Conversation
…eneration ## Summary Fixed the critical integration gap between WorkerPool and worker execution by implementing proper shard-based, range-aware generation. This enables deterministic parallel data generation where multiple workers can generate data concurrently while producing identical output to single-threaded generation. ## Changes ### Core Bug Fix: RNG State Management **File: `src/generator/adapters/BaseAdapter.ts`** Fixed critical determinism bug in `generateStream()` where `rangeStart` skipping only advanced the RNG once per document instead of once per field. This caused non-required fields (with null-rate checks) to have different null/non-null values between single-threaded and sharded generation. **Before:** Only called `random()` once per skipped document **After:** Calls `random()` for every non-PK, non-FK, non-reference field per skipped document ### Worker Integration **File: `src/generator/worker.cts`** - Added shard parameter support (`collectionIndex`, `rangeStart`, `count`) to `start_generation` event - Routes to `generateCollectionWithRange()` when shard info is provided - Maintains backward compatibility with non-sharded generation **File: `src/generator/index.ts`** - Added `generateCollectionWithRange()` method that generates a single collection's data within a specific range - Properly initializes adapter with seed and handles schema resolution - Enables parallel worker execution with deterministic output per shard ### WorkerPool Enhancements **File: `src/generator/WorkerPool.ts`** - Added `ShardTask` interface for structured shard task description - Added `createShardTasks()` function to generate tasks for all collection shards - Added `processShardedGeneration()` method for parallel task execution across workers - Enhanced `ShardRange` interface with `count` property - Workers now receive exact `(start, count)` ranges instead of processing entire collections ### Testing **New File: `src/generator/WorkerPool.test.ts`** - 13 tests covering shard computation, range division, and determinism - Tests for edge cases: uneven division, more workers than items, contiguous ranges **New File: `src/generator/deterministic.test.ts`** - 9 tests proving single-threaded vs multi-worker output is identical - Verifies IDs, names, emails all match exactly across 1, 2, and 4 worker configurations **New File: `src/generator/range-determinism.test.ts`** - 4 tests verifying range-based generation produces correct document counts and values - Confirms first N and last N documents match single-threaded generation ## Verification All 72 tests pass, including: - 26 new tests for WorkerPool and deterministic generation - All existing generator tests still pass - TypeScript type-check passes ## Technical Details **Deterministic Sharding Formula:** ``` Value = f(seed, collection_name, field_name, document_index) ``` Each worker's `rangeStart` parameter ensures it generates the exact same values as single-threaded generation for its assigned range, enabling: - Horizontal scalability across CPU cores - Identical output regardless of worker count - No IPC overhead for data exchange (only coordination messages) ## Breaking Changes None - backward compatible with existing non-sharded usage.
…egistry
## Summary
Phase 2 Week 1: Built a comprehensive constraint type system and central constraint registry for drawline-core. This enables field-level and cross-column validation with support for multiple enforcement modes (strict, warn, retry, skip).
## Motivation
The existing `ConstraintEngine` was a prototype focused on dependency graph sorting. Production use requires:
- A typed constraint system with clear severity levels
- A central registry for constraint management
- Reusable validator functions for common constraint patterns
- Support for different enforcement modes
- Comprehensive test coverage
## Changes
### New Module: `src/generator/core/constraints/`
#### `types.ts` - Constraint Type System
Defined a complete type hierarchy for constraints:
- **ConstraintType**: Six categories covering all constraint scenarios
- `field_validation` - Basic field constraints (min, max, pattern, enum)
- `cross_column` - Constraints comparing multiple columns in same row
- `cross_table` - Constraints spanning multiple collections
- `temporal` - Date/time-based constraints
- `conditional` - If-then-else business rules
- `aggregation` - Computed aggregate constraints
- **ConstraintSeverity**: Three levels for reporting
- `error` - Must be satisfied, generation halts on violation
- `warning` - Should be satisfied, logs warning but continues
- `info` - Informational only
- **ConstraintMode**: Enforcement strategies
- `strict` - Throw error on violation
- `warn` - Log warning and continue
- `retry` - Re-generate violating documents (up to N attempts)
- `skip` - Skip documents that violate
- **Core Interfaces**:
- `ConstraintValidator<T>` - Generic validator interface
- `ConstraintContext` - Runtime context (document, allDocuments, random)
- `ValidationResult` - Individual constraint result
- `ConstraintViolation` - Detailed violation report
- `ConstraintReport` - Batch validation summary
#### `ConstraintRegistry.ts` - Central Constraint Management
A registry pattern for managing constraints throughout the generation lifecycle:
**Core Operations:**
- `register(name, validator)` - Register named validators
- `registerFieldConstraint(field, validator)` - Field-specific validators
- `registerDocumentConstraint(validator)` - Row-level validators
- `unregister(name)` - Remove validators
**Validation Methods:**
- `validateDocument(doc)` - Single document validation
- `validateBatch(docs)` - Batch validation with retry logic
- `fromSchemaFields(fields)` - Auto-generate from schema definitions
**Advanced Features:**
- Priority-based validation ordering
- Retry mechanism with configurable attempts
- Constraint cloning for parallel execution
- Statistics tracking
**Validation Modes:**
```typescript
// Strict: Fail on first violation
registry.setMode("strict");
// Warn: Continue but log violations
registry.setMode("warn");
// Retry: Re-generate up to 3 times
registry.setMode("retry");
registry.setMaxRetries(3);
// Skip: Skip violating documents
registry.setMode("skip");
```
#### `validators/fieldValidators.ts` - Field-Level Validators
Built-in validators for common field constraints:
| Validator | Description |
|-----------|-------------|
| `createRangeValidator` | Numeric range with inclusive/exclusive bounds |
| `createStringLengthValidator` | String length constraints |
| `createPatternValidator` | Regex pattern matching |
| `createEnumValidator` | Allowed value set |
| `createEmailValidator` | Email format validation |
| `createUrlValidator` | URL format validation |
| `createNullRateValidator` | Maximum null percentage |
| `createUniqueValidator` | Uniqueness across batch |
#### `validators/crossColumnValidators.ts` - Cross-Column Validators
Built-in validators for cross-field constraints:
| Validator | Description |
|-----------|-------------|
| `createCrossColumnValidator` | Comparison operators (eq, ne, gt, gte, lt, lte) |
| `createSumOfValidator` | Field = sum(fields) constraint |
| `createRatioOfValidator` | Ratio with tolerance |
| `createPercentageOfValidator` | Percentage with tolerance |
| `createConditionalValidator` | If-then-else business rules |
### Testing
#### `ConstraintRegistry.test.ts` - 42 Tests
Comprehensive test coverage across:
**Registry Operations:**
- Registration and retrieval
- Field constraints
- Document constraints
- Cloning and clearing
- Statistics
**Validation Scenarios:**
- Single document validation
- Batch validation
- Multiple violations
- Schema field conversion
**Field Validators:**
- Range (inclusive/exclusive)
- String length
- Pattern matching
- Enum values
- Email format
- URL format
**Cross-Column Validators:**
- Comparison operators
- Sum constraints
- Ratio constraints
- Percentage constraints
- Conditional logic
**Integration Scenarios:**
- Complex business rules
- Multiple constraint types
- End-to-end validation
## Usage Examples
### Basic Field Validation
```typescript
import { createRangeValidator, createEnumValidator } from "./constraints";
const registry = new ConstraintRegistry();
registry.registerFieldConstraint("price", createRangeValidator("price", { min: 0, max: 1000 }));
registry.registerFieldConstraint("status", createEnumValidator("status", ["active", "pending"]));
const violations = registry.validateDocument({ price: 50, status: "active" });
// violations = []
```
### Cross-Column Constraint
```typescript
import { createCrossColumnValidator, createSumOfValidator } from "./constraints";
registry.registerFieldConstraint("discount", createCrossColumnValidator("discount", {
sourceField: "discount",
targetField: "price",
operator: "lt",
}));
registry.registerFieldConstraint("total", createSumOfValidator("total", {
targetFields: ["subtotal", "tax"],
sumField: "total",
}));
```
### Conditional Validation
```typescript
import { createConditionalValidator } from "./constraints";
registry.registerFieldConstraint("approved_at", createConditionalValidator("approved_at", {
conditionField: "status",
conditionOperator: "eq",
conditionValue: "approved",
thenField: "approved_at",
thenConstraint: (value) => ({
valid: value !== null,
errorMessage: "Approved docs must have approved_at",
}),
}));
```
### Batch Validation with Retry
```typescript
registry.setMode("retry");
registry.setMaxRetries(3);
const report = registry.validateBatch(documents, { mode: "retry" });
console.log(report.totalViolations);
console.log(report.executionTimeMs);
```
## Breaking Changes
None - all changes are additive. Existing `ConstraintEngine` in `core/ConstraintEngine.ts` remains functional.
## Backward Compatibility
The `ConstraintRegistry.fromSchemaFields()` method automatically converts existing `FieldConstraints` definitions from `schemaDesign.ts` into constraint validators, ensuring smooth migration.
## Technical Notes
- All validators are pure functions for testability
- Random function injected via context for deterministic testing
- Supports both synchronous and potential async validation
- Memory-efficient batch processing with streaming support
## Performance
- O(n*m) batch validation where n=documents, m=constraints
- Constraint priority ordering minimizes early failures
- Retry logic uses exponential backoff strategy
Phase2 week1 constraint engine
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.