Skip to content

Constraint Engine Phase 1#24

Merged
Sauvikn98 merged 3 commits intomainfrom
dev
Apr 19, 2026
Merged

Constraint Engine Phase 1#24
Sauvikn98 merged 3 commits intomainfrom
dev

Conversation

@Sauvikn98
Copy link
Copy Markdown
Contributor

No description provided.

Sauvikn98 and others added 3 commits April 17, 2026 11:49
…eneration

## Summary

Fixed the critical integration gap between WorkerPool and worker execution by implementing proper shard-based, range-aware generation. This enables deterministic parallel data generation where multiple workers can generate data concurrently while producing identical output to single-threaded generation.

## Changes

### Core Bug Fix: RNG State Management

**File: `src/generator/adapters/BaseAdapter.ts`**

Fixed critical determinism bug in `generateStream()` where `rangeStart` skipping only advanced the RNG once per document instead of once per field. This caused non-required fields (with null-rate checks) to have different null/non-null values between single-threaded and sharded generation.

**Before:** Only called `random()` once per skipped document
**After:** Calls `random()` for every non-PK, non-FK, non-reference field per skipped document

### Worker Integration

**File: `src/generator/worker.cts`**

- Added shard parameter support (`collectionIndex`, `rangeStart`, `count`) to `start_generation` event
- Routes to `generateCollectionWithRange()` when shard info is provided
- Maintains backward compatibility with non-sharded generation

**File: `src/generator/index.ts`**

- Added `generateCollectionWithRange()` method that generates a single collection's data within a specific range
- Properly initializes adapter with seed and handles schema resolution
- Enables parallel worker execution with deterministic output per shard

### WorkerPool Enhancements

**File: `src/generator/WorkerPool.ts`**

- Added `ShardTask` interface for structured shard task description
- Added `createShardTasks()` function to generate tasks for all collection shards
- Added `processShardedGeneration()` method for parallel task execution across workers
- Enhanced `ShardRange` interface with `count` property
- Workers now receive exact `(start, count)` ranges instead of processing entire collections

### Testing

**New File: `src/generator/WorkerPool.test.ts`**
- 13 tests covering shard computation, range division, and determinism
- Tests for edge cases: uneven division, more workers than items, contiguous ranges

**New File: `src/generator/deterministic.test.ts`**
- 9 tests proving single-threaded vs multi-worker output is identical
- Verifies IDs, names, emails all match exactly across 1, 2, and 4 worker configurations

**New File: `src/generator/range-determinism.test.ts`**
- 4 tests verifying range-based generation produces correct document counts and values
- Confirms first N and last N documents match single-threaded generation

## Verification

All 72 tests pass, including:
- 26 new tests for WorkerPool and deterministic generation
- All existing generator tests still pass
- TypeScript type-check passes

## Technical Details

**Deterministic Sharding Formula:**
```
Value = f(seed, collection_name, field_name, document_index)
```

Each worker's `rangeStart` parameter ensures it generates the exact same values as single-threaded generation for its assigned range, enabling:
- Horizontal scalability across CPU cores
- Identical output regardless of worker count
- No IPC overhead for data exchange (only coordination messages)

## Breaking Changes

None - backward compatible with existing non-sharded usage.
…egistry

## Summary

Phase 2 Week 1: Built a comprehensive constraint type system and central constraint registry for drawline-core. This enables field-level and cross-column validation with support for multiple enforcement modes (strict, warn, retry, skip).

## Motivation

The existing `ConstraintEngine` was a prototype focused on dependency graph sorting. Production use requires:
- A typed constraint system with clear severity levels
- A central registry for constraint management
- Reusable validator functions for common constraint patterns
- Support for different enforcement modes
- Comprehensive test coverage

## Changes

### New Module: `src/generator/core/constraints/`

#### `types.ts` - Constraint Type System

Defined a complete type hierarchy for constraints:

- **ConstraintType**: Six categories covering all constraint scenarios
  - `field_validation` - Basic field constraints (min, max, pattern, enum)
  - `cross_column` - Constraints comparing multiple columns in same row
  - `cross_table` - Constraints spanning multiple collections
  - `temporal` - Date/time-based constraints
  - `conditional` - If-then-else business rules
  - `aggregation` - Computed aggregate constraints

- **ConstraintSeverity**: Three levels for reporting
  - `error` - Must be satisfied, generation halts on violation
  - `warning` - Should be satisfied, logs warning but continues
  - `info` - Informational only

- **ConstraintMode**: Enforcement strategies
  - `strict` - Throw error on violation
  - `warn` - Log warning and continue
  - `retry` - Re-generate violating documents (up to N attempts)
  - `skip` - Skip documents that violate

- **Core Interfaces**:
  - `ConstraintValidator<T>` - Generic validator interface
  - `ConstraintContext` - Runtime context (document, allDocuments, random)
  - `ValidationResult` - Individual constraint result
  - `ConstraintViolation` - Detailed violation report
  - `ConstraintReport` - Batch validation summary

#### `ConstraintRegistry.ts` - Central Constraint Management

A registry pattern for managing constraints throughout the generation lifecycle:

**Core Operations:**
- `register(name, validator)` - Register named validators
- `registerFieldConstraint(field, validator)` - Field-specific validators
- `registerDocumentConstraint(validator)` - Row-level validators
- `unregister(name)` - Remove validators

**Validation Methods:**
- `validateDocument(doc)` - Single document validation
- `validateBatch(docs)` - Batch validation with retry logic
- `fromSchemaFields(fields)` - Auto-generate from schema definitions

**Advanced Features:**
- Priority-based validation ordering
- Retry mechanism with configurable attempts
- Constraint cloning for parallel execution
- Statistics tracking

**Validation Modes:**
```typescript
// Strict: Fail on first violation
registry.setMode("strict");

// Warn: Continue but log violations
registry.setMode("warn");

// Retry: Re-generate up to 3 times
registry.setMode("retry");
registry.setMaxRetries(3);

// Skip: Skip violating documents
registry.setMode("skip");
```

#### `validators/fieldValidators.ts` - Field-Level Validators

Built-in validators for common field constraints:

| Validator | Description |
|-----------|-------------|
| `createRangeValidator` | Numeric range with inclusive/exclusive bounds |
| `createStringLengthValidator` | String length constraints |
| `createPatternValidator` | Regex pattern matching |
| `createEnumValidator` | Allowed value set |
| `createEmailValidator` | Email format validation |
| `createUrlValidator` | URL format validation |
| `createNullRateValidator` | Maximum null percentage |
| `createUniqueValidator` | Uniqueness across batch |

#### `validators/crossColumnValidators.ts` - Cross-Column Validators

Built-in validators for cross-field constraints:

| Validator | Description |
|-----------|-------------|
| `createCrossColumnValidator` | Comparison operators (eq, ne, gt, gte, lt, lte) |
| `createSumOfValidator` | Field = sum(fields) constraint |
| `createRatioOfValidator` | Ratio with tolerance |
| `createPercentageOfValidator` | Percentage with tolerance |
| `createConditionalValidator` | If-then-else business rules |

### Testing

#### `ConstraintRegistry.test.ts` - 42 Tests

Comprehensive test coverage across:

**Registry Operations:**
- Registration and retrieval
- Field constraints
- Document constraints
- Cloning and clearing
- Statistics

**Validation Scenarios:**
- Single document validation
- Batch validation
- Multiple violations
- Schema field conversion

**Field Validators:**
- Range (inclusive/exclusive)
- String length
- Pattern matching
- Enum values
- Email format
- URL format

**Cross-Column Validators:**
- Comparison operators
- Sum constraints
- Ratio constraints
- Percentage constraints
- Conditional logic

**Integration Scenarios:**
- Complex business rules
- Multiple constraint types
- End-to-end validation

## Usage Examples

### Basic Field Validation
```typescript
import { createRangeValidator, createEnumValidator } from "./constraints";

const registry = new ConstraintRegistry();
registry.registerFieldConstraint("price", createRangeValidator("price", { min: 0, max: 1000 }));
registry.registerFieldConstraint("status", createEnumValidator("status", ["active", "pending"]));

const violations = registry.validateDocument({ price: 50, status: "active" });
// violations = []
```

### Cross-Column Constraint
```typescript
import { createCrossColumnValidator, createSumOfValidator } from "./constraints";

registry.registerFieldConstraint("discount", createCrossColumnValidator("discount", {
  sourceField: "discount",
  targetField: "price",
  operator: "lt",
}));

registry.registerFieldConstraint("total", createSumOfValidator("total", {
  targetFields: ["subtotal", "tax"],
  sumField: "total",
}));
```

### Conditional Validation
```typescript
import { createConditionalValidator } from "./constraints";

registry.registerFieldConstraint("approved_at", createConditionalValidator("approved_at", {
  conditionField: "status",
  conditionOperator: "eq",
  conditionValue: "approved",
  thenField: "approved_at",
  thenConstraint: (value) => ({
    valid: value !== null,
    errorMessage: "Approved docs must have approved_at",
  }),
}));
```

### Batch Validation with Retry
```typescript
registry.setMode("retry");
registry.setMaxRetries(3);

const report = registry.validateBatch(documents, { mode: "retry" });
console.log(report.totalViolations);
console.log(report.executionTimeMs);
```

## Breaking Changes

None - all changes are additive. Existing `ConstraintEngine` in `core/ConstraintEngine.ts` remains functional.

## Backward Compatibility

The `ConstraintRegistry.fromSchemaFields()` method automatically converts existing `FieldConstraints` definitions from `schemaDesign.ts` into constraint validators, ensuring smooth migration.

## Technical Notes

- All validators are pure functions for testability
- Random function injected via context for deterministic testing
- Supports both synchronous and potential async validation
- Memory-efficient batch processing with streaming support

## Performance

- O(n*m) batch validation where n=documents, m=constraints
- Constraint priority ordering minimizes early failures
- Retry logic uses exponential backoff strategy
@Sauvikn98 Sauvikn98 changed the title Constrain Engine Phase 1 Constraint Engine Phase 1 Apr 19, 2026
@Sauvikn98 Sauvikn98 merged commit 04b6257 into main Apr 19, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant