Mathematically Grounded, Engineering-Strong Database Seeding Engine
Drawline Core is a production-grade TypeScript library for intelligent, deterministic test data generation across multiple database systems. It provides a unified interface for schema inference, relationship resolution, and referentially intact data seeding with strong mathematical guarantees on data consistency.
- Overview
- Features Achieved
- Technical Architecture
- Mathematical Foundations
- Usage
- Roadmap
- Development
Drawline addresses one of the most challenging problems in software engineering: generating realistic, referentially intact test data at scale across heterogeneous database systems. Traditional approaches rely on simple random generation or expensive database lookups to maintain foreign key integrity. Drawline uses a mathematically derived deterministic generation protocol that guarantees referential integrity without any database queries during generation.
Given:
- A database schema
$S$ with collections$C = {c_1, c_2, ..., c_n}$ - Relationships
$R = {r_1, r_2, ..., r_m}$ defining foreign key dependencies - A generation seed
$\sigma \in \mathbb{N}$
Generate documents
- All foreign key references point to existing primary keys
- The generation is fully deterministic:
$G(\sigma, c, i) \rightarrow d_i$ - No database queries are required during generation
Drawline implements a unified adapter pattern supporting 11+ database systems:
| Adapter | Status | Key Features |
|---|---|---|
| PostgreSQL | ✅ Complete | Schema inference, FK constraints, serial types, composite PKs |
| MySQL | ✅ Complete | AUTO_INCREMENT, foreign keys, composite PKs |
| SQLite | ✅ Complete | Embedded testing, full FK support |
| MongoDB | ✅ Complete | ObjectId generation, document embedding |
| DynamoDB | ✅ Complete | Partition keys, sort keys, GSI support |
| Firestore | ✅ Complete | Collection groups, subcollections |
| Redis | ✅ Complete | Key-value, sets, sorted sets |
| SQL Server | ✅ Complete | Identity columns, stored procedures |
| InMemory | ✅ Complete | Mock adapter for testing |
| Ephemeral | ✅ Complete | Transient data for demos |
| Null | ✅ Complete | No-op adapter |
| CSV Export | ✅ Complete | Export to CSV files |
- SchemaCollection: Represents tables/collections with fields, constraints, and metadata
- SchemaField: Supports 20+ field types including composite keys
- SchemaRelationship: One-to-one, one-to-many, many-to-many with composite FK support
- FieldConstraints: min, max, minLength, maxLength, pattern, enum, unique, nullPercentage
Smart field generation based on semantic naming:
// Score-based inference system
const rules = [
{ tokens: ['email'], score: 10, generator: f => f.internet.email() },
{ tokens: ['first', 'name'], score: 8, generator: f => f.person.firstName() },
{ tokens: ['created', 'at'], score: 10, generator: f => f.date.past().toISOString() },
// ... 80+ rules implemented
];Supports:
- Tokenization (camelCase, snake_case, PascalCase)
- Negative token filtering
- Score-based best-match selection
- Perfect-match bonuses
- Caching for performance
Mathematically sound topological sorting:
- Strong vs Weak Dependencies: Distinguishes required vs optional FKs
- Cycle Detection: DFS-based cycle detection with Tarjan's algorithm principles
- Cycle Breaking: Intelligent break-point selection prioritizing weak deps
- Level Assignment: BFS-based level propagation for parallel execution
Cross-column dependency resolution:
- ColumnDependencyGraph: Topological sort of field dependencies
- Binary Constraints: minColumn, maxColumn, gtColumn, ltColumn
- Temporal Constraints: startDate, endDate for timestamps
- Numeric Constraints: min, max with automatic range adjustment
- String Constraints: minLength, maxLength, pattern, trim, case
Math-based ID generation eliminating database lookups:
ID(collection, index) = H(collection + index + sessionId + seed)
where H is SHA-256 truncated to specific format:
- UUID: 8-4-4-4-12 hex format
- ObjectId: 24-char hex string
- Integer: index + startId + 1
This guarantees:
- Composite Primary Keys: Up to N fields per PK
- Composite Foreign Keys: Multi-column FK references
- FK Chaining: Resolves nested FK chains (A→B→C)
- Cached Resolution: Parent row caching for performance
Generates type-safe ORM code from schema:
| ORM | Status | Output |
|---|---|---|
| Prisma | ✅ Complete | schema.prisma |
| TypeORM | ✅ Complete | entities/*.ts |
| Drizzle | ✅ Complete | schema.ts |
| Mongoose | ✅ Complete | schemas/*.ts |
- Full Sync Mode: Destructive schema changes allowed
- Additive Mode: Safe migrations only
- DDL Generation: CREATE TABLE, ALTER TABLE, DROP TABLE
- Type Migration: Type widening detection
- Foreign Key Resolution: Constraint ordering
drawline init # Initialize project
drawline gen --schema --config # Generate data
drawline validate # Validate schema
drawline diff # Show schema changes
Parallel generation for large datasets:
- Worker Threads: Native Node.js worker_threads
- Sharding: Deterministic range-based sharding
- Progress Callbacks: Real-time progress reporting
- Task Queue: FIFO scheduling with backpressure
┌─────────────────────────────────────────────────────────────────────┐
│ TestDataGeneratorService │
├─────────────────────────────────────────────────────────────────────┤
│ 1. initialize(config, collections, relationships) │
│ ├── Preload metadata from target DB │
│ ├── Build relationship map │
│ └── Initialize seeded RNG │
│ │
│ 2. buildDependencyOrder() │
│ ├── Build DAG from relationships │
│ ├── Detect and break cycles │
│ └── Return topological sort │
│ │
│ 3. generateAndPopulate() │
│ ├── For each collection in order: │
│ │ ├── ensureCollection() │
│ │ ├── generateCollectionData() │
│ │ └── insertDocuments() │
│ └── Validate referential integrity │
└─────────────────────────────────────────────────────────────────────┘
abstract class BaseAdapter {
// Connection management
abstract connect(): Promise<void>;
abstract disconnect(): Promise<void>;
// Schema operations
abstract collectionExists(name: string): Promise<boolean>;
abstract ensureCollection(name: string, fields: SchemaField[]): Promise<void>;
abstract getCollectionDetails(name: string): Promise<CollectionDetails>;
abstract getCollectionSchema(name: string): Promise<SchemaField[]>;
// Data operations
abstract insertDocuments(
collectionName: string,
documents: GeneratedDocument[]
): Promise<(string | number)[]>;
abstract clearCollection(name: string): Promise<void>;
abstract getDocumentCount(name: string): Promise<number>;
// Validation
abstract validateReference(
collectionName: string,
fieldName: string,
value: unknown
): Promise<boolean>;
}BaseAdapter
├── PostgresAdapter
├── MySQLAdapter
├── SQLiteAdapter
├── MongoDBAdapter
├── DynamoDBAdapter
├── FirestoreAdapter
├── RedisAdapter
├── SQLServerAdapter
├── InMemoryAdapter (for testing)
├── EphemeralAdapter (for demos)
├── NullAdapter (no-op)
└── CSVExportAdapter (export)
Problem: Given a DAG
Algorithm: Kahn's algorithm with in-degree counting:
TOPOLOGICAL-SORT(G):
Compute in-degree(v) for all v ∈ V
Queue ← { v | in-degree(v) = 0 }
result ← []
while Queue not empty:
v ← Queue.pop()
result.append(v)
for each edge (v, w):
in-degree(w) ← in-degree(w) - 1
if in-degree(w) = 0:
Queue.push(w)
return result
Complexity:
Theorem: For any collections
Proof: Using the deterministic hash: $$id(c, i) = \text{hash}(\text{collection}c \oplus i \oplus \sigma){constrained}$$
The FK resolution computes:
By substitution:
Theorem: Any finite directed graph can be made acyclic by removing at least one edge.
Algorithm: Modified DFS with cycle breaking:
DETECT-CYCLE(G):
visited ← ∅
recursionStack ← ∅
DFS(v):
visited.add(v)
recursionStack.add(v)
for each neighbor u of v:
if u ∉ visited:
if DFS(u) return true
if u ∈ recursionStack:
return CYCLE-DETECTED(v, u)
recursionStack.delete(v)
return false
for each vertex v:
if v ∉ visited:
if DFS(v) return true
return false
Breaking Strategy: When cycles detected, prioritize removing weak dependencies (non-required FKs) to preserve data integrity.
Problem: Given a field name
Algorithm: Score-based matching:
Where:
-
$\text{match}(r, f) = 5$ if$|tokens(f)| = |tokens(r)|$ (perfect match) $\text{noise}(r, f) = 0.5 \times (|tokens(f)| - |tokens(r)|)$
Select
For composite FKs
- Select parent row index
$r = i \mod |parent|$ - Retrieve cached parent row
$P[r]$ - For each component
$f_j$ :$$value[f_j] = P[r][p_j]$$
This ensures all FK components reference the same parent row.
For constraints like
Where
npm install @solvaratech/drawline-coreimport { TestDataGeneratorService } from "@solvaratech/drawline-core/server";
import { PostgresAdapter } from "@solvaratech/drawline-core/generator/adapters/PostgresAdapter";
// 1. Configure adapter
const adapter = new PostgresAdapter({
connectionString: "postgres://user:pass@localhost:5432/mydb"
});
await adapter.connect();
// 2. Initialize service
const service = new TestDataGeneratorService(adapter);
// 3. Define schema
const collections = [
{
id: "users",
name: "users",
fields: [
{ id: "id", name: "id", type: "uuid", isPrimaryKey: true },
{ id: "email", name: "email", type: "string", required: true },
{ id: "name", name: "name", type: "string" }
]
},
{
id: "posts",
name: "posts",
fields: [
{ id: "id", name: "id", type: "uuid", isPrimaryKey: true },
{ id: "user_id", name: "user_id", type: "uuid", isForeignKey: true,
referencedCollectionId: "users" },
{ id: "title", name: "title", type: "string" }
]
}
];
const relationships = [
{
id: "posts->users",
fromCollectionId: "posts",
toCollectionId: "users",
type: "many-to-one",
fromField: "user_id",
toField: "id"
}
];
// 4. Generate configuration
const config = {
collections: [
{ collectionName: "users", count: 100 },
{ collectionName: "posts", count: 1000 }
],
seed: 12345
};
// 5. Execute generation
const result = await service.generateAndPopulate(
collections,
relationships,
config
);
console.log(`Generated ${result.totalDocumentsGenerated} documents`);import { computeSchemaDiff, generateDDL } from "@solvaratech/drawline-core/schema";
// Compare current schema with database
const diff = computeSchemaDiff(databaseSnapshot, newSchema, "additive");
// Generate migration SQL
const statements = generateDDL(diff);
for (const stmt of statements) {
console.log(stmt.sql);
}import { PrismaGenerator } from "@solvaratech/drawline-core/generators/orm";
const generator = new PrismaGenerator();
const output = generator.generate(collections, relationships);
console.log(output.content); // Prisma schema.prisma content- Enhanced Validation: Post-generation integrity validation
- Data masking: Sensitive data identification and redaction
- Incremental generation: Delta seeding for existing databases
- Distribution profiles: Normal, exponential, power-law distributions
- Relationship visualization: Draw relationship graphs
- Web UI Dashboard: Visual schema editor and generator interface
- Data Templates: Reusable generation templates
- Export formats: More export adapters (Excel, JSON Lines)
- Audit logging: Generation audit trail
- CI/CD integration: GitHub Actions, GitLab CI
- GraphQL API: REST/GraphQL API for remote generation
- Multi-tenant:隔离的多租户支持
- Enterprise features: SSO, RBAC, audit
- Cloud dashboard: SaaS management console
- Plug-in system: Third-party generator plugins
- Node.js 18+
- TypeScript 5.9+
- pnpm or npm
npm install
npm run build# Run all tests
npm test
# Watch mode
npm run test:watch
# UI
npm run test:ui
# CI (with coverage)
npm run test:cinpm run type-checknpm run cli:build
npm link # Link globally
drawline init
drawline gen --schema schema.json --config config.json// Main exports
export * from "./types/schemaDesign"; // Schema types
export * from "./types/schemaDiff"; // Diff types
export * from "./utils/schemaConverter"; // Converters
export * from "./utils/errorMessages"; // Errors
export * from "./schema"; // Schema engine
export * from "./generators/orm"; // ORM generators
// Server exports
export * from "./connections"; // Database connections
export * from "./generator"; // Generation engineinterface SchemaCollection {
id: string;
name: string;
fields: SchemaField[];
schema?: string;
dbName?: string;
position?: { x: number; y: number };
}
interface SchemaField {
id: string;
name: string;
type: FieldType;
required?: boolean;
isPrimaryKey?: boolean;
isForeignKey?: boolean;
isSerial?: boolean;
compositePrimaryKeyIndex?: number;
compositeKeyGroup?: string;
referencedCollectionId?: string;
foreignKeyTarget?: string;
rawType?: string;
arrayItemType?: string;
defaultValue?: any;
constraints?: FieldConstraints;
}
interface SchemaRelationship {
id: string;
fromCollectionId: string;
toCollectionId: string;
type: "one-to-one" | "one-to-many" | "many-to-many";
fromField?: string;
toField?: string;
fromFields?: string[];
toFields?: string[];
}
interface TestDataConfig {
collections: CollectionConfig[];
seed?: number | string;
batchSize?: number;
onProgress?: (progress: ProgressUpdate) => Promise<void>;
}MIT License. See LICENSE file for details.
See CONTRIBUTING.md for development guidelines.
- GitHub Issues: https://github.com/solvaratech/drawline-core/issues
- Documentation: https://drawline.app/docs