The current FieldInferenceEngine.ts relies on a limited set of heuristic rules to infer field types, constraints, and semantic meaning from schema definitions. A TODO comment in the code explicitly highlights this limitation:
// TODO - add more rules ( FAST )
Due to the limited rule set:
- Field type detection can be inaccurate or overly generic.
- Constraints such as uniqueness, optionality, or semantic intent are often missed.
- Inference quality drops significantly for real-world schemas that use inconsistent or domain-specific naming conventions.
This reduces both accuracy and coverage of the inference engine, especially as schema complexity grows.
Affected File
Current Behavior
The inference engine:
-
Applies a small number of heuristic rules.
-
Primarily depends on basic field-name pattern matching.
-
Lacks deeper semantic inference for:
-
IDs vs references
-
Metadata fields (timestamps, soft deletes, audit fields)
-
Enums and constrained values
-
Domain-specific fields (e.g., finance, auth, analytics)
As a result:
- Many fields fall back to default or generic types.
- Downstream systems (data generation, validation, relationship detection) receive incomplete or suboptimal metadata.
Desired Solution
Expand and refine the heuristic rule set in FieldInferenceEngine to improve speed, accuracy, and coverage of field detection.
The enhanced inference engine should:
1. Improve Field Type Detection
Detect common field types more reliably based on naming patterns and context:
- Identifiers:
id, *_id, uuid, guid
- Timestamps:
created_at, updated_at, deleted_at, last_login
- Booleans:
is_*, has_*, can_*, *_enabled
- Counters & metrics:
count, total, views, score
- Monetary values:
amount, price, cost, balance
- Textual content:
description, bio, notes, comment
2. Infer Constraints & Semantics
Add rules to infer constraints such as:
nullable vs required
unique fields (e.g., email, username)
- Length constraints (e.g.,
phone, otp, pin)
- Enum-like fields (
status, type, role, category)
- Soft-delete indicators (
deleted_at, is_deleted)
3. Context-Aware Inference
Use surrounding context to improve accuracy:
- Table name + field name correlation
- Relationship hints (foreign keys, join tables)
- Cross-field dependencies (e.g.,
start_date + end_date)
- Composite patterns (
min_* / max_*, from_* / to_*)
4. Performance Considerations (FAST)
Since inference runs frequently:
- Rules should be lightweight and deterministic.
- Avoid expensive string operations where possible.
- Use prioritized rule evaluation with early exits.
- Prefer precompiled regex or lookup tables.
5. Extensibility
Design the rule system so that:
- New rules can be added without modifying core logic.
- Rules can be ordered or weighted.
- Domain-specific rule sets can be plugged in later (optional).
Expected Benefits
- Higher-quality schema understanding
- More accurate data generation
- Better relationship and constraint detection
- Reduced manual overrides
- Improved support for real-world and legacy schemas
Acceptance Criteria
The current
FieldInferenceEngine.tsrelies on a limited set of heuristic rules to infer field types, constraints, and semantic meaning from schema definitions. A TODO comment in the code explicitly highlights this limitation:// TODO - add more rules ( FAST )Due to the limited rule set:
This reduces both accuracy and coverage of the inference engine, especially as schema complexity grows.
Affected File
FieldInferenceEngine.tsCurrent Behavior
The inference engine:
Applies a small number of heuristic rules.
Primarily depends on basic field-name pattern matching.
Lacks deeper semantic inference for:
IDs vs references
Metadata fields (timestamps, soft deletes, audit fields)
Enums and constrained values
Domain-specific fields (e.g., finance, auth, analytics)
As a result:
Desired Solution
Expand and refine the heuristic rule set in
FieldInferenceEngineto improve speed, accuracy, and coverage of field detection.The enhanced inference engine should:
1. Improve Field Type Detection
Detect common field types more reliably based on naming patterns and context:
id,*_id,uuid,guidcreated_at,updated_at,deleted_at,last_loginis_*,has_*,can_*,*_enabledcount,total,views,scoreamount,price,cost,balancedescription,bio,notes,comment2. Infer Constraints & Semantics
Add rules to infer constraints such as:
nullablevsrequireduniquefields (e.g.,email,username)phone,otp,pin)status,type,role,category)deleted_at,is_deleted)3. Context-Aware Inference
Use surrounding context to improve accuracy:
start_date+end_date)min_*/max_*,from_*/to_*)4. Performance Considerations (FAST)
Since inference runs frequently:
5. Extensibility
Design the rule system so that:
Expected Benefits
Acceptance Criteria