Skip to content

Feature: Improve Field Inference Engine Rules #2

@Sauvikn98

Description

@Sauvikn98

The current FieldInferenceEngine.ts relies on a limited set of heuristic rules to infer field types, constraints, and semantic meaning from schema definitions. A TODO comment in the code explicitly highlights this limitation:

// TODO - add more rules ( FAST )

Due to the limited rule set:

  • Field type detection can be inaccurate or overly generic.
  • Constraints such as uniqueness, optionality, or semantic intent are often missed.
  • Inference quality drops significantly for real-world schemas that use inconsistent or domain-specific naming conventions.

This reduces both accuracy and coverage of the inference engine, especially as schema complexity grows.

Affected File

  • FieldInferenceEngine.ts

Current Behavior

The inference engine:

  • Applies a small number of heuristic rules.

  • Primarily depends on basic field-name pattern matching.

  • Lacks deeper semantic inference for:

  • IDs vs references

  • Metadata fields (timestamps, soft deletes, audit fields)

  • Enums and constrained values

  • Domain-specific fields (e.g., finance, auth, analytics)

As a result:

  • Many fields fall back to default or generic types.
  • Downstream systems (data generation, validation, relationship detection) receive incomplete or suboptimal metadata.

Desired Solution

Expand and refine the heuristic rule set in FieldInferenceEngine to improve speed, accuracy, and coverage of field detection.

The enhanced inference engine should:

1. Improve Field Type Detection

Detect common field types more reliably based on naming patterns and context:

  • Identifiers: id, *_id, uuid, guid
  • Timestamps: created_at, updated_at, deleted_at, last_login
  • Booleans: is_*, has_*, can_*, *_enabled
  • Counters & metrics: count, total, views, score
  • Monetary values: amount, price, cost, balance
  • Textual content: description, bio, notes, comment

2. Infer Constraints & Semantics

Add rules to infer constraints such as:

  • nullable vs required
  • unique fields (e.g., email, username)
  • Length constraints (e.g., phone, otp, pin)
  • Enum-like fields (status, type, role, category)
  • Soft-delete indicators (deleted_at, is_deleted)

3. Context-Aware Inference

Use surrounding context to improve accuracy:

  • Table name + field name correlation
  • Relationship hints (foreign keys, join tables)
  • Cross-field dependencies (e.g., start_date + end_date)
  • Composite patterns (min_* / max_*, from_* / to_*)

4. Performance Considerations (FAST)

Since inference runs frequently:

  • Rules should be lightweight and deterministic.
  • Avoid expensive string operations where possible.
  • Use prioritized rule evaluation with early exits.
  • Prefer precompiled regex or lookup tables.

5. Extensibility

Design the rule system so that:

  • New rules can be added without modifying core logic.
  • Rules can be ordered or weighted.
  • Domain-specific rule sets can be plugged in later (optional).

Expected Benefits

  • Higher-quality schema understanding
  • More accurate data generation
  • Better relationship and constraint detection
  • Reduced manual overrides
  • Improved support for real-world and legacy schemas

Acceptance Criteria

  • Field inference accuracy improves across diverse schemas
  • Common field patterns are reliably detected
  • Inference performance remains fast or improves
  • No regressions in existing inference behavior
  • Clear structure for adding future rules

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions