Skip to content

Support Advanced Numeric Data Distributions (Normal, Exponential, Log-Normal) #4

@Sauvikn98

Description

@Sauvikn98

Currently, when our engine generates randomized numeric fields, it defaults to a uniform distribution. While fine for basic mock data, many real-world scenarios require non-uniform probability distributions—such as generating realistic user ages, dynamic pricing, or clustered activity timestamps.

This issue aims to introduce support for Normal, Exponential, and Log-Normal distributions for numeric fields during test data generation.

Tasks:

  • Update the TestDataConfig or the numeric SchemaField.constraints type definition to allow specifying a distribution property (e.g., "normal" | "exponential" | "log-normal") along with any necessary math parameters (mean, standard deviation, lambda).
  • Modify the data generator logic (likely within TestDataGeneratorService or our Faker utility wrappers) to calculate random values adhering to the provided distribution using our deterministic seedrandom instance.
  • Ensure the calculated distributions still gracefully respect existing field boundaries like min and max.
  • Add Vitest unit tests verifying that a large generation batch roughly approximates the expected bell curve or decay.

Where to start looking:

  • Check the types in TestDataConfig and SchemaField.
  • Look at how seedrandom is currently utilized for data generation to ensure your new math functions remain deterministic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions