Skip to content

Conversation

@seebs
Copy link
Contributor

@seebs seebs commented Nov 22, 2019

This is the stuff to add "patterns" for use with joins and similar things, also some cleanup like removing the old copy of Apophenia.

We still had a reference to the old draft copy of apophenia. Removed it,
updated go.mod/go.sum.
enumer package seems to have changed. i don't know why the
other thing also updated with it.
This adds a feature to allow creation of patterned data with bitmasks
corresponding to patterns being inserted in int-type fields.

The only currently supported pattern is "triangle", which does
digits in a pattern like
	001012012301234[...]
up to a specified limit. It can also generate masks for columns
where the generated digit is equal to, equal to or greater than,
or lower than a designated value, or the first time that values
up to N were generated in each run.

For N=10, this means that equal(0) is 10/55 of the input space,
while equal(9) is 1/55 of the input space, but distinct queries
against the values masked by them will produce exactly one value
either way; on the other hand, once(9) is 10/55 of the input space,
and denotes 10 distinct values.
Value computations were wrong for later splits (again!), because I was
dividing by the value of t.n, when that wasn't actually relevant.

Made patterns independent; fields can specify the traits of their patterns,
or just inherit from a parent. This allows having specific fields using
different pattern rules.

Pattern fields can now infer their maximum values, approximately; if no
maximum is specified, they'll guess their maximum range, although if there's
exponents involved, they'll guess a bit high unless the column space is
close to filling out a cycle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant