feat(hash): seed-avalanche + structured-key tests; fix sparse-key collisions in mh#217
Merged
Conversation
…lisions in mh New SMHasher-inspired framework tests (typed, all algorithms): - SeedAvalanche: flipping a single seed bit must flip ~half the output bits (descriptors gain kSeeded; skipped for the seedless `simple`). - StructuredKeysAreDistinct: all-zero inputs of every length 0..256, single-bit keys (16B/64B, every position), and cyclic patterns must hash pairwise distinct (~1200 keys, birthday bound ~2^-44). The structured-key test immediately found 5 real collisions in mh: the round `rotl(h ^ block, r) * M` absorbs raw input bits, and a single input bit that lands on bit 63 (or 62, for multipliers == 1 mod 4) after rotation survives the odd-multiplier lane multiply as a single-bit state difference, cancelling against the matching bit of the next block (bit32@block_k == bit63@block_k+1). Fix (values change, per the stability contract): absorbed blocks are premultiplied by a full-width odd constant (kMulIn) so single-bit inputs become multi-bit before meeting the accumulator - the same defense xxh64/ murmur3/xxh3 employ. Residual single-bit cases (input bits 62/63) are diffused by the rotate-before-multiply of the lane itself. Tails (<8 bytes) need no premultiplication: they cannot carry bits 62/63, so cancellation is impossible there - keeping the short-input path at full speed. Also: README notes big-endian is correct by construction but not exercised by CI; TODO.md easy tier cleared.
Fab-Cat
approved these changes
Jul 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Easy tier of
mbo/hash/TODO.md— and the new test paid off instantly.New framework tests (typed, cover all six algorithms automatically)
kSeeded; the seedlesssimpleskips.The catch: real sparse-key collisions in
mhThe structured-key test found 5 collisions among single-bit keys on its first run, e.g.
bit32@block_k == bit63@block_{k+1}. Root cause: the roundrotl(h ^ block, r) * Mabsorbs raw input bits; a single bit landing on bit 63 (or 62, for multipliers ≡ 1 mod 4) after rotation survives the odd-multiplier lane multiply as a single-bit state difference and cancels against the matching bit of the next block.The fix (values change, per the stability contract)
Absorbed blocks are premultiplied by a full-width odd constant (
kMulIn) so single-bit inputs become multi-bit before meeting the accumulator — the same defense xxh64/murmur3/xxh3 employ (and why they all pay an input multiply). Residual single-bit cases (input bits 62/63) are diffused by the rotate-before-multiply of the lane itself.Tails are exempt: a <8-byte tail cannot carry bits 62/63, so cancellation is impossible there — the short-input path keeps its speed. Cost elsewhere: +1 multiply per 8-byte block; the CI benchmark on this PR shows the updated cross-platform picture.
Also: README notes big-endian is correct by construction but not exercised by CI; TODO.md easy tier cleared.