Skip to content

feat(hash): seed-avalanche + structured-key tests; fix sparse-key collisions in mh#217

Merged
helly25 merged 2 commits into
mainfrom
hash/easy-tests
Jul 4, 2026
Merged

feat(hash): seed-avalanche + structured-key tests; fix sparse-key collisions in mh#217
helly25 merged 2 commits into
mainfrom
hash/easy-tests

Conversation

@helly25

@helly25 helly25 commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Easy tier of mbo/hash/TODO.md — and the new test paid off instantly.

New framework tests (typed, cover all six algorithms automatically)

  • SeedAvalanche (SMHasher-style): flipping a single seed bit must flip ~half the output bits (0.45–0.55 for strong algorithms, reactive floor for weak ones). Descriptors gain kSeeded; the seedless simple skips.
  • StructuredKeysAreDistinct: ~1200 degenerate keys — all-zero inputs of every length 0–256, single-bit keys (16 B/64 B, every bit position), cyclic patterns — must hash pairwise distinct (64-bit birthday bound ≈ 2⁻⁴⁴).

The catch: real sparse-key collisions in mh

The structured-key test found 5 collisions among single-bit keys on its first run, e.g. bit32@block_k == bit63@block_{k+1}. Root cause: the round rotl(h ^ block, r) * M absorbs raw input bits; a single bit landing on bit 63 (or 62, for multipliers ≡ 1 mod 4) after rotation survives the odd-multiplier lane multiply as a single-bit state difference and cancels against the matching bit of the next block.

The fix (values change, per the stability contract)

Absorbed blocks are premultiplied by a full-width odd constant (kMulIn) so single-bit inputs become multi-bit before meeting the accumulator — the same defense xxh64/murmur3/xxh3 employ (and why they all pay an input multiply). Residual single-bit cases (input bits 62/63) are diffused by the rotate-before-multiply of the lane itself.

Tails are exempt: a <8-byte tail cannot carry bits 62/63, so cancellation is impossible there — the short-input path keeps its speed. Cost elsewhere: +1 multiply per 8-byte block; the CI benchmark on this PR shows the updated cross-platform picture.

Also: README notes big-endian is correct by construction but not exercised by CI; TODO.md easy tier cleared.

helly25 and others added 2 commits July 4, 2026 00:12
…lisions in mh

New SMHasher-inspired framework tests (typed, all algorithms):
- SeedAvalanche: flipping a single seed bit must flip ~half the output bits
  (descriptors gain kSeeded; skipped for the seedless `simple`).
- StructuredKeysAreDistinct: all-zero inputs of every length 0..256,
  single-bit keys (16B/64B, every position), and cyclic patterns must hash
  pairwise distinct (~1200 keys, birthday bound ~2^-44).

The structured-key test immediately found 5 real collisions in mh: the round
`rotl(h ^ block, r) * M` absorbs raw input bits, and a single input bit that
lands on bit 63 (or 62, for multipliers == 1 mod 4) after rotation survives
the odd-multiplier lane multiply as a single-bit state difference, cancelling
against the matching bit of the next block (bit32@block_k == bit63@block_k+1).

Fix (values change, per the stability contract): absorbed blocks are
premultiplied by a full-width odd constant (kMulIn) so single-bit inputs
become multi-bit before meeting the accumulator - the same defense xxh64/
murmur3/xxh3 employ. Residual single-bit cases (input bits 62/63) are
diffused by the rotate-before-multiply of the lane itself. Tails (<8 bytes)
need no premultiplication: they cannot carry bits 62/63, so cancellation is
impossible there - keeping the short-input path at full speed.

Also: README notes big-endian is correct by construction but not exercised
by CI; TODO.md easy tier cleared.
@helly25 helly25 requested a review from Fab-Cat July 4, 2026 08:25
@helly25 helly25 merged commit 80e6268 into main Jul 4, 2026
40 checks passed
@helly25 helly25 deleted the hash/easy-tests branch July 4, 2026 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants