snapshotting for indexers#13
Open
mohammed-deepinfra wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The engine's KV-event ZMQ tee (
KvZmqPublisher) broadcasts KV-cache events on azmq.PUBsocket so a Dynamo standalone indexer can reconstruct the cache tree. PUB/SUB is lossy, so a subscriber that detects a sequence gap asks the replayROUTERsocket to resend fromstart_seq. Until now replay could only answer from the in-memory ring buffer (the lastbuffer_stepsbatches).The bug this fixes: when the requested
start_seqhad already been evicted from the ring, the old code streamed only whatever was left in the buffer — leaving a hole at the front. This is exactly what happens when an indexer cold-joins (empty tree, requests replay from seq 0) or falls behind the ring window. Every block whose parent landed in the missing range was then rejected withParentBlockNotFound, and the subscriber's tree collapsed in a cascade.The fix: when
start_seqpredates the ring buffer, the publisher now serves a full snapshot of its current tree instead of a hole.kv_snapshot: {block_hash -> (tokens_hash, parent_hash, depth)}, kept in lockstep with the live stream (BlockStoredadds,BlockRemoved/AllBlocksClearedprune). It lives on the same thread that owns the sockets and the buffer, so no lock is needed._service_replaynow branches:start_seqstill in the ring → incremental replay as before;start_seqbelow the ring floor →_send_snapshot._send_snapshotstreams a new replay sub-protocol: aSNAPSHOT_SEQ(-2) header carrying the snapshot versionS(= last published seq), then oneAllBlocksClearedbatch to reset the worker, then the whole tree as depth-orderedBlockStoredbatches (parents strictly before children), then the existingEND_SEQ(-1) sentinel. Orphans — a non-root block whose parent was evicted — are skipped rather than emitted (emitting one would be rejected withParentBlockNotFoundand strand its subtree); they are re-learned from the live stream._SNAPSHOT_BATCH_BLOCKS = 1000per batch. ROUTER's mute action is to drop silently, so one-event-per-frame would overrun the send HWM and truncate a large tree — the very failure this feature exists to prevent. The replay socket's HWM is also held>= buffer_steps.Coupled wire-format change (required to keep each snapshot node compact): the engine now sends one precomputed
tokens_hashper block instead of the rawtoken_idsarray. The hash is XXH3-64 withXXH3_SEED = 1337, computed to be byte-identical to dynamo'scompute_block_hash_for_seq(LoRA name folded into the seed; sorted multimodalmm_hashvalues appended). The router no longer hashes tokens. This shrinks the wire payload and lets eachkv_snapshotnode store a singleu64rather than a block's worth of token ids.publish_storedloses itstoken_ids/num_block_tokensparams and gainstokens_hashes.Test Coverage
_compute_block_tokens_hashmust be byte-identical to dynamo'scompute_block_hash_for_seq, or every indexer query misses. Guarded bytokens_hash_parity_test.pyagainst the dynamo binding (lives in the consumer/dynamo repo, since it tests both sides).buffer_steps=16to force the snapshot path): restarted all 3 cold-joining indexers (perfect/routing/reality). Each detected the gap (expected seq 0, got ~432793), requested replay from 0, received a snapshot of ~89.7k blocks, and rebuilt in < 1 s with zeroParentBlockNotFound, even while ~3,250 live events arrived during/after the rebuild. A functional prefix query after rebuild returned the full longest-match on all three. Runbook + captured logs:standalone_indexer/snapshot_e2e_test.md.