Skip to content

used a wrapper on dynamo local kv indexer in the engine pod for snapshotting + replay#14

Open
mohammed-deepinfra wants to merge 1 commit into
deep-main-v1.3.0rc16from
mohammed/kv-local-indexer-worker
Open

used a wrapper on dynamo local kv indexer in the engine pod for snapshotting + replay#14
mohammed-deepinfra wants to merge 1 commit into
deep-main-v1.3.0rc16from
mohammed/kv-local-indexer-worker

Conversation

@mohammed-deepinfra

Copy link
Copy Markdown

[None][feat] In-process KV local indexer for ZMQ event recovery (ring-buffer + snapshot over HTTP)

Description

The KV-cache ZMQ tee (KvZmqPublisher) previously backed recovery with a fixed-window Python deque plus a ZMQ ROUTER replay socket (:5558). That design had two problems: it could only replay a bounded recent window (no full snapshot for a consumer that joins late or after eviction), and the deque path had known failure modes under large gaps (dropped-root cascade, ROUTER-HWM truncation).

This PR replaces that machinery with Dynamo's LocalKvIndexer (radix tree + replay ring buffer + tree snapshot), embedded in-process via a thin PyO3 wrapper crate. Every event the tee publishes is also fed into the indexer, keyed by event_id == seq (the same ZMQ batch sequence), so a consumer keeps detecting gaps on seq exactly as before.

Recovery moves from ZMQ to HTTP. A new endpoint GET /kv_recover?start=&end= returns an externally-tagged WorkerKvQueryResponse JSON:

  • Events -the requested range is still in the ring buffer (normal gap replay).
  • TreeDump - start was evicted; a full reconstruction of the current tree as stored events with synthetic 0-based ids. Consumer rule: drop the worker's tree, apply all, set watermark = last_event_id.
  • TooNew - start is ahead of the newest event (no-op).

The live ZMQ stream wire format is unchanged (vLLM msgpack: [topic, 8-byte BE seq, msgpack([ts,[events],dp])]), so existing live subscribers are unaffected. Only the recovery transport changed (ROUTER :5558 -> HTTP /kv_recover).

Key pieces:

  • New crate tensorrt_llm/serve/kv_local_indexer/ - PyO3 wrapper (LocalIndexer: apply_stored/apply_removed/apply_cleared/get_events_json/shutdown). Depends on dynamo-kv-router (no vendoring); built to a cp312 wheel via build_wheel.sh inside the base image so the ABI matches.
  • kv_zmq_publisher.py : removed the deque, ROUTER replay socket, _service_replay; feed the indexer from the single consumer thread; added get_recovery_json().
  • openai_server.py : registered /kv_recover (sync handler so FastAPI runs it in its threadpool, keeping the inference event loop free).
  • kv_events_config.py : new enable_local_indexer flag (replay_endpoint kept but deprecated/ignored).
  • commands/serve.py : updated --kv_events_config help.
  • Dockerfile.python : copies ./wheels/ and pip-installs the wrapper (tolerant if absent, so images that don't need recovery still build).

Gated behind enable_local_indexer (default off), so this is a no-op unless explicitly turned on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant