Skip to content

fix(cubestore): expose CUBESTORE_CACHESTORE_WAL_TTL_SECONDS to bound the cachestore WAL#11166

Open
killzoner wants to merge 1 commit into
cube-js:masterfrom
killzoner:fix/cachestore-wal-ttl-seconds
Open

fix(cubestore): expose CUBESTORE_CACHESTORE_WAL_TTL_SECONDS to bound the cachestore WAL#11166
killzoner wants to merge 1 commit into
cube-js:masterfrom
killzoner:fix/cachestore-wal-ttl-seconds

Conversation

@killzoner

@killzoner killzoner commented Jun 26, 2026

Copy link
Copy Markdown

Check List

  • Tests have been run in packages where changes have been made if available (cargo test -p cubestore cachestore + the new config tests pass)
  • Linter has been run for changed code (cargo fmt --check clean; cargo clippy shows no new issues in the changed files)
  • Tests for the changes have been added if not covered yet (unit tests for the WAL ttl min-clamp validation)
  • Docs have been added / updated if required (added CUBESTORE_CACHESTORE_WAL_TTL_SECONDS to the environment variables reference)

Issue Reference this PR resolves

Fixes #11168

Description of Changes Made

The cachestore's RocksDB keeps old write-ahead-log (WAL) files for a retention time that is currently hardcoded (~330s).

  • Under sustained write load these archived WAL files pile up and can fill the data volume (e.g. a Kubernetes emptyDir), which gets the pod evicted.
  • This adds CUBESTORE_CACHESTORE_WAL_TTL_SECONDS to make that retention configurable:
    • Unset -> same value as today, so behavior is unchanged.
    • Lower -> RocksDB drops archived WAL sooner, keeping it smaller on disk. Useful when cachestore log upload is off (the default) -- that is the only thing that needs the WAL kept.
    • Minimum 2s (0 would turn WAL cleanup off entirely).
  • WAL_size_limit_MB was deliberately not used: enabling it alongside the ttl makes RocksDB switch from the fast cleanup cycle to a 10-minute one, so it keeps more WAL, not less.

@github-actions github-actions Bot added cube store Issues relating to Cube Store rust Pull requests that update Rust code pr:community Contribution from Cube.js community members. labels Jun 26, 2026
@killzoner killzoner force-pushed the fix/cachestore-wal-ttl-seconds branch from 0c5fa17 to 96189e6 Compare June 26, 2026 13:33
@killzoner killzoner force-pushed the fix/cachestore-wal-ttl-seconds branch from 96189e6 to 8b1e5b3 Compare June 26, 2026 13:57
@killzoner killzoner force-pushed the fix/cachestore-wal-ttl-seconds branch from 8b1e5b3 to d0ea25f Compare June 26, 2026 14:03
@killzoner killzoner force-pushed the fix/cachestore-wal-ttl-seconds branch from d0ea25f to 0410c00 Compare June 26, 2026 14:08
…the cachestore WAL

The cachestore opens RocksDB with a hardcoded WAL ttl of
meta_store_snapshot_interval + meta_store_log_upload_interval (~330s),
retained so get_updates_since can read recent batches for log upload.
Under sustained high-cardinality write churn the archived WAL is kept
for that whole window and grows large; on a Kubernetes emptyDir this can
fill the volume and the kubelet evicts the pod.

Expose the ttl as CUBESTORE_CACHESTORE_WAL_TTL_SECONDS. When unset it
falls back to the existing default (the snapshot + log-upload sum), so
behavior is unchanged; deployments that do not use cachestore log upload
can lower it to bound the archived WAL -- RocksDB purges archived WAL
older than the ttl on a wal_ttl/2 cycle, so a lower value gives a lower
on-disk plateau. Floored at 2s (0 would disable WAL-ttl purge).
@killzoner

Copy link
Copy Markdown
Author

From what I've seen in the commits, probably @ovr would be best to have a look when you have time. Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cube store Issues relating to Cube Store pr:community Contribution from Cube.js community members. rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cubestore: cachestore WAL archive grows unbounded (no ttl config), filling the data volume

1 participant