Skip to content

Support the meta delete with tombstone to prevent dirty writes#73

Merged
drinkbeer merged 7 commits into
mainfrom
support_delete_tombstones
Jun 5, 2026
Merged

Support the meta delete with tombstone to prevent dirty writes#73
drinkbeer merged 7 commits into
mainfrom
support_delete_tombstones

Conversation

@drinkbeer

Copy link
Copy Markdown

Support the meta delete with tombstone to prevent dirty writes.

@drinkbeer drinkbeer force-pushed the support_delete_tombstones branch from 8175c0f to ca371ac Compare May 8, 2026 23:07
Comment thread lib/dalli/client.rb Outdated
Comment thread lib/dalli/protocol/meta/request_formatter.rb

@mrattle mrattle left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more comments on this.

Comment thread lib/dalli/protocol/meta/request_formatter.rb
Comment thread lib/dalli/protocol/meta/response_processor.rb Outdated

@oosidat oosidat left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through some of pair-review's suggestions, looks good overall, just wanted to highlight some potential edge cases.


Review assisted by pair-review

Comment thread lib/dalli/protocol/base.rb
Comment thread lib/dalli/protocol/meta/response_processor.rb Outdated
Comment thread test/integration/test_tombstone.rb Outdated
@drinkbeer drinkbeer force-pushed the support_delete_tombstones branch from bb0b019 to 59dc053 Compare June 3, 2026 07:28
@drinkbeer drinkbeer requested review from mrattle and oosidat June 3, 2026 07:29

@mrattle mrattle left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I did some extra digging and tophatting to confirm the behaviour of the zero-byte value case and that seems fine without any additional handling.

@mrattle mrattle left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented on a couple of things that are extensions of latent behaviours (which don't seem to be blowing up anywhere as far as we know) and one telemetry-related thing

Comment thread lib/dalli/protocol/meta.rb Outdated
unless attributes.frozen?
attributes['value_bytesize'] = total_value_bytesize
attributes['hit_count'] = results.count { |_key, result| result.hit? }
attributes['miss_count'] = results.count { |_key, result| result.miss? }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things here:

  • we're treating stale hits as miss=false and result.hit? will return true. Do we want to be including them in the hit count when they're not actually returning usable data? In the current setup where we simply do a hard delete the following reads would be treated as misses, so it feels like we're artificially inflating our hitrate with garbage data here.
  • as this is written, hit_count + miss_count will sum to keys.length. We're doing two passes of the keys here when we could just be doing a single pass for the misses and then subtracting that number from the length of the array of keys to get the number of hits for a minor performance optimization. Further, this can probably also be wrapped into the keys.each block above.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — agreed that API/result semantics and hit-rate metrics should be separated here. A stale tombstone can still be CacheResult#hit? because memcached returned an item, but counting it as a metrics hit would inflate hit rate compared to the previous hard-delete behavior.

I changed the metrics semantics to:

  • hit_count: fresh hits only
  • miss_count: true misses + stale tombstones / other non-fresh results
  • stale_count: stale tombstones, so we can observe them separately

For read_multi_with_status, the counts are computed while filling results rather than doing extra passes over the result hash.

Comment thread lib/dalli/protocol/meta.rb Outdated
routing_suffix = RequestFormatter.routing_tokens(**routing_token_kwargs(req_options))
post_get_req = optimized_for_raw ? "v k q#{routing_suffix}\r\n" : "v f k q#{routing_suffix}\r\n"
keys.each do |key|
@connection_manager.write("mg #{key} #{post_get_req}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this method we don't pass our key throughKeyRegularizer.encode as we do in other methods. This is carried through from the existing read_multi_req method, which has no documented reasoning for skipping that normalization and I think it was an error by ommission.

The end result is that you can potentially have a key with whitespace in it (which is forbidden by the server and will return an error), or you can have injected CRLF which can allow arbitrary commands to be sent via key injection. I don't think that's actually a plausible vector for any use-cases that we currently have, but we still leverage this sanitization in (almost) every other method + the header parsing so it should probably be present here too.

@drinkbeer drinkbeer Jun 5, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — agreed. I updated this path to run keys through KeyRegularizer.encode and include the b flag when needed, rather than interpolating the raw key into the meta command. Created a helper method write_read_multi_get to process the keys.

I also applied the same fix to the existing read_multi_req path since it had the same omission. Response parsing now goes through response_processor.key_from_tokens, so base64 response keys are decoded back to the original key before namespace stripping / miss filling.

Added coverage for Unicode, whitespace, and CRLF-containing keys for both get_multi and get_multi_with_status.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added unit tests to cover the cases.

Comment thread lib/dalli/protocol/meta.rb Outdated
@connection_manager.flush
result = response_processor.meta_get_with_status
unless attributes.frozen?
attributes['value_bytesize'] = result.value.nil? ? 0 : result.value.bytesize

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to preface this one as another "bug" that exists elsewhere (in the existing get method) but does not seem to have any impact under our use conditions, so it can probably be safely ignored.

pair-review is calling out the potential for result.value to be a non-String type that doesn't respond to .bytesize, which would raise a NoMethodError.

There are a set of conditions that need to be met for this to happen, and I don't think that we really encounter them:

  • Need to be using OTel middleware so the above attributes.frozen? is false
  • Need to be using Dalli directly, using Rails cache will protect against this
  • Need to be caching non-String types without first serializing them

To handle this cleanly, you could capture the raw byte count inside meta_get_with_status (where the wire read already happens via read_data) and return it alongside the CacheResult:

def meta_get_with_status
  tokens = error_on_unexpected!([VA, EN, HD])
  return [::Dalli::CacheResult.new(value: nil, miss: true), 0] if tokens.first == EN

  if tokens.first == VA
    raw_body = read_data(tokens[1].to_i)
    value = @value_marshaller.retrieve(raw_body, bitflags_from_tokens(tokens))
    [::Dalli::CacheResult.new(value: value, stale: stale_from_tokens(tokens)), raw_body.bytesize]
  else
    [::Dalli::CacheResult.new(value: nil, miss: true), 0]
  end
end

Then in get_with_status:

result, raw_bytesize = response_processor.meta_get_with_status
unless attributes.frozen?
  attributes['value_bytesize'] = raw_bytesize
  attributes['hit_count']  = result.miss? ? 0 : 1
  attributes['miss_count'] = result.miss? ? 1 : 0
end
result

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — this is a better place to measure the size. I updated meta_get_with_status to return both the CacheResult and the raw response body bytesize captured before deserialization, then get_with_status uses that raw bytesize for the OTel value_bytesize attribute.

I kept the updated stale metrics semantics from the earlier change (hit_count = fresh hits only, miss_count = non-fresh, stale_count separately reported).

Also added coverage for a non-String value through get_with_status with OTel enabled, which would have raised on result.value.bytesize before this change.

@drinkbeer drinkbeer force-pushed the support_delete_tombstones branch from cc6e261 to 7182f37 Compare June 5, 2026 08:04
@drinkbeer drinkbeer merged commit 262698c into main Jun 5, 2026
19 of 20 checks passed
@drinkbeer drinkbeer deleted the support_delete_tombstones branch June 5, 2026 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants