Support the meta delete with tombstone to prevent dirty writes by drinkbeer · Pull Request #73 · Shopify/dalli

drinkbeer · 2026-05-08T23:02:42Z

Support the meta delete with tombstone to prevent dirty writes.

… supports x and i token

mrattle

A couple more comments on this.

oosidat

Went through some of pair-review's suggestions, looks good overall, just wanted to highlight some potential edge cases.

Review assisted by pair-review

mrattle

Looks good, I did some extra digging and tophatting to confirm the behaviour of the zero-byte value case and that seems fine without any additional handling.

mrattle

Commented on a couple of things that are extensions of latent behaviours (which don't seem to be blowing up anywhere as far as we know) and one telemetry-related thing

mrattle · 2026-06-04T21:26:20Z

+          unless attributes.frozen?
+            attributes['value_bytesize'] = total_value_bytesize
+            attributes['hit_count'] = results.count { |_key, result| result.hit? }
+            attributes['miss_count'] = results.count { |_key, result| result.miss? }


Two things here:

we're treating stale hits as miss=false and result.hit? will return true. Do we want to be including them in the hit count when they're not actually returning usable data? In the current setup where we simply do a hard delete the following reads would be treated as misses, so it feels like we're artificially inflating our hitrate with garbage data here.

as this is written, hit_count + miss_count will sum to keys.length. We're doing two passes of the keys here when we could just be doing a single pass for the misses and then subtracting that number from the length of the array of keys to get the number of hits for a minor performance optimization. Further, this can probably also be wrapped into the keys.each block above.

Good point — agreed that API/result semantics and hit-rate metrics should be separated here. A stale tombstone can still be CacheResult#hit? because memcached returned an item, but counting it as a metrics hit would inflate hit rate compared to the previous hard-delete behavior.

I changed the metrics semantics to:

hit_count: fresh hits only

miss_count: true misses + stale tombstones / other non-fresh results

stale_count: stale tombstones, so we can observe them separately

For read_multi_with_status, the counts are computed while filling results rather than doing extra passes over the result hash.

mrattle · 2026-06-04T21:32:19Z

+          routing_suffix = RequestFormatter.routing_tokens(**routing_token_kwargs(req_options))
+          post_get_req = optimized_for_raw ? "v k q#{routing_suffix}\r\n" : "v f k q#{routing_suffix}\r\n"
+          keys.each do |key|
+            @connection_manager.write("mg #{key} #{post_get_req}")


In this method we don't pass our key throughKeyRegularizer.encode as we do in other methods. This is carried through from the existing read_multi_req method, which has no documented reasoning for skipping that normalization and I think it was an error by ommission.

The end result is that you can potentially have a key with whitespace in it (which is forbidden by the server and will return an error), or you can have injected CRLF which can allow arbitrary commands to be sent via key injection. I don't think that's actually a plausible vector for any use-cases that we currently have, but we still leverage this sanitization in (almost) every other method + the header parsing so it should probably be present here too.

Good catch — agreed. I updated this path to run keys through KeyRegularizer.encode and include the b flag when needed, rather than interpolating the raw key into the meta command. Created a helper method write_read_multi_get to process the keys.

I also applied the same fix to the existing read_multi_req path since it had the same omission. Response parsing now goes through response_processor.key_from_tokens, so base64 response keys are decoded back to the original key before namespace stripping / miss filling.

Added coverage for Unicode, whitespace, and CRLF-containing keys for both get_multi and get_multi_with_status.

Also added unit tests to cover the cases.

mrattle · 2026-06-04T22:18:40Z

+          @connection_manager.flush
+          result = response_processor.meta_get_with_status
+          unless attributes.frozen?
+            attributes['value_bytesize'] = result.value.nil? ? 0 : result.value.bytesize


Going to preface this one as another "bug" that exists elsewhere (in the existing get method) but does not seem to have any impact under our use conditions, so it can probably be safely ignored.

pair-review is calling out the potential for result.value to be a non-String type that doesn't respond to .bytesize, which would raise a NoMethodError.

There are a set of conditions that need to be met for this to happen, and I don't think that we really encounter them:

Need to be using OTel middleware so the above attributes.frozen? is false

Need to be using Dalli directly, using Rails cache will protect against this

Need to be caching non-String types without first serializing them

To handle this cleanly, you could capture the raw byte count inside meta_get_with_status (where the wire read already happens via read_data) and return it alongside the CacheResult:

def meta_get_with_status tokens = error_on_unexpected!([VA, EN, HD]) return [::Dalli::CacheResult.new(value: nil, miss: true), 0] if tokens.first == EN if tokens.first == VA raw_body = read_data(tokens[1].to_i) value = @value_marshaller.retrieve(raw_body, bitflags_from_tokens(tokens)) [::Dalli::CacheResult.new(value: value, stale: stale_from_tokens(tokens)), raw_body.bytesize] else [::Dalli::CacheResult.new(value: nil, miss: true), 0] end end

Then in get_with_status:

result, raw_bytesize = response_processor.meta_get_with_status unless attributes.frozen? attributes['value_bytesize'] = raw_bytesize attributes['hit_count'] = result.miss? ? 0 : 1 attributes['miss_count'] = result.miss? ? 1 : 0 end result

Agreed — this is a better place to measure the size. I updated meta_get_with_status to return both the CacheResult and the raw response body bytesize captured before deserialization, then get_with_status uses that raw bytesize for the OTel value_bytesize attribute.

I kept the updated stale metrics semantics from the earlier change (hit_count = fresh hits only, miss_count = non-fresh, stale_count separately reported).

Also added coverage for a non-String value through get_with_status with OTel enabled, which would have raised on result.value.bytesize before this change.

drinkbeer force-pushed the support_delete_tombstones branch from 8175c0f to ca371ac Compare May 8, 2026 23:07

mrattle reviewed May 12, 2026

View reviewed changes

Comment thread lib/dalli/client.rb Outdated

Comment thread lib/dalli/protocol/meta/request_formatter.rb

drinkbeer added 4 commits May 26, 2026 12:17

Support the meta delete with tombstone to prevent dirty writes

9e9de02

Upgrade Github workflow Memcached version from 1.6.23 to 1.6.41 which…

61000e3

… supports x and i token

Fix the unit tests

c639ede

Make the HD result to be real miss, and add unit test to cover the cases

8ce08da

mrattle reviewed Jun 2, 2026

View reviewed changes

Comment thread lib/dalli/protocol/meta/request_formatter.rb

Comment thread lib/dalli/protocol/meta/response_processor.rb Outdated

oosidat reviewed Jun 2, 2026

View reviewed changes

Comment thread lib/dalli/protocol/base.rb

Comment thread lib/dalli/protocol/meta/response_processor.rb Outdated

Comment thread test/integration/test_tombstone.rb Outdated

Address comments

59dc053

drinkbeer force-pushed the support_delete_tombstones branch from bb0b019 to 59dc053 Compare June 3, 2026 07:28

drinkbeer requested review from mrattle and oosidat June 3, 2026 07:29

oosidat approved these changes Jun 3, 2026

View reviewed changes

mrattle approved these changes Jun 3, 2026

View reviewed changes

Add get_multi_with_status

7453f01

mrattle reviewed Jun 4, 2026

View reviewed changes

Address comments

7182f37

drinkbeer force-pushed the support_delete_tombstones branch from cc6e261 to 7182f37 Compare June 5, 2026 08:04

drinkbeer merged commit 262698c into main Jun 5, 2026
19 of 20 checks passed

drinkbeer deleted the support_delete_tombstones branch June 5, 2026 08:17

Conversation

drinkbeer commented May 8, 2026

Uh oh!

Uh oh!

Uh oh!

mrattle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

oosidat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mrattle left a comment

Choose a reason for hiding this comment

Uh oh!

mrattle left a comment

Choose a reason for hiding this comment

Uh oh!

mrattle Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

drinkbeer Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

mrattle Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

drinkbeer Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drinkbeer Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

mrattle Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

drinkbeer Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

drinkbeer Jun 5, 2026 •

edited

Loading