Skip to content

Bug 5538: Do not update Content-Length from 304 responses#2401

Open
bowling233 wants to merge 2 commits intosquid-cache:masterfrom
ZJUSCT:fix-content-length-304
Open

Bug 5538: Do not update Content-Length from 304 responses#2401
bowling233 wants to merge 2 commits intosquid-cache:masterfrom
ZJUSCT:fix-content-length-304

Conversation

@bowling233
Copy link
Copy Markdown

@bowling233 bowling233 commented Apr 5, 2026

This minimal fix addresses several known problems with broken servers
and improves Squid HTTP compliance. More work is needed to improve
handling of other 304-related cases, but that work requires a lot more
changes, and this fix does not stand in the way of that effort.

RFC 9111 Section 3.2 explicitly excludes Content-Length from
the header fields that a cache MUST add to a stored response
when updating it from a 304. Some broken servers send
Content-Length: 0 in 304 responses, which replaces the
correct stored Content-Length and causes clients to receive
an empty body.
Copilot AI review requested due to automatic review settings April 5, 2026 05:54
@squid-anubis

This comment was marked as resolved.

@squid-anubis squid-anubis added the M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels label Apr 5, 2026

This comment has been minimized.

@squid-anubis squid-anubis removed the M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels label Apr 5, 2026
@yadij
Copy link
Copy Markdown
Contributor

yadij commented Apr 5, 2026

The problem I have with this approach (used previously for Vary issues), is that it does not take into account whether the 304 the server provided was generated from the representation of the resource the proxy has cached.

The specification does not require update because changes to size means the representation changed. In which case the "304" status is no longer valid. The origin server should have supplied a 2xx status with the new representation having that new size. PR #2400 solves this along with other similar cases.

Comment on lines +396 to +400
(id == Http::HdrType::VARY) ||
// RFC 9111 Section 3.2 explicitly excludes Content-Length
// from the "MUST add ..., replacing already present" list. Also,
// broken servers are known to send Content-Length:0 in their 304s.
(id == Http::HdrType::CONTENT_LENGTH);
Copy link
Copy Markdown
Contributor

@yadij yadij Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this was my first idea. But when I saw Vary and recalled the reasons we had for ignoring that were essentially the same as Content-Length. I realized ETag and all the negotiable Content-* headers also have the same issues possible.

In other words, all the headers which distinguish between representations of a resource have this problem.

When they change it means something in how the server identifies the representation is broken and none of the 304 should be used. Thus PR #2400 supersedes this.

@rousskov rousskov changed the title Do not update Content-Length from 304 responses Bug 5538: Do not update Content-Length from 304 responses Apr 5, 2026
rousskov
rousskov previously approved these changes Apr 5, 2026
Copy link
Copy Markdown
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this PR is a good step forward and, assuming it has been tested, should be merged.


@yadij, AFAICT, your negative review was based, in part, on incorrect assumptions about 304 meaning and related protocol requirements, as detailed below. Please reconsider your vote.

Amos: The problem I have with this approach, is that it does not take into account whether the 304 the server provided was generated from the representation of the resource the proxy has cached.

Server resources and their representations do change, but, from HTTP point of view, a 304 response essentially means "use what you have". It does not imply "I still have what you have":

A 304 (Not Modified) response status code indicates that the stored response can be updated and reused

In the above RFC 9111 quote, "stored response" refers to the response stored by Squid, not server.

In addition to other considerations, known server bugs make it impossible for Squid to distinguish changed representation from unchanged one in some contexts. However, this is a secondary concern.

The specification does not require update

Clarification: The specification essentially prohibits Content-Length update. PR code follows that important rule, fixing a Squid bug.

because changes to size means the representation changed.

... or known bugs in relatively popular servers. In either case, a 304 allows Squid to reuse its stored response.

N.B. HTTP specs talk about selecting stored responses that must be updated after a 304, but, AFAICT, they do not say or even imply "do not reuse a stored response if you cannot find a matching stored response". That HTTP-required matching algorithm does not even take Content-Length into account AFAICT, but even if it did, it would still allow Squid to reuse the stored response on Content-Length mismatches!

In which case the "304" status is no longer valid.

AFAICT, HTTP protocol does not have a concept of an invalid 304 status.

The origin server should have supplied a 2xx status with the new representation having that new size.

I see no such requirement in HTTP protocol. The specs say 304 means that the client has "a valid representation" that it may reuse, not "the same representation as the origin currently has":

The 304 (Not Modified) status code indicates that a conditional GET or HEAD request has been received and would have resulted in a 200 (OK) response if it were not for the fact that the condition evaluated to false. In other words, there is no need for the server to transfer a representation of the target resource because the request indicates that the client, which made the request conditional, already has a valid representation; the server is therefore redirecting the client to make use of that stored representation as if it were the content of a 200 (OK) response.

@rousskov
Copy link
Copy Markdown
Contributor

rousskov commented Apr 5, 2026

@bowling233, I expect one of the CI tests to fail because your contact information is not in the CONTRIBUTORS file. Please add that entry, preferably matching your GitHub account email. This should be done as a separate commit on the PR branch -- no need to squash, rebase, etc.

@bowling233 bowling233 force-pushed the fix-content-length-304 branch from 0cab5b5 to 030e2b2 Compare April 5, 2026 15:20
@rousskov rousskov added the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label Apr 5, 2026
@yadij
Copy link
Copy Markdown
Contributor

yadij commented Apr 5, 2026

N.B. HTTP specs talk about selecting stored responses that must be updated after a 304, but, AFAICT, they do not say or even imply "do not reuse a stored response if you cannot find a matching stored response".

Indeed, your would have a point if we were doing the cache lookup. That is long past, what we need to enforce in this logic is that the server is actually talking about our already-found response before we change it.

The buggy servers do not provide 304 correctly matching the representation we are updating (eg. Vary identifier and Content-Length [ part of fingerprint identifier] changed). That means the section 3.2 "which headers to skip" is not even relevant - the cache object and 304 simply do not align properly.

@rousskov
Copy link
Copy Markdown
Contributor

rousskov commented Apr 5, 2026

N.B. HTTP specs talk about selecting stored responses that must be updated after a 304, but, AFAICT, they do not say or even imply "do not reuse a stored response if you cannot find a matching stored response".

Indeed, your would have a point if we were doing the cache lookup. That is long past, what we need to enforce in this logic is that the server is actually talking about our already-found response before we change it.

In PR-modified code, we are way past that "Is the server talking about our response?" decision as well. We have already decided, and that decision was "yes, the server is talking about our response; we found a match".

One may argue that we made a wrong decision, but that argument is outside this PR scope. This PR is about what to update when updating. In known bug manifestations, we do want to update. In code terms, this PR is about fixing updateOnNotModified() effects (in some cases); it is not about avoiding updateOnNotModified() calls (in some cases).

@yadij
Copy link
Copy Markdown
Contributor

yadij commented Apr 6, 2026

N.B. HTTP specs talk about selecting stored responses that must be updated after a 304, but, AFAICT, they do not say or even imply "do not reuse a stored response if you cannot find a matching stored response".

Indeed, your would have a point if we were doing the cache lookup. That is long past, what we need to enforce in this logic is that the server is actually talking about our already-found response before we change it.

In PR-modified code, we are way past that "Is the server talking about our response?" decision as well.

Exactly my point/objection. The bug is in that earlier code, not the later code changed by this PR.

Changing this later code is a bandaid over a single symptom of the actual bug. Other symptoms remain problematic until the bug is properly fixed - at which point the bandaid is not "fixing" anything.

@bowling233
Copy link
Copy Markdown
Author

@yadij, I understand your concern that the real bug lies in the code path that decides whether a 304 is applicable to the cached representation -- that is a valid issue and I agree PR #2400's approach to representation validation is worthwhile work.

However, I think these two PRs address different RFC 9111 requirements and should not block each other.

These are two separate MUST-level requirements at different layers. Section 4.3.4 (which PR #2400 targets) governs whether to update a stored response. But once a stored response is identified for update, the RFC chains to Section 3.2 for how to perform that update:

For each stored response identified, the cache MUST update its header fields with the header fields provided in the 304 (Not Modified) response, as per Section 3.2. (RFC 9111 Section 4.3.4)

Section 3.2 is unambiguous about Content-Length:

the cache MUST add each header field in the provided response to the stored response, replacing field values that are already present, with the following exceptions: [...] The Content-Length header field. (RFC 9111 Section 3.2)

This is a blanket exception -- there is no qualifier like "when the representation matches" or "unless the validator differs." It applies unconditionally every time update() is called. Even with perfect representation validation in place, Section 3.2 would still require skipping Content-Length.

The exclusion exists for a fundamental reason, not just as a workaround for mismatched representations. Section 3.2 explains the rationale:

In some cases, caches [...] store the results of processing the received response, rather than the response itself, and updating header fields that affect that processing can result in inconsistent behavior and security issues. (RFC 9111 Section 3.2)

Content-Length describes the body stored in the cache. A 304 response body is empty, so its Content-Length (whether 0 or absent) describes nothing relevant to the stored representation. A perfectly valid 304 for the correct representation should still not overwrite Content-Length.

In summary: PR #2400 addresses Section 4.3.4 compliance (representation matching before update). This PR addresses Section 3.2 compliance (correct header update rules during update). Section 4.3.4 itself defers to Section 3.2 for the update procedure. Both fixes are complementary and can coexist. Merging this minimal change does not preclude or complicate the broader work in PR #2400. I would appreciate if you would reconsider your review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants