Bug 5538: Do not update Content-Length from 304 responses#2401
Bug 5538: Do not update Content-Length from 304 responses#2401bowling233 wants to merge 2 commits intosquid-cache:masterfrom
Conversation
RFC 9111 Section 3.2 explicitly excludes Content-Length from the header fields that a cache MUST add to a stored response when updating it from a 304. Some broken servers send Content-Length: 0 in 304 responses, which replaces the correct stored Content-Length and causes clients to receive an empty body.
This comment was marked as resolved.
This comment was marked as resolved.
|
The problem I have with this approach (used previously for Vary issues), is that it does not take into account whether the 304 the server provided was generated from the representation of the resource the proxy has cached. The specification does not require update because changes to size means the representation changed. In which case the "304" status is no longer valid. The origin server should have supplied a 2xx status with the new representation having that new size. PR #2400 solves this along with other similar cases. |
| (id == Http::HdrType::VARY) || | ||
| // RFC 9111 Section 3.2 explicitly excludes Content-Length | ||
| // from the "MUST add ..., replacing already present" list. Also, | ||
| // broken servers are known to send Content-Length:0 in their 304s. | ||
| (id == Http::HdrType::CONTENT_LENGTH); |
There was a problem hiding this comment.
Doing this was my first idea. But when I saw Vary and recalled the reasons we had for ignoring that were essentially the same as Content-Length. I realized ETag and all the negotiable Content-* headers also have the same issues possible.
In other words, all the headers which distinguish between representations of a resource have this problem.
When they change it means something in how the server identifies the representation is broken and none of the 304 should be used. Thus PR #2400 supersedes this.
There was a problem hiding this comment.
I believe this PR is a good step forward and, assuming it has been tested, should be merged.
@yadij, AFAICT, your negative review was based, in part, on incorrect assumptions about 304 meaning and related protocol requirements, as detailed below. Please reconsider your vote.
Amos: The problem I have with this approach, is that it does not take into account whether the 304 the server provided was generated from the representation of the resource the proxy has cached.
Server resources and their representations do change, but, from HTTP point of view, a 304 response essentially means "use what you have". It does not imply "I still have what you have":
A 304 (Not Modified) response status code indicates that the stored response can be updated and reused
In the above RFC 9111 quote, "stored response" refers to the response stored by Squid, not server.
In addition to other considerations, known server bugs make it impossible for Squid to distinguish changed representation from unchanged one in some contexts. However, this is a secondary concern.
The specification does not require update
Clarification: The specification essentially prohibits Content-Length update. PR code follows that important rule, fixing a Squid bug.
because changes to size means the representation changed.
... or known bugs in relatively popular servers. In either case, a 304 allows Squid to reuse its stored response.
N.B. HTTP specs talk about selecting stored responses that must be updated after a 304, but, AFAICT, they do not say or even imply "do not reuse a stored response if you cannot find a matching stored response". That HTTP-required matching algorithm does not even take Content-Length into account AFAICT, but even if it did, it would still allow Squid to reuse the stored response on Content-Length mismatches!
In which case the "304" status is no longer valid.
AFAICT, HTTP protocol does not have a concept of an invalid 304 status.
The origin server should have supplied a 2xx status with the new representation having that new size.
I see no such requirement in HTTP protocol. The specs say 304 means that the client has "a valid representation" that it may reuse, not "the same representation as the origin currently has":
The 304 (Not Modified) status code indicates that a conditional GET or HEAD request has been received and would have resulted in a 200 (OK) response if it were not for the fact that the condition evaluated to false. In other words, there is no need for the server to transfer a representation of the target resource because the request indicates that the client, which made the request conditional, already has a valid representation; the server is therefore redirecting the client to make use of that stored representation as if it were the content of a 200 (OK) response.
|
@bowling233, I expect one of the CI tests to fail because your contact information is not in the CONTRIBUTORS file. Please add that entry, preferably matching your GitHub account email. This should be done as a separate commit on the PR branch -- no need to squash, rebase, etc. |
0cab5b5 to
030e2b2
Compare
Indeed, your would have a point if we were doing the cache lookup. That is long past, what we need to enforce in this logic is that the server is actually talking about our already-found response before we change it. The buggy servers do not provide 304 correctly matching the representation we are updating (eg. Vary identifier and Content-Length [ part of fingerprint identifier] changed). That means the section 3.2 "which headers to skip" is not even relevant - the cache object and 304 simply do not align properly. |
In PR-modified code, we are way past that "Is the server talking about our response?" decision as well. We have already decided, and that decision was "yes, the server is talking about our response; we found a match". One may argue that we made a wrong decision, but that argument is outside this PR scope. This PR is about what to update when updating. In known bug manifestations, we do want to update. In code terms, this PR is about fixing updateOnNotModified() effects (in some cases); it is not about avoiding updateOnNotModified() calls (in some cases). |
Exactly my point/objection. The bug is in that earlier code, not the later code changed by this PR. Changing this later code is a bandaid over a single symptom of the actual bug. Other symptoms remain problematic until the bug is properly fixed - at which point the bandaid is not "fixing" anything. |
|
@yadij, I understand your concern that the real bug lies in the code path that decides whether a 304 is applicable to the cached representation -- that is a valid issue and I agree PR #2400's approach to representation validation is worthwhile work. However, I think these two PRs address different RFC 9111 requirements and should not block each other. These are two separate MUST-level requirements at different layers. Section 4.3.4 (which PR #2400 targets) governs whether to update a stored response. But once a stored response is identified for update, the RFC chains to Section 3.2 for how to perform that update:
Section 3.2 is unambiguous about Content-Length:
This is a blanket exception -- there is no qualifier like "when the representation matches" or "unless the validator differs." It applies unconditionally every time The exclusion exists for a fundamental reason, not just as a workaround for mismatched representations. Section 3.2 explains the rationale:
Content-Length describes the body stored in the cache. A 304 response body is empty, so its Content-Length (whether 0 or absent) describes nothing relevant to the stored representation. A perfectly valid 304 for the correct representation should still not overwrite Content-Length. In summary: PR #2400 addresses Section 4.3.4 compliance (representation matching before update). This PR addresses Section 3.2 compliance (correct header update rules during update). Section 4.3.4 itself defers to Section 3.2 for the update procedure. Both fixes are complementary and can coexist. Merging this minimal change does not preclude or complicate the broader work in PR #2400. I would appreciate if you would reconsider your review. |
This minimal fix addresses several known problems with broken servers
and improves Squid HTTP compliance. More work is needed to improve
handling of other 304-related cases, but that work requires a lot more
changes, and this fix does not stand in the way of that effort.