Reject all RFC 9110 forbidden control characters in outbound headers#12689
Reject all RFC 9110 forbidden control characters in outbound headers#12689rodrigobnogueira wants to merge 2 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #12689 +/- ##
==========================================
+ Coverage 98.93% 98.95% +0.02%
==========================================
Files 131 131
Lines 46688 46708 +20
Branches 2421 2421
==========================================
+ Hits 46190 46220 +30
+ Misses 374 366 -8
+ Partials 124 122 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Merging this PR will not alter performance
Comparing Footnotes
|
Reflect the broader RFC 9110 §5.5 / RFC 9112 §4 forbidden-CTL rejection in §5.2's components list, selection note, threat row 2.1, mitigation rows 2.1/2.2/2.3/2.7/2.13, and the past-advisories recap. Adjust the mermaid flow label from "reject CR/LF/NUL" to "reject forbidden CTLs" and add the GHSA to the recap as the source of the tightening.
| # RFC 9110 §5.5 / RFC 9112 §4: reject all ASCII control characters | ||
| # (0x00-0x08, 0x0A-0x1F, 0x7F) in headers, status lines, and reason | ||
| # phrases. HTAB (0x09) and SP (0x20) remain permitted. | ||
| if (ch < 0x20 and ch != 0x09) or ch == 0x7F: |
There was a problem hiding this comment.
| # RFC 9110 §5.5 / RFC 9112 §4: reject all ASCII control characters | |
| # (0x00-0x08, 0x0A-0x1F, 0x7F) in headers, status lines, and reason | |
| # phrases. HTAB (0x09) and SP (0x20) remain permitted. | |
| if (ch < 0x20 and ch != 0x09) or ch == 0x7F: | |
| # https://www.rfc-editor.org/info/rfc9110/#section-5.5-5 | |
| # https://www.rfc-editor.org/info/rfc9112/#section-4-3 | |
| if (ch < 0x20 and ch != 0x09) or ch == 0x7F: |
| | 2.1 | Header CR / LF / NUL injection | Both backends reject these bytes via `_write_str_raise_on_nlcr` (`_http_writer.pyx:_write_str_raise_on_nlcr`) and `_safe_header` (`http_writer.py:_safe_header`), raising `ValueError` from `_serialize_headers` before any byte hits the transport. Applied symmetrically to names, values, and the status line. | **The current tests import whichever `_serialize_headers` won the import, so only one backend is exercised. Parameterise like `tests/test_http_parser.py` does (cross-cuts [§6.1](#61-highest-leverage-recommendations) #3).** | | ||
| | 2.2 | Status-line `reason` injection | `web_response.Response._set_status` (`web_response.py:StreamResponse._set_status`) rejects `\r` / `\n` in `reason` *at set-time*. The writer also rejects them at write-time as part of the status-line validation. | None. | | ||
| | 2.3 | Request-line path / method | The full status line (`{method} {path} HTTP/{v}.{v}`) goes through `_write_str_raise_on_nlcr` / `_safe_header`, so CR / LF / NUL are caught regardless of whether `path` came from `yarl` or `method` was a caller-supplied string. yarl additionally rejects these bytes earlier per RFC 3986. | None. | | ||
| | 2.1 | Header forbidden-CTL injection | Both backends reject the full RFC 9110 §5.5 / RFC 9112 §4 forbidden set (`0x00-0x08`, `0x0A-0x1F`, `0x7F`; HTAB and SP permitted) via `_write_str_raise_on_nlcr` (`_http_writer.pyx:_write_str_raise_on_nlcr`) and `_safe_header` (`http_writer.py:_safe_header`), raising `ValueError` from `_serialize_headers` before any byte hits the transport. Applied symmetrically to names, values, and the status line. Hardening tightened in GHSA-xjx4-5hx2-2cv8. | **The current tests import whichever `_serialize_headers` won the import, so only one backend is exercised. Parameterise like `tests/test_http_parser.py` does (cross-cuts [§6.1](#61-highest-leverage-recommendations) #3).** | |
There was a problem hiding this comment.
@rodrigobnogueira Going to need to be more careful here. GHSA-xjx4-5hx2-2cv8 isn't public. We've also not deemed it a vulnerability, so this won't be published. Should just reference the PR (and probably only need it in the recap section at the bottom, rather than in here.
What do these changes do?
Tighten the outbound header serializer so it rejects every ASCII control
character that :rfc:
9110#section-5.5and :rfc:9112#section-4forbid inheader field-values and reason-phrases (
0x00-0x08,0x0A-0x1F,0x7F), not justCR,LFandNUL._safe_header()inaiohttp/http_writer.pynow uses a compiled regexcovering the full forbidden set.
_write_str_raise_on_nlcr()inaiohttp/_http_writer.pyxuses theequivalent inequality
(ch < 0x20 and ch != 0x09) or ch == 0x7F.0x09) and SP (0x20) remain permitted, matching RFC 9110.broader forbidden set across status lines, header names, and field
values, plus a positive test for HTAB.
This aligns outbound validation with inbound strict parsing.
Are there changes in behavior for the user?
Yes. Applications that placed bare control characters (other than HTAB)
into outbound headers will now get a
ValueErrorinstead of silentlyemitting non-RFC-compliant bytes. The error message changes from
"Newline, carriage return, or null byte detected in headers." to
"Forbidden control character detected in headers."
Is it a substantial burden for the maintainers to support this?
No. It is a small, contained change in two files (the pure-Python and
Cython serializers) with mirrored logic.
Related issue number
N/A
Checklist
CONTRIBUTORS.txt(already listed)CHANGES/folder