Fix ZLibDecompressor dropping data past the first gzip member#12674
Fix ZLibDecompressor dropping data past the first gzip member#12674Ashutosh-177 wants to merge 6 commits into
Conversation
When a response body contains concatenated gzip members (RFC 1952 §2.2), zlib sets eof and moves the remaining bytes to unused_data once the first member is fully consumed. decompress_sync() was not checking unused_data, so every member after the first was silently discarded. Apply the same while-eof-and-unused_data loop that ZSTDDecompressor already uses for multi-frame zstd streams. Add unused_data to ZLibDecompressObjProtocol so the attribute is typed. Include three tests mirroring the existing ZSTD multi-frame test suite. Fixes aio-libs#7157 Signed-off-by: Ashutosh Kumar Singh <ahutoshhjp1067@gmail.com>
for more information, see https://pre-commit.ci
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #12674 +/- ##
=======================================
Coverage 98.95% 98.95%
=======================================
Files 131 131
Lines 46688 46730 +42
Branches 2421 2424 +3
=======================================
+ Hits 46200 46242 +42
Misses 366 366
Partials 122 122
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Add gzip and decompressor to the spelling wordlist, and replace the unresolvable Sphinx class cross-reference with a plain code literal.
There was a problem hiding this comment.
Pull request overview
This PR fixes truncated decompression output when servers send concatenated gzip members (multi-member gzip), by teaching ZLibDecompressor.decompress_sync() to continue decompressing unused_data after the first member ends—similar to the existing multi-frame handling in ZSTDDecompressor.
Changes:
- Add a loop in
ZLibDecompressor.decompress_sync()to process concatenated gzip/deflate members viaunused_data. - Extend typing for the zlib decompressor protocol to include
unused_data. - Add unit tests for concatenated gzip members and update spelling wordlist/changelog/contributors.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
aiohttp/compression_utils.py |
Adds multi-member handling for concatenated gzip/deflate streams via unused_data. |
tests/test_compression_utils.py |
Adds tests for concatenated gzip-member decoding and max_length behavior. |
docs/spelling_wordlist.txt |
Adds “decompressor” and “gzip” to the spelling whitelist. |
CHANGES/7157.bugfix.rst |
Documents the bugfix for concatenated gzip/deflate decompression. |
CONTRIBUTORS.txt |
Adds contributor entry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Merging this PR will not alter performance
Comparing Footnotes
|
When the output budget is exhausted in the multi-member loop, store the leftover compressed bytes in _pending_unused_data so the next decompress_sync() call picks them up, matching the ZSTDDecompressor behaviour. Also expose _pending_unused_data in data_available and add a test that mirrors test_zstd_multi_frame_max_length_exhausted_preserves_unused_data.
What do these changes do?
When a response body contains concatenated gzip members (e.g. a server that produces one gzip member per write call, as nginx does under certain configs),
zlib.decompressobjsetseofand stores the remaining bytes inunused_dataafter it finishes the first member.decompress_sync()wasn't checkingunused_dataat all, so every member after the first was silently discarded. The caller got truncated output with no error.The fix applies the same
while eof and unused_dataloop thatZSTDDecompressoralready uses for multi-frame zstd streams. Each iteration creates a fresh decompressor and feeds it the leftover bytes, accumulating output across all members.max_lengthis tracked and honoured across the loop.Also added
unused_datatoZLibDecompressObjProtocolso the attribute is properly typed, and three tests that mirror the existing ZSTD multi-frame test suite.Are there changes in behaviour for the end user?
Yes — responses with concatenated gzip members now decompress fully instead of being silently truncated at the first member boundary.
Related issue number
Fixes #7157
Checklist
make fmthas been run N/A (formatting only)CONTRIBUTORS.txtDrafted with Claude Sonnet 4.6; reviewed by Ashutosh-177.