Skip to content

Fix ZLibDecompressor dropping data past the first gzip member#12674

Open
Ashutosh-177 wants to merge 6 commits into
aio-libs:masterfrom
Ashutosh-177:fix/zlib-multi-member-gzip
Open

Fix ZLibDecompressor dropping data past the first gzip member#12674
Ashutosh-177 wants to merge 6 commits into
aio-libs:masterfrom
Ashutosh-177:fix/zlib-multi-member-gzip

Conversation

@Ashutosh-177
Copy link
Copy Markdown

What do these changes do?

When a response body contains concatenated gzip members (e.g. a server that produces one gzip member per write call, as nginx does under certain configs), zlib.decompressobj sets eof and stores the remaining bytes in unused_data after it finishes the first member. decompress_sync() wasn't checking unused_data at all, so every member after the first was silently discarded. The caller got truncated output with no error.

The fix applies the same while eof and unused_data loop that ZSTDDecompressor already uses for multi-frame zstd streams. Each iteration creates a fresh decompressor and feeds it the leftover bytes, accumulating output across all members. max_length is tracked and honoured across the loop.

Also added unused_data to ZLibDecompressObjProtocol so the attribute is properly typed, and three tests that mirror the existing ZSTD multi-frame test suite.

Are there changes in behaviour for the end user?

Yes — responses with concatenated gzip members now decompress fully instead of being silently truncated at the first member boundary.

Related issue number

Fixes #7157

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • make fmt has been run N/A (formatting only)
  • All the tests pass
  • Changelog entry added
  • Added myself to CONTRIBUTORS.txt

Drafted with Claude Sonnet 4.6; reviewed by Ashutosh-177.

When a response body contains concatenated gzip members (RFC 1952 §2.2),
zlib sets eof and moves the remaining bytes to unused_data once the
first member is fully consumed. decompress_sync() was not checking
unused_data, so every member after the first was silently discarded.

Apply the same while-eof-and-unused_data loop that ZSTDDecompressor
already uses for multi-frame zstd streams. Add unused_data to
ZLibDecompressObjProtocol so the attribute is typed. Include three
tests mirroring the existing ZSTD multi-frame test suite.

Fixes aio-libs#7157

Signed-off-by: Ashutosh Kumar Singh <ahutoshhjp1067@gmail.com>
@psf-chronographer psf-chronographer Bot added the bot:chronographer:provided There is a change note present in this PR label May 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.95%. Comparing base (d2c203f) to head (fb80e0a).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #12674   +/-   ##
=======================================
  Coverage   98.95%   98.95%           
=======================================
  Files         131      131           
  Lines       46688    46730   +42     
  Branches     2421     2424    +3     
=======================================
+ Hits        46200    46242   +42     
  Misses        366      366           
  Partials      122      122           
Flag Coverage Δ
Autobahn 22.41% <13.95%> (-0.01%) ⬇️
CI-GHA 98.92% <100.00%> (+<0.01%) ⬆️
OS-Linux 98.67% <100.00%> (-0.01%) ⬇️
OS-Windows 97.04% <100.00%> (+<0.01%) ⬆️
OS-macOS 97.93% <100.00%> (-0.01%) ⬇️
Py-3.10 98.15% <100.00%> (+<0.01%) ⬆️
Py-3.11 98.41% <100.00%> (+<0.01%) ⬆️
Py-3.12 98.50% <100.00%> (+<0.01%) ⬆️
Py-3.13 98.48% <100.00%> (-0.01%) ⬇️
Py-3.14 98.49% <100.00%> (+<0.01%) ⬆️
Py-3.14t 97.55% <100.00%> (+<0.01%) ⬆️
Py-pypy-3.11 97.42% <100.00%> (-0.02%) ⬇️
VM-macos 97.93% <100.00%> (-0.01%) ⬇️
VM-ubuntu 98.67% <100.00%> (-0.01%) ⬇️
VM-windows 97.04% <100.00%> (+<0.01%) ⬆️
cython-coverage 37.91% <4.65%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Add gzip and decompressor to the spelling wordlist, and replace the
unresolvable Sphinx class cross-reference with a plain code literal.
@Ashutosh-177 Ashutosh-177 marked this pull request as ready for review May 21, 2026 18:28
Copilot AI review requested due to automatic review settings May 21, 2026 18:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes truncated decompression output when servers send concatenated gzip members (multi-member gzip), by teaching ZLibDecompressor.decompress_sync() to continue decompressing unused_data after the first member ends—similar to the existing multi-frame handling in ZSTDDecompressor.

Changes:

  • Add a loop in ZLibDecompressor.decompress_sync() to process concatenated gzip/deflate members via unused_data.
  • Extend typing for the zlib decompressor protocol to include unused_data.
  • Add unit tests for concatenated gzip members and update spelling wordlist/changelog/contributors.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
aiohttp/compression_utils.py Adds multi-member handling for concatenated gzip/deflate streams via unused_data.
tests/test_compression_utils.py Adds tests for concatenated gzip-member decoding and max_length behavior.
docs/spelling_wordlist.txt Adds “decompressor” and “gzip” to the spelling whitelist.
CHANGES/7157.bugfix.rst Documents the bugfix for concatenated gzip/deflate decompression.
CONTRIBUTORS.txt Adds contributor entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread aiohttp/compression_utils.py
Comment thread tests/test_compression_utils.py
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 21, 2026

Merging this PR will not alter performance

✅ 72 untouched benchmarks
⏩ 72 skipped benchmarks1


Comparing Ashutosh-177:fix/zlib-multi-member-gzip (fb80e0a) with master (d2c203f)

Open in CodSpeed

Footnotes

  1. 72 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Ashutosh-177 and others added 3 commits May 22, 2026 00:11
When the output budget is exhausted in the multi-member loop, store the
leftover compressed bytes in _pending_unused_data so the next
decompress_sync() call picks them up, matching the ZSTDDecompressor
behaviour. Also expose _pending_unused_data in data_available and add a
test that mirrors test_zstd_multi_frame_max_length_exhausted_preserves_unused_data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:chronographer:provided There is a change note present in this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decompressing concatenated gzip

2 participants