PARQUET-3261: Fix integer overflow in CapacityByteArrayOutputStream#3525
PARQUET-3261: Fix integer overflow in CapacityByteArrayOutputStream#3525yadavay-amzn wants to merge 1 commit intoapache:masterfrom
Conversation
The overflow check in addSlab() used bytesUsed which is not updated until after addSlab() returns in write(). This caused the overflow guard to miss cases where bytesAllocated + nextSlabSize exceeds Integer.MAX_VALUE. Fix: - Use bytesAllocated instead of bytesUsed for the overflow check, since bytesAllocated is always up to date when addSlab() is called. - Cap nextSlabSize when it would cause bytesAllocated to overflow, instead of letting Math.addExact throw an uncaught ArithmeticException.
2dab1ed to
29bbc79
Compare
|
Friendly ping to @wgtmac @Fokko, this is a small doc/parser change that's been LGTM'd by @steveloughran. Could one of you please take a look when you have a moment? Thanks! |
There was a problem hiding this comment.
Pull request overview
This pull request fixes an integer-overflow edge case in CapacityByteArrayOutputStream.addSlab(int) that could previously surface as an uncaught ArithmeticException when writing very large values (e.g., large ARRAY<STRING> columns), and adds regression tests to validate the corrected behavior.
Changes:
- Adjust overflow guarding in
addSlab()to usebytesAllocated(which is current at slab-allocation time) rather thanbytesUsed. - Cap the computed
nextSlabSizesobytesAllocated + nextSlabSizecannot overflowInteger.MAX_VALUE, preventingMath.addExactfrom throwing unexpectedly. - Add JUnit regression tests covering both the “near max” capping case and the “true overflow”
OutOfMemoryErrorcase.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| parquet-common/src/main/java/org/apache/parquet/bytes/CapacityByteArrayOutputStream.java | Fixes the overflow detection and prevents bytesAllocated growth from triggering an uncaught ArithmeticException. |
| parquet-common/src/test/java/org/apache/parquet/bytes/TestCapacityByteArrayOutputStreamOverflow.java | Adds regression tests validating slab-size capping near Integer.MAX_VALUE and correct OutOfMemoryError behavior on genuine overflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks @yadavay-amzn for fixing it and @steveloughran for the review! This makes sense to me. |
What changes were made?
Fix integer overflow in
CapacityByteArrayOutputStream.addSlab()that causesArithmeticExceptionwhen writing largeARRAY<STRING>columns (issue #3261).Root cause (identified by @Kimahriman)
The overflow check in
addSlab()usedbytesUsedto detect overflow, butbytesUsedis not updated until afteraddSlab()returns inwrite(). Additionally,nextSlabSizecan be larger thanminimumSize(due to the doubling strategy), so checking onlybytesUsed + minimumSizewas insufficient.This meant
bytesAllocated = Math.addExact(this.bytesAllocated, nextSlabSize)could overflow without being caught by the guard, throwing an uncaughtArithmeticExceptioninstead of the intendedOutOfMemoryError.Fix
bytesAllocatedinstead ofbytesUsedfor the overflow check —bytesAllocatedis always up to date whenaddSlab()is called.nextSlabSizewhen it would causebytesAllocatedto overflowInteger.MAX_VALUE, preventing the uncaughtArithmeticExceptionon theMath.addExactcall.Tests
Added
TestCapacityByteArrayOutputStreamOverflowwith two tests:Integer.MAX_VALUEsucceeds (previously threwArithmeticException)OutOfMemoryErroras intended