-
Notifications
You must be signed in to change notification settings - Fork 270
feat: add complex type support to native Parquet writer #3214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Enables support for complex types (arrays, maps, structs) in Comet's native Parquet writer by removing the blocking check that previously prevented them. Changes: - Remove complex type blocking check in CometDataWritingCommand.scala - Add comprehensive test coverage for complex types including: - Basic complex types (array, struct, map) - Nested complex types (array of structs, struct containing array, etc.) - Nullable complex types with nulls at various nesting levels - Complex types containing decimal and temporal types - Empty arrays and maps - Fuzz testing with randomly generated complex type schemas - Update documentation to reflect complex type support Co-Authored-By: Claude Opus 4.5 <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3214 +/- ##
============================================
+ Coverage 56.12% 59.63% +3.50%
- Complexity 976 1416 +440
============================================
Files 119 170 +51
Lines 11743 15700 +3957
Branches 2251 2595 +344
============================================
+ Hits 6591 9362 +2771
- Misses 4012 5021 +1009
- Partials 1140 1317 +177 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Enable spark.comet.scan.allowIncompatible in complex type tests so that native_iceberg_compat scan is used (which supports complex types) instead of falling back to native_comet (which doesn't support complex types). Co-Authored-By: Claude Opus 4.5 <[email protected]>
c6bc58a to
93d1f82
Compare
The CI sets COMET_PARQUET_SCAN_IMPL=native_comet for some test profiles, which overrides the default auto mode. Since native_comet doesn't support complex types, the scan falls back to Spark's reader which produces OnHeapColumnVector instead of CometVector, causing the native writer to fail. This fix explicitly sets COMET_NATIVE_SCAN_IMPL to "auto" in the test configuration, allowing native_iceberg_compat to be used for complex types. Co-Authored-By: Claude Opus 4.5 <[email protected]>
@comphead ☝️ |

Summary
Changes
CometDataWritingCommand.scalaCometParquetWriterSuite.scala:Test plan
🤖 Generated with Claude Code