feat: add complex type support to native Parquet writer #3214

andygrove · 2026-01-18T19:24:33Z

Summary

Enables support for complex types (arrays, maps, structs) in Comet's native Parquet writer
Removes the blocking check that previously prevented complex types
Adds comprehensive test coverage for complex types

Changes

Remove complex type blocking check in CometDataWritingCommand.scala
Add 12 new tests for complex types in CometParquetWriterSuite.scala:
- Basic complex types (array, struct, map)
- Nested complex types (array of structs, struct containing array, map with struct values, deeply nested)
- Nullable complex types with nulls at various nesting levels
- Complex types containing decimal and temporal types
- Empty arrays and maps
- Fuzz testing with randomly generated complex type schemas
Update documentation to reflect complex type support

Test plan

Tests verify round-trip compatibility (write with Comet, read with Spark/Comet)
Fuzz testing with randomly generated schemas

🤖 Generated with Claude Code

Enables support for complex types (arrays, maps, structs) in Comet's native Parquet writer by removing the blocking check that previously prevented them. Changes: - Remove complex type blocking check in CometDataWritingCommand.scala - Add comprehensive test coverage for complex types including: - Basic complex types (array, struct, map) - Nested complex types (array of structs, struct containing array, etc.) - Nullable complex types with nulls at various nesting levels - Complex types containing decimal and temporal types - Empty arrays and maps - Fuzz testing with randomly generated complex type schemas - Update documentation to reflect complex type support Co-Authored-By: Claude Opus 4.5 <[email protected]>

codecov-commenter · 2026-01-18T19:47:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.63%. Comparing base (f09f8af) to head (af1d474).
⚠️ Report is 855 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3214      +/-   ##
============================================
+ Coverage     56.12%   59.63%   +3.50%     
- Complexity      976     1416     +440     
============================================
  Files           119      170      +51     
  Lines         11743    15700    +3957     
  Branches       2251     2595     +344     
============================================
+ Hits           6591     9362    +2771     
- Misses         4012     5021    +1009     
- Partials       1140     1317     +177

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Enable spark.comet.scan.allowIncompatible in complex type tests so that native_iceberg_compat scan is used (which supports complex types) instead of falling back to native_comet (which doesn't support complex types). Co-Authored-By: Claude Opus 4.5 <[email protected]>

andygrove · 2026-01-18T21:16:12Z

With these changes, we can run the PySpark repartition benchmark fully native, and it shows an almost 2x speedup compared to Spark (and also ~2x compared to Comet when writes are disabled and Comet does the columnar-to-row transition).

The CI sets COMET_PARQUET_SCAN_IMPL=native_comet for some test profiles, which overrides the default auto mode. Since native_comet doesn't support complex types, the scan falls back to Spark's reader which produces OnHeapColumnVector instead of CometVector, causing the native writer to fail. This fix explicitly sets COMET_NATIVE_SCAN_IMPL to "auto" in the test configuration, allowing native_iceberg_compat to be used for complex types. Co-Authored-By: Claude Opus 4.5 <[email protected]>

andygrove · 2026-01-18T22:47:53Z

With these changes, we can run the PySpark repartition benchmark fully native, and it shows an almost 2x speedup compared to Spark (and also ~2x compared to Comet when writes are disabled and Comet does the columnar-to-row transition).

@comphead ☝️

andygrove marked this pull request as draft January 18, 2026 20:17

andygrove force-pushed the feat/complex-type-parquet-write branch from c6bc58a to 93d1f82 Compare January 18, 2026 20:37

andygrove mentioned this pull request Jan 18, 2026

Comet should gracefully handle OnHeapColumnVector instead of failing #3215

Open

andygrove and others added 3 commits January 18, 2026 14:18

save

bc6c799

format

ce7b6d4

andygrove marked this pull request as ready for review January 18, 2026 22:42

andygrove requested a review from comphead January 18, 2026 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add complex type support to native Parquet writer #3214

feat: add complex type support to native Parquet writer #3214

andygrove commented Jan 18, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jan 18, 2026 •

edited

Loading

Uh oh!

andygrove commented Jan 18, 2026 •

edited

Loading

Uh oh!

andygrove commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add complex type support to native Parquet writer #3214

Are you sure you want to change the base?

feat: add complex type support to native Parquet writer #3214

Conversation

andygrove commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

codecov-commenter commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andygrove commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andygrove commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andygrove commented Jan 18, 2026 •

edited

Loading

codecov-commenter commented Jan 18, 2026 •

edited

Loading

andygrove commented Jan 18, 2026 •

edited

Loading