fix: configurable fallback when parquet vectorized reader is disabled (#4352) by andygrove · Pull Request #4355 · apache/datafusion-comet

andygrove · 2026-05-17T00:58:15Z

Which issue does this PR close?

Closes #4352.

Rationale for this change

Comet's native_datafusion scan rejects Parquet-to-Spark conversions that Spark's vectorized reader rejects, but Spark's parquet-mr (non-vectorized) path silently overflows / nulls. Disabling spark.sql.parquet.enableVectorizedReader opts into parquet-mr semantics that Comet has no equivalent for, so by default Comet should fall back to Spark in that case. Users who want Comet to handle the scan regardless can opt in.

What changes are included in this PR?

New config spark.comet.scan.allowDisabledParquetVectorizedReader (default false → fall back to Spark when vectorized reader is disabled).
CometScanRule.nativeDataFusionScan skips itself when the vectorized reader is disabled and the opt-in flag is false.
CometTestBase sets the flag to true so existing Comet tests continue to exercise the native scan.
Re-enables (un-ignores) the affected ParquetTypeWideningSuite tests in the 4.0.2 and 4.1.1 diffs.

How are these changes tested?

Existing test suites — the previously ignored ParquetTypeWideningSuite tests are now exercised on Spark 4.0 and 4.1 via the parquet-mr fallback path.

…apache#4352) Comet's native_datafusion scan rejects Parquet-to-Spark conversions that Spark's vectorized reader rejects, but Spark's parquet-mr (non-vectorized) path silently overflows / nulls. Disabling spark.sql.parquet.enableVectorizedReader opts into parquet-mr semantics that Comet has no equivalent for, so by default Comet now falls back to Spark in that case. Users who want Comet to handle the scan regardless can opt in. - New config spark.comet.scan.allowDisabledParquetVectorizedReader (default false: fall back to Spark when vectorized reader is disabled). - CometScanRule.nativeDataFusionScan skips itself when the vectorized reader is disabled and the opt-in flag is false. - CometTestBase sets the flag to true so existing Comet tests continue to exercise the native scan. - Re-enables (un-ignores) the affected ParquetTypeWideningSuite tests in the 4.0.2 and 4.1.1 diffs.

andygrove force-pushed the fix-4352-configurable-fallback branch from 40e130e to 4bcb25c Compare May 17, 2026 02:34

andygrove marked this pull request as ready for review May 17, 2026 02:34

docs: note parquet vectorized-reader fallback in compatibility guide

1a437a3

andygrove mentioned this pull request May 17, 2026

chore: Remove config option for native_iceberg_compat #4019

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: configurable fallback when parquet vectorized reader is disabled (#4352)#4355

fix: configurable fallback when parquet vectorized reader is disabled (#4352)#4355
andygrove wants to merge 2 commits into
apache:mainfrom
andygrove:fix-4352-configurable-fallback

andygrove commented May 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrove commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andygrove commented May 17, 2026 •

edited

Loading