Skip to content

fix: configurable fallback when parquet vectorized reader is disabled (#4352)#4355

Open
andygrove wants to merge 2 commits into
apache:mainfrom
andygrove:fix-4352-configurable-fallback
Open

fix: configurable fallback when parquet vectorized reader is disabled (#4352)#4355
andygrove wants to merge 2 commits into
apache:mainfrom
andygrove:fix-4352-configurable-fallback

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented May 17, 2026

Which issue does this PR close?

Closes #4352.

Rationale for this change

Comet's native_datafusion scan rejects Parquet-to-Spark conversions that Spark's vectorized reader rejects, but Spark's parquet-mr (non-vectorized) path silently overflows / nulls. Disabling spark.sql.parquet.enableVectorizedReader opts into parquet-mr semantics that Comet has no equivalent for, so by default Comet should fall back to Spark in that case. Users who want Comet to handle the scan regardless can opt in.

What changes are included in this PR?

  • New config spark.comet.scan.allowDisabledParquetVectorizedReader (default false → fall back to Spark when vectorized reader is disabled).
  • CometScanRule.nativeDataFusionScan skips itself when the vectorized reader is disabled and the opt-in flag is false.
  • CometTestBase sets the flag to true so existing Comet tests continue to exercise the native scan.
  • Re-enables (un-ignores) the affected ParquetTypeWideningSuite tests in the 4.0.2 and 4.1.1 diffs.

How are these changes tested?

Existing test suites — the previously ignored ParquetTypeWideningSuite tests are now exercised on Spark 4.0 and 4.1 via the parquet-mr fallback path.

…apache#4352)

Comet's native_datafusion scan rejects Parquet-to-Spark conversions that
Spark's vectorized reader rejects, but Spark's parquet-mr (non-vectorized)
path silently overflows / nulls. Disabling
spark.sql.parquet.enableVectorizedReader opts into parquet-mr semantics
that Comet has no equivalent for, so by default Comet now falls back to
Spark in that case. Users who want Comet to handle the scan regardless
can opt in.

- New config spark.comet.scan.allowDisabledParquetVectorizedReader
  (default false: fall back to Spark when vectorized reader is disabled).
- CometScanRule.nativeDataFusionScan skips itself when the vectorized
  reader is disabled and the opt-in flag is false.
- CometTestBase sets the flag to true so existing Comet tests continue
  to exercise the native scan.
- Re-enables (un-ignores) the affected ParquetTypeWideningSuite tests
  in the 4.0.2 and 4.1.1 diffs.
@andygrove andygrove force-pushed the fix-4352-configurable-fallback branch from 40e130e to 4bcb25c Compare May 17, 2026 02:34
@andygrove andygrove marked this pull request as ready for review May 17, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

native_datafusion: tests asserting parquet-mr's permissive overflow/narrowing behavior cannot be made to pass

1 participant