Skip to content

Make Physical CastExpr Field-aware and unify cast semantics across physical expressions#20814

Merged
kosiew merged 10 commits intoapache:mainfrom
kosiew:cast-01-20164
Mar 10, 2026
Merged

Make Physical CastExpr Field-aware and unify cast semantics across physical expressions#20814
kosiew merged 10 commits intoapache:mainfrom
kosiew:cast-01-20164

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Mar 9, 2026

Which issue does this PR close?

Rationale for this change

Physical CastExpr previously stored only a target DataType. This caused field-level semantics (name, nullability, and metadata) to be lost when casts were represented in the physical layer. In contrast, logical expressions already carry this information through FieldRef.

This mismatch created several issues:

  • Physical and logical cast representations diverged in how they preserve schema semantics.
  • Struct casting logic behaved differently depending on whether the cast was represented as CastExpr or CastColumnExpr.
  • Downstream components (such as schema rewriting and ordering equivalence analysis) required additional branching and duplicated logic.

Making CastExpr field-aware aligns the physical representation with logical semantics and enables consistent schema propagation across execution planning and expression evaluation.

What changes are included in this PR?

This PR introduces field-aware semantics to CastExpr and simplifies several areas that previously relied on type-only casting.

Key changes include:

  1. Field-aware CastExpr

    • Replace the cast_type: DataType field with target_field: FieldRef.
    • Add new_with_target_field constructor to explicitly construct field-aware casts.
    • Keep the existing new(expr, DataType) constructor as a compatibility shim that creates a canonical field.
  2. Return-field and nullability behavior

    • return_field now returns the full target_field, preserving name, nullability, and metadata.
    • nullable() now derives its result from the resolved target field rather than the input expression.
    • Add compatibility logic for legacy type-only casts to preserve previous behavior.
  3. Struct cast validation improvements

    • Struct-to-struct casting now validates compatibility using field information before execution.
    • Planning-time validation prevents unsupported casts from reaching execution.
  4. Shared cast property logic

    • Introduce shared helper functions (cast_expr_properties, is_order_preserving_cast_family) for determining ordering preservation.
    • Reuse this logic in both CastExpr and CastColumnExpr to avoid duplicated implementations.
  5. Schema rewriter improvements

    • Refactor physical column resolution into resolve_physical_column.
    • Simplify cast insertion logic when logical and physical fields differ.
    • Pass explicit physical and logical fields to cast creation for improved correctness.
  6. Ordering equivalence simplification

    • Introduce substitute_cast_like_ordering helper to unify handling of CastExpr and CastColumnExpr in ordering equivalence analysis.
  7. Additional unit tests

    • Validate metadata propagation through return_field.
    • Verify nullability behavior for field-aware casts.
    • Ensure legacy type-only casts preserve existing semantics.
    • Test struct-cast validation with nested field semantics.

Are these changes tested?

Yes.

New unit tests were added in physical-expr/src/expressions/cast.rs to verify:

  • Metadata propagation through field-aware casts
  • Correct nullability behavior derived from the target field
  • Backward compatibility with legacy type-only constructors
  • Struct cast compatibility validation using nested fields

Existing tests continue to pass and validate compatibility with the previous API behavior.

Are there any user-facing changes?

There are no direct user-facing behavior changes.

This change primarily improves internal schema semantics and consistency in the physical expression layer. Existing APIs remain compatible through the legacy constructor that accepts only a DataType.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

kosiew added 5 commits March 9, 2026 11:07
Centralize the duplicated cast-ordering predicate in cast.rs and reuse it
in cast_column.rs. Merge parallel CastExpr / CastColumnExpr logic into a
single helper in equivalence/properties/mod.rs. Extract physical-schema
resolution into resolve_physical_column to avoid inline
reimplementation in rewrite_column.
- Replaced cast_type with target_field in CastExpr to improve clarity and metadata handling.
- Introduced new method `new_with_target_field` for creating CastExpr instances with explicit target fields.
- Updated evaluation and return field methods to utilize the new target_field structure.
- Enhanced tests for field-aware casting, ensuring correct metadata preservation and nullability behavior.
Return target_field directly from return_field in CastExpr.
Update field-aware tests to assert target field's
name and nullability instead of child's.

Modify create_cast_column_expr to accept already-resolved
physical FieldRef, eliminating the additional schema lookup
and tightening the helper boundary.
Update `CastExpr` to store a `FieldRef` target, preserving
explicit target-field semantics. Maintain legacy type-only
construction paths for compatibility while ensuring fast
validation on incompatible struct casts. Adjust schema
rewriter to avoid re-resolving physical fields after
calling `resolve_physical_column`.
Consolidate logic in `cast.rs` by merging nullable() and
return_field() into a unified resolved_target_field() helper.
This reduces branching and improves readability while
maintaining existing APIs. In `schema_rewriter.rs`, simplify
the code by removing a redundant conditional branch,
ensuring all paths lead to a single cast-construction
method with consistent semantics.
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Mar 9, 2026
@kosiew kosiew marked this pull request as ready for review March 9, 2026 04:13
@kosiew kosiew requested a review from adriangb March 9, 2026 04:14
Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

It would be really nice if for this PR or future we kept a strict separation between refactoring and changes. I would much rather review a PR that is mostly just refactoring with no new features or behavioral changes and then review another PR that has just the API changes rather than all in one PR. That's not always possible e.g. if the refactor uses the new APIs but I think some of the changes here are.

For this PR I left mostly some comments about comments (very meta).

Could you update #20164 after this is merged to summarize the current state?

@kosiew
Copy link
Contributor Author

kosiew commented Mar 9, 2026

@adriangb

Thanks for the review.

I found one behavior change worth addressing before merging.

  • CastExpr::nullable() was returning resolved_target_field(...).is_nullable().
  • That means a field-aware cast can now report false purely because the target field is marked non-nullable, even when the child expression is nullable and a regular CAST can still yield null for null input.
  • This diverges from the logical cast behavior which explicitly propagates the input nullability rather than forcing the destination field nullability.

Can you review 3f8d735 which makes optimizer reasoning safer?

@kosiew kosiew requested a review from adriangb March 9, 2026 07:53
Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3f8d735 looks good to me!

@kosiew kosiew added this pull request to the merge queue Mar 10, 2026
Merged via the queue into apache:main with commit fd97799 Mar 10, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants