Make Physical CastExpr Field-aware and unify cast semantics across physical expressions#20814
Make Physical CastExpr Field-aware and unify cast semantics across physical expressions#20814kosiew merged 10 commits intoapache:mainfrom
Conversation
Centralize the duplicated cast-ordering predicate in cast.rs and reuse it in cast_column.rs. Merge parallel CastExpr / CastColumnExpr logic into a single helper in equivalence/properties/mod.rs. Extract physical-schema resolution into resolve_physical_column to avoid inline reimplementation in rewrite_column.
- Replaced cast_type with target_field in CastExpr to improve clarity and metadata handling. - Introduced new method `new_with_target_field` for creating CastExpr instances with explicit target fields. - Updated evaluation and return field methods to utilize the new target_field structure. - Enhanced tests for field-aware casting, ensuring correct metadata preservation and nullability behavior.
Return target_field directly from return_field in CastExpr. Update field-aware tests to assert target field's name and nullability instead of child's. Modify create_cast_column_expr to accept already-resolved physical FieldRef, eliminating the additional schema lookup and tightening the helper boundary.
Update `CastExpr` to store a `FieldRef` target, preserving explicit target-field semantics. Maintain legacy type-only construction paths for compatibility while ensuring fast validation on incompatible struct casts. Adjust schema rewriter to avoid re-resolving physical fields after calling `resolve_physical_column`.
Consolidate logic in `cast.rs` by merging nullable() and return_field() into a unified resolved_target_field() helper. This reduces branching and improves readability while maintaining existing APIs. In `schema_rewriter.rs`, simplify the code by removing a redundant conditional branch, ensuring all paths lead to a single cast-construction method with consistent semantics.
adriangb
left a comment
There was a problem hiding this comment.
This looks good to me.
It would be really nice if for this PR or future we kept a strict separation between refactoring and changes. I would much rather review a PR that is mostly just refactoring with no new features or behavioral changes and then review another PR that has just the API changes rather than all in one PR. That's not always possible e.g. if the refactor uses the new APIs but I think some of the changes here are.
For this PR I left mostly some comments about comments (very meta).
Could you update #20164 after this is merged to summarize the current state?
…d field metadata handling
…s nullability and update related tests for accuracy
|
Thanks for the review. I found one behavior change worth addressing before merging.
Can you review 3f8d735 which makes optimizer reasoning safer? |
Which issue does this PR close?
Rationale for this change
Physical
CastExprpreviously stored only a targetDataType. This caused field-level semantics (name, nullability, and metadata) to be lost when casts were represented in the physical layer. In contrast, logical expressions already carry this information throughFieldRef.This mismatch created several issues:
CastExprorCastColumnExpr.Making
CastExprfield-aware aligns the physical representation with logical semantics and enables consistent schema propagation across execution planning and expression evaluation.What changes are included in this PR?
This PR introduces field-aware semantics to
CastExprand simplifies several areas that previously relied on type-only casting.Key changes include:
Field-aware CastExpr
cast_type: DataTypefield withtarget_field: FieldRef.new_with_target_fieldconstructor to explicitly construct field-aware casts.new(expr, DataType)constructor as a compatibility shim that creates a canonical field.Return-field and nullability behavior
return_fieldnow returns the fulltarget_field, preserving name, nullability, and metadata.nullable()now derives its result from the resolved target field rather than the input expression.Struct cast validation improvements
Shared cast property logic
cast_expr_properties,is_order_preserving_cast_family) for determining ordering preservation.CastExprandCastColumnExprto avoid duplicated implementations.Schema rewriter improvements
resolve_physical_column.Ordering equivalence simplification
substitute_cast_like_orderinghelper to unify handling ofCastExprandCastColumnExprin ordering equivalence analysis.Additional unit tests
return_field.Are these changes tested?
Yes.
New unit tests were added in
physical-expr/src/expressions/cast.rsto verify:Existing tests continue to pass and validate compatibility with the previous API behavior.
Are there any user-facing changes?
There are no direct user-facing behavior changes.
This change primarily improves internal schema semantics and consistency in the physical expression layer. Existing APIs remain compatible through the legacy constructor that accepts only a
DataType.LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.