-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix: sqllogictest cannot convert <subquery> to Substrait #19739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@gabotechs if you could take a look. |
|
I'm struggling to find the time to take a look at this one. I'm requesting back up now! |
| Expr::GroupingSet(expr) => not_impl_err!("Cannot convert {expr:?} to Substrait"), | ||
| Expr::Placeholder(expr) => not_impl_err!("Cannot convert {expr:?} to Substrait"), | ||
| Expr::OuterReferenceColumn(_, _) => { | ||
| // OuterReferenceColumn requires tracking outer query schema context for correlated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still getting a lot of This feature is not implemented: Cannot convert OuterReferenceColumn errors when running the tests so maybe this PR can partially close the issue instead of completely? Unless you're still working on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah for the remaining issues I intend to open follow up PRs. My goal is to resolve all the issue related to substrait in the next few weeks.
| }; | ||
|
|
||
| // Handle negated EXISTS (NOT EXISTS) | ||
| if exists.negated { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no PREDICATE_OP_NOT_EXISTS in the spec so I think this a reasonable workaround. Minor note, the consumer hardcodes negated:false so I don't think NOT EXISTS/NOT IN will round-trip correctly (Exists/InSubquery)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had look into it earlier, I was hoping to open a seperate discussion so not to clutter this PR.
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a fast first pass review (I'm not so familiar with the codebase but am trying to help from the @substrait-io perspective), but in trying to understand things locally, I think I found some deadcode. Please let me know if I am misunderstanding!
| from_exists(self, exists, schema) | ||
| } | ||
|
|
||
| fn handle_outer_reference_column( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct that this is dead code? Maybe was intended to be added to the match in datafusion/substrait/src/logical_plan/producer/expr/mod.rs, but is now being left for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thanks
| /// Outer reference columns reference columns from an outer query scope in correlated subqueries. | ||
| /// We convert them the same way as regular columns since the subquery plan will be | ||
| /// reconstructed with the proper schema context during consumption. | ||
| pub fn from_outer_reference_column( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this also deadcode (once the single deadcode caller is deleted)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes true.
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good to me now besides that one comment about return type.
| arguments: vec![FunctionArgument { | ||
| arg_type: Some(ArgType::Value(substrait_exists)), | ||
| }], | ||
| output_type: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is done consistently across the codebase, but this technically makes this substrait invalid. From the documentation:
// Must be set to the return type of the function, exactly as derived
// using the declaration in the extension.
Type output_type = 3;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I wish that the substrait-rs library were at a state where it could handle this for you, but it just isn't there yet)
|
@kosiew since you have worked with substrait, could you take a look whenever you get time. Thanks! |
Which issue does this PR close?
Rationale for this change
The sqllogictest for the substrait was failing for subquery.
What changes are included in this PR?
ScalarSubqueryandExistsexpressions in the Substrait producer.Are these changes tested?
Yes
Are there any user-facing changes?