Skip to content

Conversation

@kumarUjjawal
Copy link
Contributor

@kumarUjjawal kumarUjjawal commented Jan 11, 2026

Which issue does this PR close?

Rationale for this change

The sqllogictest for the substrait was failing for subquery.

query failed: DataFusion error: This feature is not implemented: Cannot convert <subquery> to Substrait

What changes are included in this PR?

  • added support for ScalarSubquery and Exists expressions in the Substrait producer.

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added the substrait Changes to the substrait crate label Jan 11, 2026
@kumarUjjawal
Copy link
Contributor Author

@gabotechs if you could take a look.

@gabotechs
Copy link
Contributor

I'm struggling to find the time to take a look at this one. I'm requesting back up now!

Expr::GroupingSet(expr) => not_impl_err!("Cannot convert {expr:?} to Substrait"),
Expr::Placeholder(expr) => not_impl_err!("Cannot convert {expr:?} to Substrait"),
Expr::OuterReferenceColumn(_, _) => {
// OuterReferenceColumn requires tracking outer query schema context for correlated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still getting a lot of This feature is not implemented: Cannot convert OuterReferenceColumn errors when running the tests so maybe this PR can partially close the issue instead of completely? Unless you're still working on it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for the remaining issues I intend to open follow up PRs. My goal is to resolve all the issue related to substrait in the next few weeks.

};

// Handle negated EXISTS (NOT EXISTS)
if exists.negated {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no PREDICATE_OP_NOT_EXISTS in the spec so I think this a reasonable workaround. Minor note, the consumer hardcodes negated:false so I don't think NOT EXISTS/NOT IN will round-trip correctly (Exists/InSubquery)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had look into it earlier, I was hoping to open a seperate discussion so not to clutter this PR.

Copy link
Contributor

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a fast first pass review (I'm not so familiar with the codebase but am trying to help from the @substrait-io perspective), but in trying to understand things locally, I think I found some deadcode. Please let me know if I am misunderstanding!

from_exists(self, exists, schema)
}

fn handle_outer_reference_column(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that this is dead code? Maybe was intended to be added to the match in datafusion/substrait/src/logical_plan/producer/expr/mod.rs, but is now being left for later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks

/// Outer reference columns reference columns from an outer query scope in correlated subqueries.
/// We convert them the same way as regular columns since the subquery plan will be
/// reconstructed with the proper schema context during consumption.
pub fn from_outer_reference_column(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also deadcode (once the single deadcode caller is deleted)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes true.

Copy link
Contributor

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good to me now besides that one comment about return type.

arguments: vec![FunctionArgument {
arg_type: Some(ArgType::Value(substrait_exists)),
}],
output_type: None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is done consistently across the codebase, but this technically makes this substrait invalid. From the documentation:

    // Must be set to the return type of the function, exactly as derived
    // using the declaration in the extension.
    Type output_type = 3;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I wish that the substrait-rs library were at a state where it could handle this for you, but it just isn't there yet)

@kumarUjjawal
Copy link
Contributor Author

@kosiew since you have worked with substrait, could you take a look whenever you get time. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[substrait] [sqllogictest] Cannot convert <subquery> to Substrait

4 participants