Skip to content

Conversation

@fivetran-BradfordPaskewitz
Copy link
Collaborator

@fivetran-BradfordPaskewitz fivetran-BradfordPaskewitz commented Dec 2, 2025

Issue: https://fivetran.atlassian.net/browse/RD-1066808

Problem: In BigQuery, when you UNNEST an ARRAY<STRUCT<...>>, the struct fields can be referenced as unqualified columns (e.g., WHERE type = 'x'). This worked in simple queries but failed in correlated subqueries because sqlglot's type annotation couldn't resolve columns from outer scopes, so qualify_columns didn't know what struct fields to expose.

Solution: Add a fallback mechanism in qualify_columns.py that queries the schema directly when type annotation fails. When processing an UNNEST source, if the expression isn't typed, traverse parent scopes to find the column's type definition in the schema (handling both table sources and CTEs), then extract and expose the struct field names as available columns in that scope.

@tobymao
Copy link
Owner

tobymao commented Dec 2, 2025

can you fix the type annotation instead?

@georgesittas
Copy link
Collaborator

Yeah, I second that suggestion in favor of avoiding making this more complicated than necessary. Keen to understand more if that is tricky to do; let me know and we can hop in a call and chat about this.

Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-BradfordPaskewitz keep in mind that you'll need to rebase and apply these changes to resolver.py due to 625654a.

Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-BradfordPaskewitz another quick round of comments from me, but this should be good to merge soon.

@tobymao wanna take a look as well? Another pair of eyes is a good idea for this one.

if col_type and col_type.is_type(exp.DataType.Type.ARRAY):
element_types = col_type.expressions
if element_types:
unnest.type = element_types[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should copy here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if element_types:
unnest.type = element_types[0]
else:
unnest.type = col_type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return None
table_name = table_identifier.name

source = scope.sources[table_name]
Copy link
Collaborator

@georgesittas georgesittas Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this lookup safe? Do we always know that table_name appears in sources? Should we be conservative?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this to be defensive

return self._get_column_type_from_scope(source, column.name)

def _get_column_type_from_scope(
self, source: t.Union[Scope, exp.Table], col_name: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pass a Column instance here, not col_name, because columns will already be normalized & quoted and we won't risk weird normalization corner cases when looking them up in the schema– which also normalizes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, updated to use column instead of name

Comment on lines 366 to 367
for source_name in source.sources:
nested_source = source.sources[source_name]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for source_name in source.sources:
nested_source = source.sources[source_name]
for source_name, nested_source in source.sources.items():

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@georgesittas georgesittas requested a review from tobymao December 10, 2025 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants