fix(optimizer)!: query schema directly when type annotation fails for processing UNNEST source #6451

fivetran-BradfordPaskewitz · 2025-12-02T16:49:25Z

Issue: https://fivetran.atlassian.net/browse/RD-1066808

Problem: In BigQuery, when you UNNEST an ARRAY<STRUCT<...>>, the struct fields can be referenced as unqualified columns (e.g., WHERE type = 'x'). This worked in simple queries but failed in correlated subqueries because sqlglot's type annotation couldn't resolve columns from outer scopes, so qualify_columns didn't know what struct fields to expose.

Solution: Add a fallback mechanism in qualify_columns.py that queries the schema directly when type annotation fails. When processing an UNNEST source, if the expression isn't typed, traverse parent scopes to find the column's type definition in the schema (handling both table sources and CTEs), then extract and expose the struct field names as available columns in that scope.

tobymao · 2025-12-02T16:52:06Z

can you fix the type annotation instead?

georgesittas · 2025-12-02T16:56:06Z

Yeah, I second that suggestion in favor of avoiding making this more complicated than necessary. Keen to understand more if that is tricky to do; let me know and we can hop in a call and chat about this.

georgesittas

@fivetran-BradfordPaskewitz keep in mind that you'll need to rebase and apply these changes to resolver.py due to 625654a.

sqlglot/optimizer/qualify_columns.py

sqlglot/optimizer/resolver.py

georgesittas

@fivetran-BradfordPaskewitz another quick round of comments from me, but this should be good to merge soon.

@tobymao wanna take a look as well? Another pair of eyes is a good idea for this one.

georgesittas · 2025-12-10T13:17:20Z

sqlglot/optimizer/resolver.py

+                            if col_type and col_type.is_type(exp.DataType.Type.ARRAY):
+                                element_types = col_type.expressions
+                                if element_types:
+                                    unnest.type = element_types[0]


I think we should copy here?

georgesittas · 2025-12-10T13:17:28Z

sqlglot/optimizer/resolver.py

+                                if element_types:
+                                    unnest.type = element_types[0]
+                            else:
+                                unnest.type = col_type


georgesittas · 2025-12-10T13:19:08Z

sqlglot/optimizer/resolver.py

+                return None
+            table_name = table_identifier.name
+
+        source = scope.sources[table_name]


Is this lookup safe? Do we always know that table_name appears in sources? Should we be conservative?

changed this to be defensive

georgesittas · 2025-12-10T13:23:39Z

sqlglot/optimizer/resolver.py

+        return self._get_column_type_from_scope(source, column.name)
+
+    def _get_column_type_from_scope(
+        self, source: t.Union[Scope, exp.Table], col_name: str


We should pass a Column instance here, not col_name, because columns will already be normalized & quoted and we won't risk weird normalization corner cases when looking them up in the schema– which also normalizes.

good call, updated to use column instead of name

georgesittas · 2025-12-10T13:25:23Z

sqlglot/optimizer/resolver.py

+            for source_name in source.sources:
+                nested_source = source.sources[source_name]


Suggested change

for source_name in source.sources:

nested_source = source.sources[source_name]

for source_name, nested_source in source.sources.items():

… processing UNNEST source

fivetran-BradfordPaskewitz requested a review from georgesittas December 2, 2025 16:49

fivetran-BradfordPaskewitz self-assigned this Dec 2, 2025

georgesittas reviewed Dec 3, 2025

View reviewed changes

sqlglot/optimizer/qualify_columns.py Outdated Show resolved Hide resolved

sqlglot/optimizer/qualify_columns.py Outdated Show resolved Hide resolved

sqlglot/optimizer/qualify_columns.py Outdated Show resolved Hide resolved

sqlglot/optimizer/qualify_columns.py Outdated Show resolved Hide resolved

fivetran-BradfordPaskewitz force-pushed the fix_qualify_unnest branch 2 times, most recently from 3b0a522 to fea0204 Compare December 5, 2025 00:22

fivetran-BradfordPaskewitz requested a review from georgesittas December 5, 2025 00:23

georgesittas reviewed Dec 8, 2025

View reviewed changes

fivetran-BradfordPaskewitz force-pushed the fix_qualify_unnest branch from fea0204 to 41cfa9e Compare December 9, 2025 19:38

fivetran-BradfordPaskewitz requested a review from georgesittas December 9, 2025 19:39

georgesittas reviewed Dec 10, 2025

View reviewed changes

georgesittas requested a review from tobymao December 10, 2025 13:57

fix(optimizer)!: query schema directly when type annotation fails for…

7428a39

… processing UNNEST source

fivetran-BradfordPaskewitz force-pushed the fix_qualify_unnest branch from 41cfa9e to 7428a39 Compare December 11, 2025 01:04

fivetran-BradfordPaskewitz requested a review from georgesittas December 11, 2025 01:06

		for source_name in source.sources:
		nested_source = source.sources[source_name]

	for source_name in source.sources:
	nested_source = source.sources[source_name]
	for source_name, nested_source in source.sources.items():

fix(optimizer)!: query schema directly when type annotation fails for processing UNNEST source #6451

Are you sure you want to change the base?

fix(optimizer)!: query schema directly when type annotation fails for processing UNNEST source #6451

Uh oh!

Conversation

fivetran-BradfordPaskewitz commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tobymao commented Dec 2, 2025

Uh oh!

georgesittas commented Dec 2, 2025

Uh oh!

georgesittas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

georgesittas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

georgesittas Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fivetran-BradfordPaskewitz commented Dec 2, 2025 •

edited

Loading

georgesittas Dec 10, 2025 •

edited

Loading