[SPARK-57068][SQL] Make SaveMode.Overwrite create the table when missing for SupportsCatalogOptions sources by LuciferYang · Pull Request #56111 · apache/spark

LuciferYang · 2026-05-26T07:46:35Z

What changes were proposed in this pull request?

DataFrameWriter.saveCommand calls catalog.loadTable(ident) for SaveMode.Append | SaveMode.Overwrite without a try/catch when the V2 source implements SupportsCatalogOptions, so writing to a brand-new identifier throws:

org.apache.spark.sql.catalyst.analysis.NoSuchTableException: ...
  at DataFrameWriter.saveCommand(DataFrameWriter.scala:179)

This PR catches NoSuchTableException in the Overwrite case and falls back to CreateTableAsSelect. Append keeps throwing: it documents "append to existing data", and silently creating would hide bugs. A new internal conf spark.sql.legacy.dataFrameWriter.overwriteOnMissingTableThrows flips back to the old behavior. The CreateTableAsSelect plan that's now built in two places is extracted into a private helper.

Why are the changes needed?

df.write.format(provider).mode("overwrite").save("/some/new/path") should work on a brand-new path the same way it does for parquet/json/orc. Today it throws NoSuchTableException for any V2 source implementing SupportsCatalogOptions (Iceberg, Lance, custom connectors). The asymmetry has been around since SPARK-29219 introduced SupportsCatalogOptions in Spark 3.0 — the V1 branch never goes through loadTable, so this only shows up on the V2 path. The full behavior matrix after this PR:

Mode × target	V1	V2 before	V2 after
Overwrite, missing	creates	throws	creates
Overwrite, existing	truncate + write	overwrite	unchanged
Append, missing	creates	throws	throws
Append, existing	append	append	unchanged
ErrorIfExists, missing	creates	creates	unchanged
ErrorIfExists, existing	throws	throws	unchanged
Ignore, missing	creates	creates	unchanged
Ignore, existing	no-op	no-op	unchanged

Only the first row changes. Append-on-missing intentionally stays a strict failure; aligning it with V1 would silently create a table the user expected to already exist.

Does this PR introduce any user-facing change?

Yes. mode("overwrite").save(<missing identifier>) now creates the table, and the migration guide is updated to reflect this. Setting spark.sql.legacy.dataFrameWriter.overwriteOnMissingTableThrows=true restores the old behavior. mode("append") on a missing table still throws.

How was this patch tested?

New tests in SupportsCatalogOptionsSuite. Four reuse the existing testCreateAndRead helper, so Overwrite-on-missing now has the same coverage shape (session/testcat catalog, with and without partitioning) as the existing ErrorIfExists and Ignore tests. Three more pin specific edges: Append-on-missing still throws, the legacy conf restores the throw, and withSchemaEvolution() + mode("overwrite") on a missing table raises UNSUPPORTED_SCHEMA_EVOLUTION.CREATE_TABLE.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

LuciferYang · 2026-05-26T07:50:43Z

+      .doc("When set to true, SaveMode.Overwrite against a missing table on a " +
+        "SupportsCatalogOptions source throws NoSuchTableException instead of " +
+        "creating the table. Restores the pre-SPARK-57068 behavior.")
+      .version("4.3.0")


4.3.0 or 4.2.0？

…ing for SupportsCatalogOptions sources ### What changes were proposed in this pull request? In `DataFrameWriter.saveCommand`, the `SaveMode.Append | SaveMode.Overwrite` branch calls `catalog.loadTable(ident)` without catching `NoSuchTableException` when the V2 source implements `SupportsCatalogOptions`. The exception propagates straight to the user, even though `SaveMode.ErrorIfExists` and `SaveMode.Ignore` on the same call succeed by routing to `CreateTableAsSelect`. This change catches `NoSuchTableException` for `SaveMode.Overwrite` only and routes to `CreateTableAsSelect(ignoreIfExists = false)`, mirroring the `createMode` arm immediately below. `SaveMode.Append` on a non-existent identifier intentionally continues to throw, because Append explicitly expects an existing table and silently creating would mask user mistakes. A new internal SQL conf `spark.sql.legacy.dataFrameWriter.overwriteOnMissingTableThrows` restores the pre-fix behavior for users who depend on it. The `CreateTableAsSelect` construction shared between the new fall-back path and the existing `createMode` arm is extracted into a private helper `createTableAsSelectForCatalogOptions` to keep both sites in sync. ### Why are the changes needed? The most idiomatic write call for any V2 connector, df.write.format(provider).mode("overwrite").save(newPath) fails with `NoSuchTableException` when `newPath` does not yet exist, whereas the equivalent V1 call (e.g. `format("parquet")`) succeeds by creating the table. V2 sources that implement `SupportsCatalogOptions` (Iceberg, Lance, and custom connectors) all hit this asymmetry. The fix aligns V2 `SaveMode.Overwrite` semantics with V1: overwrite-on-missing creates the table, overwrite-on-existing truncates and writes. Behavior matrix after this change: | Mode × Target | V1 | V2 before | V2 after | |------------------------|---------------|--------------|------------| | Overwrite, missing | creates | **throws** | creates | | Overwrite, existing | truncate+write| overwrite | unchanged | | Append, missing | creates | throws | throws* | | Append, existing | append | append | unchanged | | ErrorIfExists, missing | creates | creates | unchanged | | ErrorIfExists, existing| throws | throws | unchanged | | Ignore, missing | creates | creates | unchanged | | Ignore, existing | no-op | no-op | unchanged | \* Intentional V1 divergence — see PR description. There is an inherent race window between `loadTable` (throws) and `CreateTableAsSelect`: a concurrent writer creating the table in between will cause `TableAlreadyExistsException` rather than overwriting. This is acceptable; V1's filesystem-atomic path doesn't expose it because V1 never consults a catalog. Users retry. ### Does this PR introduce _any_ user-facing change? Yes. `df.write.format(<V2 SupportsCatalogOptions source>).mode("overwrite") .save(<new identifier>)` now creates the table instead of throwing `NoSuchTableException`. No behavior change for paths that already exist. The migration guide has been updated. The legacy flag `spark.sql.legacy.dataFrameWriter.overwriteOnMissingTableThrows` restores the prior behavior. ### How was this patch tested? New tests in `SupportsCatalogOptionsSuite`: - `save works with Overwrite - no table, no partitioning, session catalog` - `save works with Overwrite - no table, with partitioning, session catalog` - `save works with Overwrite - no table, no partitioning, testcat catalog` - `save works with Overwrite - no table, with partitioning, testcat catalog` These reuse the existing `testCreateAndRead` helper, which verifies catalog state (table identity, partitioning, columns) in addition to data. Plus three behavior-pinning tests: - `Append mode still fails when table is missing - testcat catalog` (pins the intentional Append divergence) - `legacy flag restores throw on Overwrite-missing` (verifies the new conf) - `Overwrite + withSchemaEvolution on missing table is rejected` (verifies the schema-evolution gate fires with the expected error class) Existing tests continue to pass. ### Was this patch authored or co-authored using generative AI tooling? No.

LuciferYang commented May 26, 2026

View reviewed changes

LuciferYang mentioned this pull request May 26, 2026

SaveMode.Overwrite on a non-existent path throws NoSuchTableException lance-format/lance-spark#555

Open

LuciferYang force-pushed the fix-overwrite-nonexistent-catalog-options branch from 4a2a359 to 0b3a0ef Compare May 26, 2026 11:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57068][SQL] Make SaveMode.Overwrite create the table when missing for SupportsCatalogOptions sources#56111

[SPARK-57068][SQL] Make SaveMode.Overwrite create the table when missing for SupportsCatalogOptions sources#56111
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:fix-overwrite-nonexistent-catalog-options

LuciferYang commented May 26, 2026 •

edited

Loading

Uh oh!

LuciferYang May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LuciferYang commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

LuciferYang May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LuciferYang commented May 26, 2026 •

edited

Loading