Summary
When a FULL model removes both a column and its clustered_by clause in the same edit, SQLMesh categorizes the change as non-breaking and attempts an in-place ALTER TABLE ... DROP COLUMN against the existing physical table. On Snowflake, this fails because the column it is trying to drop is still part of the clustering key on the existing physical table.
The plan aborts mid-apply with a partial state: some sibling models in the same plan have already had their physical tables created/cloned, while everything downstream of the failing model is skipped.
Environment
sqlmesh version: 0.231.1
- Engine: Snowflake
- Model kind:
FULL
- Plan flags:
--forward-only --skip-backfill --allow-destructive-model <other_model>
Scenario
A FULL model — call it ANALYTICS.SESSION_FACTS — was previously deployed with a clustering key that references a materialized column:
MODEL (
name ANALYTICS.SESSION_FACTS,
kind FULL,
clustered_by (TENANT_ID, PARTITION_MONTH),
...
);
SELECT
tenant_id,
user_id,
session_start_ts,
date_trunc('month', session_start_ts) AS partition_month,
...
FROM ...
The deployed physical table has PARTITION_MONTH as a stored column that participates in the clustering key.
The model is then edited to:
- Remove
clustered_by entirely.
- Stop projecting
PARTITION_MONTH (it is no longer in the SELECT).
MODEL (
name ANALYTICS.SESSION_FACTS,
kind FULL,
-- clustered_by removed
...
);
SELECT
tenant_id,
user_id,
session_start_ts,
-- partition_month no longer projected
...
FROM ...
Root Cause: CLONE inherits the clustering key
SQLMesh's non-breaking code path for FULL models uses Snowflake's CREATE TABLE ... CLONE to create the new physical snapshot from the existing one:
CREATE TABLE IF NOT EXISTS ANALYTICS.sqlmesh__ANALYTICS_SILVER.SESSION_FACTS__<new_fingerprint>__dev
CLONE ANALYTICS.sqlmesh__ANALYTICS_SILVER.SESSION_FACTS__<old_fingerprint>
CLONE in Snowflake copies the full table metadata, including clustering keys. SQLMesh then attempts to ALTER TABLE ... DROP COLUMN PARTITION_MONTH on the cloned snapshot — which fails because the inherited clustering key still references that column.
On failure, SQLMesh rolls back by dropping the cloned snapshot entirely. This means:
- Manually ALTERing the cloned target is not possible — by the time the error surfaces, the rollback has already dropped the new snapshot.
- Re-running the plan repeats the loop: clone from source → ALTER fail → drop. Every subsequent attempt fails identically.
Expected Behavior
SQLMesh should either:
- Sequence the destructive ALTERs correctly — after cloning, drop the inherited clustering key first (
ALTER TABLE ... DROP CLUSTERING KEY), then drop the now-orphaned column. Snowflake permits this when the order is correct.
- Reclassify the change as breaking when it detects that the column to be dropped participates in the existing clustering key, so a new physical table version is created via CTAS (not CLONE) — avoiding the inherited key entirely.
- Surface a more actionable error that identifies the source snapshot whose clustering key must be manually dropped before the plan can succeed.
Currently, SQLMesh attempts the column drop in isolation and surfaces Snowflake's raw error without any guidance on how to recover.
Actual Behavior
The plan fails partway through the physical layer update with:
Failed models
"ANALYTICS"."SESSION_FACTS"
ProgrammingError:
003203 (42601): <query_id>: Cannot drop column 'PARTITION_MONTH' which belongs to a clustering key
Error: Plan application failed.
By the time the error appears, ~17% of the physical layer update had already completed (sibling models cloned/created). The remaining models — including all downstream consumers of SESSION_FACTS — are reported as skipped.
Reproduction Steps
- Deploy a
FULL model that projects a derived column (e.g. DATE_TRUNC('MONTH', some_ts) AS partition_month) and references that materialized column directly in clustered_by. Apply the plan; let the physical table be created with the clustering key on that column.
- Edit the model to (a) remove
clustered_by and (b) stop projecting the materialized column.
- Run
sqlmesh plan <env> --forward-only --skip-backfill.
- Observe SQLMesh categorizing the change as non-breaking, cloning the existing physical snapshot, and then attempting
ALTER TABLE ... DROP COLUMN PARTITION_MONTH on the clone, which Snowflake rejects with error 003203 (42601).
- Observe SQLMesh rolling back by dropping the newly cloned snapshot.
- Re-run the plan — observe the loop repeats identically.
Workarounds
Only one workaround is effective:
Manually drop the clustering key on the source snapshot (the one being cloned from — typically the snapshot pointed to by the production environment's view) before re-running the plan:
ALTER TABLE ANALYTICS.sqlmesh__ANALYTICS_SILVER.SESSION_FACTS__<prod_fingerprint>
DROP CLUSTERING KEY;
Once the source snapshot no longer carries the clustering key, the next CLONE produces a snapshot without it, and the subsequent ALTER TABLE ... DROP COLUMN succeeds.
⚠️ This directly mutates a production physical snapshot. Snowflake's SELECT * view semantics are unaffected (the view still resolves correctly), but Snowflake's automatic clustering service stops operating on that table.
Attempted but ineffective workarounds:
--allow-destructive-model ANALYTICS.SESSION_FACTS — does not bypass the CLONE code path for FULL models in this scenario. The plan still clones the source snapshot, inheriting the clustering key, and the ALTER still fails.
- Manually
ALTER TABLE ... DROP CLUSTERING KEY on the cloned target — not possible because SQLMesh's rollback drops the cloned snapshot before the user can intervene.
Why This Matters
The change appears trivial from the model author's perspective — drop a clustering key, drop a derived column. The plan summary categorizes it as non-breaking, suggesting it should apply cleanly. The root issue is that SQLMesh's CLONE-based non-breaking code path faithfully copies the source's clustering constraints, but then issues ALTERs that conflict with those inherited constraints without accounting for the required sequencing.
The partial-apply outcome compounds the problem: the plan is left in an inconsistent state where some new physical tables exist but downstream consumers are stuck pointing at old snapshots, and the only recovery path requires directly mutating a production physical table.
Summary
When a
FULLmodel removes both a column and itsclustered_byclause in the same edit, SQLMesh categorizes the change as non-breaking and attempts an in-placeALTER TABLE ... DROP COLUMNagainst the existing physical table. On Snowflake, this fails because the column it is trying to drop is still part of the clustering key on the existing physical table.The plan aborts mid-apply with a partial state: some sibling models in the same plan have already had their physical tables created/cloned, while everything downstream of the failing model is skipped.
Environment
sqlmeshversion:0.231.1FULL--forward-only --skip-backfill --allow-destructive-model <other_model>Scenario
A
FULLmodel — call itANALYTICS.SESSION_FACTS— was previously deployed with a clustering key that references a materialized column:The deployed physical table has
PARTITION_MONTHas a stored column that participates in the clustering key.The model is then edited to:
clustered_byentirely.PARTITION_MONTH(it is no longer in the SELECT).Root Cause: CLONE inherits the clustering key
SQLMesh's non-breaking code path for
FULLmodels uses Snowflake'sCREATE TABLE ... CLONEto create the new physical snapshot from the existing one:CLONEin Snowflake copies the full table metadata, including clustering keys. SQLMesh then attempts toALTER TABLE ... DROP COLUMN PARTITION_MONTHon the cloned snapshot — which fails because the inherited clustering key still references that column.On failure, SQLMesh rolls back by dropping the cloned snapshot entirely. This means:
Expected Behavior
SQLMesh should either:
ALTER TABLE ... DROP CLUSTERING KEY), then drop the now-orphaned column. Snowflake permits this when the order is correct.Currently, SQLMesh attempts the column drop in isolation and surfaces Snowflake's raw error without any guidance on how to recover.
Actual Behavior
The plan fails partway through the physical layer update with:
By the time the error appears, ~17% of the physical layer update had already completed (sibling models cloned/created). The remaining models — including all downstream consumers of
SESSION_FACTS— are reported as skipped.Reproduction Steps
FULLmodel that projects a derived column (e.g.DATE_TRUNC('MONTH', some_ts) AS partition_month) and references that materialized column directly inclustered_by. Apply the plan; let the physical table be created with the clustering key on that column.clustered_byand (b) stop projecting the materialized column.sqlmesh plan <env> --forward-only --skip-backfill.ALTER TABLE ... DROP COLUMN PARTITION_MONTHon the clone, which Snowflake rejects with error003203 (42601).Workarounds
Only one workaround is effective:
Manually drop the clustering key on the source snapshot (the one being cloned from — typically the snapshot pointed to by the production environment's view) before re-running the plan:
Once the source snapshot no longer carries the clustering key, the next CLONE produces a snapshot without it, and the subsequent
ALTER TABLE ... DROP COLUMNsucceeds.Attempted but ineffective workarounds:
--allow-destructive-model ANALYTICS.SESSION_FACTS— does not bypass the CLONE code path forFULLmodels in this scenario. The plan still clones the source snapshot, inheriting the clustering key, and the ALTER still fails.ALTER TABLE ... DROP CLUSTERING KEYon the cloned target — not possible because SQLMesh's rollback drops the cloned snapshot before the user can intervene.Why This Matters
The change appears trivial from the model author's perspective — drop a clustering key, drop a derived column. The plan summary categorizes it as non-breaking, suggesting it should apply cleanly. The root issue is that SQLMesh's CLONE-based non-breaking code path faithfully copies the source's clustering constraints, but then issues ALTERs that conflict with those inherited constraints without accounting for the required sequencing.
The partial-apply outcome compounds the problem: the plan is left in an inconsistent state where some new physical tables exist but downstream consumers are stuck pointing at old snapshots, and the only recovery path requires directly mutating a production physical table.