Non-breaking plan fails when removed column is part of existing clustering key (Snowflake)

## Summary

When a `FULL` model removes both a column and its `clustered_by` clause in the same edit, SQLMesh categorizes the change as **non-breaking** and attempts an in-place `ALTER TABLE ... DROP COLUMN` against the existing physical table. On Snowflake, this fails because the column it is trying to drop is still part of the clustering key on the existing physical table.

The plan aborts mid-apply with a partial state: some sibling models in the same plan have already had their physical tables created/cloned, while everything downstream of the failing model is skipped.

## Environment

- `sqlmesh` version: `0.231.1`
- Engine: Snowflake
- Model kind: `FULL`
- Plan flags: `--forward-only --skip-backfill --allow-destructive-model <other_model>`

## Scenario

A `FULL` model — call it `ANALYTICS.SESSION_FACTS` — was previously deployed with a clustering key that references a materialized column:

```sql
MODEL (
  name ANALYTICS.SESSION_FACTS,
  kind FULL,
  clustered_by (TENANT_ID, PARTITION_MONTH),
  ...
);

SELECT
  tenant_id,
  user_id,
  session_start_ts,
  date_trunc('month', session_start_ts) AS partition_month,
  ...
FROM ...
```

The deployed physical table has `PARTITION_MONTH` as a stored column that participates in the clustering key.

The model is then edited to:
1. Remove `clustered_by` entirely.
2. Stop projecting `PARTITION_MONTH` (it is no longer in the SELECT).

```sql
MODEL (
  name ANALYTICS.SESSION_FACTS,
  kind FULL,
  -- clustered_by removed
  ...
);

SELECT
  tenant_id,
  user_id,
  session_start_ts,
  -- partition_month no longer projected
  ...
FROM ...
```

## Root Cause: CLONE inherits the clustering key

SQLMesh's non-breaking code path for `FULL` models uses Snowflake's `CREATE TABLE ... CLONE` to create the new physical snapshot from the existing one:

```sql
CREATE TABLE IF NOT EXISTS ANALYTICS.sqlmesh__ANALYTICS_SILVER.SESSION_FACTS__<new_fingerprint>__dev
CLONE ANALYTICS.sqlmesh__ANALYTICS_SILVER.SESSION_FACTS__<old_fingerprint>
```

`CLONE` in Snowflake copies the full table metadata, including clustering keys. SQLMesh then attempts to `ALTER TABLE ... DROP COLUMN PARTITION_MONTH` on the cloned snapshot — which fails because the inherited clustering key still references that column.

On failure, SQLMesh rolls back by dropping the cloned snapshot entirely. This means:

- **Manually ALTERing the cloned target is not possible** — by the time the error surfaces, the rollback has already dropped the new snapshot.
- **Re-running the plan repeats the loop**: clone from source → ALTER fail → drop. Every subsequent attempt fails identically.

## Expected Behavior

SQLMesh should either:

1. **Sequence the destructive ALTERs correctly** — after cloning, drop the inherited clustering key first (`ALTER TABLE ... DROP CLUSTERING KEY`), then drop the now-orphaned column. Snowflake permits this when the order is correct.
2. **Reclassify the change as breaking** when it detects that the column to be dropped participates in the existing clustering key, so a new physical table version is created via CTAS (not CLONE) — avoiding the inherited key entirely.
3. **Surface a more actionable error** that identifies the source snapshot whose clustering key must be manually dropped before the plan can succeed.

Currently, SQLMesh attempts the column drop in isolation and surfaces Snowflake's raw error without any guidance on how to recover.

## Actual Behavior

The plan fails partway through the physical layer update with:

```
Failed models

  "ANALYTICS"."SESSION_FACTS"

    ProgrammingError:
      003203 (42601): <query_id>: Cannot drop column 'PARTITION_MONTH' which belongs to a clustering key

Error: Plan application failed.
```

By the time the error appears, ~17% of the physical layer update had already completed (sibling models cloned/created). The remaining models — including all downstream consumers of `SESSION_FACTS` — are reported as skipped.

## Reproduction Steps

1. Deploy a `FULL` model that projects a derived column (e.g. `DATE_TRUNC('MONTH', some_ts) AS partition_month`) and references that materialized column directly in `clustered_by`. Apply the plan; let the physical table be created with the clustering key on that column.
2. Edit the model to (a) remove `clustered_by` and (b) stop projecting the materialized column.
3. Run `sqlmesh plan <env> --forward-only --skip-backfill`.
4. Observe SQLMesh categorizing the change as non-breaking, cloning the existing physical snapshot, and then attempting `ALTER TABLE ... DROP COLUMN PARTITION_MONTH` on the clone, which Snowflake rejects with error `003203 (42601)`.
5. Observe SQLMesh rolling back by dropping the newly cloned snapshot.
6. Re-run the plan — observe the loop repeats identically.

## Workarounds

**Only one workaround is effective:**

Manually drop the clustering key on the **source** snapshot (the one being cloned from — typically the snapshot pointed to by the production environment's view) before re-running the plan:

```sql
ALTER TABLE ANALYTICS.sqlmesh__ANALYTICS_SILVER.SESSION_FACTS__<prod_fingerprint>
  DROP CLUSTERING KEY;
```

Once the source snapshot no longer carries the clustering key, the next CLONE produces a snapshot without it, and the subsequent `ALTER TABLE ... DROP COLUMN` succeeds.

> ⚠️ This directly mutates a production physical snapshot. Snowflake's `SELECT *` view semantics are unaffected (the view still resolves correctly), but Snowflake's automatic clustering service stops operating on that table.

**Attempted but ineffective workarounds:**

- `--allow-destructive-model ANALYTICS.SESSION_FACTS` — does **not** bypass the CLONE code path for `FULL` models in this scenario. The plan still clones the source snapshot, inheriting the clustering key, and the ALTER still fails.
- Manually `ALTER TABLE ... DROP CLUSTERING KEY` on the cloned target — not possible because SQLMesh's rollback drops the cloned snapshot before the user can intervene.

## Why This Matters

The change appears trivial from the model author's perspective — drop a clustering key, drop a derived column. The plan summary categorizes it as non-breaking, suggesting it should apply cleanly. The root issue is that SQLMesh's CLONE-based non-breaking code path faithfully copies the source's clustering constraints, but then issues ALTERs that conflict with those inherited constraints without accounting for the required sequencing.

The partial-apply outcome compounds the problem: the plan is left in an inconsistent state where some new physical tables exist but downstream consumers are stuck pointing at old snapshots, and the only recovery path requires directly mutating a production physical table.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-breaking plan fails when removed column is part of existing clustering key (Snowflake) #5813

Summary

Environment

Scenario

Root Cause: CLONE inherits the clustering key

Expected Behavior

Actual Behavior

Reproduction Steps

Workarounds

Why This Matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Non-breaking plan fails when removed column is part of existing clustering key (Snowflake) #5813

Description

Summary

Environment

Scenario

Root Cause: CLONE inherits the clustering key

Expected Behavior

Actual Behavior

Reproduction Steps

Workarounds

Why This Matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions