Skip to content

[SPARK-56995][SQL][DML] Allow dataframe caching in the DSv2 Transaction API#56121

Open
andreaschat-db wants to merge 6 commits into
apache:masterfrom
andreaschat-db:dsv2TransactionDFCachingFix
Open

[SPARK-56995][SQL][DML] Allow dataframe caching in the DSv2 Transaction API#56121
andreaschat-db wants to merge 6 commits into
apache:masterfrom
andreaschat-db:dsv2TransactionDFCachingFix

Conversation

@andreaschat-db
Copy link
Copy Markdown
Contributor

@andreaschat-db andreaschat-db commented May 26, 2026

Currently, the DSv2 Transaction API skips dataframe caching. This can cause significant performance regression to existing workloads. Dataframes cached prior to the transaction should be allowed to be reused within the transaction. Cache substitution during a transaction now delegates to the connector via Transaction.registerScans. Spark hands every materialized scan in a candidate cached subtree to the active transaction, and the connector decides whether reusing the cached snapshots is compatible with its isolation contract.

What changes were proposed in this pull request?

This transaction introduces Transaction.registerScans in the transaction API. Check above for more details.

Why are the changes needed?

Without this fix cached dataframes cannot be used in transactions. As a result, the DSv2 transaction API will introduce significant performance regression for relevant workloads.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing and new tests.

Was this patch authored or co-authored using generative AI tooling?

Opus 4.7

@huaxingao
Copy link
Copy Markdown
Contributor

@aokolnychyi @andreaschat-db

Is this PR aiming for 4.2?

@andreaschat-db andreaschat-db marked this pull request as ready for review May 28, 2026 07:48
@andreaschat-db
Copy link
Copy Markdown
Contributor Author

@aokolnychyi @andreaschat-db

Is this PR aiming for 4.2?

Yes. It is an important fix for the DSv2 Transaction API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants