-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[FEATURE REQ][Spark Connector]Add spark 4.1 support #48849
Copy link
Copy link
Open
Labels
ClientThis issue points to a problem in the data-plane of the library.This issue points to a problem in the data-plane of the library.CosmosService AttentionWorkflow: This issue is responsible by Azure service team.Workflow: This issue is responsible by Azure service team.cosmos:spark3Cosmos DB Spark3 OLTP ConnectorCosmos DB Spark3 OLTP Connectorneeds-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamWorkflow: This issue needs attention from Azure service team or SDK team
Metadata
Metadata
Assignees
Labels
ClientThis issue points to a problem in the data-plane of the library.This issue points to a problem in the data-plane of the library.CosmosService AttentionWorkflow: This issue is responsible by Azure service team.Workflow: This issue is responsible by Azure service team.cosmos:spark3Cosmos DB Spark3 OLTP ConnectorCosmos DB Spark3 OLTP Connectorneeds-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamWorkflow: This issue needs attention from Azure service team or SDK team
Summary
Add Spark 4.1 support via a new
azure-cosmos-spark_4-1_2-13module. Addresses SPARK-52787 package reorganization whereHDFSMetadataLogmoved fromo.a.s.sql.execution.streamingtoo.a.s.sql.execution.streaming.checkpointing.Design
Create a shared
azure-cosmos-spark_4base module (similar toazure-cosmos-spark_3for 3.x), then have bothazure-cosmos-spark_4-0_2-13andazure-cosmos-spark_4-1_2-13inherit from it.Why a shared base module?
Analysis shows all 18 Spark 4.x override files (12 main + 6 test Scala files) are identical between 4-0 and 4-1. Only 3 files differ (
ChangeFeedInitialOffsetWriter.scala,CosmosCatalogBase.scala,CosmosCatalogITestBase.scala) due to the HDFSMetadataLog import change.Steps
azure-cosmos-spark_4— shared base module containing the 12 identical main Scala files + 6 identical test files that overrideazure-cosmos-spark_3for Spark 4.x API compatibilityazure-cosmos-spark_4-0_2-13— change parent toazure-cosmos-spark_4, remove duplicated files, keep only 4-0 specific configazure-cosmos-spark_4-1_2-13— inherits fromazure-cosmos-spark_4, overrides the 3 files with updated HDFSMetadataLog imports, uses Spark 4.1.0 dependencyReference
azure-cosmos-spark_3→azure-cosmos-spark_3-x_2-12/azure-cosmos-spark_3-x_2-13Key difference from MetadataVersionUtil
With latest code on upstream-main, the spark connector has already removed the dependency on
MetadataVersionUtil(inlined the version validation logic). So onlyHDFSMetadataLogimport needs updating for Spark 4.1.