Skip to content

[FEATURE REQ][Spark Connector]Add spark 4.1 support #48849

@xinlian12

Description

@xinlian12

Summary

Add Spark 4.1 support via a new azure-cosmos-spark_4-1_2-13 module. Addresses SPARK-52787 package reorganization where HDFSMetadataLog moved from o.a.s.sql.execution.streaming to o.a.s.sql.execution.streaming.checkpointing.

Design

Create a shared azure-cosmos-spark_4 base module (similar to azure-cosmos-spark_3 for 3.x), then have both azure-cosmos-spark_4-0_2-13 and azure-cosmos-spark_4-1_2-13 inherit from it.

Why a shared base module?

Analysis shows all 18 Spark 4.x override files (12 main + 6 test Scala files) are identical between 4-0 and 4-1. Only 3 files differ (ChangeFeedInitialOffsetWriter.scala, CosmosCatalogBase.scala, CosmosCatalogITestBase.scala) due to the HDFSMetadataLog import change.

Steps

  1. Create azure-cosmos-spark_4 — shared base module containing the 12 identical main Scala files + 6 identical test files that override azure-cosmos-spark_3 for Spark 4.x API compatibility
  2. Refactor azure-cosmos-spark_4-0_2-13 — change parent to azure-cosmos-spark_4, remove duplicated files, keep only 4-0 specific config
  3. Create azure-cosmos-spark_4-1_2-13 — inherits from azure-cosmos-spark_4, overrides the 3 files with updated HDFSMetadataLog imports, uses Spark 4.1.0 dependency
  4. Update CI/pipeline configs — emulator matrices, spark.yml, aggregate-reports, version entries

Reference

Key difference from MetadataVersionUtil

With latest code on upstream-main, the spark connector has already removed the dependency on MetadataVersionUtil (inlined the version validation logic). So only HDFSMetadataLog import needs updating for Spark 4.1.

Metadata

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.CosmosService AttentionWorkflow: This issue is responsible by Azure service team.cosmos:spark3Cosmos DB Spark3 OLTP Connectorneeds-team-attentionWorkflow: This issue needs attention from Azure service team or SDK team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions