Skip to content

Spark: Add Spark 4.2 module (copy of Spark 4.1)#16751

Open
rahulsmahadev wants to merge 1 commit into
apache:mainfrom
rahulsmahadev:spark-4.2-module
Open

Spark: Add Spark 4.2 module (copy of Spark 4.1)#16751
rahulsmahadev wants to merge 1 commit into
apache:mainfrom
rahulsmahadev:spark-4.2-module

Conversation

@rahulsmahadev

Copy link
Copy Markdown
Contributor

Summary

First step toward Spark 4.2 support: this PR adds spark/v4.2 as a mechanical, byte-identical copy of spark/v4.1 — zero content changes, no version bumps, no build wiring. spark/v4.1 is untouched.

Why a copy-only PR

This is intentionally split into two PRs so the follow-up PR containing the actual Spark-4.2-specific changes (version bumps, API fixes, build wiring) has a small, reviewable diff instead of being buried in ~150k lines of copied code.

Because the copy is byte-identical, git's copy/rename detection (git log --follow -C, git blame -C -C) links every spark/v4.2 file back to its full v4.1/v4.0/v3.5 history — and this holds even under squash-merge, which does not preserve the git mv + copy-back commit pairs used previously. Verified on this branch: git blame -C -C on spark/v4.2/.../SparkCatalog.java attributes lines to the original 2020–2024 commits, not to the copy commit.

Precedent note: Spark 4.0 (#13059) and Spark 4.1 (#14155) were introduced as Move X as Y + Copy back Y as X + Initial support commit triplets, rebase-merged to preserve the rename pair. This PR deliberately uses a plain byte-identical copy instead, which achieves the same history preservation independent of merge strategy; the "initial support" content will come as the follow-up PR. Related: #14984 takes the established single-PR approach for 4.2.0 (RC) — happy to coordinate or defer to whichever structure maintainers prefer.

Build impact: none

The new directory is invisible to the build until explicitly registered:

  • gradle.properties gates versions via systemProp.knownSparkVersions=3.5,4.0,4.1 (and defaultSparkVersions=4.1) — 4.2 is not listed.
  • settings.gradle only includes spark subprojects inside explicit if (sparkVersions.contains("X")) blocks; there is no globbing of spark/*.
  • spark/build.gradle only does apply from: file("$projectDir/vX/build.gradle") for enabled versions, so spark/v4.2/build.gradle is never applied.
  • The Spark CI matrix is hardcoded to spark: ['3.5', '4.0', '4.1'].

CI and releases are therefore unaffected until the follow-up PR wires the module up. The RAT license check passes since every file is an identical copy of an already-licensed file (dev/.rat-excludes is glob-based, not path-specific).

Verification

  • diff -r spark/v4.1 spark/v4.2 → empty (exit 0)
  • File counts equal: 627 files in spark/v4.1, 627 in spark/v4.2, no symlinks, all tracked
  • Single commit touching only spark/v4.2/** (627 files, +149,822 lines)

Follow-up PR

Spark-4.2-specific changes come next, mirroring the v4.1 "initial support" commit: add 4.2 to knownSparkVersions/defaultSparkVersions, settings.gradle + spark/build.gradle + jmh.gradle wiring, gradle/libs.versions.toml entries, .github/workflows/spark-ci.yml matrix, .gitignore benchmark paths, dev/stage-binaries.sh, version-string bumps inside spark/v4.2, and any API fixes Spark 4.2 requires.

This pull request and its description were written by Isaac.

@nssalian

Copy link
Copy Markdown
Collaborator

@rahulsmahadev @manuzhang has been working on this #14984 please coordinate with that PR so we don't have duplicated work.

@rahulsmahadev

Copy link
Copy Markdown
Contributor Author

@rahulsmahadev @manuzhang has been working on this #14984 please coordinate with that PR so we don't have duplicated work.

Ah I see I didn't realize there was already a PR when I spoke to @szehon-ho today

@nssalian

Copy link
Copy Markdown
Collaborator

Ah I see I didn't realize there was already a PR when I spoke to @szehon-ho today

Your PR description references it. Please coordinate with @manuzhang to help add any missing pieces or help with reviews. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants