[SPARK-57075][INFRA] Share precompile Coursier cache with test/pyspark/sparkr jobs#56118
Draft
zhengruifeng wants to merge 1 commit into
Draft
[SPARK-57075][INFRA] Share precompile Coursier cache with test/pyspark/sparkr jobs#56118zhengruifeng wants to merge 1 commit into
zhengruifeng wants to merge 1 commit into
Conversation
dac4a74 to
b759a65
Compare
Add the `precompile-coursier-` cache as a restore-key fallback for the `test`, `pyspark`, and `sparkr` jobs in `build_and_test.yml` so they can reuse the dependency JARs already resolved by the `precompile` job instead of re-downloading them when their own Coursier cache misses. Generated-by: Claude Code (Claude Opus 4.7)
b759a65 to
fc6274b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Add the
precompile-coursier-cache as a restore-key fallback for thetest,pyspark, andsparkrjobs in.github/workflows/build_and_test.yml,so they can reuse the dependency JARs already resolved by the
precompilejob when their own Coursier cache misses.
Concretely, each of the three jobs'
Cache Coursier local repositorystepnow has these additional fallback restore-keys (existing primary key and
prefix fallback unchanged):
Why are the changes needed?
The
precompilejob already resolves the full superset of dependencies(it builds with
-Phadoop-3 -Pyarn -Pspark-ganglia-lgpl -Phadoop-cloud -Phive -Pkubernetes -Pjvm-profiler -Pkinesis-asl -Phive-thriftserver -Pdocker-integration-tests -Pvolcano) and populates~/.cache/coursier,but writes that cache under the key prefix
precompile-coursier-. Thedownstream test jobs read from
${matrix.java}-${matrix.hadoop}-coursier-,pyspark-coursier-, andsparkr-coursier-respectively, so they cannotsee the precompile job's cache.
The precompile artifact tarball only bundles
target/directories(
.classfiles and assemblies); it does not include the resolved JARs.So when a test job's own Coursier cache is cold (new branch, modified
pom.xml/plugins.sbt), SBT and Coursier still have to re-resolveand re-download the dependencies from scratch even though the
precompile job already downloaded them in this same workflow.
Adding the precompile cache as a restore-key fallback lets the test
jobs benefit from that work in the cold-cache case. The change is
purely additive: existing per-job caches still take precedence via the
primary key and the first restore-key entry.
Does this PR introduce any user-facing change?
No. CI-only.
How was this patch tested?
YAML validates with
python3 -c "import yaml; yaml.safe_load(...)". Theeffectiveness of the cache fallback can only be observed on actual GHA
runs and will be evaluated by the CI on this PR.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.7)