ci: refactor release workflows for atomic manifest updates#25
Merged
ci: refactor release workflows for atomic manifest updates#25
Conversation
This was referenced Jan 20, 2026
Refactor CI/CD workflows to eliminate race conditions and improve reliability
by decoupling artifact generation from git operations.
Previous workflow: Each matrix job (per architecture) committed directly to git
→ Race conditions, flaky failures, difficult partial recovery
New workflow: Build → Artifact → Single Aggregation → Commit
→ No races, atomic updates, easy partial recovery
**reusable-release-python-tar.yml:**
- Rename job: release-assets-and-update-manifest → release-assets-and-generate-json
- Remove: All git operations (pull, commit, push)
- Remove: Complex bash script for manifest parsing
- Add: Call to generate_partial_manifest.py after release
- Add: Upload partial manifest as workflow artifact
- Simplified: ~120 lines removed, ~30 lines added
**release-latest-python-tag.yml & release-matching-python-tags.yml:**
- Add: New update-manifests job (runs after all builds complete)
- Add: Downloads all partial manifest artifacts
- Add: Calls apply_partial_manifests.py
- Add: Single atomic git commit/push
- Add: Concurrency control (manifest-update-${{ github.ref }})
- Change: Matrix fail-fast: false (preserve successful builds)
**generate_tar.yml, merge-manifest.yml, python-sample.yml:**
- Change: fail-fast: false in matrix strategy
Before:
After:
✅ **No race conditions:** Single commit point
✅ **Partial recovery:** Rerun only failed architectures
✅ **Better reliability:** fail-fast: false preserves successful builds
✅ **Atomic updates:** All-or-nothing manifest commits
✅ **Simpler logic:** Removed ~150 lines of bash script
- .github/workflows/reusable-release-python-tar.yml (+30, -152)
- .github/workflows/release-latest-python-tag.yml (+74, -6)
- .github/workflows/release-matching-python-tags.yml (+75, -6)
- .github/workflows/generate_tar.yml (+1, -1)
- .github/workflows/merge-manifest.yml (+1, -1)
- .github/workflows/python-sample.yml (+1, -1)
Signed-off-by: Adilhusain Shaikh <Adilhusain.Shaikh@ibm.com>
6497222 to
931589e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Race conditions in manifest updates: multiple architecture builds commit to git simultaneously, causing conflicts and non-deterministic results.
Current flow:
Solution
Decouple builds from git operations using single-writer pattern.
New flow:
Changes
reusable-release-python-tar.yml(-122 lines)generate_partial_manifest.py(from PR feat(tooling): add partial manifest generation and application scripts #23)release-assets-and-update-manifest→release-assets-and-generate-jsonrelease-latest-python-tag.yml&release-matching-python-tags.yml(+75 lines each)update-manifestsjob:apply_partial_manifests.py(from PR feat(tooling): add partial manifest generation and application scripts #23)fail-fast: false- preserve successful builds if one failsMinor Updates
generate_tar.yml,merge-manifest.yml,python-sample.yml: Addedfail-fast: falseImpact
✅ Eliminates race conditions - single commit point
✅ Safe job retrigger - failed jobs can be rerun independently
✅ Idempotent aggregation - recollects artifacts and rebuilds manifests safely
✅ Atomic updates - all-or-nothing commits
✅ Simpler code - removed ~150 lines of complex bash
✅ Testable - manifest generation tested in PR #23
Recovery Example
If s390x build fails but ppc64le succeeds:
To recover:
Key: Aggregation is independent and idempotent - safely retriggering jobs causes it to recollect all artifacts and rebuild manifests.
Dependencies
Related