Skip to content

Comments

ci: refactor release workflows for atomic manifest updates#25

Merged
mtarsel merged 1 commit intoIBM:mainfrom
adilhusain-s:pr4-release-workflow-refactor
Feb 4, 2026
Merged

ci: refactor release workflows for atomic manifest updates#25
mtarsel merged 1 commit intoIBM:mainfrom
adilhusain-s:pr4-release-workflow-refactor

Conversation

@adilhusain-s
Copy link
Collaborator

@adilhusain-s adilhusain-s commented Jan 18, 2026

Problem

Race conditions in manifest updates: multiple architecture builds commit to git simultaneously, causing conflicts and non-deterministic results.

Current flow:

  1. Each architecture builds Python
  2. Each commits manifest changes directly → race conditions
  3. Retry on conflict → masks problems, doesn't solve them

Solution

Decouple builds from git operations using single-writer pattern.

New flow:

  1. Each architecture builds Python and generates partial manifest JSON (no git ops)
  2. Upload partials as workflow artifacts
  3. Single aggregation job downloads all partials and commits once atomically

Changes

reusable-release-python-tar.yml (-122 lines)

release-latest-python-tag.yml & release-matching-python-tags.yml (+75 lines each)

  • Added: update-manifests job:
  • Changed: fail-fast: false - preserve successful builds if one fails

Minor Updates

  • generate_tar.yml, merge-manifest.yml, python-sample.yml: Added fail-fast: false

Impact

Eliminates race conditions - single commit point
Safe job retrigger - failed jobs can be rerun independently
Idempotent aggregation - recollects artifacts and rebuilds manifests safely
Atomic updates - all-or-nothing commits
Simpler code - removed ~150 lines of complex bash
Testable - manifest generation tested in PR #23

Recovery Example

If s390x build fails but ppc64le succeeds:

  1. ppc64le uploads partial manifest artifact
  2. s390x job fails (no artifact uploaded)
  3. Aggregation job downloads available artifacts (ppc64le only)
  4. Creates version manifest files and commits ppc64le entries

To recover:

  1. Retrigger only the failed s390x job
  2. s390x now uploads its partial manifest artifact
  3. Aggregation job automatically retriggered
  4. Downloads all available artifacts (ppc64le + s390x)
  5. Recollects and recreates version manifest files with both architectures
  6. Commits atomically - no conflicts, no lost data

Key: Aggregation is independent and idempotent - safely retriggering jobs causes it to recollect all artifacts and rebuild manifests.

Dependencies

Related

Refactor CI/CD workflows to eliminate race conditions and improve reliability
by decoupling artifact generation from git operations.

Previous workflow: Each matrix job (per architecture) committed directly to git
→ Race conditions, flaky failures, difficult partial recovery

New workflow: Build → Artifact → Single Aggregation → Commit
→ No races, atomic updates, easy partial recovery

**reusable-release-python-tar.yml:**
- Rename job: release-assets-and-update-manifest → release-assets-and-generate-json
- Remove: All git operations (pull, commit, push)
- Remove: Complex bash script for manifest parsing
- Add: Call to generate_partial_manifest.py after release
- Add: Upload partial manifest as workflow artifact
- Simplified: ~120 lines removed, ~30 lines added

**release-latest-python-tag.yml & release-matching-python-tags.yml:**
- Add: New update-manifests job (runs after all builds complete)
- Add: Downloads all partial manifest artifacts
- Add: Calls apply_partial_manifests.py
- Add: Single atomic git commit/push
- Add: Concurrency control (manifest-update-${{ github.ref }})
- Change: Matrix fail-fast: false (preserve successful builds)

**generate_tar.yml, merge-manifest.yml, python-sample.yml:**
- Change: fail-fast: false in matrix strategy

Before:

After:

✅ **No race conditions:** Single commit point
✅ **Partial recovery:** Rerun only failed architectures
✅ **Better reliability:** fail-fast: false preserves successful builds
✅ **Atomic updates:** All-or-nothing manifest commits
✅ **Simpler logic:** Removed ~150 lines of bash script

- .github/workflows/reusable-release-python-tar.yml (+30, -152)
- .github/workflows/release-latest-python-tag.yml (+74, -6)
- .github/workflows/release-matching-python-tags.yml (+75, -6)
- .github/workflows/generate_tar.yml (+1, -1)
- .github/workflows/merge-manifest.yml (+1, -1)
- .github/workflows/python-sample.yml (+1, -1)

Signed-off-by: Adilhusain Shaikh <Adilhusain.Shaikh@ibm.com>
@adilhusain-s adilhusain-s force-pushed the pr4-release-workflow-refactor branch from 6497222 to 931589e Compare February 4, 2026 09:08
@mtarsel mtarsel merged commit 6fb3811 into IBM:main Feb 4, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants