Skip to content

Disallow RPM upload with the same NVR [RHELDST-37824]#403

Merged
rbikar merged 3 commits into
release-engineering:masterfrom
rbikar:disallowe-rpm-dupes
Jun 16, 2026
Merged

Disallow RPM upload with the same NVR [RHELDST-37824]#403
rbikar merged 3 commits into
release-engineering:masterfrom
rbikar:disallowe-rpm-dupes

Conversation

@rbikar

@rbikar rbikar commented Jun 5, 2026

Copy link
Copy Markdown
Member

This change introduces a functionality that will raise a fatal error
when upload of an RPM with identical cdn_path of already present unit in pulp
is attempted.

The RPMs in question differ only with their checksums (and signing keys),
further association to live repositories and publish will cause breakage
of repository.

Duplicates check is by default disabled, and can be enabled by setting
PUBTOOLS_PULP_ALLOW_DUPLICATE_UNITS env var.

This should be a workaround until proper suport for such scenario is
implemented.

@rbikar rbikar force-pushed the disallowe-rpm-dupes branch 3 times, most recently from 562d78c to aaca11a Compare June 5, 2026 08:55
@codecov

codecov Bot commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.80%. Comparing base (0066a0d) to head (ffd2d44).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #403   +/-   ##
=======================================
  Coverage   99.80%   99.80%           
=======================================
  Files          54       54           
  Lines        3018     3030   +12     
=======================================
+ Hits         3012     3024   +12     
  Misses          6        6           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rbikar rbikar force-pushed the disallowe-rpm-dupes branch 3 times, most recently from c08da81 to 9726413 Compare June 10, 2026 08:34
@rbikar rbikar marked this pull request as ready for review June 10, 2026 09:39
@rbikar rbikar requested a review from rajulkumar as a code owner June 10, 2026 09:39
@rbikar rbikar requested a review from MichalHaluza June 10, 2026 09:39

@MichalHaluza MichalHaluza left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been tested on production-like data? AFAIK, we used to have the same RPMs signed with different GPG keys for different products in the past. This was (and still is) perfectly valid as long as the RPMs with the same NVRA but different signature aren't associated to the same repo. Checking at the time of upload may break these cases.

This is a workaround until we have a proper support for units with same NVR but different checksum in Pulp.
Such units are not allowed to be uploaded to Pulp as they would break the integrity of the repository when published.
"""
# import pdb; pdb.set_trace()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover from debugging

@rbikar

rbikar commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

Has this been tested on production-like data? AFAIK, we used to have the same RPMs signed with different GPG keys for different products in the past. This was (and still is) perfectly valid as long as the RPMs with the same NVRA but different signature aren't associated to the same repo. Checking at the time of upload may break these cases.

Haven't done any big tests on production data, but the case you mentioned will be disabled with current solution.
I just wanted workflow to fail as soon as possible.

I can move this check to Associate phase probably where I'll do the same or similar query but directly to dest repo ,
maybe (even better) doing query only to dest repo would be sufficient even in the Upload phase. Now we do generic content query.

With new query we will enable upload of NVR duplicates unless the same NVR is present in dest repo. Sounds good?

@MichalHaluza

Copy link
Copy Markdown
Contributor

Yes, checking only the dest repo would be preferred, however can we actually do that? Are the dest repos already known during the upload step? (Note that the same RPM can be uploaded to multiple repos during a signle push, and in RHEL it oftentimes is - mainline and dot)

@rbikar rbikar force-pushed the disallowe-rpm-dupes branch from 6d44c47 to 3697fda Compare June 11, 2026 13:07
@rbikar

rbikar commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Yes, checking only the dest repo would be preferred, however can we actually do that? Are the dest repos already known during the upload step? (Note that the same RPM can be uploaded to multiple repos during a signle push, and in RHEL it oftentimes is - mainline and dot)

Yep, dest repos are known after Query pulp step.

@rbikar rbikar force-pushed the disallowe-rpm-dupes branch from a0e03c2 to 80a79f2 Compare June 11, 2026 13:12
@rbikar

rbikar commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Implementation update - now the error is raised only in case when RPM would create a duplicate in destination repository.

This allows to upload of RPMs with same NVR and different gpg signing keys as long as RPMs target different repositories. Although I think this may also lead to problems with re-signed RPMs with the same path to origin even when targeting different repos. We would probably need to check cdn_path value or the gpg signing key.

Comment thread src/pubtools/_pulp/tasks/push/items/rpm.py Outdated
Comment thread src/pubtools/_pulp/tasks/push/items/rpm.py
This change introduces a functionality that will raise a fatal error
when upload of an RPM with identical `cdn_path` of already present unit in pulp
is attempted.

The RPMs in question differ only with their checksums (and signing keys),
further association to live repositories and publish will cause breakage
of repository.

Duplicates check is by default disabled, and can be enabled by setting
PUBTOOLS_PULP_ALLOW_DUPLICATE_UNITS env var.

This should be a workaround until proper suport for such scenario is
implemented.
@rbikar rbikar force-pushed the disallowe-rpm-dupes branch from 80a79f2 to ae7c8c2 Compare June 12, 2026 11:57
@rbikar rbikar force-pushed the disallowe-rpm-dupes branch from bf0eb5c to 8c702ab Compare June 12, 2026 12:02
@rbikar

rbikar commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

After consultation with @MichalHaluza this should the correct solution:

  • do not allow upload or association of RPM unit that has identical cdn_path (origin) as any unit in pulp.

@rbikar rbikar merged commit b842eeb into release-engineering:master Jun 16, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants