Skip to content

Ufm cert auto refresh#1106

Draft
hasayesh wants to merge 2 commits into
NVIDIA:mainfrom
hasayesh:ufm-cert-auto-refresh
Draft

Ufm cert auto refresh#1106
hasayesh wants to merge 2 commits into
NVIDIA:mainfrom
hasayesh:ufm-cert-auto-refresh

Conversation

@hasayesh
Copy link
Copy Markdown
Contributor

Description

add UFM cert auto-refresh endpoints

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hasayesh hasayesh force-pushed the ufm-cert-auto-refresh branch from 53152c2 to 420864e Compare April 23, 2026 19:42
    ## Description
    The credential reader in forge_vault.rs had a guard that converted an
    empty password to None (treated as missing credentials). This prevented
    the mTLS authentication path in ib/rest.rs from ever being reached for
    UFM credentials, since the empty password — the designed trigger for
    mTLS — was discarded before the caller could act on it.

    ## Type of Change
    <!-- Check one that best describes this PR -->
    - [ ] **Add** - New feature or capability
    - [ ] **Change** - Changes in existing functionality
    - [x] **Fix** - Bug fixes
    - [ ] **Remove** - Removed features or deprecated functionality
    - [ ] **Internal** - Internal changes (refactoring, tests, docs, etc.)

    ## Related Issues (Optional)
    <!-- If applicable, provide GitHub Issue. -->

    ## Breaking Changes
    - [ ] This PR contains breaking changes

    <!-- If checked above, describe the breaking changes and migration steps
    -->

    ## Testing
    <!-- How was this tested? Check all that apply -->
    - [ ] Unit tests added/updated
    - [ ] Integration tests added/updated
    - [ ] Manual testing performed
    - [ ] No testing required (docs, internal refactor, etc.)

    ## Additional Notes
    <!-- Any additional context, deployment notes, or reviewer guidance -->

    Signed-off-by: hasayesh <hasayesh@nvidia.com>

Signed-off-by: Hamid Asayesh <hasayesh@nvidia.com>
Add two new HTTP endpoints for UFM's cert_auto_refresh feature:
  GET /api/v1/ufm/{fabric}/certs/ca     - serves CA/intermediate cert
  GET /api/v1/ufm/{fabric}/certs/server - serves server cert + private key

Both endpoints call the existing Vault PKI certificate provider and
return PEM data. Mounted outside /admin so they bypass web UI OAuth2
but are still protected by the mTLS layer (CertDescriptionMiddleware).

UFM 6.19+ can be configured to periodically pull from these endpoints,
eliminating the need for manual 30-day certificate rotation.

- [x] **Add** - New feature or capability
- [ ] **Change** - Changes in existing functionality
- [ ] **Fix** - Bug fixes
- [ ] **Remove** - Removed features or deprecated functionality
- [ ] **Internal** - Internal changes (refactoring, tests, docs, etc.)

- [ ] This PR contains breaking changes

- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [x] Manual testing performed
- [ ] No testing required (docs, internal refactor, etc.)

Requires infrastructure changes (DNS hostAliases + site config endpoint
hostname) per site before mTLS path can be activated.

Signed-off-by: hasayesh <hasayesh@nvidia.com>
Signed-off-by: Hamid Asayesh <hasayesh@nvidia.com>
@hasayesh hasayesh force-pushed the ufm-cert-auto-refresh branch from 420864e to a3005d0 Compare April 23, 2026 19:48
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 23, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ajf
Copy link
Copy Markdown
Collaborator

ajf commented Apr 23, 2026

I am not sure why copy-pr-bot isn't auto-running tests. The commit is signed and @hasayesh is in NVIDIA org. @lachen-nv do you know?

@ajf
Copy link
Copy Markdown
Collaborator

ajf commented Apr 23, 2026

I guess it's because it's a draft, but the copy-pr-bot comment usually says it's because it's a draft.

@lachen-nv
Copy link
Copy Markdown
Contributor

Yes, because it is a draft PR, we need to manually apply ok to test. Once it is marked as ready, it will trigger automatically.

@lachen-nv
Copy link
Copy Markdown
Contributor

/ok to test a3005d0

@lachen-nv
Copy link
Copy Markdown
Contributor

We could also configure it to trigger on draft PRs. I think that would be fine if we want earlier signal, but it may also waste CI resources on work that is still in progress. My preference would be to keep draft PRs manual unless we are seeing a real need for earlier automatic validation.

@ajf
Copy link
Copy Markdown
Collaborator

ajf commented Apr 27, 2026

@lachen-nv yeah I think the current setup of not running on draft PRs is fine. My initial comment was that the first comment by copy-pr-bot said the PR requires additional verification, not that it was a draft.

@ajf
Copy link
Copy Markdown
Collaborator

ajf commented Apr 29, 2026

@hasayesh can you please update the description with more info, and an accompanying document in the documentation on what this is for and how to use it. cc @Coco-Ben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants