Skip to content

feat: Support fw updates to unregistered devices in NSM and PSM#442

Merged
spydaNVIDIA merged 5 commits into
mainfrom
feat/pyda-fw
May 4, 2026
Merged

feat: Support fw updates to unregistered devices in NSM and PSM#442
spydaNVIDIA merged 5 commits into
mainfrom
feat/pyda-fw

Conversation

@spydaNVIDIA
Copy link
Copy Markdown
Contributor

Description

feat: Support fw updates to unregistered devices in NSM and PSM

Type of Change

  • Feature - New feature or functionality (feat:)
  • Fix - Bug fixes (fix:)
  • Chore - Modification or removal of existing functionality (chore:)
  • Refactor - Refactoring of existing functionality (refactor:)
  • Docs - Changes in documentation or OpenAPI schema (docs:)
  • CI - Changes in GitHub workflows. Requires additional scrutiny (ci:)
  • Version - Issuing a new release version (version:)

Services Affected

  • API - API models or endpoints updated
  • Workflow - Workflow service updated
  • DB - DB DAOs or migrations updated
  • Site Manager - Site Manager updated
  • Cert Manager - Cert Manager updated
  • Site Agent - Site Agent updated
  • RLA - RLA service updated
  • Powershelf Manager - Powershelf Manager updated
  • NVSwitch Manager - NVSwitch Manager updated

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@spydaNVIDIA spydaNVIDIA requested a review from kunzhao-nv April 29, 2026 23:17
@spydaNVIDIA spydaNVIDIA requested a review from a team as a code owner April 29, 2026 23:17
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Consolidates credential types to common/pkg/credential (adds nil-safe Credential.Equal); removes local credential/secretstring implementations; adds firmware-target support with per-target validation, auto-registration, and queuing in both managers; implements idempotent credential put/overwrite semantics and updates tests and imports accordingly.

Changes

Cohort / File(s) Summary
Shared Credential Package
common/pkg/credential/credential.go, common/pkg/credential/credential_test.go
Adds Credential.Equal(*Credential) bool with explicit nil handling; expands unit tests for equality, IsValid, Retrieve, and env-based constructors.
Deleted Local Implementations
nvswitch-manager/pkg/common/credential/credential.go, nvswitch-manager/pkg/common/secretstring/secretstring.go, nvswitch-manager/pkg/common/secretstring/secretstring_test.go
Removes local Credential and SecretString implementations and their tests; functionality consolidated to shared package.
Import Consolidation & Small Adjustments
nvswitch-manager/cmd/..., nvswitch-manager/internal/service/config.go, nvswitch-manager/pkg/credentials/credential_manager.go, nvswitch-manager/pkg/db/config.go, nvswitch-manager/pkg/objects/..., powershelf-manager/...
Repoints imports from local pkg/common/* to common/pkg/*; adjusts pointer/value usage where Credential is created, assigned, or passed.
Proto: Firmware Targets
nvswitch-manager/internal/proto/v1/nvswitch-manager.proto, powershelf-manager/internal/proto/v1/powershelf-manager.proto
Adds FirmwareTarget and related request/response fields; introduces targets repeated field; documents mutual-exclusivity and per-item best-effort semantics; adds response IP fields.
Service: Targets & Validation (nvswitch)
nvswitch-manager/internal/service/server_impl.go, ..._test.go
Enforces exactly-one-of switch_uuids/targets; adds target validation, conversion helpers, auto-registration path, queueFirmwareUpdate refactor, and per-target result population. Extensive tests added for validation, mapping, and contract semantics.
Service: Targets & Validation (powershelf)
powershelf-manager/internal/service/server_impl.go, ..._test.go
Adds target validation, per-target registration and conversion, refactors component upgrade flow, and adds tests for validation, mutual-exclusivity, and nil-manager guards.
Credential Storage — InMemory
nvswitch-manager/pkg/credentials/inmemory.go, powershelf-manager/pkg/credentials/inmemory.go, .../inmemory_test.go
Put/Patch now reject nil cred; Put reads existing entry and no-ops when Equal (info) or overwrites when different (warn). Tests updated/added for idempotency and overwrite semantics.
Credential Storage — Vault
nvswitch-manager/pkg/credentials/vault.go, powershelf-manager/pkg/credentials/vault.go, .../vault_test.go
Introduces errCredentialNotFound sentinel; Put becomes read-before-write (skip on Equal, overwrite on differ/unreadable); Patch now unconditionally writes. Tests validate upsert semantics.
Service Init Changes (nvswitch)
nvswitch-manager/internal/service/service.go
DB initialization now occurs only in persistent mode; persistent mode requires DB host and fails fast on connection/migration errors.
Widespread Tests & Adjustments
nvswitch-manager/..._test.go, powershelf-manager/..._test.go, nvswitch-manager/pkg/db/config_test.go, nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go
Tests updated to use shared credential package, adjusted pointer semantics, and expanded to cover firmware-target and credential upsert behavior.
go.mod
go.mod
Promotes github.com/golang/protobuf v1.5.4 from indirect to direct requirement (version unchanged).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Service
    participant Registrar
    participant CredentialMgr
    participant DataStore

    Client->>Service: QueueUpdates(req with FirmwareTarget)
    Service->>Service: validateFirmwareTarget(target)
    alt validation fails
        Service-->>Client: return per-item INVALID_ARGUMENT result
    else validation passes
        Service->>Registrar: registerDevice(bmc/nvos details)
        alt registrar rejects (in-memory: missing creds)
            Registrar-->>Service: error -> populate per-target result
        else registrar succeeds (returns UUID)
            Service->>CredentialMgr: PutBMC(mac, cred)
            CredentialMgr->>DataStore: Get(mac)
            alt not found
                CredentialMgr->>DataStore: Write(mac, newCred)
            else found & Equal
                CredentialMgr-->>Service: no-op (info)
            else found & differs
                CredentialMgr->>DataStore: Overwrite(mac, newCred)
            end
            Registrar->>Service: UUID
            Service->>Service: queueFirmwareUpdate(UUID, bundle)
            Service-->>Client: return QueueUpdateResult(bmc_ip, status)
        end
    end
Loading
sequenceDiagram
    participant Caller
    participant CredentialManager
    participant Backend

    Caller->>CredentialManager: Put(id, newCred)
    CredentialManager->>Backend: Get(id)
    alt not found
        CredentialManager->>Backend: Write(id, newCred)
        CredentialManager-->>Caller: success
    else found
        alt newCred.Equal(existing)
            CredentialManager-->>Caller: no-op (info)
        else credentials differ
            CredentialManager->>Backend: Overwrite(id, newCred)
            CredentialManager-->>Caller: success (warn)
        end
    end

    Caller->>CredentialManager: Patch(id, cred)
    CredentialManager->>Backend: Write(id, cred)   <!-- unconditional -->
    CredentialManager-->>Caller: success
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the primary feature: adding firmware update support for unregistered devices in NSM and PSM, which is the main objective of this changeset.
Description check ✅ Passed The description explicitly states the feature intent and clarifies the change type as 'Chore', providing sufficient context about what is being added to the codebase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/pyda-fw

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-04-29 23:18:59 UTC | Commit: 1061e78

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

Test Results

9 022 tests  +54   9 022 ✅ +54   7m 6s ⏱️ -14s
  150 suites  -  1       0 💤 ± 0 
   14 files   ± 0       0 ❌ ± 0 

Results for commit 8b7cd9f. ± Comparison against base commit bce7503.

This pull request removes 4 and adds 58 tests. Note that renamed tests count towards both.
github.com/NVIDIA/ncx-infra-controller-rest/nvswitch-manager/pkg/common/credential ‑ TestCredential
github.com/NVIDIA/ncx-infra-controller-rest/nvswitch-manager/pkg/common/credential ‑ TestNewCredentialFromEnv
github.com/NVIDIA/ncx-infra-controller-rest/nvswitch-manager/pkg/common/secretstring ‑ TestSecretString
github.com/NVIDIA/ncx-infra-controller-rest/nvswitch-manager/pkg/credentials ‑ TestInMemoryBMCPutGet/put_overwrites_existing_value
github.com/NVIDIA/ncx-infra-controller-rest/common/pkg/credential ‑ TestCredential
github.com/NVIDIA/ncx-infra-controller-rest/common/pkg/credential ‑ TestCredentialEqual
github.com/NVIDIA/ncx-infra-controller-rest/common/pkg/credential ‑ TestCredentialEqual/both_differ
github.com/NVIDIA/ncx-infra-controller-rest/common/pkg/credential ‑ TestCredentialEqual/both_nil
github.com/NVIDIA/ncx-infra-controller-rest/common/pkg/credential ‑ TestCredentialEqual/different_password
github.com/NVIDIA/ncx-infra-controller-rest/common/pkg/credential ‑ TestCredentialEqual/different_user
github.com/NVIDIA/ncx-infra-controller-rest/common/pkg/credential ‑ TestCredentialEqual/equal_values
github.com/NVIDIA/ncx-infra-controller-rest/common/pkg/credential ‑ TestCredentialEqual/first_nil
github.com/NVIDIA/ncx-infra-controller-rest/common/pkg/credential ‑ TestCredentialEqual/identical
github.com/NVIDIA/ncx-infra-controller-rest/common/pkg/credential ‑ TestCredentialEqual/second_nil
…

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (4)
nvswitch-manager/internal/proto/v1/nvswitch-manager.proto (1)

280-286: ⚡ Quick win

Clarify mixed-mode request behavior in the contract.

Given the service processes both switch_uuids and targets in one call, document whether combining them is intentional and how overlap should be handled. This avoids ambiguous client behavior.

As per coding guidelines **/*.proto: review Protobuf definitions for compatibility and expressiveness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nvswitch-manager/internal/proto/v1/nvswitch-manager.proto` around lines 280 -
286, The proto comment for QueueUpdatesRequest is ambiguous about mixed-mode
requests: callers can supply both switch_uuids and targets but the contract
doesn't say if that's allowed or how to handle overlaps; update the
QueueUpdatesRequest documentation to explicitly state whether combining
switch_uuids and targets is supported, define precedence or deduplication
behavior when the same device appears in both lists, and specify whether targets
will be auto-registered before update and how errors for already-registered or
conflicting entries are reported; reference the message QueueUpdatesRequest and
fields switch_uuids and targets so clients and servers implement consistent
behavior.
powershelf-manager/pkg/credentials/inmemory_test.go (1)

309-327: ⚡ Quick win

Assert the no-op branch explicitly.

This test claims idempotency, but it only checks the final state after the overwrite. It will still pass if the second identical Put replaces the stored pointer instead of being a true no-op.

Suggested strengthening
 func TestInMemoryPutIdempotentAndOverwrite(t *testing.T) {
 	ctx := context.Background()
 	mgr := NewInMemoryCredentialManager()
 	mac := parseMAC(t, "00:11:22:33:44:55")
+	first := newCred("admin", "secret")

 	// First Put succeeds
-	assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "secret")))
+	assert.NoError(t, mgr.Put(ctx, mac, first))

 	// Idempotent Put with same credentials is a no-op
 	assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "secret")))
+	got, err := mgr.Get(ctx, mac)
+	assert.NoError(t, err)
+	assert.Same(t, first, got)

 	// Put with different credentials overwrites (no error for in-memory)
 	assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "different")))

 	// Credentials are now the new values
-	got, err := mgr.Get(ctx, mac)
+	got, err = mgr.Get(ctx, mac)
 	assert.NoError(t, err)
 	assert.Equal(t, "admin", got.User)
 	assert.Equal(t, "different", got.Password.Value)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@powershelf-manager/pkg/credentials/inmemory_test.go` around lines 309 - 327,
The test TestInMemoryPutIdempotentAndOverwrite must explicitly verify the
idempotent no-op branch: after the first mgr.Put(ctx, mac, newCred("admin",
"secret")) call, call mgr.Get(ctx, mac) to capture the stored credential (e.g.,
beforeCred), then perform the second identical mgr.Put(...) and call mgr.Get
again (afterCred); assert that afterCred represents the same stored object
(verify pointer identity or otherwise assert no new allocation/change) and that
values remain equal, before proceeding to the overwrite case—use the existing
functions/methods mgr.Put, mgr.Get, parseMAC, and newCred to locate and
implement this check.
nvswitch-manager/pkg/credentials/vault_test.go (1)

370-372: ⚡ Quick win

Assert the conflict contract, not just “some error”.

At Line 371, this will also pass on unrelated Vault failures. Please assert the conflict reason as well, so the test actually pins the new idempotent-vs-conflict behavior.

Suggested tightening
-	assert.Error(t, mgr.PutBMC(ctx, mac, newCredential("admin", "different")))
-	assert.Error(t, mgr.PutNVOS(ctx, mac, newCredential("nvos", "different")))
+	err := mgr.PutBMC(ctx, mac, newCredential("admin", "different"))
+	assert.Error(t, err)
+	assert.Contains(t, err.Error(), "differ from provided")
+
+	err = mgr.PutNVOS(ctx, mac, newCredential("nvos", "different"))
+	assert.Error(t, err)
+	assert.Contains(t, err.Error(), "differ from provided")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nvswitch-manager/pkg/credentials/vault_test.go` around lines 370 - 372,
Replace the loose assert.Error checks on mgr.PutBMC and mgr.PutNVOS with
assertions that the returned error specifically indicates a conflict (not any
Vault failure): call mgr.PutBMC(ctx, mac, newCredential(...)) and capture the
error, then assert errors.Is(err, ErrConflict) (or compare against the package's
ErrConflict/Error string) or check for Vault 409 conflict semantics; do the same
for mgr.PutNVOS so the test enforces the idempotent-vs-conflict contract for
PutBMC/PutNVOS rather than accepting any error.
nvswitch-manager/pkg/credentials/inmemory_test.go (1)

112-119: ⚡ Quick win

Make the second PutBMC data-driven.

Branching on name == "put same credential is no-op" means a test rename silently stops exercising the idempotent path. Add an explicit boolean to the table instead so the scenario stays declarative.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nvswitch-manager/pkg/credentials/inmemory_test.go` around lines 112 - 119,
The test currently branches on the test name to exercise the idempotent PutBMC
path; add a boolean field (e.g., putSameCredential bool) to the table-driven
test case struct, set it true for the "put same credential is no-op" case, and
replace the name comparison with if tc.putSameCredential { assert.NoError(t,
mgr.PutBMC(ctx, mac, newCredential("user1","p1"))) } so the idempotent scenario
is driven by data, not the test name; update the case entries accordingly and
keep the initialPut logic using parseMAC, PutBMC and newCredential as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nvswitch-manager/internal/service/server_impl.go`:
- Around line 456-463: Iterate safely over req.Targets by first checking if each
target == nil before reading target.BmcIp or calling validateFirmwareTarget; for
any nil entry construct the pb.QueueUpdateResult (e.g., result :=
&pb.QueueUpdateResult{}) setting result.Status = pb.StatusCode_INVALID_ARGUMENT
and result.Error to a descriptive message like "nil firmware target", and
continue to the next item; similarly update validateFirmwareTarget to return an
error when passed a nil *pb.FirmwareTarget (or ensure callers guard nil) so you
never dereference target inside validateFirmwareTarget.

In `@nvswitch-manager/internal/service/service.go`:
- Around line 53-64: The persistent-mode initialization currently skips DB setup
when c.DBConf.Host == "" which leaves db nil; remove the extra guard so
persistent mode always attempts DB setup: change the conditional to only check
c.DataStoreType == nvswitchmanager.DatastoreTypePersistent, then call
postgres.New(ctx, c.DBConf) (or add an explicit validation that c.DBConf.Host is
required and return an error) and run migrations.Migrate(ctx, pg) as before;
reference postgres.New, migrations.Migrate, c.DataStoreType, and c.DBConf.Host
to locate and fix the logic so persistent mode fails fast instead of bypassing
DB initialization.

In `@nvswitch-manager/pkg/credentials/inmemory.go`:
- Around line 86-100: PutBMC and PutNVOS currently allow nil credentials to be
written into m.store, which later causes GetBMC/GetNVOS to panic when they call
cred.IsValid(); modify both InMemoryCredentialManager.PutBMC and PutNVOS to
validate the input immediately and return a non-nil error if cred is nil (do not
write to m.store), keeping the existing locking and overwrite logic otherwise;
update error messages to clearly indicate a nil credential was provided and
reference the key (use m.bmcKey or m.nvosKey) in the log/returned error to aid
debugging.

In `@nvswitch-manager/pkg/credentials/vault.go`:
- Around line 232-243: VaultCredentialManager.PutBMC/PutNVOS reject
existing-but-different credentials while
InMemoryCredentialManager.PutBMC/PutNVOS still overwrite, causing behavioral
mismatch in tests; make semantics consistent by changing the in-memory
implementations to mirror Vault: in InMemoryCredentialManager.PutBMC and
InMemoryCredentialManager.PutNVOS first call get for the key, if an existing
credential is present and not Equal(cred) return an error (same message/shape as
Vault’s fmt.Errorf hinting to use PatchBMC/PatchNVOS), otherwise no-op when
equal or call put when absent; update in-memory tests to expect the rejection
instead of overwrite.

In `@nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go`:
- Around line 149-151: The test is storing the NVOS credential under the BMC MAC
(using tray.BMC.MAC) so nm.Get for the NVOS subsystem never finds it; change the
PutNVOS call in the setup branch (where tc.setupNVOSCred is true) to write the
credential under the NVOS MAC field (use the tray.NVOS.MAC field or the
appropriate tray.<NVOS>.MAC accessor) when calling nm.CredentialManager.PutNVOS
instead of tray.BMC.MAC so the NVOS lookup key matches what nm.Get expects.

In `@powershelf-manager/internal/service/server_impl.go`:
- Around line 220-236: Validation and registration error handling currently
appends a single UpdateComponentFirmwareResponse into resp.Components which
loses per-component results when a target has multiple components; modify the
branches that call validateFirmwareTarget(target) and s.registerPowershelf(...)
so that for each requested component in the target you append an individual
*pb.UpdateComponentFirmwareResponse with the same Status/Error (e.g., loop over
target.Components or the request's component list) to resp.Components, then
append resp to responses and continue; update code paths around
validateFirmwareTarget, firmwareTargetToRegisterRequest, and registerPowershelf
to populate per-component entries rather than a single element.
- Around line 249-260: In upgradeComponents, every per-component response must
include the original Component so callers can correlate results; when the
UpgradeTo nil validation triggers, set UpdateComponentFirmwareResponse.Component
= component.Component, and when appending the result of s.updateFirmware(ctx,
pmcMac, component.Component, component.UpgradeTo.Version) ensure the returned
*pb.UpdateComponentFirmwareResponse has its Component field populated (e.g.,
check the response from updateFirmware and, if resp.Component is empty, set
resp.Component = component.Component) before appending; this touches
upgradeComponents and the handling of responses from updateFirmware.

In `@powershelf-manager/pkg/credentials/inmemory.go`:
- Around line 73-87: In InMemoryCredentialManager.Put, reject nil credential
pointers immediately instead of persisting them: add a nil-check at the start of
Put(ctx, mac, cred) and return a non-nil error (and optional log) if cred == nil
so m.store[key] is never set to nil; keep the existing equality/overwrite logic
(existing.Equal(cred), log.Warnf) unchanged and ensure callers get an error
rather than causing later panics in Get().

In `@powershelf-manager/pkg/credentials/vault.go`:
- Around line 174-190: VaultCredentialManager.Put currently treats any error
from m.Get as "missing" and proceeds to overwrite; change Put so it only falls
through to m.write when m.Get returns a typed "not found" sentinel and returns
every other error to the caller. In practice, update VaultCredentialManager.Put
to call existing, err := m.Get(...); if err != nil { if !errors.Is(err,
credential.ErrNotFound) { return err } /* proceed to write because not found */
} else if existing != nil { ... } — use errors.Is to compare against the
package's not-found sentinel (e.g., credential.ErrNotFound or the vault SDK's
not-found error) and import "errors"; only call m.write when the error is that
sentinel.

---

Nitpick comments:
In `@nvswitch-manager/internal/proto/v1/nvswitch-manager.proto`:
- Around line 280-286: The proto comment for QueueUpdatesRequest is ambiguous
about mixed-mode requests: callers can supply both switch_uuids and targets but
the contract doesn't say if that's allowed or how to handle overlaps; update the
QueueUpdatesRequest documentation to explicitly state whether combining
switch_uuids and targets is supported, define precedence or deduplication
behavior when the same device appears in both lists, and specify whether targets
will be auto-registered before update and how errors for already-registered or
conflicting entries are reported; reference the message QueueUpdatesRequest and
fields switch_uuids and targets so clients and servers implement consistent
behavior.

In `@nvswitch-manager/pkg/credentials/inmemory_test.go`:
- Around line 112-119: The test currently branches on the test name to exercise
the idempotent PutBMC path; add a boolean field (e.g., putSameCredential bool)
to the table-driven test case struct, set it true for the "put same credential
is no-op" case, and replace the name comparison with if tc.putSameCredential {
assert.NoError(t, mgr.PutBMC(ctx, mac, newCredential("user1","p1"))) } so the
idempotent scenario is driven by data, not the test name; update the case
entries accordingly and keep the initialPut logic using parseMAC, PutBMC and
newCredential as-is.

In `@nvswitch-manager/pkg/credentials/vault_test.go`:
- Around line 370-372: Replace the loose assert.Error checks on mgr.PutBMC and
mgr.PutNVOS with assertions that the returned error specifically indicates a
conflict (not any Vault failure): call mgr.PutBMC(ctx, mac, newCredential(...))
and capture the error, then assert errors.Is(err, ErrConflict) (or compare
against the package's ErrConflict/Error string) or check for Vault 409 conflict
semantics; do the same for mgr.PutNVOS so the test enforces the
idempotent-vs-conflict contract for PutBMC/PutNVOS rather than accepting any
error.

In `@powershelf-manager/pkg/credentials/inmemory_test.go`:
- Around line 309-327: The test TestInMemoryPutIdempotentAndOverwrite must
explicitly verify the idempotent no-op branch: after the first mgr.Put(ctx, mac,
newCred("admin", "secret")) call, call mgr.Get(ctx, mac) to capture the stored
credential (e.g., beforeCred), then perform the second identical mgr.Put(...)
and call mgr.Get again (afterCred); assert that afterCred represents the same
stored object (verify pointer identity or otherwise assert no new
allocation/change) and that values remain equal, before proceeding to the
overwrite case—use the existing functions/methods mgr.Put, mgr.Get, parseMAC,
and newCred to locate and implement this check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4e7cc0f7-df38-41a1-a33a-c14981231b38

📥 Commits

Reviewing files that changed from the base of the PR and between 33ddf04 and 1061e78.

⛔ Files ignored due to path filters (2)
  • nvswitch-manager/internal/proto/v1/nvswitch-manager.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • powershelf-manager/internal/proto/v1/powershelf-manager.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
📒 Files selected for processing (31)
  • common/pkg/credential/credential.go
  • common/pkg/credential/credential_test.go
  • nvswitch-manager/cmd/redfish.go
  • nvswitch-manager/cmd/serve.go
  • nvswitch-manager/cmd/ssh.go
  • nvswitch-manager/cmd/update_strategy.go
  • nvswitch-manager/internal/proto/v1/nvswitch-manager.proto
  • nvswitch-manager/internal/service/config.go
  • nvswitch-manager/internal/service/server_impl.go
  • nvswitch-manager/internal/service/server_impl_test.go
  • nvswitch-manager/internal/service/service.go
  • nvswitch-manager/pkg/common/credential/credential.go
  • nvswitch-manager/pkg/common/secretstring/secretstring.go
  • nvswitch-manager/pkg/common/secretstring/secretstring_test.go
  • nvswitch-manager/pkg/credentials/credential_manager.go
  • nvswitch-manager/pkg/credentials/inmemory.go
  • nvswitch-manager/pkg/credentials/inmemory_test.go
  • nvswitch-manager/pkg/credentials/vault.go
  • nvswitch-manager/pkg/credentials/vault_test.go
  • nvswitch-manager/pkg/db/config.go
  • nvswitch-manager/pkg/db/config_test.go
  • nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go
  • nvswitch-manager/pkg/objects/bmc/bmc.go
  • nvswitch-manager/pkg/objects/nvos/nvos.go
  • powershelf-manager/internal/proto/v1/powershelf-manager.proto
  • powershelf-manager/internal/service/server_impl.go
  • powershelf-manager/internal/service/server_impl_test.go
  • powershelf-manager/pkg/credentials/inmemory.go
  • powershelf-manager/pkg/credentials/inmemory_test.go
  • powershelf-manager/pkg/credentials/vault.go
  • powershelf-manager/pkg/credentials/vault_test.go
💤 Files with no reviewable changes (3)
  • nvswitch-manager/pkg/common/secretstring/secretstring_test.go
  • nvswitch-manager/pkg/common/secretstring/secretstring.go
  • nvswitch-manager/pkg/common/credential/credential.go

Comment thread nvswitch-manager/internal/service/server_impl.go
Comment thread nvswitch-manager/internal/service/service.go Outdated
Comment thread nvswitch-manager/pkg/credentials/inmemory.go
Comment thread nvswitch-manager/pkg/credentials/vault.go Outdated
Comment on lines 149 to +151
if tc.setupNVOSCred {
require.NoError(t, nm.CredentialManager.PutNVOS(ctx, tray.BMC.MAC, credential.New("nvos_admin", "nvos_pass")))
c := credential.New("nvos_admin", "nvos_pass")
require.NoError(t, nm.CredentialManager.PutNVOS(ctx, tray.BMC.MAC, &c))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Store the NVOS credential under the NVOS MAC.

Line 151 writes the NVOS credential with tray.BMC.MAC, so the success case never populates the key that nm.Get() needs for the NVOS subsystem.

Suggested fix
 			if tc.setupNVOSCred {
 				c := credential.New("nvos_admin", "nvos_pass")
-				require.NoError(t, nm.CredentialManager.PutNVOS(ctx, tray.BMC.MAC, &c))
+				require.NoError(t, nm.CredentialManager.PutNVOS(ctx, tray.NVOS.MAC, &c))
 			}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if tc.setupNVOSCred {
require.NoError(t, nm.CredentialManager.PutNVOS(ctx, tray.BMC.MAC, credential.New("nvos_admin", "nvos_pass")))
c := credential.New("nvos_admin", "nvos_pass")
require.NoError(t, nm.CredentialManager.PutNVOS(ctx, tray.BMC.MAC, &c))
if tc.setupNVOSCred {
c := credential.New("nvos_admin", "nvos_pass")
require.NoError(t, nm.CredentialManager.PutNVOS(ctx, tray.NVOS.MAC, &c))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go` around lines
149 - 151, The test is storing the NVOS credential under the BMC MAC (using
tray.BMC.MAC) so nm.Get for the NVOS subsystem never finds it; change the
PutNVOS call in the setup branch (where tc.setupNVOSCred is true) to write the
credential under the NVOS MAC field (use the tray.NVOS.MAC field or the
appropriate tray.<NVOS>.MAC accessor) when calling nm.CredentialManager.PutNVOS
instead of tray.BMC.MAC so the NVOS lookup key matches what nm.Get expects.

Comment thread powershelf-manager/internal/service/server_impl.go
Comment thread powershelf-manager/internal/service/server_impl.go
Comment thread powershelf-manager/pkg/credentials/inmemory.go
Comment thread powershelf-manager/pkg/credentials/vault.go Outdated
Comment thread nvswitch-manager/internal/service/server_impl.go
Comment thread nvswitch-manager/pkg/credentials/inmemory.go
Comment thread powershelf-manager/internal/service/server_impl.go Outdated
Comment thread powershelf-manager/internal/service/server_impl.go
Comment thread nvswitch-manager/internal/service/service.go Outdated
Comment thread nvswitch-manager/pkg/credentials/vault.go
Comment thread nvswitch-manager/internal/service/server_impl_test.go Outdated
Comment thread nvswitch-manager/pkg/credentials/vault_test.go Outdated
Comment thread nvswitch-manager/pkg/credentials/inmemory_test.go Outdated
Comment thread nvswitch-manager/internal/proto/v1/nvswitch-manager.proto
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go (1)

149-151: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

NVOS credential stored under incorrect MAC key.

Line 151 stores the NVOS credential using tray.BMC.MAC, but nm.Get retrieves NVOS credentials via tray.NVOS.MAC. Since newTestTray assigns distinct MACs to BMC (AA:BB:CC:DD:EE:FF) and NVOS (11:22:33:44:55:66), this test cannot pass as written.

Suggested fix
 			if tc.setupNVOSCred {
 				c := credential.New("nvos_admin", "nvos_pass")
-				require.NoError(t, nm.CredentialManager.PutNVOS(ctx, tray.BMC.MAC, &c))
+				require.NoError(t, nm.CredentialManager.PutNVOS(ctx, tray.NVOS.MAC, &c))
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go` around lines
149 - 151, The test stores the NVOS credential under the wrong MAC key (using
tray.BMC.MAC) which doesn't match where nvOS credentials are looked up
(tray.NVOS.MAC); update the credential storage call so PutNVOS uses
tray.NVOS.MAC instead of tray.BMC.MAC (change the call site that invokes
nm.CredentialManager.PutNVOS) so the key matches the lookup path used by
nm.Get/newTestTray.
powershelf-manager/pkg/credentials/vault.go (1)

190-203: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Only write on errCredentialNotFound; return other Get errors.

Line 200 still logs and overwrites on parse/transient Vault read failures. That can clobber an existing secret on the normal registration path, which still calls Put directly, and it makes Patch no safer than Put for error recovery.

Suggested fix
 	existing, err := m.Get(ctx, mac)
 	switch {
 	case err == nil && existing.Equal(cred):
 		log.Infof("PMC credentials for %s already exist and match; skipping write", mac)
 		return nil
 	case err == nil:
 		log.Warnf("PMC credentials for %s differ from existing; overwriting vault entry", mac)
 	case errors.Is(err, errCredentialNotFound):
 		// No existing secret; fall through to write.
 	default:
-		log.Warnf("PMC credentials for %s could not be read (%v); overwriting vault entry", mac, err)
+		return fmt.Errorf("checking existing PMC credentials for %s: %w", mac, err)
 	}
 
 	return m.write(ctx, mac, cred)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@powershelf-manager/pkg/credentials/vault.go` around lines 190 - 203, The
current logic overwrites the Vault entry on any non-nil Get error; change it so
only when errors.Is(err, errCredentialNotFound) we proceed to write, otherwise
return the error. Specifically, in the block around m.Get(ctx, mac) and the
existing.Equal(cred) checks, keep the early return when err == nil and contents
match, keep the overwrite path when err == nil and contents differ, but for any
other non-nil err (i.e., not errCredentialNotFound) return that error instead of
logging and calling m.write; only call m.write when errors.Is(err,
errCredentialNotFound).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nvswitch-manager/internal/proto/v1/nvswitch-manager.proto`:
- Around line 315-319: Update the comment for the field switch_uuid in the
response message (the field named "switch_uuid") to explicitly state that this
UUID is only populated for registered/UUID-based targets and may be empty for
target-based failures that occur before the switch is registered (e.g., failures
reported under QueueUpdatesRequest's best-effort contract or when returning
FirmwareUpdateInfo for a FirmwareTarget). Also add a short note advising clients
to rely on the status field (StatusCode status) and error message (string error)
rather than assuming switch_uuid is always present.
- Line 187: The proto comment for the field "Vendor vendor = 7" incorrectly
claims a default of VENDOR_NVIDIA; update the comment to state that under proto3
unset enum fields decode to the zero value (VENDOR_UNKNOWN) and there is no
field-level default. Additionally, if the API requires VENDOR_NVIDIA as the
effective default, add enforcement in the service validation routine (the
message validation function) or in the conversion function that maps
proto→internal types to either reject VENDOR_UNKNOWN with a validation error or
explicitly remap VENDOR_UNKNOWN to VENDOR_NVIDIA so the behavior matches the
documented contract.

---

Duplicate comments:
In `@nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go`:
- Around line 149-151: The test stores the NVOS credential under the wrong MAC
key (using tray.BMC.MAC) which doesn't match where nvOS credentials are looked
up (tray.NVOS.MAC); update the credential storage call so PutNVOS uses
tray.NVOS.MAC instead of tray.BMC.MAC (change the call site that invokes
nm.CredentialManager.PutNVOS) so the key matches the lookup path used by
nm.Get/newTestTray.

In `@powershelf-manager/pkg/credentials/vault.go`:
- Around line 190-203: The current logic overwrites the Vault entry on any
non-nil Get error; change it so only when errors.Is(err, errCredentialNotFound)
we proceed to write, otherwise return the error. Specifically, in the block
around m.Get(ctx, mac) and the existing.Equal(cred) checks, keep the early
return when err == nil and contents match, keep the overwrite path when err ==
nil and contents differ, but for any other non-nil err (i.e., not
errCredentialNotFound) return that error instead of logging and calling m.write;
only call m.write when errors.Is(err, errCredentialNotFound).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d955170a-bb71-4b7c-a1f0-ac7839f0790e

📥 Commits

Reviewing files that changed from the base of the PR and between 1061e78 and 542317c.

⛔ Files ignored due to path filters (4)
  • nvswitch-manager/internal/proto/v1/nvswitch-manager.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • nvswitch-manager/internal/proto/v1/nvswitch-manager_grpc.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • powershelf-manager/internal/proto/v1/powershelf-manager.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • powershelf-manager/internal/proto/v1/powershelf-manager_grpc.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
📒 Files selected for processing (32)
  • common/pkg/credential/credential.go
  • common/pkg/credential/credential_test.go
  • go.mod
  • nvswitch-manager/cmd/redfish.go
  • nvswitch-manager/cmd/serve.go
  • nvswitch-manager/cmd/ssh.go
  • nvswitch-manager/cmd/update_strategy.go
  • nvswitch-manager/internal/proto/v1/nvswitch-manager.proto
  • nvswitch-manager/internal/service/config.go
  • nvswitch-manager/internal/service/server_impl.go
  • nvswitch-manager/internal/service/server_impl_test.go
  • nvswitch-manager/internal/service/service.go
  • nvswitch-manager/pkg/common/credential/credential.go
  • nvswitch-manager/pkg/common/secretstring/secretstring.go
  • nvswitch-manager/pkg/common/secretstring/secretstring_test.go
  • nvswitch-manager/pkg/credentials/credential_manager.go
  • nvswitch-manager/pkg/credentials/inmemory.go
  • nvswitch-manager/pkg/credentials/inmemory_test.go
  • nvswitch-manager/pkg/credentials/vault.go
  • nvswitch-manager/pkg/credentials/vault_test.go
  • nvswitch-manager/pkg/db/config.go
  • nvswitch-manager/pkg/db/config_test.go
  • nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go
  • nvswitch-manager/pkg/objects/bmc/bmc.go
  • nvswitch-manager/pkg/objects/nvos/nvos.go
  • powershelf-manager/internal/proto/v1/powershelf-manager.proto
  • powershelf-manager/internal/service/server_impl.go
  • powershelf-manager/internal/service/server_impl_test.go
  • powershelf-manager/pkg/credentials/inmemory.go
  • powershelf-manager/pkg/credentials/inmemory_test.go
  • powershelf-manager/pkg/credentials/vault.go
  • powershelf-manager/pkg/credentials/vault_test.go
💤 Files with no reviewable changes (3)
  • nvswitch-manager/pkg/common/secretstring/secretstring_test.go
  • nvswitch-manager/pkg/common/secretstring/secretstring.go
  • nvswitch-manager/pkg/common/credential/credential.go
✅ Files skipped from review due to trivial changes (10)
  • nvswitch-manager/cmd/redfish.go
  • common/pkg/credential/credential.go
  • nvswitch-manager/internal/service/config.go
  • nvswitch-manager/pkg/db/config.go
  • go.mod
  • nvswitch-manager/cmd/update_strategy.go
  • nvswitch-manager/cmd/ssh.go
  • nvswitch-manager/pkg/objects/bmc/bmc.go
  • nvswitch-manager/pkg/objects/nvos/nvos.go
  • nvswitch-manager/pkg/credentials/credential_manager.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • nvswitch-manager/cmd/serve.go
  • nvswitch-manager/internal/service/service.go
  • common/pkg/credential/credential_test.go
  • powershelf-manager/pkg/credentials/inmemory.go
  • powershelf-manager/internal/service/server_impl.go
  • nvswitch-manager/internal/service/server_impl_test.go
  • nvswitch-manager/internal/service/server_impl.go

string nvos_ip = 4; // NVOS SSH IP (can be same as bmc_ip)
Credentials nvos_credentials = 5;
int32 nvos_port = 6; // 0 = default (22)
Vendor vendor = 7; // Default: VENDOR_NVIDIA
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether the service explicitly remaps FirmwareTarget vendor=VENDOR_UNKNOWN to VENDOR_NVIDIA.
rg -nP --type=go -C3 '\bfirmwareTargetToRegisterRequest\b|\btarget\.Vendor\b|\bVENDOR_UNKNOWN\b|\bVENDOR_NVIDIA\b'

Repository: NVIDIA/infra-controller-rest

Length of output: 7498


🏁 Script executed:

# Check if there's validation or initialization of Vendor before firmwareTargetToRegisterRequest is called
rg -nP --type=go -B5 'firmwareTargetToRegisterRequest\(target\)' nvswitch-manager/internal/service/server_impl.go | head -40

Repository: NVIDIA/infra-controller-rest

Length of output: 258


🏁 Script executed:

# Look for where FirmwareTarget.Vendor might be validated or defaulted
rg -nP --type=go 'target.*Vendor|FirmwareTarget.*validate|vendor.*==' nvswitch-manager/internal/service/

Repository: NVIDIA/infra-controller-rest

Length of output: 280


🏁 Script executed:

# Check the proto file directly to see the full field definition context
sed -n '175,195p' nvswitch-manager/internal/proto/v1/nvswitch-manager.proto

Repository: NVIDIA/infra-controller-rest

Length of output: 998


🏁 Script executed:

# Check the validateFirmwareTarget function to see if it validates/defaults the Vendor field
sed -n '542,570p' nvswitch-manager/internal/service/server_impl.go

Repository: NVIDIA/infra-controller-rest

Length of output: 1272


🏁 Script executed:

# Verify proto3 enum default semantics by checking the proto spec and how the generated code handles unset enums
cat nvswitch-manager/internal/proto/v1/nvswitch-manager.pb.go | grep -A 10 "type FirmwareTarget struct"

Repository: NVIDIA/infra-controller-rest

Length of output: 1516


Proto comment misrepresents enum default semantics.

The comment at line 187, // Default: VENDOR_NVIDIA, is inaccurate under proto3 semantics. Proto3 provides no mechanism for field-level defaults; unset enum fields unconditionally deserialize to the zero value (VENDOR_UNKNOWN). The service validation function does not enforce a default, nor does the conversion function remap VENDOR_UNKNOWN to VENDOR_NVIDIA. This creates an API contract mismatch: clients omitting the vendor field will receive VENDOR_UNKNOWN in the deserialized message, contradicting the documented default.

Correct the comment to reflect proto3 behavior:

-    Vendor vendor = 7;           // Default: VENDOR_NVIDIA
+    Vendor vendor = 7;           // Proto3 default: VENDOR_UNKNOWN (0); set explicitly to override.

If VENDOR_NVIDIA is required, add validation to reject VENDOR_UNKNOWN.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Vendor vendor = 7; // Default: VENDOR_NVIDIA
Vendor vendor = 7; // Proto3 default: VENDOR_UNKNOWN (0); set explicitly to override.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nvswitch-manager/internal/proto/v1/nvswitch-manager.proto` at line 187, The
proto comment for the field "Vendor vendor = 7" incorrectly claims a default of
VENDOR_NVIDIA; update the comment to state that under proto3 unset enum fields
decode to the zero value (VENDOR_UNKNOWN) and there is no field-level default.
Additionally, if the API requires VENDOR_NVIDIA as the effective default, add
enforcement in the service validation routine (the message validation function)
or in the conversion function that maps proto→internal types to either reject
VENDOR_UNKNOWN with a validation error or explicitly remap VENDOR_UNKNOWN to
VENDOR_NVIDIA so the behavior matches the documented contract.

Comment on lines +315 to +319
string switch_uuid = 1; // UUID of the switch
StatusCode status = 2; // SUCCESS or error; per-item outcome under QueueUpdatesRequest's best-effort contract
string error = 3; // Error message if status != SUCCESS
repeated FirmwareUpdateInfo updates = 4; // Queued updates if successful
string bmc_ip = 5; // Set for FirmwareTarget responses; empty for UUID-based
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify switch_uuid availability for target-based failures.

At Line 315, switch_uuid is described unconditionally as “UUID of the switch,” but for targets that fail before registration, it remains empty. Tightening this comment will prevent client-side parsing assumptions.

Suggested comment clarification
-    string switch_uuid = 1;                  // UUID of the switch
+    string switch_uuid = 1;                  // UUID of the switch; empty when a target fails before auto-registration

As per coding guidelines, "**/*.proto: Review the Protobuf definitions, point out issues relative to compatibility, and expressiveness."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
string switch_uuid = 1; // UUID of the switch
StatusCode status = 2; // SUCCESS or error; per-item outcome under QueueUpdatesRequest's best-effort contract
string error = 3; // Error message if status != SUCCESS
repeated FirmwareUpdateInfo updates = 4; // Queued updates if successful
string bmc_ip = 5; // Set for FirmwareTarget responses; empty for UUID-based
string switch_uuid = 1; // UUID of the switch; empty when a target fails before auto-registration
StatusCode status = 2; // SUCCESS or error; per-item outcome under QueueUpdatesRequest's best-effort contract
string error = 3; // Error message if status != SUCCESS
repeated FirmwareUpdateInfo updates = 4; // Queued updates if successful
string bmc_ip = 5; // Set for FirmwareTarget responses; empty for UUID-based
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nvswitch-manager/internal/proto/v1/nvswitch-manager.proto` around lines 315 -
319, Update the comment for the field switch_uuid in the response message (the
field named "switch_uuid") to explicitly state that this UUID is only populated
for registered/UUID-based targets and may be empty for target-based failures
that occur before the switch is registered (e.g., failures reported under
QueueUpdatesRequest's best-effort contract or when returning FirmwareUpdateInfo
for a FirmwareTarget). Also add a short note advising clients to rely on the
status field (StatusCode status) and error message (string error) rather than
assuming switch_uuid is always present.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go (1)

149-151: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Store the NVOS credential under the NVOS MAC.

Line 151 still seeds the NVOS credential with tray.BMC.MAC, so the success case does not populate the NVOS lookup key it is meant to exercise. Use tray.NVOS.MAC here.

Proposed fix
 			if tc.setupNVOSCred {
 				c := credential.New("nvos_admin", "nvos_pass")
-				require.NoError(t, nm.CredentialManager.PutNVOS(ctx, tray.BMC.MAC, &c))
+				require.NoError(t, nm.CredentialManager.PutNVOS(ctx, tray.NVOS.MAC, &c))
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go` around lines
149 - 151, In the test setup block inside nvswitchmanager_test.go (the
tc.setupNVOSCred branch) you are seeding the NVOS credential under the wrong
key; change the PutNVOS call that currently uses tray.BMC.MAC to use
tray.NVOS.MAC so the NVOS lookup key is populated for the success case (the call
to nm.CredentialManager.PutNVOS with the credential created by credential.New
should store under tray.NVOS.MAC).
powershelf-manager/pkg/credentials/vault.go (1)

190-203: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not treat unreadable existing secrets as writable.

Put still falls through to m.write for every Get error except errCredentialNotFound. That turns transient Vault failures, ACL regressions, or malformed stored secrets into blind overwrites on the normal registration path. Patch already provides the force-replace path; Put should return the read error here.

Proposed fix
 	existing, err := m.Get(ctx, mac)
 	switch {
 	case err == nil && existing.Equal(cred):
 		log.Infof("PMC credentials for %s already exist and match; skipping write", mac)
 		return nil
 	case err == nil:
 		log.Warnf("PMC credentials for %s differ from existing; overwriting vault entry", mac)
 	case errors.Is(err, errCredentialNotFound):
 		// No existing secret; fall through to write.
 	default:
-		log.Warnf("PMC credentials for %s could not be read (%v); overwriting vault entry", mac, err)
+		return err
 	}
 
 	return m.write(ctx, mac, cred)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@powershelf-manager/pkg/credentials/vault.go` around lines 190 - 203, In Put,
don't treat arbitrary Get errors as a signal to overwrite: when m.Get(ctx, mac)
returns an error other than errCredentialNotFound, return that error instead of
calling m.write; only fall through to write for the not-found case
(errors.Is(err, errCredentialNotFound)) or when Get succeeds and values differ
(if Put semantics require overwrite) — keep the existing force-replace behavior
in Patch, and update the control flow around m.Get, errCredentialNotFound,
m.write, and any log messages in Put to return the read error for
transient/ACL/malformed-secret failures.
nvswitch-manager/pkg/credentials/vault.go (1)

247-260: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate Vault read failures instead of overwriting through them.

shouldSkipWrite returns false for any unreadable-existing-secret case, so PutBMC and PutNVOS overwrite after transient Vault failures or corrupted payloads. That makes the normal registration path behave like a forced rotation path. Keep overwrite semantics in Patch* and return the read error here.

Proposed fix
-func (m *VaultCredentialManager) shouldSkipWrite(ctx context.Context, key, kind string, mac net.HardwareAddr, cred *credential.Credential) bool {
+func (m *VaultCredentialManager) shouldSkipWrite(ctx context.Context, key, kind string, mac net.HardwareAddr, cred *credential.Credential) (bool, error) {
 	existing, err := m.get(ctx, key)
 	switch {
 	case err == nil && existing.Equal(cred):
 		log.Infof("%s credentials for %s already exist and match; skipping write", kind, mac)
-		return true
+		return true, nil
 	case err == nil:
 		log.Warnf("%s credentials for %s differ from existing; overwriting vault entry", kind, mac)
+		return false, nil
 	case errors.Is(err, errCredentialNotFound):
-		// No existing secret; fall through to write.
+		return false, nil
 	default:
-		log.Warnf("%s credentials for %s could not be read (%v); overwriting vault entry", kind, mac, err)
+		return false, err
 	}
-	return false
 }
@@
 	key := m.getBMCCredentialKey(mac)
-	if m.shouldSkipWrite(ctx, key, "BMC", mac, cred) {
+	skip, err := m.shouldSkipWrite(ctx, key, "BMC", mac, cred)
+	if err != nil {
+		return err
+	}
+	if skip {
 		return nil
 	}
 	return m.put(ctx, key, cred)
@@
 	key := m.getNVOSCredentialKey(mac)
-	if m.shouldSkipWrite(ctx, key, "NVOS", mac, cred) {
+	skip, err := m.shouldSkipWrite(ctx, key, "NVOS", mac, cred)
+	if err != nil {
+		return err
+	}
+	if skip {
 		return nil
 	}
 	return m.put(ctx, key, cred)
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go`:
- Around line 149-151: In the test setup block inside nvswitchmanager_test.go
(the tc.setupNVOSCred branch) you are seeding the NVOS credential under the
wrong key; change the PutNVOS call that currently uses tray.BMC.MAC to use
tray.NVOS.MAC so the NVOS lookup key is populated for the success case (the call
to nm.CredentialManager.PutNVOS with the credential created by credential.New
should store under tray.NVOS.MAC).

In `@powershelf-manager/pkg/credentials/vault.go`:
- Around line 190-203: In Put, don't treat arbitrary Get errors as a signal to
overwrite: when m.Get(ctx, mac) returns an error other than
errCredentialNotFound, return that error instead of calling m.write; only fall
through to write for the not-found case (errors.Is(err, errCredentialNotFound))
or when Get succeeds and values differ (if Put semantics require overwrite) —
keep the existing force-replace behavior in Patch, and update the control flow
around m.Get, errCredentialNotFound, m.write, and any log messages in Put to
return the read error for transient/ACL/malformed-secret failures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 61d1d4c0-4dc3-413a-a54e-d8793a506548

📥 Commits

Reviewing files that changed from the base of the PR and between 542317c and 54994bf.

⛔ Files ignored due to path filters (4)
  • nvswitch-manager/internal/proto/v1/nvswitch-manager.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • nvswitch-manager/internal/proto/v1/nvswitch-manager_grpc.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • powershelf-manager/internal/proto/v1/powershelf-manager.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • powershelf-manager/internal/proto/v1/powershelf-manager_grpc.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
📒 Files selected for processing (32)
  • common/pkg/credential/credential.go
  • common/pkg/credential/credential_test.go
  • go.mod
  • nvswitch-manager/cmd/redfish.go
  • nvswitch-manager/cmd/serve.go
  • nvswitch-manager/cmd/ssh.go
  • nvswitch-manager/cmd/update_strategy.go
  • nvswitch-manager/internal/proto/v1/nvswitch-manager.proto
  • nvswitch-manager/internal/service/config.go
  • nvswitch-manager/internal/service/server_impl.go
  • nvswitch-manager/internal/service/server_impl_test.go
  • nvswitch-manager/internal/service/service.go
  • nvswitch-manager/pkg/common/credential/credential.go
  • nvswitch-manager/pkg/common/secretstring/secretstring.go
  • nvswitch-manager/pkg/common/secretstring/secretstring_test.go
  • nvswitch-manager/pkg/credentials/credential_manager.go
  • nvswitch-manager/pkg/credentials/inmemory.go
  • nvswitch-manager/pkg/credentials/inmemory_test.go
  • nvswitch-manager/pkg/credentials/vault.go
  • nvswitch-manager/pkg/credentials/vault_test.go
  • nvswitch-manager/pkg/db/config.go
  • nvswitch-manager/pkg/db/config_test.go
  • nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go
  • nvswitch-manager/pkg/objects/bmc/bmc.go
  • nvswitch-manager/pkg/objects/nvos/nvos.go
  • powershelf-manager/internal/proto/v1/powershelf-manager.proto
  • powershelf-manager/internal/service/server_impl.go
  • powershelf-manager/internal/service/server_impl_test.go
  • powershelf-manager/pkg/credentials/inmemory.go
  • powershelf-manager/pkg/credentials/inmemory_test.go
  • powershelf-manager/pkg/credentials/vault.go
  • powershelf-manager/pkg/credentials/vault_test.go
💤 Files with no reviewable changes (3)
  • nvswitch-manager/pkg/common/secretstring/secretstring_test.go
  • nvswitch-manager/pkg/common/secretstring/secretstring.go
  • nvswitch-manager/pkg/common/credential/credential.go
✅ Files skipped from review due to trivial changes (11)
  • nvswitch-manager/pkg/credentials/credential_manager.go
  • nvswitch-manager/cmd/redfish.go
  • go.mod
  • nvswitch-manager/pkg/objects/bmc/bmc.go
  • nvswitch-manager/internal/service/config.go
  • nvswitch-manager/cmd/ssh.go
  • nvswitch-manager/pkg/db/config.go
  • nvswitch-manager/cmd/update_strategy.go
  • nvswitch-manager/pkg/objects/nvos/nvos.go
  • nvswitch-manager/internal/service/server_impl.go
  • nvswitch-manager/pkg/credentials/inmemory.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • powershelf-manager/pkg/credentials/vault_test.go
  • powershelf-manager/pkg/credentials/inmemory.go
  • nvswitch-manager/internal/service/service.go
  • common/pkg/credential/credential_test.go
  • powershelf-manager/pkg/credentials/inmemory_test.go
  • powershelf-manager/internal/proto/v1/powershelf-manager.proto
  • powershelf-manager/internal/service/server_impl.go

@spydaNVIDIA spydaNVIDIA force-pushed the feat/pyda-fw branch 4 times, most recently from d242df5 to 8985bbd Compare May 1, 2026 21:38
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
powershelf-manager/pkg/credentials/vault.go (1)

190-203: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Do not treat unreadable existing secrets as safe-to-overwrite.

Put still drops into m.write for any non-errCredentialNotFound failure from Get. A transient Vault read failure or malformed stored secret can therefore be converted into a successful overwrite, which hides backend health problems and destroys the previous state. Only the typed not-found case should fall through to write; all other Get errors should be returned, with Patch kept as the explicit overwrite path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@powershelf-manager/pkg/credentials/vault.go` around lines 190 - 203, The Put
flow currently calls m.write for any Get error except errCredentialNotFound;
change Put so that after calling m.Get(ctx, mac) it only falls through to
m.write when errors.Is(err, errCredentialNotFound); if err == nil and
existing.Equal(cred) return nil; if err == nil and values differ log and return
a non-nil conflict/error directing callers to use Patch for overwrite; for any
other non-nil err (not errCredentialNotFound) return that error instead of
calling m.write. Update the logic referencing m.Get, existing.Equal,
errCredentialNotFound, m.write and the Put function accordingly.
nvswitch-manager/pkg/credentials/vault.go (1)

247-260: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Return non-not found Vault read errors instead of overwriting through them.

shouldSkipWrite still falls through to m.put on parse failures and transient Vault read errors. That means PutBMC/PutNVOS can return success after overwriting an entry whose prior state could not be determined, which masks Vault health issues and can destroy the last readable credential. Only the typed errCredentialNotFound case should proceed to write; every other error should bubble up, and callers that truly want blind replacement should use PatchBMC/PatchNVOS.

Suggested direction
-func (m *VaultCredentialManager) shouldSkipWrite(ctx context.Context, key, kind string, mac net.HardwareAddr, cred *credential.Credential) bool {
+func (m *VaultCredentialManager) shouldSkipWrite(ctx context.Context, key, kind string, mac net.HardwareAddr, cred *credential.Credential) (bool, error) {
 	existing, err := m.get(ctx, key)
 	switch {
 	case err == nil && existing.Equal(cred):
 		log.Infof("%s credentials for %s already exist and match; skipping write", kind, mac)
-		return true
+		return true, nil
 	case err == nil:
 		log.Warnf("%s credentials for %s differ from existing; overwriting vault entry", kind, mac)
+		return false, nil
 	case errors.Is(err, errCredentialNotFound):
-		// No existing secret; fall through to write.
+		return false, nil
 	default:
-		log.Warnf("%s credentials for %s could not be read (%v); overwriting vault entry", kind, mac, err)
+		return false, err
 	}
-	return false
 }
 
 func (m *VaultCredentialManager) PutBMC(ctx context.Context, mac net.HardwareAddr, cred *credential.Credential) error {
 	key := m.getBMCCredentialKey(mac)
-	if m.shouldSkipWrite(ctx, key, "BMC", mac, cred) {
+	skip, err := m.shouldSkipWrite(ctx, key, "BMC", mac, cred)
+	if err != nil {
+		return err
+	}
+	if skip {
 		return nil
 	}
 	return m.put(ctx, key, cred)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nvswitch-manager/pkg/credentials/vault.go` around lines 247 - 260, The
current shouldSkipWrite silently falls through to overwrite on any Vault read
error; change its contract and callers so only the typed errCredentialNotFound
leads to a write and all other read errors bubble up. Specifically: change
shouldSkipWrite(ctx, key, kind, mac, cred) to return (bool, error); keep the
true return when existing.Equal(cred); when m.get returns errCredentialNotFound
return (false, nil); when m.get returns any other non-nil error return (false,
err) (do not log-and-fall-through); and return (false, nil) only when you want
the caller to proceed to m.put. Update callers PutBMC/PutNVOS to propagate the
error returned by shouldSkipWrite (so they don’t blind-overwrite on Vault
errors) and document that blind replacement remains available via
PatchBMC/PatchNVOS; update any unit tests to assert the new error-propagation
behavior.
🧹 Nitpick comments (1)
nvswitch-manager/internal/service/server_impl_test.go (1)

117-210: ⚡ Quick win

Add edge-case coverage for nil targets and port bounds.

These tests lock the happy path and several field-level failures, but they still miss the two cases most likely to break the new targets flow: a nil *pb.FirmwareTarget element and a port outside 1..65535 before firmwareTargetToRegisterRequest casts to int32. Adding those cases would guard the new unregistered-device path much more tightly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nvswitch-manager/internal/service/server_impl_test.go` around lines 117 -
210, Add tests that cover a nil *pb.FirmwareTarget and out-of-range port values:
call validateFirmwareTarget(nil) and assert it returns an error, and add
firmwareTargetToRegisterRequest cases where Bmc.Port or Nvos.Port are 0 and
65536 (or otherwise outside 1..65535) and assert the function either returns an
error or that the code under test validates/handles the bounds before casting to
int32; if the production helpers (validateFirmwareTarget,
firmwareTargetToRegisterRequest) do not currently guard these, update them to
return a clear error for a nil target and for ports outside 1..65535 so your new
tests pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@powershelf-manager/internal/service/server_impl.go`:
- Around line 235-257: The code currently calls registerPowershelf() before
validating the component payload; add a validation-only pass on
targetReq.Components (check for empty slice, nil upgrade_to, and unsupported
component types) immediately after validateFirmwareTarget(target) and before
calling registerPowershelf(), and if validation fails set resp.Components =
fanOutComponentError(targetReq.Components, pb.StatusCode_INVALID_ARGUMENT,
"<clear error>") and append resp to responses without calling
registerPowershelf() or persisting anything; re-use/implement a small helper
like validateComponentRequests(targetReq.Components) and keep existing
upgradeComponents(ctx, target.PmcMacAddress, ...) and registerPowershelf(...)
calls only for fully validated targets.

---

Duplicate comments:
In `@nvswitch-manager/pkg/credentials/vault.go`:
- Around line 247-260: The current shouldSkipWrite silently falls through to
overwrite on any Vault read error; change its contract and callers so only the
typed errCredentialNotFound leads to a write and all other read errors bubble
up. Specifically: change shouldSkipWrite(ctx, key, kind, mac, cred) to return
(bool, error); keep the true return when existing.Equal(cred); when m.get
returns errCredentialNotFound return (false, nil); when m.get returns any other
non-nil error return (false, err) (do not log-and-fall-through); and return
(false, nil) only when you want the caller to proceed to m.put. Update callers
PutBMC/PutNVOS to propagate the error returned by shouldSkipWrite (so they don’t
blind-overwrite on Vault errors) and document that blind replacement remains
available via PatchBMC/PatchNVOS; update any unit tests to assert the new
error-propagation behavior.

In `@powershelf-manager/pkg/credentials/vault.go`:
- Around line 190-203: The Put flow currently calls m.write for any Get error
except errCredentialNotFound; change Put so that after calling m.Get(ctx, mac)
it only falls through to m.write when errors.Is(err, errCredentialNotFound); if
err == nil and existing.Equal(cred) return nil; if err == nil and values differ
log and return a non-nil conflict/error directing callers to use Patch for
overwrite; for any other non-nil err (not errCredentialNotFound) return that
error instead of calling m.write. Update the logic referencing m.Get,
existing.Equal, errCredentialNotFound, m.write and the Put function accordingly.

---

Nitpick comments:
In `@nvswitch-manager/internal/service/server_impl_test.go`:
- Around line 117-210: Add tests that cover a nil *pb.FirmwareTarget and
out-of-range port values: call validateFirmwareTarget(nil) and assert it returns
an error, and add firmwareTargetToRegisterRequest cases where Bmc.Port or
Nvos.Port are 0 and 65536 (or otherwise outside 1..65535) and assert the
function either returns an error or that the code under test validates/handles
the bounds before casting to int32; if the production helpers
(validateFirmwareTarget, firmwareTargetToRegisterRequest) do not currently guard
these, update them to return a clear error for a nil target and for ports
outside 1..65535 so your new tests pass.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 09d9745a-5df1-4f23-9bcf-aa5591e651c8

📥 Commits

Reviewing files that changed from the base of the PR and between 54994bf and d242df5.

⛔ Files ignored due to path filters (4)
  • nvswitch-manager/internal/proto/v1/nvswitch-manager.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • nvswitch-manager/internal/proto/v1/nvswitch-manager_grpc.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • powershelf-manager/internal/proto/v1/powershelf-manager.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • powershelf-manager/internal/proto/v1/powershelf-manager_grpc.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
📒 Files selected for processing (32)
  • common/pkg/credential/credential.go
  • common/pkg/credential/credential_test.go
  • go.mod
  • nvswitch-manager/cmd/redfish.go
  • nvswitch-manager/cmd/serve.go
  • nvswitch-manager/cmd/ssh.go
  • nvswitch-manager/cmd/update_strategy.go
  • nvswitch-manager/internal/proto/v1/nvswitch-manager.proto
  • nvswitch-manager/internal/service/config.go
  • nvswitch-manager/internal/service/server_impl.go
  • nvswitch-manager/internal/service/server_impl_test.go
  • nvswitch-manager/internal/service/service.go
  • nvswitch-manager/pkg/common/credential/credential.go
  • nvswitch-manager/pkg/common/secretstring/secretstring.go
  • nvswitch-manager/pkg/common/secretstring/secretstring_test.go
  • nvswitch-manager/pkg/credentials/credential_manager.go
  • nvswitch-manager/pkg/credentials/inmemory.go
  • nvswitch-manager/pkg/credentials/inmemory_test.go
  • nvswitch-manager/pkg/credentials/vault.go
  • nvswitch-manager/pkg/credentials/vault_test.go
  • nvswitch-manager/pkg/db/config.go
  • nvswitch-manager/pkg/db/config_test.go
  • nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go
  • nvswitch-manager/pkg/objects/bmc/bmc.go
  • nvswitch-manager/pkg/objects/nvos/nvos.go
  • powershelf-manager/internal/proto/v1/powershelf-manager.proto
  • powershelf-manager/internal/service/server_impl.go
  • powershelf-manager/internal/service/server_impl_test.go
  • powershelf-manager/pkg/credentials/inmemory.go
  • powershelf-manager/pkg/credentials/inmemory_test.go
  • powershelf-manager/pkg/credentials/vault.go
  • powershelf-manager/pkg/credentials/vault_test.go
💤 Files with no reviewable changes (3)
  • nvswitch-manager/pkg/common/secretstring/secretstring_test.go
  • nvswitch-manager/pkg/common/credential/credential.go
  • nvswitch-manager/pkg/common/secretstring/secretstring.go
✅ Files skipped from review due to trivial changes (7)
  • nvswitch-manager/cmd/update_strategy.go
  • nvswitch-manager/internal/service/config.go
  • nvswitch-manager/pkg/db/config.go
  • nvswitch-manager/pkg/objects/nvos/nvos.go
  • nvswitch-manager/pkg/credentials/credential_manager.go
  • nvswitch-manager/pkg/objects/bmc/bmc.go
  • nvswitch-manager/cmd/ssh.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • go.mod
  • nvswitch-manager/cmd/redfish.go
  • nvswitch-manager/pkg/nvswitchmanager/nvswitchmanager_test.go
  • nvswitch-manager/pkg/db/config_test.go
  • nvswitch-manager/internal/service/server_impl.go
  • nvswitch-manager/pkg/credentials/vault_test.go
  • nvswitch-manager/internal/proto/v1/nvswitch-manager.proto

Comment on lines +235 to +257
// Direct path: auto-register unregistered targets, then upgrade
for _, targetReq := range req.Targets {
target := targetReq.Target
resp := &pb.UpdatePowershelfFirmwareResponse{
PmcIp: target.GetPmcIpAddress(),
}

if err := validateFirmwareTarget(target); err != nil {
resp.Components = fanOutComponentError(targetReq.Components, pb.StatusCode_INVALID_ARGUMENT, err.Error())
responses = append(responses, resp)
continue
}

regResp := s.registerPowershelf(ctx, firmwareTargetToRegisterRequest(target))
if regResp.Status != pb.StatusCode_SUCCESS {
resp.Components = fanOutComponentError(targetReq.Components, regResp.Status,
fmt.Sprintf("failed to register target: %s", regResp.Error))
responses = append(responses, resp)
continue
}
resp.PmcMacAddress = regResp.PmcMacAddress
resp.Components = s.upgradeComponents(ctx, target.PmcMacAddress, targetReq.Components)
responses = append(responses, resp)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate target component requests before auto-registering the device.

registerPowershelf() runs before the component payload is known to be executable. A target request with an empty components list, a nil upgrade_to, or an unsupported component will still register/persist the device and then return only INVALID_ARGUMENT component results. That turns a bad firmware request into a state-changing operation.

Please add a validation-only pass for targetReq.Components before calling registerPowershelf(), and reject empty component lists explicitly so invalid requests cannot mutate inventory/credentials.

Suggested direction
+func validateUpdateComponents(components []*pb.UpdateComponentFirmwareRequest) error {
+	if len(components) == 0 {
+		return fmt.Errorf("at least one component is required")
+	}
+	for _, c := range components {
+		if c == nil {
+			return fmt.Errorf("component request is required")
+		}
+		if c.UpgradeTo == nil {
+			return fmt.Errorf("upgrade_to firmware version is required")
+		}
+		if c.Component != pb.PowershelfComponent_PMC {
+			return fmt.Errorf("PSM does not support upgrading %s component", c.Component.String())
+		}
+	}
+	return nil
+}
+
 // Direct path: auto-register unregistered targets, then upgrade
 for _, targetReq := range req.Targets {
 	target := targetReq.Target
 	resp := &pb.UpdatePowershelfFirmwareResponse{
 		PmcIp: target.GetPmcIpAddress(),
 	}
 
 	if err := validateFirmwareTarget(target); err != nil {
 		resp.Components = fanOutComponentError(targetReq.Components, pb.StatusCode_INVALID_ARGUMENT, err.Error())
 		responses = append(responses, resp)
 		continue
 	}
+	if err := validateUpdateComponents(targetReq.Components); err != nil {
+		return nil, status.Error(codes.InvalidArgument, err.Error())
+	}
 
 	regResp := s.registerPowershelf(ctx, firmwareTargetToRegisterRequest(target))
 	...
 }

Also applies to: 282-332

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@powershelf-manager/internal/service/server_impl.go` around lines 235 - 257,
The code currently calls registerPowershelf() before validating the component
payload; add a validation-only pass on targetReq.Components (check for empty
slice, nil upgrade_to, and unsupported component types) immediately after
validateFirmwareTarget(target) and before calling registerPowershelf(), and if
validation fails set resp.Components =
fanOutComponentError(targetReq.Components, pb.StatusCode_INVALID_ARGUMENT,
"<clear error>") and append resp to responses without calling
registerPowershelf() or persisting anything; re-use/implement a small helper
like validateComponentRequests(targetReq.Components) and keep existing
upgradeComponents(ctx, target.PmcMacAddress, ...) and registerPowershelf(...)
calls only for fully validated targets.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
powershelf-manager/pkg/credentials/vault_test.go (1)

132-142: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Abort helper setup failures instead of returning a nil manager.

assert.NoError / assert.NotNil allow the test to continue after cfg.NewManager() fails, which can turn the real setup error into a later nil-deref or misleading assertion. Use require here so the subtest fails fast.

Suggested fix
 import (
@@
-	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
 )
@@
 func newManagerWithServer(t *testing.T, srv *httptest.Server) *VaultCredentialManager {
 	t.Helper()
@@
-	assert.NoError(t, cfg.Validate())
+	require.NoError(t, cfg.Validate())
 	mgr, err := cfg.NewManager()
-	assert.NoError(t, err)
-	assert.NotNil(t, mgr)
+	require.NoError(t, err)
+	require.NotNil(t, mgr)
 	return mgr
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@powershelf-manager/pkg/credentials/vault_test.go` around lines 132 - 142, In
newManagerWithServer the test uses assert.NoError and assert.NotNil which can
let execution continue after setup failures; change those to require.NoError(t,
cfg.Validate()), require.NoError(t, err) and require.NotNil(t, mgr) (or combine
checks) so the subtest aborts immediately on failures when calling
VaultConfig.Validate() and cfg.NewManager() and when verifying mgr in
newManagerWithServer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nvswitch-manager/internal/service/server_impl.go`:
- Around line 526-530: The multi-item update branch currently maps a QueueUpdate
failure to result.Status = pb.StatusCode_INTERNAL_ERROR, which is inconsistent
with the single-item path that treats queueing failures as INVALID_ARGUMENT;
update the multi-item path (where s.fwm.QueueUpdate is called and the variable
updates is returned) to use the same per-item status mapping as the single-item
flow: set result.Status to pb.StatusCode_INVALID_ARGUMENT (and set result.Error
with the formatted error message) or, if result contains per-item entries,
iterate the updates/results and mark each item's status as INVALID_ARGUMENT with
the error text so the client sees the same failure semantics as the single-item
path.

In `@powershelf-manager/pkg/credentials/inmemory_test.go`:
- Around line 310-328: The idempotent Put test currently only checks values, not
that Put was a true no-op; update TestInMemoryPutIdempotentAndOverwrite to
create a credential instance (e.g., orig := newCred("admin","secret")), call
mgr.Put with orig, call Put again with the same orig, then after the second Put
call mgr.Get and assert that the returned credential pointer/value is the same
instance as orig (pointer equality) rather than just equal fields; use
NewInMemoryCredentialManager, Put, Get and newCred as the referenced symbols to
locate where to add the pointer-equality assertion and preserve the existing
overwrite checks.

In `@powershelf-manager/pkg/credentials/vault_test.go`:
- Around line 357-395: The test TestVaultManager_PutUpsertSemantics currently
only asserts final state and can’t detect whether the second identical Put was a
no-op; modify the fakeVaultKVServer (type fakeVaultKVServer and its handler())
to expose a simple request/write counter or spy for Put/Write operations (e.g.,
increment a PutWriteCount when the handler receives a write request), then
update the test to assert that after the first Put the counter is 1 and after
the second identical Put the counter remains 1 (while other operations like the
differing Put and Patch increment it as expected), keeping all existing
Get/Patch/Put calls and assertions but adding these counter checks to pin the
idempotent Put behavior.

---

Outside diff comments:
In `@powershelf-manager/pkg/credentials/vault_test.go`:
- Around line 132-142: In newManagerWithServer the test uses assert.NoError and
assert.NotNil which can let execution continue after setup failures; change
those to require.NoError(t, cfg.Validate()), require.NoError(t, err) and
require.NotNil(t, mgr) (or combine checks) so the subtest aborts immediately on
failures when calling VaultConfig.Validate() and cfg.NewManager() and when
verifying mgr in newManagerWithServer.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 93c08e8f-e16f-47fc-ab13-fb3de4615cc2

📥 Commits

Reviewing files that changed from the base of the PR and between d242df5 and 8985bbd.

⛔ Files ignored due to path filters (3)
  • nvswitch-manager/internal/proto/v1/nvswitch-manager.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • powershelf-manager/internal/proto/v1/powershelf-manager.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
  • powershelf-manager/internal/proto/v1/powershelf-manager_grpc.pb.go is excluded by !**/*.pb.go, !**/*.pb.go
📒 Files selected for processing (16)
  • go.mod
  • nvswitch-manager/internal/proto/v1/nvswitch-manager.proto
  • nvswitch-manager/internal/service/server_impl.go
  • nvswitch-manager/internal/service/server_impl_test.go
  • nvswitch-manager/internal/service/service.go
  • nvswitch-manager/pkg/credentials/inmemory.go
  • nvswitch-manager/pkg/credentials/inmemory_test.go
  • nvswitch-manager/pkg/credentials/vault.go
  • nvswitch-manager/pkg/credentials/vault_test.go
  • powershelf-manager/internal/proto/v1/powershelf-manager.proto
  • powershelf-manager/internal/service/server_impl.go
  • powershelf-manager/internal/service/server_impl_test.go
  • powershelf-manager/pkg/credentials/inmemory.go
  • powershelf-manager/pkg/credentials/inmemory_test.go
  • powershelf-manager/pkg/credentials/vault.go
  • powershelf-manager/pkg/credentials/vault_test.go
✅ Files skipped from review due to trivial changes (1)
  • go.mod
🚧 Files skipped from review as they are similar to previous changes (3)
  • powershelf-manager/pkg/credentials/inmemory.go
  • powershelf-manager/pkg/credentials/vault.go
  • powershelf-manager/internal/service/server_impl.go

Comment on lines +526 to +530
updates, err := s.fwm.QueueUpdate(ctx, switchUUID, bundleVersion, components)
if err != nil {
result.Status = pb.StatusCode_INTERNAL_ERROR
result.Error = fmt.Sprintf("failed to queue update: %v", err)
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use consistent per-item status mapping for queue failures.

At Line [528], every QueueUpdate failure is reported as INTERNAL_ERROR. In the single-item path (Line [405]), equivalent queueing failures are treated as invalid request input. This inconsistency makes client behavior harder to reason about.

Suggested minimal alignment
-		result.Status = pb.StatusCode_INTERNAL_ERROR
+		result.Status = pb.StatusCode_INVALID_ARGUMENT
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
updates, err := s.fwm.QueueUpdate(ctx, switchUUID, bundleVersion, components)
if err != nil {
result.Status = pb.StatusCode_INTERNAL_ERROR
result.Error = fmt.Sprintf("failed to queue update: %v", err)
return
updates, err := s.fwm.QueueUpdate(ctx, switchUUID, bundleVersion, components)
if err != nil {
result.Status = pb.StatusCode_INVALID_ARGUMENT
result.Error = fmt.Sprintf("failed to queue update: %v", err)
return
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nvswitch-manager/internal/service/server_impl.go` around lines 526 - 530, The
multi-item update branch currently maps a QueueUpdate failure to result.Status =
pb.StatusCode_INTERNAL_ERROR, which is inconsistent with the single-item path
that treats queueing failures as INVALID_ARGUMENT; update the multi-item path
(where s.fwm.QueueUpdate is called and the variable updates is returned) to use
the same per-item status mapping as the single-item flow: set result.Status to
pb.StatusCode_INVALID_ARGUMENT (and set result.Error with the formatted error
message) or, if result contains per-item entries, iterate the updates/results
and mark each item's status as INVALID_ARGUMENT with the error text so the
client sees the same failure semantics as the single-item path.

Comment on lines +310 to +328
func TestInMemoryPutIdempotentAndOverwrite(t *testing.T) {
ctx := context.Background()
mgr := NewInMemoryCredentialManager()
mac := parseMAC(t, "00:11:22:33:44:55")

// First Put succeeds
assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "secret")))

// Idempotent Put with same credentials is a no-op
assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "secret")))

// Put with different credentials overwrites (no error for in-memory)
assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "different")))

// Credentials are now the new values
got, err := mgr.Get(ctx, mac)
assert.NoError(t, err)
assert.Equal(t, "admin", got.User)
assert.Equal(t, "different", got.Password.Value)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Strengthen the idempotent Put check.

This case only verifies that the second write succeeds and the values stay the same. It would still pass if the manager replaced the stored credential with a fresh pointer, so the no-op behavior is not actually exercised.

Proposed assertion tweak
 func TestInMemoryPutIdempotentAndOverwrite(t *testing.T) {
 	ctx := context.Background()
 	mgr := NewInMemoryCredentialManager()
 	mac := parseMAC(t, "00:11:22:33:44:55")

-	// First Put succeeds
-	assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "secret")))
+	// First Put succeeds
+	first := newCred("admin", "secret")
+	assert.NoError(t, mgr.Put(ctx, mac, first))

 	// Idempotent Put with same credentials is a no-op
 	assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "secret")))

 	// Put with different credentials overwrites (no error for in-memory)
 	assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "different")))

 	// Credentials are now the new values
 	got, err := mgr.Get(ctx, mac)
 	assert.NoError(t, err)
 	assert.Equal(t, "admin", got.User)
 	assert.Equal(t, "different", got.Password.Value)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@powershelf-manager/pkg/credentials/inmemory_test.go` around lines 310 - 328,
The idempotent Put test currently only checks values, not that Put was a true
no-op; update TestInMemoryPutIdempotentAndOverwrite to create a credential
instance (e.g., orig := newCred("admin","secret")), call mgr.Put with orig, call
Put again with the same orig, then after the second Put call mgr.Get and assert
that the returned credential pointer/value is the same instance as orig (pointer
equality) rather than just equal fields; use NewInMemoryCredentialManager, Put,
Get and newCred as the referenced symbols to locate where to add the
pointer-equality assertion and preserve the existing overwrite checks.

Comment on lines +357 to +395
// TestVaultManager_PutUpsertSemantics pins the upsert contract that Put
// shares with the in-memory backend: identical credentials are a no-op,
// differing credentials overwrite (with a warning log not asserted here),
// and Patch unconditionally replaces.
func TestVaultManager_PutUpsertSemantics(t *testing.T) {
fake := newFakeVaultKVServer()
fake.mountPresent = true
srv := httptest.NewServer(fake.handler())
defer srv.Close()

ctx := context.Background()
mgr := newManagerWithServer(t, srv)
mac := parseMAC(t, "00:11:22:33:44:55")

// First Put writes the credentials.
assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "secret")))

// Idempotent Put with identical credentials is skipped, not rewritten.
assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "secret")))

// Put with different credentials succeeds and overwrites the existing
// entry (warn-and-overwrite). This matches the in-memory backend so that
// callers like pmcmanager.Register get consistent semantics across
// datastore types.
assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "rotated")))

got, err := mgr.Get(ctx, mac)
assert.NoError(t, err)
assert.Equal(t, "admin", got.User)
assert.Equal(t, "rotated", got.Password.Value)

// Patch unconditionally replaces, even when the existing entry differs
// from the new value.
assert.NoError(t, mgr.Patch(ctx, mac, newCred("root", "newpass")))
got, err = mgr.Get(ctx, mac)
assert.NoError(t, err)
assert.Equal(t, "root", got.User)
assert.Equal(t, "newpass", got.Password.Value)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Pin the idempotent Put path more directly.

This test only checks the final stored secret, so it would still pass if the second identical Put rewrote the record instead of short-circuiting. If the no-op behavior is part of the contract, add a request counter or similar spy in fakeVaultKVServer so the test fails when that optimization regresses.

Suggested test tightening
 type fakeVaultKVServer struct {
 	store        map[string]map[string]interface{} // uppercase MAC -> secret data (UsernamePassword map)
 	mountPresent bool
+	putCount     map[string]int
 }
@@
 func newFakeVaultKVServer() *fakeVaultKVServer {
 	return &fakeVaultKVServer{
 		store:        make(map[string]map[string]interface{}),
+		putCount:     make(map[string]int),
 		mountPresent: false,
 	}
 }
@@
 		case strings.HasPrefix(path, "/secrets/data/machines/bmc/") && strings.HasSuffix(path, "/root") && (r.Method == http.MethodPut || r.Method == http.MethodPost):
 			mac := strings.TrimSuffix(strings.TrimPrefix(path, "/secrets/data/machines/bmc/"), "/root")
+			s.putCount[mac]++
 			var payload map[string]interface{}
@@
 	assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "secret")))
 	assert.NoError(t, mgr.Put(ctx, mac, newCred("admin", "secret")))
+	assert.Equal(t, 1, fake.putCount[mac.String()])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@powershelf-manager/pkg/credentials/vault_test.go` around lines 357 - 395, The
test TestVaultManager_PutUpsertSemantics currently only asserts final state and
can’t detect whether the second identical Put was a no-op; modify the
fakeVaultKVServer (type fakeVaultKVServer and its handler()) to expose a simple
request/write counter or spy for Put/Write operations (e.g., increment a
PutWriteCount when the handler receives a write request), then update the test
to assert that after the first Put the counter is 1 and after the second
identical Put the counter remains 1 (while other operations like the differing
Put and Patch increment it as expected), keeping all existing Get/Patch/Put
calls and assertions but adding these counter checks to pin the idempotent Put
behavior.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
carbide-nsm 66 4 20 33 9 0
carbide-psm 58 6 29 13 2 8
carbide-rest-api 60 6 31 13 2 8
carbide-rest-cert-manager 54 4 28 13 1 8
carbide-rest-db 58 6 29 13 2 8
carbide-rest-site-agent 55 5 28 13 1 8
carbide-rest-site-manager 54 4 28 13 1 8
carbide-rest-workflow 59 6 30 13 2 8
carbide-rla 57 6 28 13 2 8
TOTAL 521 47 251 137 22 64

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@spydaNVIDIA spydaNVIDIA requested a review from kunzhao-nv May 1, 2026 22:27
@spydaNVIDIA spydaNVIDIA force-pushed the feat/pyda-fw branch 2 times, most recently from 621cb5c to 621d893 Compare May 1, 2026 22:44
Copy link
Copy Markdown
Contributor

@kunzhao-nv kunzhao-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update inventorysync in RLA accordingly.

Comment thread powershelf-manager/internal/service/server_impl.go
@spydaNVIDIA spydaNVIDIA enabled auto-merge (squash) May 4, 2026 16:52
@spydaNVIDIA spydaNVIDIA merged commit e02c20b into main May 4, 2026
98 checks passed
@spydaNVIDIA spydaNVIDIA deleted the feat/pyda-fw branch May 4, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants