fix: No-op Instance update when requested NVLink Interfaces matches with current#438
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds an order-independent multiset comparison for NVLink interfaces in UpdateInstanceHandler.Handle. If incoming NVLinkInterfaces exactly match an existing status bucket (Pending, Ready, or Deleting) by multiset of (NVLinkLogicalPartitionID, DeviceInstance), the handler treats the NVLink update as a no-op and skips partition fetches, DB writes, deletion marking, and re-appending unchanged NVLink interfaces. ChangesInstance NVLink fast-path
Sequence Diagram(s)sequenceDiagram
participant Client
participant Handler
participant DB
participant SiteController
Client->>Handler: UpdateInstance(req with NVLinkInterfaces)
alt NVLink multiset matches an existing status-bucket
Handler->>Handler: Build multiset keyed by (PartitionID, DeviceInstance)
Handler->>Handler: Match incoming tuples against multiset (decrement counts)
Handler-->>Client: Respond (no NVLink DB fetch/update or SiteController NVLink update)
else NVLink differs
Handler->>DB: Fetch NVLinkLogicalPartitions
Handler->>DB: Create new NVLink interface records (Pending)
Handler->>DB: Mark unmatched existing NVLink interfaces Deleting
Handler->>SiteController: Send updated GPU config / NVLink info
SiteController-->>Handler: Ack
Handler-->>Client: Respond with updated NVLink interfaces
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-04-28 22:18:44 UTC | Commit: e13071a |
Test Results9 281 tests +5 9 281 ✅ +5 7m 17s ⏱️ -6s Results for commit 1cc4217. ± Comparison against base commit 2645be3. This pull request removes 1 and adds 6 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/pkg/api/handler/instance.go (1)
2660-2675:⚠️ Potential issue | 🟠 MajorHandle explicit empty
nvLinkInterfacesupdates as no-ops too.Line 2660 gates this new check on
len(apiRequest.NVLinkInterfaces) > 0, so an explicit[]skips the rejection path. If the instance already has no active NVLink interfaces, that request still matches the current binding set and will incorrectly proceed as a no-op update.Suggested fix
- if len(apiRequest.NVLinkInterfaces) > 0 { + if apiRequest.NVLinkInterfaces != nil { nvlIfcDAO := cdbm.NewNVLinkInterfaceDAO(uih.dbSession) nvlIfcs, _, err := nvlIfcDAO.GetAll(ctx, nil, cdbm.NVLinkInterfaceFilterInput{InstanceIDs: []uuid.UUID{instance.ID}}, cdbp.PageInput{}, nil) if err != nil { logger.Error().Err(err).Msg("error retrieving NVLink Interfaces from DB for Instance") return cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, "Failed to retrieve NVLink Interfaces for Instance", nil) } // Discard if VPC has default NVLink Logical Partition specified and NVLink Interfaces are exists if len(nvlIfcs) > 0 && vpc.NVLinkLogicalPartitionID != nil { return cutil.NewAPIErrorResponse(c, http.StatusBadRequest, "Cannot update NVLink Interfaces if VPC has default NVLink Logical Partition and NVLink Interfaces already exist for the Instance", nil) } if nvLinkInterfaceUpdateRequestMatchesExisting(apiRequest.NVLinkInterfaces, nvlIfcs) { return cutil.NewAPIErrorResponse(c, http.StatusBadRequest, "Cannot update NVLink Interfaces because the requested NVLink Interfaces are identical to the Instance's current NVLink Interfaces", nil) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/pkg/api/handler/instance.go` around lines 2660 - 2675, Remove the outer len(apiRequest.NVLinkInterfaces) > 0 guard so you always fetch current NVLink interfaces (nvlIfcDAO.GetAll -> nvlIfcs) and compare them with the request using nvLinkInterfaceUpdateRequestMatchesExisting; also adjust the VPC default-partition check to only block when the request is actually trying to set interfaces (i.e. keep the vpc.NVLinkLogicalPartitionID check conditional on len(apiRequest.NVLinkInterfaces) > 0 and len(nvlIfcs) > 0) so an explicit empty slice ([] ) will be treated as a no-op when it matches the existing nvlIfcs and correctly return the identical-update error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/pkg/api/handler/instance_test.go`:
- Around line 5757-5784: The test currently returns early on non-200 responses
so the Site Controller workflow-call assertion (controlled by
verifySiteControllerRequest) is skipped; move or duplicate the Site Controller
verification out of the early-return path so it always runs after the handler
invocation and explicitly assert no workflow calls for this case using the Site
Controller mock (scp) API (e.g., assert scp shows zero workflow requests / no
workflows invoked) when verifySiteControllerRequest is false in the failing test
case; update the harness logic around verifySiteControllerRequest or add an
explicit check in this specific test case so the "no Site Controller workflow
call" constraint is enforced even on 400 responses.
---
Outside diff comments:
In `@api/pkg/api/handler/instance.go`:
- Around line 2660-2675: Remove the outer len(apiRequest.NVLinkInterfaces) > 0
guard so you always fetch current NVLink interfaces (nvlIfcDAO.GetAll ->
nvlIfcs) and compare them with the request using
nvLinkInterfaceUpdateRequestMatchesExisting; also adjust the VPC
default-partition check to only block when the request is actually trying to set
interfaces (i.e. keep the vpc.NVLinkLogicalPartitionID check conditional on
len(apiRequest.NVLinkInterfaces) > 0 and len(nvlIfcs) > 0) so an explicit empty
slice ([] ) will be treated as a no-op when it matches the existing nvlIfcs and
correctly return the identical-update error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4726590c-f1d0-4e39-b80d-bb8d4503a7f2
📒 Files selected for processing (2)
api/pkg/api/handler/instance.goapi/pkg/api/handler/instance_test.go
| name: "test Instance update API endpoint fails when NVLink interfaces are unchanged (same multiset as current)", | ||
| fields: fields{ | ||
| dbSession: dbSession, | ||
| tc: tc, | ||
| scp: scp, | ||
| cfg: cfg, | ||
| }, | ||
| args: args{ | ||
| reqData: &model.APIInstanceUpdateRequest{ | ||
| IpxeScript: os2.IpxeScript, | ||
| // Order differs from DB creation order; must still be rejected as identical multiset. | ||
| NVLinkInterfaces: []model.APINVLinkInterfaceCreateOrUpdateRequest{ | ||
| {NVLinkLogicalPartitionID: nvllp2.ID.String(), DeviceInstance: 3}, | ||
| {NVLinkLogicalPartitionID: nvllp1.ID.String(), DeviceInstance: 0}, | ||
| {NVLinkLogicalPartitionID: nvllp2.ID.String(), DeviceInstance: 2}, | ||
| {NVLinkLogicalPartitionID: nvllp1.ID.String(), DeviceInstance: 1}, | ||
| }, | ||
| }, | ||
| reqInstance: inst13.ID.String(), | ||
| reqOrg: tnOrg1, | ||
| reqUser: tnu1, | ||
| respCode: http.StatusBadRequest, | ||
| respMessage: cdb.GetStrPtr("Cannot update NVLink Interfaces because the requested NVLink Interfaces are identical to the Instance's current NVLink Interfaces"), | ||
| }, | ||
| wantErr: false, | ||
| verifySiteControllerRequest: false, | ||
| verifyChildSpanner: false, | ||
| }, |
There was a problem hiding this comment.
Missing assertion for “no Site Controller workflow call” in the new rejection scenario.
This case checks 400 + error text, but it does not verify the side-effect constraint. Because non-200 responses return early at Line 5950, workflow-call checks are skipped entirely.
Proposed test-harness enhancement
type args struct {
...
+ expectNoSiteWorkflowCall bool
}
{
name: "test Instance update API endpoint fails when NVLink interfaces are unchanged (same multiset as current)",
...
- verifySiteControllerRequest: false,
+ expectNoSiteWorkflowCall: true,
+ verifySiteControllerRequest: false,
verifyChildSpanner: false,
},
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
+ var preCalls int
+ var siteMock *tmocks.Client
+ if tt.args.expectNoSiteWorkflowCall {
+ reqID, err := uuid.Parse(tt.args.reqInstance)
+ require.NoError(t, err)
+ ins, err := insDAO.GetByID(context.Background(), nil, reqID, nil)
+ require.NoError(t, err)
+ c, err := tt.fields.scp.GetClientByID(ins.SiteID)
+ require.NoError(t, err)
+ siteMock = c.(*tmocks.Client)
+ preCalls = len(siteMock.Calls)
+ }
...
require.Equal(t, tt.args.respCode, rec.Code)
+ if tt.args.expectNoSiteWorkflowCall {
+ assert.Equal(t, preCalls, len(siteMock.Calls), "site controller workflow should not be invoked")
+ }
if tt.args.respCode != http.StatusOK {
return
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/pkg/api/handler/instance_test.go` around lines 5757 - 5784, The test
currently returns early on non-200 responses so the Site Controller
workflow-call assertion (controlled by verifySiteControllerRequest) is skipped;
move or duplicate the Site Controller verification out of the early-return path
so it always runs after the handler invocation and explicitly assert no workflow
calls for this case using the Site Controller mock (scp) API (e.g., assert scp
shows zero workflow requests / no workflows invoked) when
verifySiteControllerRequest is false in the failing test case; update the
harness logic around verifySiteControllerRequest or add an explicit check in
this specific test case so the "no Site Controller workflow call" constraint is
enforced even on 400 responses.
There was a problem hiding this comment.
@hwadekar-nv Acknowledged. I’ll ignore this thread and won’t pursue the suggested harness change. If you want me to mark my earlier comment as resolved, just say the word.
🧠 Learnings used
Learnt from: thossain-nv
Repo: NVIDIA/ncx-infra-controller-rest PR: 257
File: api/pkg/api/handler/vpcpeering.go:203-262
Timestamp: 2026-03-18T01:37:21.156Z
Learning: In Go handler code under api/pkg/api/handler, RBAC should not short-circuit on the first matching role (e.g., if isProviderAdmin { ... } else { ... }). For dual-role service accounts (e.g., Latinum), an org may have both ProviderAdmin and TenantAdmin. Evaluate both authorization paths independently, accumulate errors per path, and only reject if all applicable paths fail. Implement a helper that checks each role path and collects errors, then decide based on whether any path grants access. Add tests covering mixed-role scenarios and ensure CI gates enforce this behavior across all relevant handlers in this directory.
Learnt from: pbreton
Repo: NVIDIA/ncx-infra-controller-rest PR: 293
File: api/pkg/api/handler/operatingsystem_ownership_test.go:119-139
Timestamp: 2026-03-30T23:24:33.337Z
Learning: In Go handler unit tests that mock Temporal SDK `ExecuteWorkflow` with a variadic signature (`args ...interface{}`), ensure the `testify/mock` expectation uses the exact same number of `mock.Anything` (or other matchers) as the total number of arguments passed at the call site. Testify flattens variadic arguments via `_m.Called(_ca...)`, so a mismatch in matcher count will break the test. If your handler uses a site-scoped Temporal client mock (e.g., `siteMock` from `scp.IDClientMap`) that is invoked with 4 arguments (e.g., `ctx, workflowOptions, workflowName, workflowInput`), keep the corresponding `On("ExecuteWorkflow", ...)` expectation at 4 matchers; do not “convert” it to the main client’s arity (e.g., a 5-arg mock) unless that specific mock is actually called with 5 arguments in the tested code path.
Learnt from: bcavnvidia
Repo: NVIDIA/ncx-infra-controller-rest PR: 350
File: api/pkg/api/handler/vpc.go:303-313
Timestamp: 2026-04-06T21:09:55.214Z
Learning: In this codebase, the database interface is intentionally designed to persist the exact value it is given (including values derived from protobuf types, e.g., enum `.String()`). Repo-owned DB constants (e.g., `cdbm.VpcRoutingProfileTypeExternal`) are published for external callers/consumers, not as an indirection layer that the handler should use. Therefore, when reviewing this repository’s handlers, do not flag as a coupling issue the pattern of persisting a protobuf-derived string value directly via calls like `cdb.GetStrPtr(routingProfileTypeValue.String())`, as this is expected behavior rather than an accidental dependency on those string representations.
Learnt from: bcavnvidia
Repo: NVIDIA/ncx-infra-controller-rest PR: 350
File: api/pkg/api/handler/vpc.go:303-313
Timestamp: 2026-04-06T21:22:33.123Z
Learning: Do not flag as an unacceptable coupling the persistence of protobuf enum values using the enum’s `.String()` output (e.g., storing `cdb.GetStrPtr(myProtoEnum.String())`) in inventory/round-trip paths. This codebase stores these `.String()` values because the gRPC back-end inventory process returns the same protobuf-derived string format, so DB round-trips must preserve the exact representation. If and when these inventory values change to plain strings with no backing proto enum, reassess the storage approach, but until then treat proto-enum `.String()` persistence as intentional and compatible.
Learnt from: bcavnvidia
Repo: NVIDIA/ncx-infra-controller-rest PR: 367
File: api/pkg/api/handler/machineinstancetype.go:593-606
Timestamp: 2026-04-10T16:39:25.404Z
Learning: In this codebase, Machine IDs are NOT UUIDs, while the deprecated Machine/InstanceType association IDs ARE UUIDs. When reviewing code that parses/validates incoming machine-related identifiers, treat `uuid.Parse` as a reliable discriminator between the deprecated UUID-based association IDs and the non-UUID Machine IDs. Do not raise/flag potential ambiguity between these two ID namespaces when a `uuid.Parse` check is used to separate them.
Learnt from: parmani-nv
Repo: NVIDIA/ncx-infra-controller-rest PR: 372
File: api/pkg/api/handler/fabric.go:137-143
Timestamp: 2026-04-15T23:49:00.903Z
Learning: In this repository, `cdb.ErrDoesNotExist` (from `db/pkg/db`) is a DB-layer sentinel error that is never wrapped. Therefore, in Go code that calls DAO/DB methods (e.g., `TenantSiteDAO.GetByTenantIDAndSiteID`), direct equality checks like `err == cdb.ErrDoesNotExist` / `err != cdb.ErrDoesNotExist` are intentional and safe. Do not flag these comparisons as robustness issues and do not suggest replacing them with `errors.Is()` for this specific sentinel.
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
08bf661 to
b2c7503
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
api/pkg/api/handler/instance.go (1)
3177-3186: 💤 Low valueMultiset consumption does not verify that all existing active entries are accounted for.
After consuming from
existingNvlIfcMapfor each request interface, there is no subsequent check to ensure the map is fully depleted. If the request contains a subset of the existing keys (but the total count matches), the logic would incorrectly report unchanged.Example: Existing has
{A:0: 2}, request has{A:0, B:0}with count 2. Both consume successfully from a map that initially had{A:0: 2, B:0: 0}(if B:0 doesn't exist, it fails). However, if existing has{A:0: 1, A:1: 1}and request has{A:0, A:0}, the second A:0 would fail to consume. The logic handles this case. But if existing has{A:0: 2}and request has{A:0, A:0}, it works correctly.Upon further analysis, the existing logic does handle this correctly since consumption failures set
nvLinkInterfacesUnchanged = false. However, adding a post-loop verification that remaining counts are zero would strengthen the invariant and improve maintainability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/pkg/api/handler/instance.go` around lines 3177 - 3186, After consuming counts from existingNvlIfcMap while iterating request interfaces, add a post-loop verification that every remaining value in existingNvlIfcMap is zero; if any value > 0 set nvLinkInterfacesUnchanged = false. Update the logic around existingNvlIfcMap / nvLinkInterfacesUnchanged (the block that builds key := fmt.Sprintf("%s:%d", nvllPartitionID.String(), apiNvlIfc.DeviceInstance) and decrements existingNvlIfcMap[key]) to include this final sweep so we only report unchanged when all multinset counts are fully consumed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/pkg/api/handler/instance.go`:
- Line 3156: Fix the typographical error in the comment that currently reads "If
the number of pending/ready/deeleting NVLink interfaces is unchanged, do
nothing" by changing "deeleting" to "deleting" so the comment reads "If the
number of pending/ready/deleting NVLink interfaces is unchanged, do nothing";
update the comment text exactly where that string appears to correct the typo.
- Around line 3136-3154: existingNvlIfcMap is populated from existingNvlIfcs
without filtering out NVLinkInterfaceStatusDeleting entries, so Deleting
interfaces can incorrectly count toward unchanged checks; update the loop that
builds existingNvlIfcMap (iterating existingNvlIfcs) to skip entries whose
Status == cdbm.NVLinkInterfaceStatusDeleting before incrementing the map and
updating counters (keep updating existingDeletingNvlIfcsCount for bookkeeping),
ensuring only non-deleting interfaces are added to existingNvlIfcMap used for
the unchanged check.
- Around line 3159-3165: The count comparison is wrong because existingNvlIfcMap
aggregates all statuses; change the logic so each branch filters
existingNvlIfcMap to the corresponding-status entries before doing the multiset
match: when len(apiRequest.NVLinkInterfaces) == existingPendingNvlIfcsCount, set
nvLinkInterfacesUnchanged only after building a map of Pending interfaces (from
existingNvlIfcMap) and performing the multiset compare against that filtered
map; do the same for the Ready branch using existingReadyNvlIfcsCount and Ready
entries; remove or disallow treating equality with existingDeletingNvlIfcsCount
as unchanged (do not accept Deleting entries) so the multiset compare never uses
Deleting-status entries.
---
Nitpick comments:
In `@api/pkg/api/handler/instance.go`:
- Around line 3177-3186: After consuming counts from existingNvlIfcMap while
iterating request interfaces, add a post-loop verification that every remaining
value in existingNvlIfcMap is zero; if any value > 0 set
nvLinkInterfacesUnchanged = false. Update the logic around existingNvlIfcMap /
nvLinkInterfacesUnchanged (the block that builds key := fmt.Sprintf("%s:%d",
nvllPartitionID.String(), apiNvlIfc.DeviceInstance) and decrements
existingNvlIfcMap[key]) to include this final sweep so we only report unchanged
when all multinset counts are fully consumed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 24e5782f-112f-43d1-a981-1bc4430fe8cf
📒 Files selected for processing (1)
api/pkg/api/handler/instance.go
| // Count the number of NVLink interfaces by NVLink Logical Partition ID and Device Instance | ||
| existingNvlIfcMap := make(map[string]int) | ||
| existingReadyNvlIfcsCount := 0 | ||
| existingPendingNvlIfcsCount := 0 | ||
| existingDeletingNvlIfcsCount := 0 | ||
| for i := range existingNvlIfcs { | ||
| key := fmt.Sprintf("%s:%d", existingNvlIfcs[i].NVLinkLogicalPartitionID.String(), existingNvlIfcs[i].DeviceInstance) | ||
| existingNvlIfcMap[key]++ | ||
| switch existingNvlIfcs[i].Status { | ||
| case cdbm.NVLinkInterfaceStatusReady: | ||
| existingReadyNvlIfcsCount++ | ||
| case cdbm.NVLinkInterfaceStatusPending: | ||
| existingPendingNvlIfcsCount++ | ||
| case cdbm.NVLinkInterfaceStatusDeleting: | ||
| existingDeletingNvlIfcsCount++ | ||
| default: | ||
| // Provisioning, Error, or other statuses are not counted in these buckets. | ||
| } | ||
| } |
There was a problem hiding this comment.
Multiset map includes all statuses but only non-Deleting interfaces should participate in the unchanged check.
Per the PR objective, interfaces in Deleting state should not be considered part of the current active binding. Currently, existingNvlIfcMap includes all interfaces regardless of status, which could lead to a request matching against a Deleting interface and incorrectly being treated as a no-op.
Proposed fix: exclude Deleting interfaces from the multiset
for i := range existingNvlIfcs {
+ // Skip Deleting interfaces - they should not be considered part of active bindings
+ if existingNvlIfcs[i].Status == cdbm.NVLinkInterfaceStatusDeleting {
+ existingDeletingNvlIfcsCount++
+ continue
+ }
key := fmt.Sprintf("%s:%d", existingNvlIfcs[i].NVLinkLogicalPartitionID.String(), existingNvlIfcs[i].DeviceInstance)
existingNvlIfcMap[key]++
switch existingNvlIfcs[i].Status {
case cdbm.NVLinkInterfaceStatusReady:
existingReadyNvlIfcsCount++
case cdbm.NVLinkInterfaceStatusPending:
existingPendingNvlIfcsCount++
- case cdbm.NVLinkInterfaceStatusDeleting:
- existingDeletingNvlIfcsCount++
default:
// Provisioning, Error, or other statuses are not counted in these buckets.
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/pkg/api/handler/instance.go` around lines 3136 - 3154, existingNvlIfcMap
is populated from existingNvlIfcs without filtering out
NVLinkInterfaceStatusDeleting entries, so Deleting interfaces can incorrectly
count toward unchanged checks; update the loop that builds existingNvlIfcMap
(iterating existingNvlIfcs) to skip entries whose Status ==
cdbm.NVLinkInterfaceStatusDeleting before incrementing the map and updating
counters (keep updating existingDeletingNvlIfcsCount for bookkeeping), ensuring
only non-deleting interfaces are added to existingNvlIfcMap used for the
unchanged check.
There was a problem hiding this comment.
Since existing Interfaces might be in the state one of those including Deleting and if incoming interfaces as in Deleting state then incoming operation would be no-op until existing interfaces completely gone.
There was a problem hiding this comment.
@hwadekar-nv, thank you for the clarification. I understand the design intent now: if the incoming request exactly matches the currently-Deleting interfaces (same count and same (NVLinkLogicalPartitionID, DeviceInstance) multiset), the operation is intentionally treated as a no-op — the system avoids creating new Pending records on top of an in-progress deletion, and lets the cleanup complete naturally.
Including Deleting-state interfaces in existingNvlIfcMap is therefore deliberate, since it is needed for the multiset match to succeed in that path. That is a valid design choice and I withdraw my earlier concern.
One minor consideration worth confirming: if existing interfaces span mixed statuses (e.g., 1 Ready + 1 Deleting), a count match against existingReadyNvlIfcsCount == 1 could trigger nvLinkInterfacesUnchanged = true, after which the multiset check might consume the Deleting key rather than the Ready key for a given (partition, device) pair — potentially masking a real change. This is only a risk when the same (partition, device) key appears in both Deleting and non-Deleting interfaces simultaneously. If your data model prevents that (e.g., a given GPU slot cannot have two concurrent records in different states), this is a non-issue. Otherwise a small guard per status bucket in the multiset pass might be worth a follow-up.
✏️ Learnings added
Learnt from: hwadekar-nv
Repo: NVIDIA/infra-controller-rest PR: 438
File: api/pkg/api/handler/instance.go:3136-3154
Timestamp: 2026-04-29T23:54:10.850Z
Learning: In api/pkg/api/handler/instance.go (UpdateInstanceHandler.Handle), the NVLink unchanged fast-path intentionally includes NVLink interfaces in the Deleting state inside existingNvlIfcMap when building the multiset. If the incoming request's NVLink interfaces match the count and (NVLinkLogicalPartitionID, DeviceInstance) multiset of currently-Deleting interfaces, the operation is treated as a no-op — the handler avoids creating new Pending records while an in-progress deletion completes. Do not flag the inclusion of Deleting-status interfaces in existingNvlIfcMap as a bug.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: thossain-nv
Repo: NVIDIA/ncx-infra-controller-rest PR: 257
File: api/pkg/api/handler/vpcpeering.go:203-262
Timestamp: 2026-03-18T01:37:21.156Z
Learning: In Go handler code under api/pkg/api/handler, RBAC should not short-circuit on the first matching role (e.g., if isProviderAdmin { ... } else { ... }). For dual-role service accounts (e.g., Latinum), an org may have both ProviderAdmin and TenantAdmin. Evaluate both authorization paths independently, accumulate errors per path, and only reject if all applicable paths fail. Implement a helper that checks each role path and collects errors, then decide based on whether any path grants access. Add tests covering mixed-role scenarios and ensure CI gates enforce this behavior across all relevant handlers in this directory.
Learnt from: bcavnvidia
Repo: NVIDIA/ncx-infra-controller-rest PR: 350
File: api/pkg/api/handler/vpc.go:303-313
Timestamp: 2026-04-06T21:09:55.214Z
Learning: In this codebase, the database interface is intentionally designed to persist the exact value it is given (including values derived from protobuf types, e.g., enum `.String()`). Repo-owned DB constants (e.g., `cdbm.VpcRoutingProfileTypeExternal`) are published for external callers/consumers, not as an indirection layer that the handler should use. Therefore, when reviewing this repository’s handlers, do not flag as a coupling issue the pattern of persisting a protobuf-derived string value directly via calls like `cdb.GetStrPtr(routingProfileTypeValue.String())`, as this is expected behavior rather than an accidental dependency on those string representations.
Learnt from: bcavnvidia
Repo: NVIDIA/ncx-infra-controller-rest PR: 350
File: api/pkg/api/handler/vpc.go:303-313
Timestamp: 2026-04-06T21:22:33.123Z
Learning: Do not flag as an unacceptable coupling the persistence of protobuf enum values using the enum’s `.String()` output (e.g., storing `cdb.GetStrPtr(myProtoEnum.String())`) in inventory/round-trip paths. This codebase stores these `.String()` values because the gRPC back-end inventory process returns the same protobuf-derived string format, so DB round-trips must preserve the exact representation. If and when these inventory values change to plain strings with no backing proto enum, reassess the storage approach, but until then treat proto-enum `.String()` persistence as intentional and compatible.
Learnt from: bcavnvidia
Repo: NVIDIA/ncx-infra-controller-rest PR: 367
File: api/pkg/api/handler/machineinstancetype.go:593-606
Timestamp: 2026-04-10T16:39:25.404Z
Learning: In this codebase, Machine IDs are NOT UUIDs, while the deprecated Machine/InstanceType association IDs ARE UUIDs. When reviewing code that parses/validates incoming machine-related identifiers, treat `uuid.Parse` as a reliable discriminator between the deprecated UUID-based association IDs and the non-UUID Machine IDs. Do not raise/flag potential ambiguity between these two ID namespaces when a `uuid.Parse` check is used to separate them.
Learnt from: parmani-nv
Repo: NVIDIA/ncx-infra-controller-rest PR: 372
File: api/pkg/api/handler/fabric.go:137-143
Timestamp: 2026-04-15T23:49:00.903Z
Learning: In this repository, `cdb.ErrDoesNotExist` (from `db/pkg/db`) is a DB-layer sentinel error that is never wrapped. Therefore, in Go code that calls DAO/DB methods (e.g., `TenantSiteDAO.GetByTenantIDAndSiteID`), direct equality checks like `err == cdb.ErrDoesNotExist` / `err != cdb.ErrDoesNotExist` are intentional and safe. Do not flag these comparisons as robustness issues and do not suggest replacing them with `errors.Is()` for this specific sentinel.
| if len(apiRequest.NVLinkInterfaces) == existingPendingNvlIfcsCount { | ||
| nvLinkInterfacesUnchanged = true | ||
| } else if len(apiRequest.NVLinkInterfaces) == existingReadyNvlIfcsCount { | ||
| nvLinkInterfacesUnchanged = true | ||
| } else if len(apiRequest.NVLinkInterfaces) == existingDeletingNvlIfcsCount { | ||
| nvLinkInterfacesUnchanged = true | ||
| } |
There was a problem hiding this comment.
Logic inconsistency: count check is per-status but multiset matching operates on all statuses combined.
The current implementation checks if the request count equals one of the three status-specific counts (Pending, Ready, or Deleting), but existingNvlIfcMap aggregates interfaces across all statuses. This permits false positives where a request matches a count bucket but consumes entries from different status categories during multiset matching.
Additionally, allowing the request to match existingDeletingNvlIfcsCount contradicts the PR objective: "ensuring those interfaces are not in the Deleting state."
Consider constructing separate maps per status or filtering existingNvlIfcMap to only include interfaces whose status corresponds to the matched count bucket.
Suggested approach
- // If the number of pending/ready/deeleting NVLink interfaces is unchanged, do nothing
- // because existing NVLink interfaces includes pending/ready/deleting NVLink interfaces on the same logical partition and device instance
- // hence we need to verify if incoming request contains the same number of pending/ready/deleting NVLink interfaces
- if len(apiRequest.NVLinkInterfaces) == existingPendingNvlIfcsCount {
- nvLinkInterfacesUnchanged = true
- } else if len(apiRequest.NVLinkInterfaces) == existingReadyNvlIfcsCount {
- nvLinkInterfacesUnchanged = true
- } else if len(apiRequest.NVLinkInterfaces) == existingDeletingNvlIfcsCount {
- nvLinkInterfacesUnchanged = true
- }
+ // Only consider Ready and Pending interfaces as "active" bindings (exclude Deleting)
+ activeNvlIfcsCount := existingReadyNvlIfcsCount + existingPendingNvlIfcsCount
+ if len(apiRequest.NVLinkInterfaces) == activeNvlIfcsCount {
+ nvLinkInterfacesUnchanged = true
+ }Then filter existingNvlIfcMap to only include non-Deleting interfaces for consistent multiset matching.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if len(apiRequest.NVLinkInterfaces) == existingPendingNvlIfcsCount { | |
| nvLinkInterfacesUnchanged = true | |
| } else if len(apiRequest.NVLinkInterfaces) == existingReadyNvlIfcsCount { | |
| nvLinkInterfacesUnchanged = true | |
| } else if len(apiRequest.NVLinkInterfaces) == existingDeletingNvlIfcsCount { | |
| nvLinkInterfacesUnchanged = true | |
| } | |
| // Only consider Ready and Pending interfaces as "active" bindings (exclude Deleting) | |
| activeNvlIfcsCount := existingReadyNvlIfcsCount + existingPendingNvlIfcsCount | |
| if len(apiRequest.NVLinkInterfaces) == activeNvlIfcsCount { | |
| nvLinkInterfacesUnchanged = true | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/pkg/api/handler/instance.go` around lines 3159 - 3165, The count
comparison is wrong because existingNvlIfcMap aggregates all statuses; change
the logic so each branch filters existingNvlIfcMap to the corresponding-status
entries before doing the multiset match: when len(apiRequest.NVLinkInterfaces)
== existingPendingNvlIfcsCount, set nvLinkInterfacesUnchanged only after
building a map of Pending interfaces (from existingNvlIfcMap) and performing the
multiset compare against that filtered map; do the same for the Ready branch
using existingReadyNvlIfcsCount and Ready entries; remove or disallow treating
equality with existingDeletingNvlIfcsCount as unchanged (do not accept Deleting
entries) so the multiset compare never uses Deleting-status entries.
There was a problem hiding this comment.
Since existing Interfaces might be in the state one of those including Deleting and if incoming interfaces as in Deleting state then incoming operation would be no-op until existing interfaces completely gone. We can't say always len(apiRequest.NVLinkInterfaces) == activeNvlIfcsCount
There was a problem hiding this comment.
@hwadekar-nv, understood — thank you for the clarification. The intent is clear: if a deletion is already in progress (interfaces in Deleting state) and the incoming request specifies the same (NVLinkLogicalPartitionID, DeviceInstance) multiset, the operation should be treated as a no-op to avoid stacking a new Pending creation on top of an in-flight deletion. Matching against existingDeletingNvlIfcsCount is therefore correct and intentional.
I will retract that part of the review concern. I note there is a subtle edge case where the combined existingNvlIfcMap (built across all statuses) could allow a request to satisfy the count check of one status bucket (e.g., Ready) but actually match keys belonging to a different status (e.g., Deleting). However, since the end result is still a safe no-op — no duplicate DB records are created — this is acceptable behavior in your design.
🧠 Learnings used
Learnt from: hwadekar-nv
Repo: NVIDIA/infra-controller-rest PR: 438
File: api/pkg/api/handler/instance.go:3136-3154
Timestamp: 2026-04-29T23:54:10.850Z
Learning: In api/pkg/api/handler/instance.go (UpdateInstanceHandler.Handle), the NVLink unchanged fast-path intentionally includes NVLink interfaces in the Deleting state inside existingNvlIfcMap when building the multiset. If the incoming request's NVLink interfaces match the count and (NVLinkLogicalPartitionID, DeviceInstance) multiset of currently-Deleting interfaces, the operation is treated as a no-op — the handler avoids creating new Pending records while an in-progress deletion completes. Do not flag the inclusion of Deleting-status interfaces in existingNvlIfcMap as a bug.
Learnt from: thossain-nv
Repo: NVIDIA/ncx-infra-controller-rest PR: 257
File: api/pkg/api/handler/vpcpeering.go:203-262
Timestamp: 2026-03-18T01:37:21.156Z
Learning: In Go handler code under api/pkg/api/handler, RBAC should not short-circuit on the first matching role (e.g., if isProviderAdmin { ... } else { ... }). For dual-role service accounts (e.g., Latinum), an org may have both ProviderAdmin and TenantAdmin. Evaluate both authorization paths independently, accumulate errors per path, and only reject if all applicable paths fail. Implement a helper that checks each role path and collects errors, then decide based on whether any path grants access. Add tests covering mixed-role scenarios and ensure CI gates enforce this behavior across all relevant handlers in this directory.
Learnt from: bcavnvidia
Repo: NVIDIA/ncx-infra-controller-rest PR: 350
File: api/pkg/api/handler/vpc.go:303-313
Timestamp: 2026-04-06T21:09:55.214Z
Learning: In this codebase, the database interface is intentionally designed to persist the exact value it is given (including values derived from protobuf types, e.g., enum `.String()`). Repo-owned DB constants (e.g., `cdbm.VpcRoutingProfileTypeExternal`) are published for external callers/consumers, not as an indirection layer that the handler should use. Therefore, when reviewing this repository’s handlers, do not flag as a coupling issue the pattern of persisting a protobuf-derived string value directly via calls like `cdb.GetStrPtr(routingProfileTypeValue.String())`, as this is expected behavior rather than an accidental dependency on those string representations.
Learnt from: bcavnvidia
Repo: NVIDIA/ncx-infra-controller-rest PR: 350
File: api/pkg/api/handler/vpc.go:303-313
Timestamp: 2026-04-06T21:22:33.123Z
Learning: Do not flag as an unacceptable coupling the persistence of protobuf enum values using the enum’s `.String()` output (e.g., storing `cdb.GetStrPtr(myProtoEnum.String())`) in inventory/round-trip paths. This codebase stores these `.String()` values because the gRPC back-end inventory process returns the same protobuf-derived string format, so DB round-trips must preserve the exact representation. If and when these inventory values change to plain strings with no backing proto enum, reassess the storage approach, but until then treat proto-enum `.String()` persistence as intentional and compatible.
Learnt from: bcavnvidia
Repo: NVIDIA/ncx-infra-controller-rest PR: 367
File: api/pkg/api/handler/machineinstancetype.go:593-606
Timestamp: 2026-04-10T16:39:25.404Z
Learning: In this codebase, Machine IDs are NOT UUIDs, while the deprecated Machine/InstanceType association IDs ARE UUIDs. When reviewing code that parses/validates incoming machine-related identifiers, treat `uuid.Parse` as a reliable discriminator between the deprecated UUID-based association IDs and the non-UUID Machine IDs. Do not raise/flag potential ambiguity between these two ID namespaces when a `uuid.Parse` check is used to separate them.
Learnt from: parmani-nv
Repo: NVIDIA/ncx-infra-controller-rest PR: 372
File: api/pkg/api/handler/fabric.go:137-143
Timestamp: 2026-04-15T23:49:00.903Z
Learning: In this repository, `cdb.ErrDoesNotExist` (from `db/pkg/db`) is a DB-layer sentinel error that is never wrapped. Therefore, in Go code that calls DAO/DB methods (e.g., `TenantSiteDAO.GetByTenantIDAndSiteID`), direct equality checks like `err == cdb.ErrDoesNotExist` / `err != cdb.ErrDoesNotExist` are intentional and safe. Do not flag these comparisons as robustness issues and do not suggest replacing them with `errors.Is()` for this specific sentinel.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/pkg/api/handler/instance.go (1)
3136-3185:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBlock the NVLink no-op fast-path when any existing interface is in an unhandled status.
At Line 3159, an explicit clear request (
[]) becomes a no-op whenever the counted buckets are zero, even ifexistingNvlIfcsstill containsProvisioningorErrorrows. The same issue can happen for mixed-state instances, because those rows are inserted intoexistingNvlIfcMapbut never participate in the bucket check. That can silently ignore a real NVLink update.Suggested fix
// Count the number of NVLink interfaces by NVLink Logical Partition ID and Device Instance existingNvlIfcMap := make(map[string]int) existingReadyNvlIfcsCount := 0 existingPendingNvlIfcsCount := 0 existingDeletingNvlIfcsCount := 0 + hasUnhandledNvlIfcStatus := false for i := range existingNvlIfcs { key := fmt.Sprintf("%s:%d", existingNvlIfcs[i].NVLinkLogicalPartitionID.String(), existingNvlIfcs[i].DeviceInstance) - existingNvlIfcMap[key]++ switch existingNvlIfcs[i].Status { case cdbm.NVLinkInterfaceStatusReady: existingReadyNvlIfcsCount++ + existingNvlIfcMap[key]++ case cdbm.NVLinkInterfaceStatusPending: existingPendingNvlIfcsCount++ + existingNvlIfcMap[key]++ case cdbm.NVLinkInterfaceStatusDeleting: existingDeletingNvlIfcsCount++ + existingNvlIfcMap[key]++ default: - // Provisioning, Error, or other statuses are not counted in these buckets. + hasUnhandledNvlIfcStatus = true } } // If the number of pending/ready/deleting NVLink interfaces is unchanged, do nothing // because existing NVLink interfaces includes pending/ready/deleting NVLink interfaces on the same logical partition and device instance // hence we need to verify if incoming request contains the same number of pending/ready/deleting NVLink interfaces - if len(apiRequest.NVLinkInterfaces) == existingPendingNvlIfcsCount { - nvLinkInterfacesUnchanged = true - } else if len(apiRequest.NVLinkInterfaces) == existingReadyNvlIfcsCount { - nvLinkInterfacesUnchanged = true - } else if len(apiRequest.NVLinkInterfaces) == existingDeletingNvlIfcsCount { - nvLinkInterfacesUnchanged = true + if !hasUnhandledNvlIfcStatus { + if len(apiRequest.NVLinkInterfaces) == existingPendingNvlIfcsCount { + nvLinkInterfacesUnchanged = true + } else if len(apiRequest.NVLinkInterfaces) == existingReadyNvlIfcsCount { + nvLinkInterfacesUnchanged = true + } else if len(apiRequest.NVLinkInterfaces) == existingDeletingNvlIfcsCount { + nvLinkInterfacesUnchanged = true + } }Based on learnings: keep
Deleting-state interfaces eligible for this no-op path; the issue is that statuses outsidePending/Ready/Deletingcan currently trigger it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/pkg/api/handler/instance.go` around lines 3136 - 3185, The no-op fast-path incorrectly fires when existingNvlIfcs contains rows in unhandled statuses (e.g., Provisioning or Error) because only Pending/Ready/Deleting were bucketed; add a guard that detects any unhandled status during the existingNvlIfcs iteration (e.g., set hasUnhandledNvlIfcStatus = true in the default switch case) and then only allow nvLinkInterfacesUnchanged to remain true if hasUnhandledNvlIfcStatus is false (and the multiset counts still match). Update references around existingNvlIfcs, existingNvlIfcMap, nvLinkInterfacesUnchanged and the status-counting switch to implement this check so unhandled statuses block the no-op path while keeping Deleting eligible.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@api/pkg/api/handler/instance.go`:
- Around line 3136-3185: The no-op fast-path incorrectly fires when
existingNvlIfcs contains rows in unhandled statuses (e.g., Provisioning or
Error) because only Pending/Ready/Deleting were bucketed; add a guard that
detects any unhandled status during the existingNvlIfcs iteration (e.g., set
hasUnhandledNvlIfcStatus = true in the default switch case) and then only allow
nvLinkInterfacesUnchanged to remain true if hasUnhandledNvlIfcStatus is false
(and the multiset counts still match). Update references around existingNvlIfcs,
existingNvlIfcMap, nvLinkInterfacesUnchanged and the status-counting switch to
implement this check so unhandled statuses block the no-op path while keeping
Deleting eligible.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9963253c-fdd4-4d90-a5d9-aa1560d81622
📒 Files selected for processing (1)
api/pkg/api/handler/instance.go
cdbeef9 to
7bc9293
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/pkg/api/handler/instance.go (1)
3105-3167:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThe no-op fast-path never actually skips the NVLink writes.
Even when
nvLinkInterfacesUnchangedbecomestrue, Lines 3105-3167 still create newPendingrows and mark the current rowsDeletingwheneverapiRequest.NVLinkInterfaces != nil. That defeats the PR fix for identical requests. Branch this block on!nvLinkInterfacesUnchanged, and in the no-op branch preserveexistingNvlIfcsas the returned/current state.Suggested control-flow fix
- if apiRequest.NVLinkInterfaces != nil { + if apiRequest.NVLinkInterfaces != nil { + if nvLinkInterfacesUnchanged { + newNvlIfcs = existingNvlIfcs + } else { nvllpIDs := goset.NewSet[uuid.UUID]() for _, apiNvlIfc := range apiRequest.NVLinkInterfaces { // NVLink Logical Partition nvllPartitionID, err := uuid.Parse(apiNvlIfc.NVLinkLogicalPartitionID) if err != nil { @@ if err != nil { logger.Error().Err(err).Msg("failed to update NVLink Interface records in DB") return cutil.NewAPIError(http.StatusInternalServerError, "Failed to update NVLink Interfaces for Instance, DB error", nil) } } + } } else { newNvlIfcs = existingNvlIfcs }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/instance.go` around lines 3105 - 3167, The NVLink write block guarded only by apiRequest.NVLinkInterfaces still runs even when nvLinkInterfacesUnchanged is true; change the control flow to only execute the creation+deletion logic when !nvLinkInterfacesUnchanged (i.e., wrap the entire block that parses NVLinkLogicalPartitionIDs, calls nvlIfcDAO.Create and UpdateMultiple) and in the else/no-op branch leave existingNvlIfcs unchanged and set newNvlIfcs (or the returned NVLink list) to the existingNvlIfcs so the handler returns the current state without creating Pending rows or marking Deleting.
🧹 Nitpick comments (1)
api/pkg/api/handler/instance_test.go (1)
5664-5691: ⚡ Quick winAdd an explicit no-op DB assertion for NVLink interface state.
This new case targets “unchanged NVLink interfaces => no-op”, but it does not assert the key postcondition that existing NVLink interfaces are not moved to
Deleting.Proposed test-harness addition
type args struct { ... nvLinkGpuConfigsVerifyCountOnly bool + expectNoNVLinkDeleting bool }{ name: "test Instance update API endpoint success when NVLink interfaces unchanged (no-op, multiset order)", ... args: args{ ... nvLinkGpuConfigsVerifyCountOnly: true, + expectNoNVLinkDeleting: true, },+ if tt.args.expectNoNVLinkDeleting { + nvlIfcDAO := cdbm.NewNVLinkInterfaceDAO(tt.fields.dbSession) + deleting, _, _ := nvlIfcDAO.GetAll( + ec.Request().Context(), + nil, + cdbm.NVLinkInterfaceFilterInput{ + InstanceIDs: []uuid.UUID{reqIns.ID}, + Statuses: []string{cdbm.NVLinkInterfaceStatusDeleting}, + }, + cdbp.PageInput{}, + nil, + ) + assert.Len(t, deleting, 0) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/instance_test.go` around lines 5664 - 5691, This test case ("test Instance update API endpoint success when NVLink interfaces unchanged (no-op, multiset order)") is missing an explicit DB assertion that existing NVLink interfaces were not marked Deleting; after calling the handler, query the NVLink interface rows for the instance (use inst13.ID / NVLinkInterfaces tied to nvllp1/nvllp2 via DeviceInstance) via the test dbSession and assert each NVLink interface's State (or Status) field is still the live state (not "Deleting"); add this assertion to the test’s verify/tear-down block so the handler no-op path verifies DB rows were not transitioned to Deleting.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/pkg/api/handler/instance_test.go`:
- Around line 6155-6160: When nvLinkGpuConfigsVerifyCountOnly is true, the test
currently only checks length; instead verify the multiset of
(NVLinkLogicalPartitionID, DeviceInstance) pairs regardless of order by
comparing counts of each pair between siteReq.Config.Nvlink.GpuConfigs and
tt.args.reqData.NVLinkInterfaces. Locate the branch guarded by
nvLinkGpuConfigsVerifyCountOnly and replace the length-only assertion with logic
that iterates siteReq.Config.Nvlink.GpuConfigs and
tt.args.reqData.NVLinkInterfaces, constructs a map/counter keyed by the tuple
(LogicalPartitionId.Value, DeviceInstance) and asserts the two maps are equal so
order differences do not cause false positives.
In `@api/pkg/api/handler/instance.go`:
- Around line 3027-3040: The closure is shadowing outer variables causing them
to be inaccessible later: remove the inner declaration of newNvlIfcs and stop
using := for existingNvlIfcs and nvLinkInterfacesUnchanged inside the
transaction closure so they reassign the outer variables instead; specifically,
delete the inner "var newNvlIfcs" declaration, change the GetAll assignment from
":=" to "=" for existingNvlIfcs (handling the err as currently done), and move
the declaration of nvLinkInterfacesUnchanged to the outer scope (e.g., before
the transaction block) so the values set by the NVLinkInterfaceDAO/GetAll flow
are visible after the transaction. Ensure NVLinkInterfaceDAO
(NewNVLinkInterfaceDAO), GetAll, and any error handling continue to work with
the reassigned variables.
---
Outside diff comments:
In `@api/pkg/api/handler/instance.go`:
- Around line 3105-3167: The NVLink write block guarded only by
apiRequest.NVLinkInterfaces still runs even when nvLinkInterfacesUnchanged is
true; change the control flow to only execute the creation+deletion logic when
!nvLinkInterfacesUnchanged (i.e., wrap the entire block that parses
NVLinkLogicalPartitionIDs, calls nvlIfcDAO.Create and UpdateMultiple) and in the
else/no-op branch leave existingNvlIfcs unchanged and set newNvlIfcs (or the
returned NVLink list) to the existingNvlIfcs so the handler returns the current
state without creating Pending rows or marking Deleting.
---
Nitpick comments:
In `@api/pkg/api/handler/instance_test.go`:
- Around line 5664-5691: This test case ("test Instance update API endpoint
success when NVLink interfaces unchanged (no-op, multiset order)") is missing an
explicit DB assertion that existing NVLink interfaces were not marked Deleting;
after calling the handler, query the NVLink interface rows for the instance (use
inst13.ID / NVLinkInterfaces tied to nvllp1/nvllp2 via DeviceInstance) via the
test dbSession and assert each NVLink interface's State (or Status) field is
still the live state (not "Deleting"); add this assertion to the test’s
verify/tear-down block so the handler no-op path verifies DB rows were not
transitioned to Deleting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d5dcb542-360e-4125-974b-9d481aecdfe5
📒 Files selected for processing (2)
api/pkg/api/handler/instance.goapi/pkg/api/handler/instance_test.go
| } | ||
|
|
||
| // Check if any DPU Extension Service Deployments are being requested to be created or removed | ||
| existingDesdMap := map[string]*cdbm.DpuExtensionServiceDeployment{} | ||
| for _, desd := range existingDesds { | ||
| existingDesdMap[fmt.Sprintf("%s:%s", desd.DpuExtensionServiceID.String(), desd.Version)] = &desd | ||
| } | ||
|
|
||
| updatedDesdMap := map[string]*cdbm.DpuExtensionServiceDeployment{} | ||
|
|
||
| // If DPU extension services were omitted from the request, preserve existing | ||
| // services in the config we send to Site Controller. | ||
| if apiRequest.DpuExtensionServiceDeployments == nil { | ||
| updateDesds = append(updateDesds, existingDesds...) | ||
| } else { | ||
| for _, adesdr := range apiRequest.DpuExtensionServiceDeployments { | ||
| desdID, err := uuid.Parse(adesdr.DpuExtensionServiceID) | ||
| if err != nil { | ||
| return cutil.NewAPIError(http.StatusBadRequest, fmt.Sprintf("Invalid DPU Extension Service ID: %s specified in request", adesdr.DpuExtensionServiceID), nil) | ||
| } | ||
|
|
||
| desvID := fmt.Sprintf("%s:%s", desdID.String(), adesdr.Version) | ||
| // Create new NVLink Interface records in the DB if specified in request | ||
| var newNvlIfcs []cdbm.NVLinkInterface | ||
| nvlIfcDAO := cdbm.NewNVLinkInterfaceDAO(uih.dbSession) | ||
|
|
||
| existingDesd, exists := existingDesdMap[desvID] | ||
| if exists { | ||
| updateDesds = append(updateDesds, *existingDesd) | ||
| updatedDesdMap[desvID] = existingDesd | ||
| } else { | ||
| newDesd, serr := desdDAO.Create(ctx, tx, cdbm.DpuExtensionServiceDeploymentCreateInput{ | ||
| SiteID: site.ID, | ||
| TenantID: tenant.ID, | ||
| InstanceID: instance.ID, | ||
| DpuExtensionServiceID: desdID, | ||
| Version: adesdr.Version, | ||
| Status: cdbm.DpuExtensionServiceDeploymentStatusPending, | ||
| CreatedBy: dbUser.ID, | ||
| }) | ||
| if serr != nil { | ||
| logger.Error().Err(serr).Msg("error creating Instance DpuExtensionServiceDeployment record in DB") | ||
| return cutil.NewAPIError(http.StatusInternalServerError, "Failed to create DPU Extension Service Deployment for Instance, DB error", nil) | ||
| } | ||
| des, _ := desIDMap[desdID] | ||
| newDesd.DpuExtensionService = des | ||
| updateDesds = append(updateDesds, *newDesd) | ||
| updatedDesdMap[desvID] = newDesd | ||
| } | ||
| // OrderAscending is our best-effort to make sure we send Carbide the interfaces in the order it originally received them. so the config doesn't get rejected | ||
| existingNvlIfcs, _, err := nvlIfcDAO.GetAll(ctx, tx, cdbm.NVLinkInterfaceFilterInput{InstanceIDs: []uuid.UUID{instanceID}}, cdbp.PageInput{OrderBy: &cdbp.OrderBy{Field: cdbm.NVLinkInterfaceOrderByCreated, Order: cdbp.OrderAscending}}, nil) | ||
| if err != nil { | ||
| logger.Error().Err(err).Msg("failed to retrieve NVLink Interfaces details for Instance") | ||
| return cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, "Failed to retrieve NVLink interfaces for Instance, DB error", nil) | ||
| } | ||
|
|
||
| nvLinkInterfacesUnchanged := false |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
from pathlib import Path
path = Path("api/pkg/api/handler/instance.go")
lines = path.read_text().splitlines()
for start, end in [(3018, 3042), (3398, 3407)]:
print(f"\n--- {path}:{start}-{end} ---")
for i in range(start, end + 1):
print(f"{i}: {lines[i-1]}")
print("\n--- occurrences of NVLink response-state symbols ---")
for i, line in enumerate(lines, 1):
if any(tok in line for tok in ("newNvlIfcs", "existingNvlIfcs", "nvLinkInterfacesUnchanged")):
print(f"{i}: {line}")
PYRepository: NVIDIA/infra-controller-rest
Length of output: 4075
🏁 Script executed:
#!/bin/bash
python - <<'PY'
from pathlib import Path
path = Path("api/pkg/api/handler/instance.go")
lines = path.read_text().splitlines()
# Show broader context around transaction boundaries
print("--- Transaction entry (around line 2700) ---")
for i in range(2698, 2730):
print(f"{i}: {lines[i-1]}")
print("\n--- Transaction exit (around line 3410) ---")
for i in range(3400, 3415):
print(f"{i}: {lines[i-1]}")
PYRepository: NVIDIA/infra-controller-rest
Length of output: 2392
🏁 Script executed:
#!/bin/bash
python - <<'PY'
from pathlib import Path
path = Path("api/pkg/api/handler/instance.go")
lines = path.read_text().splitlines()
# Look for the closure-ending pattern: likely a }) at the end of the WithTx
# Search from line 3150 onwards for the closing brace of the transaction
print("--- Looking for transaction closure end (line 3100-3300) ---")
for i in range(3100, 3300):
line = lines[i-1]
# Show lines with unmatched closing braces that might end the closure
if i >= 3200 and line.strip() in ["}", "})"] :
print(f"{i}: {line}")
elif i >= 3200 and i <= 3230:
print(f"{i}: {line}")
# Also check around line 3400 to see structure
print("\n--- Around line 3400-3410 (response assembly) ---")
for i in range(3395, 3415):
print(f"{i}: {lines[i-1]}")
PYRepository: NVIDIA/infra-controller-rest
Length of output: 2528
🏁 Script executed:
#!/bin/bash
python - <<'PY'
from pathlib import Path
import re
path = Path("api/pkg/api/handler/instance.go")
lines = path.read_text().splitlines()
# Find where the WithTx closure ends by searching for "}) " pattern
# The closure likely ends with "}) " (closing brace-paren-space)
print("--- Searching for WithTx closure end ---")
for i in range(3300, 3420):
line = lines[i-1]
# Look for the specific pattern of closure end
if re.match(r'^\s*\})\s*$', line):
print(f"FOUND CLOSURE END at {i}: {line}")
# Print surrounding context
for j in range(max(3300, i-5), min(3420, i+10)):
print(f"{j}: {lines[j-1]}")
break
elif i <= 3320 or i >= 3380:
print(f"{i}: {lines[i-1]}")
PYRepository: NVIDIA/infra-controller-rest
Length of output: 796
🏁 Script executed:
#!/bin/bash
python - <<'PY'
from pathlib import Path
path = Path("api/pkg/api/handler/instance.go")
lines = path.read_text().splitlines()
# Find where the WithTx closure ends by looking for }) at start of line
print("--- Searching for WithTx closure end (line 3300+) ---")
for i in range(3300, 3420):
line = lines[i-1]
if line.strip() == '})' or line.rstrip().endswith('})'):
print(f"FOUND CLOSURE END at {i}: {repr(line)}")
# Print surrounding context
for j in range(max(3300, i-3), min(3420, i+5)):
prefix = ">>>" if j == i else " "
print(f"{prefix} {j}: {lines[j-1]}")
break
# Also print some context every 20 lines
if i % 20 == 0:
print(f"{i}: {lines[i-1]}")
PYRepository: NVIDIA/infra-controller-rest
Length of output: 691
Fix critical scope error: inner variable declarations shadow and become inaccessible outside the transaction closure.
Line 3030 redeclares newNvlIfcs, and line 3034 shadows existingNvlIfcs with the := operator inside the closure. Additionally, line 3040 declares nvLinkInterfacesUnchanged locally within the closure. These shadowed variables are then referenced outside the closure at line 3405–3407, where they are either empty or out of scope entirely. This is a compile-time error.
Remove the var newNvlIfcs declaration at line 3030. Reassign to the outer variables directly: change line 3034 to use = instead of := (with appropriate error handling), and move nvLinkInterfacesUnchanged declaration outside the closure (before line 2721) or refactor to avoid out-of-scope references.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/pkg/api/handler/instance.go` around lines 3027 - 3040, The closure is
shadowing outer variables causing them to be inaccessible later: remove the
inner declaration of newNvlIfcs and stop using := for existingNvlIfcs and
nvLinkInterfacesUnchanged inside the transaction closure so they reassign the
outer variables instead; specifically, delete the inner "var newNvlIfcs"
declaration, change the GetAll assignment from ":=" to "=" for existingNvlIfcs
(handling the err as currently done), and move the declaration of
nvLinkInterfacesUnchanged to the outer scope (e.g., before the transaction
block) so the values set by the NVLinkInterfaceDAO/GetAll flow are visible after
the transaction. Ensure NVLinkInterfaceDAO (NewNVLinkInterfaceDAO), GetAll, and
any error handling continue to work with the reassigned variables.
bae89d8 to
728c5b1
Compare
36b6c99 to
94f45cd
Compare
thossain-nv
left a comment
There was a problem hiding this comment.
Let's reduce the grace period to 90s and fix the final suggestions before merging.
| // This interface is already ready, we don't need to re-issue the NVLink interface | ||
| existingReadyNvlIfcsCount++ | ||
| } else { | ||
| if mostRecentNvlIfc.Updated.Before(time.Now().Add(-NVLinkInterfaceStatusSyncGraceWindow)) { |
There was a problem hiding this comment.
This should be mostRecentNvlIfc.Updated.After(time.Now().Add(-NVLinkInterfaceStatusSyncGraceWindow)) { as we want the no-op to happen if the Interfaces were updated during grace period not before.
57a9a2c to
eecca29
Compare
eecca29 to
1cc4217
Compare
Description
Type of Change
Services Affected
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes