Skip to content

OCPCLOUD-2664: Aggregate controller statuses in clusteroperator controller#574

Open
mdbooth wants to merge 4 commits into
openshift:mainfrom
openshift-cloud-team:OCPCLOUD-2664-co-aggregatator-01
Open

OCPCLOUD-2664: Aggregate controller statuses in clusteroperator controller#574
mdbooth wants to merge 4 commits into
openshift:mainfrom
openshift-cloud-team:OCPCLOUD-2664-co-aggregatator-01

Conversation

@mdbooth

@mdbooth mdbooth commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

The revision and installer controllers current write 'SubConditions' to the ClusterOperator prefixed RevisionController and InstallerController respectively, but these conditions are not reflected in the primary conditions published on the cluster-api ClusterOperator object.

Meanwhile the following controllers use ClusterOperatorStatusClient, which writes directly to the ClusterOperator's primary conditions:

  • corecluster
  • infracluster
  • secretsync
  • kubeconfig

This means that the published primary conditions are currently set non-deterministically by whichever of the above 4 controllers reconciled most recently.

This change if the first of a series of PRs to fix this. It rewrites the clusteroperator controller to aggregate the conditions of subcontrollers. However, the situation will not be fixed until all 4 of the above mentioned controllers have also been updated to use operatorstatus.ReconcileResult for condition writing and integrated into the aggregating controller.

Additionally, this change also ensures that we don't publish an updated operator version until the CAPI rollout is complete. We do this by moving the responsibility for writing operator version to the revision controller.

I recommend reviewing this change by commit.

Summary by CodeRabbit

  • Chores

    • Run code generation as part of verify/build so generated code is up-to-date before lint/format/build; stringer added to toolchain.
  • Refactor

    • Consolidated ClusterOperator status aggregation and explicit operator-version tracking for clearer, consistent status updates.
    • Standardized controller result/reason handling with stringified reason values.
  • Tests

    • Reworked tests and test-suite bootstrapping to validate aggregated status conditions and operator-version behavior.
  • Bug Fixes

    • Improved platform validation with explicit handling when infrastructure platform info is missing.
  • Bug Fixes

    • Adjusted degraded/available reason semantics to improve status messaging.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 2, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 2, 2026

Copy link
Copy Markdown

@mdbooth: This pull request references OCPCLOUD-2664 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

  • operatorstatus: add WithUpdateOperatorVersion()
  • operatorstatus: convert Reason to iota type
  • Move version writing to revision controller
  • clusteroperator: aggregate sub-controller conditions

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Introduce a typed Reason enum with stringer, add code generation in the build, refactor operatorstatus to aggregate/merge/write ClusterOperator conditions and optionally update operator versions, export controller ResultGenerator instances, and update controllers and tests to the new status flow.

Changes

Operator Status System Refactoring

Layer / File(s) Summary
Build tooling and generate target
Makefile, go.mod
Added generate Makefile target and golang.org/x/tools/cmd/stringer tool to run go generate ./... before fmt/lint/build.
Typed Reason enum and stringer
pkg/operatorstatus/controller_status.go, pkg/operatorstatus/reason_string.go, pkg/operatorstatus/operator_status.go
Introduce Reason typed enum, generate Reason.String(), update condition creation to accept Reason and write reason.String().
operatorstatus core: result/merge/write and predicates
pkg/operatorstatus/controller_status.go, pkg/operatorstatus/watch_predicates.go
Add operatorVersion to ReconcileResult with WithUpdateOperatorVersion, export MergeConditions, rewrite WriteClusterOperatorStatus to compute/merge/patch conditions and optionally status.versions, and add ClusterOperatorStatusChanged() predicate using semantic deep-equality.
operatorstatus tests & envtest harness
pkg/operatorstatus/controller_status_test.go
Migrate controller-status tests to envtest, add helpers for seeding/SSA checks, switch reason assertions to Reason*.String(), and update MergeConditions tests.
ClusterOperator controller aggregation and write
pkg/controllers/clusteroperator/clusteroperator_controller.go
Reshape controller to use client.Client, aggregate sub-controller Available/Progressing into top-level conditions, compute aggregated reason/message, and write status via apply-configuration patching (including operator version on unsupported platforms).
Main wiring and platform handling
cmd/capi-operator/main.go
Load feature gates before platform computation, return sentinel when infra.Status.PlatformStatus is nil, compute supportedPlatform from feature gates and infra, and wire ClusterOperatorController with Client, ReleaseVersion, and IsUnsupportedPlatform.
ClusterOperator tests and suite updates
pkg/controllers/clusteroperator/*, pkg/controllers/clusteroperator/suite_test.go
Refactor ClusterOperator tests to context-aware/table-driven Ginkgo, add timeouts and kWithCtx helper, move envtest teardown into DeferCleanup, and update manager start/stop wiring to set controller fields directly.
Export controller ResultGenerator and propagate usage
pkg/controllers/installer/installer_controller.go, pkg/controllers/revision/revision_controller.go
Export controller ResultGenerator variables and replace unexported opresult usage across reconcile return paths.
Revision controller tests, secretsync and misc updates
pkg/controllers/revision/*, pkg/controllers/secretsync/secret_sync_controller.go, pkg/controllers/common_consts.go, pkg/test/conditions.go
Extend revision tests to assert operator version present/absent in lifecycle scenarios, change secret-sync reasons to ReasonEphemeralError, remove OperatorVersionKey from common_consts, and make test matcher handle fmt.Stringer.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 3 warnings)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error Error() method at controller_status.go:249 exposes raw error strings to ClusterOperator status conditions visible in logs, risking internal hostname/infrastructure detail disclosure. Sanitize error messages before writing to status conditions to redact sensitive details like hostnames, IPs, and pod names.
Docstring Coverage ⚠️ Warning Docstring coverage is 32.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Revision controller tests have 12 Eventually calls missing WithTimeout. They default to Gomega's 1s timeout instead of explicit constants like clusteroperator tests (defaultEventuallyTimeout=5s). Add WithTimeout(defaultEventuallyTimeout) to all Eventually calls in revision test helpers and tests; define defaultEventuallyTimeout constant in revision suite_test.go.
Microshift Test Compatibility ⚠️ Warning New e2e tests use configv1.Infrastructure/PlatformType (config.openshift.io) and Machine/MachineSet (machine.openshift.io) APIs unavailable on MicroShift, without skip protections. Add [apigroup:config.openshift.io] and [apigroup:machine.openshift.io] tags to test names, or wrap with exutil.IsMicroShiftCluster() guards.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: aggregating controller statuses into the clusteroperator controller, with a direct reference to the Jira ticket.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test names and Go test function names in the PR are stable and deterministic with no dynamic content like pod names, timestamps, UUIDs, or generated identifiers.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No e2e tests were added in this PR. All test file changes are unit tests in pkg/ directory using envtest. The check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed No scheduling constraints added. This PR refactors controller status aggregation logic only, without introducing affinity rules, nodeSelectors, topology spread constraints, or PodDisruptionBudgets.
Ote Binary Stdout Contract ✅ Passed No stdout contract violations found. All process-level logging properly configured to stderr via klog.LogToStderr(true), logger usage, and fmt.Fprintf(os.Stderr,...) in TestMain.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR contains no new Ginkgo e2e tests. All test modifications are in unit tests (pkg/ directories using envtest), not e2e tests. Custom check does not apply.
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons were found in the PR changes.
Container-Privileges ✅ Passed No privileged container settings (privileged, hostPID, hostNetwork, hostIPC, SYS_ADMIN, allowPrivilegeEscalation) were introduced in manifests or Dockerfile.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from damdo and racheljpg June 2, 2026 16:18
@openshift-ci

openshift-ci Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign damdo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/controllers/revision/revision_controller.go (1)

48-57: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix build: ResultGenerator cannot be a const (it’s a function call)

ResultGenerator is declared inside a const block as operatorstatus.ControllerResultGenerator(controllerName), which is not a compile-time constant, so the build will fail. Switch it to a var (same issue appears in pkg/controllers/installer/installer_controller.go too).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/revision/revision_controller.go` around lines 48 - 57, The
declaration of ResultGenerator inside the const block is invalid because
operatorstatus.ControllerResultGenerator(controllerName) is not a compile-time
constant; change ResultGenerator from a const to a var (move it out of the const
block and declare: var ResultGenerator =
operatorstatus.ControllerResultGenerator(controllerName)) so the function call
is allowed at init time; apply the same fix for the identical declaration in
pkg/controllers/installer/installer_controller.go referencing the Installer
controller name.
🧹 Nitpick comments (1)
pkg/controllers/revision/revision_controller_test.go (1)

353-407: ⚡ Quick win

Align the new Ginkgo specs with the repo’s test style.

These new It(...) titles should use the should ... form, and the phase comments are better expressed with By(...) so the spec output stays structured.

As per coding guidelines, "Use descriptive test names in 'should...' format in Ginkgo tests" and "Use By() for test steps to document test phases in Ginkgo tests".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/revision/revision_controller_test.go` around lines 353 - 407,
Rename the two Ginkgo It specs to use "should ..." phrasing (e.g., change
It("sets operator version when current revision matches latest", ...) to
It("should set operator version when current revision matches latest", ...) and
change the other to "should correct stale operator version when current revision
matches latest"). Replace inline phase comments with By(...) calls to document
steps: convert comments like "// BeforeEach created rev1. Set CurrentRevision to
match." and "// Seed an incorrect operator version", "// Trigger a reconcile",
"// Wait for the version to appear"/"// Wait for the version to be corrected"
into By("...") calls placed immediately before the code that performs each phase
(use the existing identifiers clusterAPI, co, coKey,
operatorstatus.OperatorVersionKey, kWithCtx, cl, metav1.SetMetaDataAnnotation to
find the blocks).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/capi-operator/main.go`:
- Around line 124-130: The code reads infra.Status.PlatformStatus.Type via
util.IsCAPIEnabledForPlatform(...) without guarding that PlatformStatus may be
nil, which can panic at startup; before computing supportedPlatform call the
existing validation/guard that ensures infra.Status.PlatformStatus is non-nil
(or explicitly check infra.Status.PlatformStatus != nil) and only then pass
PlatformStatus.Type to util.IsCAPIEnabledForPlatform, otherwise treat the
platform as unsupported; update the sequence around the
ClusterOperatorController construction (the IsUnsupportedPlatform value) so
supportedPlatform is derived after the nil-check/validation.

In `@pkg/controllers/clusteroperator/clusteroperator_controller.go`:
- Around line 196-197: The user-facing Available message currently says "Cluster
CAPI Operator"; update it to use the user-facing term "Cluster API" instead of
"CAPI". Locate the availableMessage string construction (availableMessage :=
fmt.Sprintf("Cluster CAPI Operator is available at %s", r.ReleaseVersion)) and
change the literal to "Cluster API Operator is available at %s" so the appended
condition (condition(configv1.OperatorAvailable, configv1.ConditionTrue,
operatorstatus.ReasonAsExpected, availableMessage)) reports the correct
user-facing wording.

In `@pkg/operatorstatus/controller_status.go`:
- Around line 340-344: The code appends a new operator version with
status.WithVersions which can duplicate an existing entry; before calling
WithVersions, filter the current status' Versions to remove any entry whose Name
== OperatorVersionKey, then call WithVersions only once with the filtered slice
plus the new OperandVersion configured with OperatorVersionKey and
r.operatorVersion; reference the existing symbols status, r.operatorVersion,
WithVersions, and OperatorVersionKey when implementing this replacement so the
old operator entry is removed before adding the updated one.

---

Outside diff comments:
In `@pkg/controllers/revision/revision_controller.go`:
- Around line 48-57: The declaration of ResultGenerator inside the const block
is invalid because operatorstatus.ControllerResultGenerator(controllerName) is
not a compile-time constant; change ResultGenerator from a const to a var (move
it out of the const block and declare: var ResultGenerator =
operatorstatus.ControllerResultGenerator(controllerName)) so the function call
is allowed at init time; apply the same fix for the identical declaration in
pkg/controllers/installer/installer_controller.go referencing the Installer
controller name.

---

Nitpick comments:
In `@pkg/controllers/revision/revision_controller_test.go`:
- Around line 353-407: Rename the two Ginkgo It specs to use "should ..."
phrasing (e.g., change It("sets operator version when current revision matches
latest", ...) to It("should set operator version when current revision matches
latest", ...) and change the other to "should correct stale operator version
when current revision matches latest"). Replace inline phase comments with
By(...) calls to document steps: convert comments like "// BeforeEach created
rev1. Set CurrentRevision to match." and "// Seed an incorrect operator
version", "// Trigger a reconcile", "// Wait for the version to appear"/"// Wait
for the version to be corrected" into By("...") calls placed immediately before
the code that performs each phase (use the existing identifiers clusterAPI, co,
coKey, operatorstatus.OperatorVersionKey, kWithCtx, cl,
metav1.SetMetaDataAnnotation to find the blocks).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 0ee08c70-9976-436d-a70c-d7173fe61749

📥 Commits

Reviewing files that changed from the base of the PR and between 3f1aa11 and aebb6d0.

⛔ Files ignored due to path filters (2)
  • vendor/golang.org/x/tools/cmd/stringer/stringer.go is excluded by !**/vendor/**, !vendor/**
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (17)
  • Makefile
  • cmd/capi-operator/main.go
  • go.mod
  • pkg/controllers/clusteroperator/clusteroperator_controller.go
  • pkg/controllers/clusteroperator/clusteroperator_controller_test.go
  • pkg/controllers/clusteroperator/suite_test.go
  • pkg/controllers/common_consts.go
  • pkg/controllers/installer/installer_controller.go
  • pkg/controllers/revision/revision_controller.go
  • pkg/controllers/revision/revision_controller_test.go
  • pkg/controllers/secretsync/secret_sync_controller.go
  • pkg/operatorstatus/controller_status.go
  • pkg/operatorstatus/controller_status_test.go
  • pkg/operatorstatus/operator_status.go
  • pkg/operatorstatus/reason_string.go
  • pkg/operatorstatus/watch_predicates.go
  • pkg/test/conditions.go
💤 Files with no reviewable changes (1)
  • pkg/controllers/common_consts.go

Comment thread cmd/capi-operator/main.go
Comment thread pkg/controllers/clusteroperator/clusteroperator_controller.go Outdated
Comment thread pkg/operatorstatus/controller_status.go
mdbooth added 3 commits June 2, 2026 18:58
Add WithUpdateOperatorVersion() to ReconcileResult, allowing controllers
to optionally update the operator version in the ClusterOperator status
when writing their conditions.

Also switches controller_status tests from fake client to envtest for
accurate SSA field ownership testing.
Convert Reason from string constants to an ordered iota type, enabling
severity-based comparison for condition aggregation.

Replaces ReasonSyncFailed with the standardised ReasonEphemeralError.
ClusterOperator version is now written by the revision controller
instead of the clusteroperator controller.
@mdbooth mdbooth force-pushed the OCPCLOUD-2664-co-aggregatator-01 branch from aebb6d0 to c238311 Compare June 2, 2026 18:02

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controllers/clusteroperator/clusteroperator_controller.go`:
- Around line 221-224: The controller currently only uses
operatorstatus.ClusterOperatorConditionsChanged() so Reconcile won't run for
status.Versions-only updates (it owns operatorstatus.OperatorVersionKey),
causing version drift; update the controller builder (the
For(&configv1.ClusterOperator{})/Named(controllerName) chain) to include an
additional predicate that also enqueues on status.Versions changes or when the
OperatorVersionKey value differs (e.g., add a predicate.Funcs with an UpdateFunc
that compares old.Status.Versions vs new.Status.Versions and/or extracts
operatorstatus.OperatorVersionKey to trigger a requeue), so Reconcile runs for
version-only updates as well.
- Around line 70-75: The current logic only calls writeStatus when
operatorstatus.MergeConditions reports changes, which misses updates to
co.Status.Versions; add a small helper (e.g., getOperandVersionByName or
readOperandVersion) to lookup the current operand entry in co.Status.Versions by
name and compare its Version to the desired operand version, and then update the
reconciliation block so that if either MergeConditions(updated)==true OR the
operand version comparison shows a delta you call writeStatus(ctx, co,
conditions) to persist the new Versions entry; reference MergeConditions,
writeStatus, and co.Status.Versions when locating where to add the helper and
the extra comparison.
- Around line 183-190: Filter subcontrollerStatuses to only those whose
progressing condition is true before selecting the max reason: build a slice
(e.g., progressingStatuses) containing entries where entry.progressing.status ==
configv1.ConditionTrue, then use slices.MaxFunc(progressingStatuses, ...) to
compute priorityProgressing; only append the OperatorProgressing condition using
priorityProgressing when progressingStatuses is non-empty (keep existing
isProgressing check) so non-progressing controllers' reasons are not considered;
update references to priorityProgressing accordingly and ensure no
nil/empty-slice access occurs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 4661e31d-64e4-47e6-9ddf-49d03b14f257

📥 Commits

Reviewing files that changed from the base of the PR and between aebb6d0 and c238311.

⛔ Files ignored due to path filters (2)
  • vendor/golang.org/x/tools/cmd/stringer/stringer.go is excluded by !**/vendor/**, !vendor/**
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (17)
  • Makefile
  • cmd/capi-operator/main.go
  • go.mod
  • pkg/controllers/clusteroperator/clusteroperator_controller.go
  • pkg/controllers/clusteroperator/clusteroperator_controller_test.go
  • pkg/controllers/clusteroperator/suite_test.go
  • pkg/controllers/common_consts.go
  • pkg/controllers/installer/installer_controller.go
  • pkg/controllers/revision/revision_controller.go
  • pkg/controllers/revision/revision_controller_test.go
  • pkg/controllers/secretsync/secret_sync_controller.go
  • pkg/operatorstatus/controller_status.go
  • pkg/operatorstatus/controller_status_test.go
  • pkg/operatorstatus/operator_status.go
  • pkg/operatorstatus/reason_string.go
  • pkg/operatorstatus/watch_predicates.go
  • pkg/test/conditions.go
💤 Files with no reviewable changes (1)
  • pkg/controllers/common_consts.go
✅ Files skipped from review due to trivial changes (2)
  • pkg/operatorstatus/reason_string.go
  • go.mod
🚧 Files skipped from review as they are similar to previous changes (12)
  • pkg/operatorstatus/watch_predicates.go
  • cmd/capi-operator/main.go
  • pkg/test/conditions.go
  • pkg/controllers/secretsync/secret_sync_controller.go
  • pkg/controllers/revision/revision_controller_test.go
  • pkg/controllers/clusteroperator/suite_test.go
  • pkg/controllers/revision/revision_controller.go
  • pkg/controllers/installer/installer_controller.go
  • pkg/operatorstatus/operator_status.go
  • Makefile
  • pkg/operatorstatus/controller_status_test.go
  • pkg/operatorstatus/controller_status.go

Comment thread pkg/controllers/clusteroperator/clusteroperator_controller.go Outdated
Comment thread pkg/controllers/clusteroperator/clusteroperator_controller.go Outdated
Comment thread pkg/controllers/clusteroperator/clusteroperator_controller.go
@mdbooth mdbooth force-pushed the OCPCLOUD-2664-co-aggregatator-01 branch from c238311 to a8ab8ad Compare June 2, 2026 18:43
@mdbooth

mdbooth commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

This unit test job took 38 minutes: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-capi-operator/574/pull-ci-openshift-cluster-capi-operator-main-unit/2061871187276861440

Usual based on job history is 10-20 minutes.

TODO: Investigate if this was a one-off. If there's a dramatic performance regression, try to determine the source and fix it.

Update: Re-run took 14 minutes. Looks like a one-off. No investigation required.

@mdbooth

mdbooth commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

/pipeline auto

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification

The pipeline-auto label has been added to this PR. Second-stage tests will be triggered automatically when all first-stage tests pass.

@mdbooth

mdbooth commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

/pipeline run

@mdbooth mdbooth force-pushed the OCPCLOUD-2664-co-aggregatator-01 branch from a8ab8ad to 098f11e Compare June 2, 2026 20:47
@mdbooth

mdbooth commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Pipeline controller died: force pushed a no-op commit to retrigger all jobs.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-disconnected-techpreview
/test e2e-aws-capi-techpreview
/test e2e-aws-capi-techpreview-post-install
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

2 similar comments
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-disconnected-techpreview
/test e2e-aws-capi-techpreview
/test e2e-aws-capi-techpreview-post-install
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-disconnected-techpreview
/test e2e-aws-capi-techpreview
/test e2e-aws-capi-techpreview-post-install
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@mdbooth

mdbooth commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Flake

/test e2e-openstack-capi-techpreview

Comment on lines +199 to +205
priorityProgressing := slices.MaxFunc(subcontrollerStatuses, func(a, b subcontrollerStatus) int {
return cmp.Compare(a.progressing.reason, b.progressing.reason)
})

if isProgressing {
// Report the highest priority progressing reason
newConditions = append(newConditions, condition(configv1.OperatorProgressing, configv1.ConditionTrue, priorityProgressing.progressing.reason, fmt.Sprintf("%s: %s", priorityProgressing.controller, priorityProgressing.progressing.message)))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
priorityProgressing := slices.MaxFunc(subcontrollerStatuses, func(a, b subcontrollerStatus) int {
return cmp.Compare(a.progressing.reason, b.progressing.reason)
})
if isProgressing {
// Report the highest priority progressing reason
newConditions = append(newConditions, condition(configv1.OperatorProgressing, configv1.ConditionTrue, priorityProgressing.progressing.reason, fmt.Sprintf("%s: %s", priorityProgressing.controller, priorityProgressing.progressing.message)))
if isProgressing {
priorityProgressing := slices.MaxFunc(subcontrollerStatuses, func(a, b subcontrollerStatus) int {
return cmp.Compare(a.progressing.reason, b.progressing.reason)
})
// Report the highest priority progressing reason
newConditions = append(newConditions, condition(configv1.OperatorProgressing, configv1.ConditionTrue, priorityProgressing.progressing.reason, fmt.Sprintf("%s: %s", priorityProgressing.controller, priorityProgressing.progressing.message)))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this in an earlier version, but it turns out it's not correct. The reason is that we want to report the highest priority Progressing reason. The top priority is actually a non-retryable error, and that's always accompanied by Progressing=False.

However, now that I look at it again I'm clearly missing some test cases because the progressing=False still has a hardcoded reason 🤦 I need to fix this.

I'll also address the nit.

@stefanonardo

Copy link
Copy Markdown

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2026
@stefanonardo

Copy link
Copy Markdown

ReasonSyncFailed is dead code now

Rewrite ClusterOperatorController to aggregate per-controller
sub-conditions (Available/Progressing) into top-level ClusterOperator
conditions.
@mdbooth mdbooth force-pushed the OCPCLOUD-2664-co-aggregatator-01 branch from 098f11e to 7189785 Compare June 3, 2026 13:20
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/controllers/clusteroperator/clusteroperator_controller_test.go (1)

59-69: ⚡ Quick win

Seed and assert Status.Versions in the supported-platform table.

These cases never populate the revision-owned operator version, so they don't cover the key regression risk here: this controller patches status directly and must not clobber an existing release version while updating conditions. Please seed Status.Versions in the setup and assert it survives each reconciliation.

Suggested test adjustment
 				capiClusterOperator = &configv1.ClusterOperator{
 					ObjectMeta: metav1.ObjectMeta{
 						Name: controllers.ClusterOperatorName,
 					},
 				}
 				Expect(cl.Create(ctx, capiClusterOperator)).To(Succeed())
 				DeferCleanup(func(ctx context.Context) {
 					testutils.CleanupResources(Default, ctx, testEnv.Config, cl, "", &configv1.ClusterOperator{})
 				})
+				capiClusterOperator.Status.Versions = []configv1.OperandVersion{{
+					Name:    operatorstatus.OperatorVersionKey,
+					Version: "previous-release-version",
+				}}
 				Expect(cl.Status().Update(ctx, capiClusterOperator)).To(Succeed())
 			})
@@
 				Eventually(co).
 					WithContext(ctx).
 					WithTimeout(defaultEventuallyTimeout).
 					Should(SatisfyAll(
 						HaveField("Status.Conditions", SatisfyAll(
 							expectedAvailable,
 							expectedProgressing,
 							test.HaveCondition(configv1.OperatorDegraded).WithStatus(configv1.ConditionFalse),
 							test.HaveCondition(configv1.OperatorUpgradeable).WithStatus(configv1.ConditionTrue),
 						)),
+						HaveField("Status.Versions", ContainElement(SatisfyAll(
+							HaveField("Name", Equal(operatorstatus.OperatorVersionKey)),
+							HaveField("Version", Equal("previous-release-version")),
+						))),
 					))

Based on learnings "Add appropriate unit tests for all code changes in Gomega".

Also applies to: 80-92

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/clusteroperator/clusteroperator_controller_test.go` around
lines 59 - 69, The test creates capiClusterOperator but never seeds
Status.Versions, so reconciliation may clobber an existing operator release;
update the setup for capiClusterOperator (the variable used in these It/By
blocks and controllers.ClusterOperatorName) to populate Status.Versions with an
entry for the "operator" (name matching your controller's expected name and a
fake Version string), then after each reconciliation assert that
capiClusterOperator.Status.Versions still contains that operator entry (i.e.,
its Version value was not overwritten) — apply the same seeding and
post-reconcile assertion to the other test block referenced (the similar case at
lines 80-92).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/controllers/clusteroperator/clusteroperator_controller_test.go`:
- Around line 59-69: The test creates capiClusterOperator but never seeds
Status.Versions, so reconciliation may clobber an existing operator release;
update the setup for capiClusterOperator (the variable used in these It/By
blocks and controllers.ClusterOperatorName) to populate Status.Versions with an
entry for the "operator" (name matching your controller's expected name and a
fake Version string), then after each reconciliation assert that
capiClusterOperator.Status.Versions still contains that operator entry (i.e.,
its Version value was not overwritten) — apply the same seeding and
post-reconcile assertion to the other test block referenced (the similar case at
lines 80-92).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b7458e65-ffd0-4ca4-bc61-0ca8bd286a40

📥 Commits

Reviewing files that changed from the base of the PR and between 098f11e and 7189785.

📒 Files selected for processing (9)
  • cmd/capi-operator/main.go
  • pkg/controllers/clusteroperator/clusteroperator_controller.go
  • pkg/controllers/clusteroperator/clusteroperator_controller_test.go
  • pkg/controllers/clusteroperator/suite_test.go
  • pkg/controllers/installer/installer_controller.go
  • pkg/controllers/revision/revision_controller.go
  • pkg/operatorstatus/controller_status.go
  • pkg/operatorstatus/controller_status_test.go
  • pkg/operatorstatus/watch_predicates.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/controllers/clusteroperator/suite_test.go
  • pkg/controllers/installer/installer_controller.go
  • pkg/controllers/revision/revision_controller.go
  • pkg/operatorstatus/controller_status.go
  • pkg/operatorstatus/controller_status_test.go

@stefanonardo

Copy link
Copy Markdown

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 4, 2026
@mdbooth

mdbooth commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

/test build

@mdbooth

mdbooth commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

/pipeline required

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-disconnected-techpreview
/test e2e-aws-capi-techpreview
/test e2e-aws-capi-techpreview-post-install
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@mdbooth

mdbooth commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

/testwith openshift/cluster-capi-operator/main/e2e-aws-ovn-techpreview-upgrade #576

@openshift-ci

openshift-ci Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

@mdbooth: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-techpreview-upgrade 7189785 link true /test e2e-azure-ovn-techpreview-upgrade
ci/prow/e2e-aws-ovn-techpreview-upgrade 7189785 link true /test e2e-aws-ovn-techpreview-upgrade
ci/prow/e2e-openstack-ovn-techpreview 7189785 link false /test e2e-openstack-ovn-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. pipeline-auto

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants