Skip to content

intra-step leasing: Add lease proxy server specs in prowgen#4877

Open
danilo-gemoli wants to merge 3 commits intoopenshift:mainfrom
danilo-gemoli:feat/ci-operator-prowgen/intra-step-lease
Open

intra-step leasing: Add lease proxy server specs in prowgen#4877
danilo-gemoli wants to merge 3 commits intoopenshift:mainfrom
danilo-gemoli:feat/ci-operator-prowgen/intra-step-lease

Conversation

@danilo-gemoli
Copy link
Contributor

@danilo-gemoli danilo-gemoli commented Dec 22, 2025

This is the preparatory work that ci-operator requires in order to run a small lease proxy server.

We are working on the intra-step leasing capability that will allow a step from the ci-operator step registry to acquire lease from boskos.
As of today, ci-operator acquire any leases required by a workflow and then run the test. With the new feature we are about to introduce, a step could also do that by sending requests to ci-operator that will, in turn, forward them to boskos.

A typical scenario looks like so:

  1. ci-operator is supposed to run the e2e-aws-ovn test:
- as: e2e-aws-ovn
  skip_if_only_changed: (^(docs|upi|hack)/)|((^|/)OWNERS(_ALIASES)?$)|((^|/)[A-Z]+\.md$)|(^images/(baremetal|libvirt|openstack))|(^cluster-api/providers/([^a]|a[^w]|aw[^s]).*/)|(/agent/)
  steps:
    cluster_profile: aws-4
    env:
      AWS_INSTALL_USE_MINIMAL_PERMISSIONS: "yes"
      USER_TAGS: |
        keyA valueA
        key2 value2
        keySpecial v@lu3-Special_.:/=+
        keyD valueD spaces
    workflow: openshift-e2e-aws
  1. The test uses a cluster_profile, therefore ci-operator runs a small HTTP server that acts as a lease proxy server, actually forwarding any requests to boskos.
  2. The workflow openshift-e2e-aws executes.
  3. Every step of the workflow has the HTTP_SERVER_IP environment variable set, that it might uses to send requests to the proxy run by ci-operator.
  4. The ipi-install-install step runs and tries to acquire a lease from the proxy.
  5. ci-operator receives the request, forwards it to boskos and finally returns the result back to the ipi-install-install step.
ipi-install-install.sh step
|
HTTP POST `/lease/acquire?type=aws-1-qe-quota-slice&count=1`
|
+----> lease-proxy-server run by `ci-operator`
          |
          HTTP POST `/lease/acquire?type=aws-1-qe-quota-slice&count=1`
          |
          +----> `boskos`

@openshift-ci-robot
Copy link
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@coderabbitai
Copy link

coderabbitai bot commented Dec 22, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds an HTTP server port constant and a PodSpec mutator that injects an HTTP port (8080) and an env var exposing the pod IP; the mutator is applied in prow job builder tests, many test fixtures and generated job YAMLs updated, and a test updated to read UNRESOLVED_CONFIG from env vars.

Changes

Cohort / File(s) Summary
Constants
pkg/api/constant.go
Added exported constant CIOperatorHTTPServerPort = 8080.
Pod spec mutator & helpers
pkg/prowgen/podspec.go
Added unexported addPort, package var smallHTTPServerEnv, and exported HTTPServer() PodSpecMutator to add named port http (8080) and inject HTTP_SERVER_IP (from status.podIP) into the first container.
Prow job builder
pkg/prowgen/jobbase.go
NewProwJobBaseBuilderForTest now applies HTTPServer() when a lease/cluster profile is present (adds port/env to PodSpec in tests).
Test fixtures (prowgen & controller)
pkg/prowgen/testdata/..., pkg/controller/ephemeralcluster/testdata/...
Numerous test fixture YAMLs updated to include env HTTP_SERVER_IP: from status.podIP and container port 8080 named http.
Generated/integration job YAMLs
test/integration/**, test/integration/pj-rehearse/**, test/integration/repo-init/**, test/integration/ci-operator-prowgen/output/**
Many generated and expected CI job YAMLs updated to inject HTTP_SERVER_IP and expose port 8080 across presubmits/periodics/postsubmits.
PRPQR reconciler tests
pkg/controller/prpqr_reconciler/prpqr_reconciler_test.go, pkg/controller/prpqr_reconciler/testdata/...
Test updated to locate UNRESOLVED_CONFIG via EnvVar helper (findUnresolvedConfigEnv); YAML unmarshalling now uses that env var. Associated ProwJob testdata YAMLs also received HTTP_SERVER_IP and port updates.
Other prowgen testdata
pkg/prowgen/testdata/...
Multiple additional prowgen test fixtures updated to include HTTP_SERVER_IP and http port 8080 in container specs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from jmguzik and liangxia December 22, 2025 13:27
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 22, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/prowgen/podspec.go (1)

184-189: Consider adding duplicate check for consistency.

The addPort helper always appends without checking for existing ports, unlike similar helpers (addEnvVar, addVolume, addVolumeMount) which prevent duplicates. While not an issue with current usage, adding a duplicate check would maintain consistency with the established pattern in this file.

🔎 Optional: Add duplicate port check
 func addPort(c *corev1.Container, name string, port int32) {
+	for _, p := range c.Ports {
+		if p.Name == name && p.ContainerPort == port {
+			return
+		}
+	}
 	c.Ports = append(c.Ports, corev1.ContainerPort{
 		Name:          name,
 		ContainerPort: port,
 	})
 }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 9dfdb9e and c12dd13.

📒 Files selected for processing (4)
  • pkg/api/constant.go
  • pkg/prowgen/jobbase.go
  • pkg/prowgen/podspec.go
  • pkg/prowgen/testdata/zz_fixture_TestNewProwJobBaseBuilderForTest_multi_stage_test_with_cluster_profile.yaml
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • pkg/api/constant.go
  • pkg/prowgen/jobbase.go
  • pkg/prowgen/podspec.go
  • pkg/prowgen/testdata/zz_fixture_TestNewProwJobBaseBuilderForTest_multi_stage_test_with_cluster_profile.yaml
🧬 Code graph analysis (2)
pkg/prowgen/jobbase.go (1)
pkg/prowgen/podspec.go (1)
  • LeaseProxyServer (498-504)
pkg/prowgen/podspec.go (1)
pkg/api/constant.go (1)
  • LeaseProxyServerPort (89-89)
🔇 Additional comments (7)
pkg/prowgen/testdata/zz_fixture_TestNewProwJobBaseBuilderForTest_multi_stage_test_with_cluster_profile.yaml (2)

19-23: LGTM!

The environment variable correctly uses the downward API to expose the Pod IP, which is necessary for test steps to locate the lease proxy server.


27-29: LGTM!

The container port configuration correctly exposes port 8082 as "lease-proxy", matching the constant defined in pkg/api/constant.go and used by the LeaseProxyServer() mutator.

pkg/api/constant.go (1)

88-89: LGTM!

The constant definition is clear, follows existing naming conventions, and centralizes the port configuration for the lease proxy server.

pkg/prowgen/jobbase.go (2)

149-149: LGTM!

The LeaseProxyServer() mutator is correctly added when a cluster profile is present, enabling ci-operator to run the lease proxy alongside the lease client configuration.


162-162: LGTM!

The LeaseProxyServer() mutator is consistently added for the MultiStageTestConfiguration case, ensuring both configuration paths have the same lease proxy functionality.

pkg/prowgen/podspec.go (2)

470-478: LGTM!

The environment variable correctly uses the Kubernetes downward API to expose the Pod's IP address, which test steps can use to connect to the lease proxy server.


498-504: LGTM!

The LeaseProxyServer mutator correctly configures the pod specification to expose the lease proxy port and inject the Pod IP environment variable, enabling intra-step leasing functionality.

@danilo-gemoli
Copy link
Contributor Author

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jan 8, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/prowgen/podspec.go (1)

504-510: LGTM! Consider adding a doc comment.

The implementation correctly adds the lease-proxy port and pod IP environment variable. The logic is sound and follows the same pattern as other mutators like LeaseClient().

📝 Optional: Add a doc comment for consistency

For consistency with similar functions like LeaseClient(), consider adding a doc comment:

+// LeaseProxyServer configures ci-operator to run an HTTP lease proxy server
+// that forwards lease requests from workflow steps to Boskos, exposing the
+// pod IP via CI_OPERATOR_POD_IP environment variable
 func LeaseProxyServer() PodSpecMutator {
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 2bb9007 and 63e8431.

📒 Files selected for processing (1)
  • pkg/prowgen/podspec.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • pkg/prowgen/podspec.go
🧬 Code graph analysis (1)
pkg/prowgen/podspec.go (1)
pkg/api/constant.go (1)
  • LeaseProxyServerPort (89-89)
🔇 Additional comments (2)
pkg/prowgen/podspec.go (2)

184-195: LGTM!

The addPort helper follows the established pattern of other helper functions in this file. The deduplication logic correctly prevents conflicts by checking both name and port number, and the implementation is clean and straightforward.


477-484: LGTM!

The environment variable correctly uses the Kubernetes downward API to expose the pod's IP address. The field reference to status.podIP is the standard way to inject pod metadata into containers.

@openshift-ci-robot
Copy link
Contributor

Scheduling required tests:
/test e2e

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test integration-optional-test

@danilo-gemoli danilo-gemoli force-pushed the feat/ci-operator-prowgen/intra-step-lease branch from 63e8431 to 3123d61 Compare February 12, 2026 09:39
@danilo-gemoli
Copy link
Contributor Author

/test e2e

@danilo-gemoli
Copy link
Contributor Author

/retest-required

Copy link
Member

@Prucek Prucek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold once this PR is merged, you would need to prepare a big PR changing all jobs in o/release. So whenever you are ready

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 13, 2026
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 13, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danilo-gemoli, Prucek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [Prucek,danilo-gemoli]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@danilo-gemoli danilo-gemoli force-pushed the feat/ci-operator-prowgen/intra-step-lease branch from 3123d61 to 3b411ab Compare February 13, 2026 10:21
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 13, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2026

New changes are detected. LGTM label has been removed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pkg/controller/prpqr_reconciler/prpqr_reconciler_test.go`:
- Around line 513-519: The test checks for an unresolvedConfigEnv via
findUnresolvedConfigEnv and then immediately dereferences
unresolvedConfigEnv.Value; replace the t.Errorf call with t.Fatalf to abort the
test when unresolvedConfigEnv is nil (so you won't dereference a nil pointer).
Update the assertion on unresolvedConfigEnv in the test to use
t.Fatalf("UNRESOLVED_CONFIG not set on prowjob %s", pj.Spec.Job) (same
message/format) to stop execution, leaving the remainder of the test (reading
unresolvedConfigEnv.Value) unchanged.

Comment on lines 513 to 519
unresolvedConfigEnv := findUnresolvedConfigEnv(items[i].Spec.PodSpec.Containers[0].Env)
if unresolvedConfigEnv == nil {
t.Errorf("UNRESOLVED_CONFIG not set on prowjob %s", pj.Spec.Job)
}

unresolvedConfig := unresolvedConfigEnv.Value

Copy link

@coderabbitai coderabbitai bot Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n pkg/controller/prpqr_reconciler/prpqr_reconciler_test.go | sed -n '500,540p'

Repository: openshift/ci-tools

Length of output: 1496


Use t.Fatalf instead of t.Errorf to stop execution before dereferencing.

Line 515 uses t.Errorf, which logs an error but allows execution to continue. When unresolvedConfigEnv is nil, line 518 will dereference it and panic. Use t.Fatalf to stop execution immediately upon missing UNRESOLVED_CONFIG.

🛠️ Proposed fix
			if unresolvedConfigEnv == nil {
-				t.Errorf("UNRESOLVED_CONFIG not set on prowjob %s", pj.Spec.Job)
+				t.Fatalf("UNRESOLVED_CONFIG not set on prowjob %s", pj.Spec.Job)
			}

			unresolvedConfig := unresolvedConfigEnv.Value
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
unresolvedConfigEnv := findUnresolvedConfigEnv(items[i].Spec.PodSpec.Containers[0].Env)
if unresolvedConfigEnv == nil {
t.Errorf("UNRESOLVED_CONFIG not set on prowjob %s", pj.Spec.Job)
}
unresolvedConfig := unresolvedConfigEnv.Value
unresolvedConfigEnv := findUnresolvedConfigEnv(items[i].Spec.PodSpec.Containers[0].Env)
if unresolvedConfigEnv == nil {
t.Fatalf("UNRESOLVED_CONFIG not set on prowjob %s", pj.Spec.Job)
}
unresolvedConfig := unresolvedConfigEnv.Value
🤖 Prompt for AI Agents
In `@pkg/controller/prpqr_reconciler/prpqr_reconciler_test.go` around lines 513 -
519, The test checks for an unresolvedConfigEnv via findUnresolvedConfigEnv and
then immediately dereferences unresolvedConfigEnv.Value; replace the t.Errorf
call with t.Fatalf to abort the test when unresolvedConfigEnv is nil (so you
won't dereference a nil pointer). Update the assertion on unresolvedConfigEnv in
the test to use t.Fatalf("UNRESOLVED_CONFIG not set on prowjob %s", pj.Spec.Job)
(same message/format) to stop execution, leaving the remainder of the test
(reading unresolvedConfigEnv.Value) unchanged.

Copy link
Contributor Author

@danilo-gemoli danilo-gemoli Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When unresolvedConfigEnv the loop skips the current cycle, preventing, in fact, a nil pointer dereference.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danilo-gemoli, you're absolutely right! I apologize for the confusion. Looking at the full file content, I can see there's a continue statement on line 516 right after the t.Errorf call:

if unresolvedConfigEnv == nil {
    t.Errorf("UNRESOLVED_CONFIG not set on prowjob %s", pj.Spec.Job)
    continue
}

This continue statement indeed skips to the next iteration of the loop, preventing the nil pointer dereference on line 518. The code is safe as written. My initial review comment was based on an incomplete view of the code that didn't show the continue statement. Thank you for the clarification!

@danilo-gemoli
Copy link
Contributor Author

/test unit

@danilo-gemoli danilo-gemoli force-pushed the feat/ci-operator-prowgen/intra-step-lease branch from 3b411ab to af8d7f7 Compare February 13, 2026 13:44
@danilo-gemoli
Copy link
Contributor Author

/test e2e

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2026

@danilo-gemoli: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images af8d7f7 link true /test images
ci/prow/breaking-changes af8d7f7 link false /test breaking-changes

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants