CNTRLPLANE-2625:Add Auth network policy e2e cases#831
CNTRLPLANE-2625:Add Auth network policy e2e cases#831gangwgr wants to merge 8 commits intoopenshift:masterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds two new Ginkgo-based end-to-end test suites that validate, reconcile, and enforce Kubernetes NetworkPolicy behavior for the authentication operator and OAuth API server; promotes Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gangwgr The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
d51d10b to
68ac2ee
Compare
68ac2ee to
6c9a415
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@test/e2e/network_policy.go`:
- Around line 466-477: In deleteAndWaitForNetworkPolicy, capture the
NetworkPolicy's original UID via
kubeClient.NetworkingV1().NetworkPolicies(namespace).Get(...) before deleting,
then perform the Delete and poll until either Get returns NotFound or the
returned object's UID differs from the original UID to ensure the
deletion/reconciliation was observed; also surface any error returned by Delete
instead of discarding it. Use the unique function name
deleteAndWaitForNetworkPolicy and references to
NetworkingV1().NetworkPolicies(...).Get/Delete when locating where to add the
UID capture and adjust the wait condition.
- Around line 82-100: The test can match pre-existing events because
waitForNetworkPolicyEvent filters only by reason/message; fix by capturing a
"notBefore" timestamp immediately before calling deleteAndWaitForNetworkPolicy
in testNetworkPolicyEvents and modify waitForNetworkPolicyEvent to accept a
metav1.Time (or time.Time) notBefore parameter and ignore events with
EventTime/LastTimestamp <= notBefore, then update both calls in
testNetworkPolicyEvents to pass that notBefore value; ensure the event
comparison uses the proper Kubernetes event timestamp fields (EventTime if set,
otherwise LastTimestamp) when filtering.
test/e2e/network_policy.go
Outdated
| func testNetworkPolicyEvents() { | ||
| kubeConfig := e2e.NewClientConfigForTest(g.GinkgoTB()) | ||
| kubeClient, err := kubernetes.NewForConfig(kubeConfig) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| configClient, err := configclient.NewForConfig(kubeConfig) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
|
||
| g.By("Waiting for authentication clusteroperator to be stable") | ||
| err = e2e.WaitForClusterOperatorAvailableNotProgressingNotDegraded(g.GinkgoTB(), configClient.ConfigV1(), "authentication") | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
|
||
| g.By("Deleting oauth-server NetworkPolicy and verifying event emission") | ||
| deleteAndWaitForNetworkPolicy(kubeClient, "openshift-authentication", "oauth-server-networkpolicy") | ||
| waitForNetworkPolicyEvent(kubeClient, "openshift-authentication-operator", "NetworkPolicyCreated", "openshift-authentication", "oauth-server-networkpolicy") | ||
|
|
||
| g.By("Deleting oauth-apiserver NetworkPolicy and verifying event emission") | ||
| deleteAndWaitForNetworkPolicy(kubeClient, "openshift-oauth-apiserver", "oauth-apiserver-networkpolicy") | ||
| waitForNetworkPolicyEvent(kubeClient, "openshift-authentication-operator", "NetworkPolicyCreated", "openshift-oauth-apiserver", "oauth-apiserver-networkpolicy") | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd test/e2e && wc -l network_policy.goRepository: openshift/cluster-authentication-operator
Length of output: 106
🏁 Script executed:
cd test/e2e && cat -n network_policy.go | head -150Repository: openshift/cluster-authentication-operator
Length of output: 7674
🏁 Script executed:
# Search for waitForNetworkPolicyEvent function definition
rg -n "func waitForNetworkPolicyEvent" test/e2e/Repository: openshift/cluster-authentication-operator
Length of output: 241
🏁 Script executed:
# Search for how events are being listed/queried
rg -n "ListOptions\|FieldSelector\|creationTimestamp" test/e2e/network_policy.go -A 5 -B 2Repository: openshift/cluster-authentication-operator
Length of output: 67
🏁 Script executed:
cd test/e2e && sed -n '492,546p' network_policy.goRepository: openshift/cluster-authentication-operator
Length of output: 1839
🏁 Script executed:
# Verify event.CreationTimestamp field exists in corev1.Event
rg -A 5 "type Event struct" --type goRepository: openshift/cluster-authentication-operator
Length of output: 4858
🏁 Script executed:
# Look at corev1.Event structure more carefully for CreationTimestamp field
rg -A 30 "^type Event struct" vendor/k8s.io/api/core/v1/types.go | head -40Repository: openshift/cluster-authentication-operator
Length of output: 1559
🏁 Script executed:
# Check if metav1.ObjectMeta has CreationTimestamp field
rg -A 20 "^type ObjectMeta struct" vendor/k8s.io/apimachinery/pkg/apis/meta/v1/types.go | head -25Repository: openshift/cluster-authentication-operator
Length of output: 1207
🏁 Script executed:
# Search for CreationTimestamp field in ObjectMeta
rg "CreationTimestamp" vendor/k8s.io/apimachinery/pkg/apis/meta/v1/types.go -A 2 -B 2Repository: openshift/cluster-authentication-operator
Length of output: 776
Guard against matching pre-existing events.
Listing all events without a "not-before" boundary can match older NetworkPolicyCreated events and yield false positives. The waitForNetworkPolicyEvent function at line 492 filters only by reason and message content, with no timestamp checking. Capture a timestamp before deletion and require events created after it.
Proposed fix
g.By("Deleting oauth-server NetworkPolicy and verifying event emission")
-deleteAndWaitForNetworkPolicy(kubeClient, "openshift-authentication", "oauth-server-networkpolicy")
-waitForNetworkPolicyEvent(kubeClient, "openshift-authentication-operator", "NetworkPolicyCreated", "openshift-authentication", "oauth-server-networkpolicy")
+start := time.Now()
+deleteAndWaitForNetworkPolicy(kubeClient, "openshift-authentication", "oauth-server-networkpolicy")
+waitForNetworkPolicyEvent(kubeClient, "openshift-authentication-operator", "NetworkPolicyCreated", "openshift-authentication", "oauth-server-networkpolicy", start)
g.By("Deleting oauth-apiserver NetworkPolicy and verifying event emission")
-deleteAndWaitForNetworkPolicy(kubeClient, "openshift-oauth-apiserver", "oauth-apiserver-networkpolicy")
-waitForNetworkPolicyEvent(kubeClient, "openshift-authentication-operator", "NetworkPolicyCreated", "openshift-oauth-apiserver", "oauth-apiserver-networkpolicy")
+start = time.Now()
+deleteAndWaitForNetworkPolicy(kubeClient, "openshift-oauth-apiserver", "oauth-apiserver-networkpolicy")
+waitForNetworkPolicyEvent(kubeClient, "openshift-authentication-operator", "NetworkPolicyCreated", "openshift-oauth-apiserver", "oauth-apiserver-networkpolicy", start)-func waitForNetworkPolicyEvent(kubeClient kubernetes.Interface, eventNamespace, reason, policyNamespace, policyName string) {
+func waitForNetworkPolicyEvent(kubeClient kubernetes.Interface, eventNamespace, reason, policyNamespace, policyName string, notBefore time.Time) {
g.GinkgoHelper()
g.GinkgoWriter.Printf("waiting for event %s in %s for networkpolicy %s/%s\n", reason, eventNamespace, policyNamespace, policyName)
@@ -504,7 +504,8 @@ func waitForNetworkPolicyEvent(kubeClient kubernetes.Interface, eventNamespace,
for _, event := range events.Items {
if event.Reason != reason {
continue
}
- if strings.Contains(event.Message, policyName) && strings.Contains(event.Message, policyNamespace) {
+ if strings.Contains(event.Message, policyName) && strings.Contains(event.Message, policyNamespace) &&
+ event.CreationTimestamp.Time.After(notBefore) {
return true, nil
}
}🤖 Prompt for AI Agents
In `@test/e2e/network_policy.go` around lines 82 - 100, The test can match
pre-existing events because waitForNetworkPolicyEvent filters only by
reason/message; fix by capturing a "notBefore" timestamp immediately before
calling deleteAndWaitForNetworkPolicy in testNetworkPolicyEvents and modify
waitForNetworkPolicyEvent to accept a metav1.Time (or time.Time) notBefore
parameter and ignore events with EventTime/LastTimestamp <= notBefore, then
update both calls in testNetworkPolicyEvents to pass that notBefore value;
ensure the event comparison uses the proper Kubernetes event timestamp fields
(EventTime if set, otherwise LastTimestamp) when filtering.
test/e2e/network_policy.go
Outdated
| func deleteAndWaitForNetworkPolicy(kubeClient kubernetes.Interface, namespace, name string) { | ||
| g.GinkgoHelper() | ||
|
|
||
| g.GinkgoWriter.Printf("deleting networkpolicy %s/%s\n", namespace, name) | ||
| _ = kubeClient.NetworkingV1().NetworkPolicies(namespace).Delete(context.TODO(), name, metav1.DeleteOptions{}) | ||
| err := wait.PollImmediate(2*time.Second, 10*time.Minute, func() (bool, error) { | ||
| _, err := kubeClient.NetworkingV1().NetworkPolicies(namespace).Get(context.TODO(), name, metav1.GetOptions{}) | ||
| return err == nil, nil | ||
| }) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| g.GinkgoWriter.Printf("networkpolicy %s/%s restored\n", namespace, name) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's verify the file exists and check the code at lines 466-477
wc -l test/e2e/network_policy.goRepository: openshift/cluster-authentication-operator
Length of output: 115
🏁 Script executed:
# Read the specific lines to confirm they match the snippet
sed -n '466,477p' test/e2e/network_policy.goRepository: openshift/cluster-authentication-operator
Length of output: 719
🏁 Script executed:
# Look for similar delete/wait patterns in the codebase to see how they handle reconciliation
rg -A 8 "wait\.PollImmediate.*Delete" test/ --type goRepository: openshift/cluster-authentication-operator
Length of output: 67
🏁 Script executed:
# Search for other instances of this function or similar patterns checking UIDs
rg "\.UID" test/e2e/ --type go -B 2 -A 2Repository: openshift/cluster-authentication-operator
Length of output: 1023
Wait for actual deletion by tracking UID changes.
The poll exits when Get succeeds, but the resource may not have been deleted yet. If the original policy still exists or is immediately recreated, Get succeeds without confirming reconciliation occurred. Store the original UID before deletion and wait for it to change to ensure the delete was actually observed.
Proposed fix
g.GinkgoWriter.Printf("deleting networkpolicy %s/%s\n", namespace, name)
-_ = kubeClient.NetworkingV1().NetworkPolicies(namespace).Delete(context.TODO(), name, metav1.DeleteOptions{})
-err := wait.PollImmediate(2*time.Second, 10*time.Minute, func() (bool, error) {
- _, err := kubeClient.NetworkingV1().NetworkPolicies(namespace).Get(context.TODO(), name, metav1.GetOptions{})
- return err == nil, nil
-})
+original, err := kubeClient.NetworkingV1().NetworkPolicies(namespace).Get(context.TODO(), name, metav1.GetOptions{})
+o.Expect(err).NotTo(o.HaveOccurred())
+originalUID := original.UID
+
+_ = kubeClient.NetworkingV1().NetworkPolicies(namespace).Delete(context.TODO(), name, metav1.DeleteOptions{})
+err = wait.PollImmediate(2*time.Second, 10*time.Minute, func() (bool, error) {
+ policy, err := kubeClient.NetworkingV1().NetworkPolicies(namespace).Get(context.TODO(), name, metav1.GetOptions{})
+ if err != nil {
+ return false, nil
+ }
+ return policy.UID != originalUID, nil
+})🤖 Prompt for AI Agents
In `@test/e2e/network_policy.go` around lines 466 - 477, In
deleteAndWaitForNetworkPolicy, capture the NetworkPolicy's original UID via
kubeClient.NetworkingV1().NetworkPolicies(namespace).Get(...) before deleting,
then perform the Delete and poll until either Get returns NotFound or the
returned object's UID differs from the original UID to ensure the
deletion/reconciliation was observed; also surface any error returned by Delete
instead of discarding it. Use the unique function name
deleteAndWaitForNetworkPolicy and references to
NetworkingV1().NetworkPolicies(...).Get/Delete when locating where to add the
UID capture and adjust the wait condition.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@test/e2e/network_policy_enforcement.go`:
- Around line 294-299: The code reads
completed.Status.ContainerStatuses[0].State.Terminated.ExitCode without checking
for a nil Terminated and can panic; update the pod-completion check (the block
that inspects completed.Status.ContainerStatuses in the function that calls
waitForPodCompletion) to verify completed.Status.ContainerStatuses is non-empty
and that completed.Status.ContainerStatuses[0].State.Terminated != nil before
accessing ExitCode, and if Terminated is nil return a descriptive error (e.g.,
"no terminated state recorded for pod <name>") or retry/handle as appropriate so
you never dereference a nil Terminated.
🧹 Nitpick comments (1)
test/e2e/network_policy_enforcement.go (1)
302-328: Consider consolidating wait helpers.
waitForPodReadyandwaitForPodCompletionare similar to helpers innetwork_policy.go. Consider extracting these totest/library/waits.goto reduce duplication across test files.
| if len(completed.Status.ContainerStatuses) == 0 { | ||
| return false, fmt.Errorf("no container status recorded for pod %s", name) | ||
| } | ||
| exitCode := completed.Status.ContainerStatuses[0].State.Terminated.ExitCode | ||
| g.GinkgoWriter.Printf("client pod %s/%s exitCode=%d\n", namespace, name, exitCode) | ||
| return exitCode == 0, nil |
There was a problem hiding this comment.
Guard against nil Terminated state.
After waitForPodCompletion, the container's State.Terminated field might still be nil in edge cases (e.g., race with kubelet status reporting). Accessing Terminated.ExitCode without a nil check will panic.
Proposed fix
if len(completed.Status.ContainerStatuses) == 0 {
return false, fmt.Errorf("no container status recorded for pod %s", name)
}
+ if completed.Status.ContainerStatuses[0].State.Terminated == nil {
+ return false, fmt.Errorf("container state not terminated for pod %s", name)
+ }
exitCode := completed.Status.ContainerStatuses[0].State.Terminated.ExitCode📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if len(completed.Status.ContainerStatuses) == 0 { | |
| return false, fmt.Errorf("no container status recorded for pod %s", name) | |
| } | |
| exitCode := completed.Status.ContainerStatuses[0].State.Terminated.ExitCode | |
| g.GinkgoWriter.Printf("client pod %s/%s exitCode=%d\n", namespace, name, exitCode) | |
| return exitCode == 0, nil | |
| if len(completed.Status.ContainerStatuses) == 0 { | |
| return false, fmt.Errorf("no container status recorded for pod %s", name) | |
| } | |
| if completed.Status.ContainerStatuses[0].State.Terminated == nil { | |
| return false, fmt.Errorf("container state not terminated for pod %s", name) | |
| } | |
| exitCode := completed.Status.ContainerStatuses[0].State.Terminated.ExitCode | |
| g.GinkgoWriter.Printf("client pod %s/%s exitCode=%d\n", namespace, name, exitCode) | |
| return exitCode == 0, nil |
🤖 Prompt for AI Agents
In `@test/e2e/network_policy_enforcement.go` around lines 294 - 299, The code
reads completed.Status.ContainerStatuses[0].State.Terminated.ExitCode without
checking for a nil Terminated and can panic; update the pod-completion check
(the block that inspects completed.Status.ContainerStatuses in the function that
calls waitForPodCompletion) to verify completed.Status.ContainerStatuses is
non-empty and that completed.Status.ContainerStatuses[0].State.Terminated != nil
before accessing ExitCode, and if Terminated is nil return a descriptive error
(e.g., "no terminated state recorded for pod <name>") or retry/handle as
appropriate so you never dereference a nil Terminated.
6c9a415 to
93d8f5d
Compare
|
/payload 4.22 nightly informing |
|
@gangwgr: trigger 68 job(s) of type informing for the nightly release of OCP 4.22
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9fabeea0-028f-11f1-821b-315b0366a22d-0 trigger 14 job(s) of type blocking for the nightly release of OCP 4.22
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9fabeea0-028f-11f1-821b-315b0366a22d-1 |
|
/retitle CNTRLPLANE-2610: Add Auth network policy e2e cases |
|
@gangwgr: This pull request references CNTRLPLANE-2610 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retitle CNTRLPLANE-2625:Add Auth network policy e2e cases |
|
@gangwgr: This pull request references CNTRLPLANE-2625 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/testwith openshift/cluster-authentication-operator/main/e2e-operator #825 |
|
@gangwgr, |
|
/testwith openshift/cluster-authentication-operator/master/e2e-aws-external-oidc-conformance-serial #825 |
|
/testwith openshift/cluster-kube-apiserver-operator/main/e2e-gcp-operator-serial-ote #825 |
|
/hold |
|
/testwith openshift/cluster-authentication-operator/master/e2e-aws-operator-parallel-ote #825 |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@test/e2e/network_policy.go`:
- Around line 194-204: The hasPort function treats a nil
NetworkPolicyPort.Protocol as a wildcard; change it to treat nil as TCP-only by
checking p.Protocol explicitly: if p.Protocol == nil then only return true when
the requested protocol equals corev1.ProtocolTCP, otherwise when p.Protocol !=
nil compare *p.Protocol == protocol; keep the existing port IntValue check and
the function name hasPort to locate the change.
20e5246 to
73a63d8
Compare
|
/testwith openshift/cluster-authentication-operator/master/e2e-aws-operator-parallel-ote #825 |
73a63d8 to
7919d32
Compare
|
/testwith openshift/cluster-authentication-operator/master/e2e-aws-operator-serial-ote #825 |
36b2a86 to
5c91957
Compare
|
/testwith openshift/cluster-authentication-operator/master/e2e-aws-operator-serial-ote #825 |
bf9a1dd to
4db2ed2
Compare
|
/testwith openshift/cluster-authentication-operator/master/e2e-aws-operator-serial-ote #825 |
|
@gangwgr, |
|
/testwith openshift/cluster-authentication-operator/master/e2e-aws-operator-serial-ote #825 |
4db2ed2 to
47c7ce9
Compare
|
/testwith openshift/cluster-authentication-operator/master/e2e-gcp-operator-disruptive #825 |
|
@gangwgr, |
|
/testwith openshift/cluster-authentication-operator/master/e2e-gcp-operator-disruptive #825 |
|
@gangwgr, |
|
/testwith openshift/cluster-authentication-operator/master/e2e-gcp-operator-disruptive #825 |
|
@gangwgr, |
|
/testwith openshift/cluster-authentication-operator/master/e2e-gcp-operator-disruptive #825 |
47c7ce9 to
6ac4a5b
Compare
|
/testwith openshift/cluster-authentication-operator/master/e2e-gcp-operator-disruptive #825 |
|
@gangwgr: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Auth network policy e2e cases