Skip to content

K8SPG-943: check operator panic in e2e test#1441

Merged
egegunes merged 10 commits intomainfrom
K8SPG-943-e2e-test
Mar 4, 2026
Merged

K8SPG-943: check operator panic in e2e test#1441
egegunes merged 10 commits intomainfrom
K8SPG-943-e2e-test

Conversation

@mayankshah1607
Copy link
Member

@mayankshah1607 mayankshah1607 commented Feb 13, 2026

CHANGE DESCRIPTION

Adds a check in e2e tests to ensure no panics are detected.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PG version?
  • Does the change support oldest and newest supported Kubernetes version?

Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Copilot AI review requested due to automatic review settings February 13, 2026 09:59
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds operator panic detection to E2E test cleanup phases. The change introduces a new check_operator_panic function that searches operator logs for panic messages and calls this function before destroying the operator in test cleanup scripts.

Changes:

  • Added check_operator_panic function to e2e-tests/functions that checks operator logs for "Observed a panic" messages
  • Updated 29 E2E test cleanup files to call check_operator_panic before destroy_operator

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

File Description
e2e-tests/functions Adds new check_operator_panic function to detect panics in operator logs
e2e-tests/tests//99-.yaml Updates 29 test cleanup scripts to check for operator panics before destroying the operator

Comment on lines +40 to +46
check_operator_panic() {
local operator_pod=$(get_operator_pod)
if kubectl logs -n "${OPERATOR_NS:-$NAMESPACE}" "$operator_pod" -c operator | grep -q "Observed a panic"; then
echo "Detected panic in operator"
exit 1
fi
}
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function doesn't verify that the operator pod exists before checking for panics. If get_operator_pod returns an empty string (no operator pod found), the kubectl logs command will fail, but the error will be masked by the pipeline to grep, causing the function to silently succeed without checking for panics.

Add validation that operator_pod is non-empty before attempting to retrieve logs. Consider also handling the case where kubectl logs fails due to the pod not existing or not being ready yet.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 16, 2026 08:06
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

@egegunes egegunes added this to the v2.9.0 milestone Feb 19, 2026
Copilot AI review requested due to automatic review settings March 2, 2026 12:10
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated no new comments.

check_operator_panic() {
local operator_pod=$(get_operator_pod)
if kubectl logs -n "${OPERATOR_NS:-$NAMESPACE}" "$operator_pod" -c operator | grep -q "Observed a panic"; then
echo "Detected panic in operator"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detected panic but what is the panic? grep -q will suppress any output. we need to print the panic and the stacktrace if possible

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 2 comments.

Comment on lines +40 to +49
check_operator_panic() {
local operator_pod=$(get_operator_pod)
local panic_log
panic_log=$(kubectl logs -n "${OPERATOR_NS:-$NAMESPACE}" "$operator_pod" -c operator | grep -A 100 "Observed a panic")
if [ -n "$panic_log" ]; then
echo "Detected panic in operator:"
echo "$panic_log"
exit 1
fi
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Problem: Several test cleanup files in this repository were not updated with check_operator_panic while all other similar test directories were. The following cleanup files are missing the new check_operator_panic call that was added to 29 other test suites:
  • e2e-tests/tests/backup-enable-disable/99-remove-cluster-gracefully.yaml
  • e2e-tests/tests/cert-manager-tls/99-remove-cluster-gracefully.yaml
  • e2e-tests/tests/demand-backup-offline-snapshot/99-cleanup.yaml
  • e2e-tests/tests/dynamic-configuration/99-remove-cluster-gracefully.yaml
  • e2e-tests/tests/pg-tde/99-remove-cluster-gracefully.yaml
  • e2e-tests/tests/backup-enable-disable/98-remove-datasource-cluster-gracefully.yaml
  1. Why it matters: These test suites will not detect operator panics, creating inconsistent coverage.

  2. Fix: Add check_operator_panic before destroy_operator in each of these cleanup files, matching the pattern used in all other test suites in this PR.

Copilot uses AI. Check for mistakes.
check_operator_panic() {
local operator_pod=$(get_operator_pod)
local panic_log
panic_log=$(kubectl logs -n "${OPERATOR_NS:-$NAMESPACE}" "$operator_pod" -c operator | grep -A 100 "Observed a panic")
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Problem: The grep -A 100 hard limit may silently truncate panic output if a stack trace exceeds 100 lines, which is common in Go panics with deep call stacks or concurrent goroutine dumps.

  2. Why it matters: A truncated panic log could make it harder to diagnose the root cause of the panic, and any goroutine dump lines beyond line 100 would be lost.

  3. Fix: Use a larger context value (e.g., -A 500) or remove the limit entirely by piping through awk '/Observed a panic/{found=1} found{print}' to capture everything from the match to the end of the log.

Suggested change
panic_log=$(kubectl logs -n "${OPERATOR_NS:-$NAMESPACE}" "$operator_pod" -c operator | grep -A 100 "Observed a panic")
panic_log=$(kubectl logs -n "${OPERATOR_NS:-$NAMESPACE}" "$operator_pod" -c operator | awk '/Observed a panic/{found=1} found{print}')

Copilot uses AI. Check for mistakes.
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Copilot AI review requested due to automatic review settings March 3, 2026 13:03
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated no new comments.

@JNKPercona
Copy link
Collaborator

Test Name Result Time
backup-enable-disable passed 00:00:00
builtin-extensions passed 00:00:00
cert-manager-tls passed 00:00:00
custom-envs passed 00:00:00
custom-extensions passed 00:00:00
custom-tls passed 00:00:00
database-init-sql passed 00:00:00
demand-backup passed 00:00:00
demand-backup-offline-snapshot passed 00:00:00
dynamic-configuration passed 00:00:00
finalizers passed 00:00:00
init-deploy passed 00:00:00
huge-pages passed 00:00:00
monitoring passed 00:00:00
monitoring-pmm3 passed 00:00:00
one-pod passed 00:00:00
operator-self-healing passed 00:11:15
pg-tde passed 00:00:00
pitr passed 00:00:00
scaling passed 00:00:00
scheduled-backup passed 00:00:00
self-healing passed 00:00:00
sidecars passed 00:00:00
standby-pgbackrest passed 00:00:00
standby-streaming passed 00:00:00
start-from-backup passed 00:00:00
tablespaces passed 00:00:00
telemetry-transfer passed 00:00:00
upgrade-consistency passed 00:00:00
upgrade-minor passed 00:00:00
users passed 00:00:00
Summary Value
Tests Run 31/31
Job Duration 00:26:15
Total Test Time 00:11:15

commit: ecd71ad
image: perconalab/percona-postgresql-operator:PR-1441-ecd71ad51

@egegunes egegunes merged commit 57b09c1 into main Mar 4, 2026
16 checks passed
@egegunes egegunes deleted the K8SPG-943-e2e-test branch March 4, 2026 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants