feat(ibmcloud): add GitHub Actions runner support for IBM Power and IBM Z#831
feat(ibmcloud): add GitHub Actions runner support for IBM Power and IBM Z#831deekay2310 wants to merge 17 commits into
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds GitHub Actions self-hosted runner provisioning for IBM Cloud PowerVS ( ChangesGitHub Actions Runner Multiarch Support on IBM Cloud
Sequence Diagram(s)sequenceDiagram
participant CLI as mapt ibm-power/ibm-z create
participant params as params.GithubRunnerArgs()
participant api as github.GenerateRegistrationToken
participant ghAPI as api.github.com
participant provider as ibmpower.New / ibmz.New
participant icdata as GetAvailableSystemTypes
participant deploy as deploy()
participant snippet as GetSetupScriptTemplate(archSnippets)
participant cloudInit as cloud-config template
CLI->>CLI: set ghRunnerArgs.Arch = Ppc64le | S390x
CLI->>params: GithubRunnerArgs()
alt --ghactions-runner-token absent, GITHUB_TOKEN present
params->>api: GenerateRegistrationToken(pat, repoURL)
api->>ghAPI: POST /repos/{owner}/{repo}/actions/runners/registration-token
ghAPI-->>api: 201 {token}
api-->>params: token
end
params-->>CLI: GithubRunnerArgs{Arch, RunnerImageRepo, Token, ...}
CLI->>provider: New(ContextArgs{GHRunnerArgs})
provider->>icdata: GetAvailableSystemTypes(preferred)
icdata-->>provider: SystemTypeResult{Types}
loop each sysType
provider->>deploy: deploy(sysType)
deploy->>snippet: GetSetupScriptTemplate() → arch-specific script
snippet-->>deploy: ppc64le or s390x Bash snippet
deploy->>cloudInit: render piUserData/izUserData with ghRunnerScript
cloudInit-->>deploy: base64 cloud-init user-data
alt capacity error
deploy-->>provider: error
provider->>provider: destroy partial stack, retry next sysType
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/provider/ibmcloud/action/ibm-z/ibm-z_test.go (1)
40-55: 💤 Low valueConsider adding a test case for GitHub runner provisioning.
The existing tests verify GitLab runner and otelcol integration but don't cover the new GitHub Actions runner path. A test case calling
izUserData(nil, "", " #!/bin/bash\n echo gh")and verifyinginstall-ghrunner.shappears in the decoded output would increase confidence in the template rendering.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/provider/ibmcloud/action/ibm-z/ibm-z_test.go` around lines 40 - 55, Add a new unit test to cover the GitHub Actions runner path: create a test (e.g., TestIzUserData_githubRunner) that calls izUserData(nil, "", " #!/bin/bash\n echo gh") and fails if izUserData returns an error; decode the output with decodeIzOutput(t, out) and assert the resulting cfg contains "install-ghrunner.sh" (and optionally that it includes any expected runcmd/write_files lines). Use the same helpers and style as TestIzUserData_noRunner so the new test lives alongside it and verifies the template renders the GitHub runner installer.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/integrations/github/snippet-linux-ppc64le.sh`:
- Around line 4-7: The RunnerImageRepo value risks executing arbitrary code
because it's embedded into snippet-linux-ppc64le.sh and snippet-linux-s390x.sh
(git clone {{ .RunnerImageRepo }} then bash -c '. scripts/vm.sh ...'); update
cmd/mapt/cmd/params/params.go and pkg/integrations/github/ghrunner.go to enforce
a safe trust model: either restrict the --ghactions-runner-image-repo flag to
admin-only (check caller permissions where flags are parsed/used in params.go),
add validation/allowlist logic in ghrunner.go to validate RunnerImageRepo
against a configured set of allowed hostnames/URLs before embedding into the
snippet, and if neither is possible add clear documentation and runtime warning
logs wherever RunnerImageRepo is accepted (and sanitize inputs to prevent
local-path/ssh/git-protocol abuses); reference the symbols RunnerImageRepo,
cmd/mapt/cmd/params/params.go, pkg/integrations/github/ghrunner.go, and the two
snippet files when making the change.
In `@pkg/integrations/github/snippet-linux-s390x.sh`:
- Around line 4-7: The template embeds an untrusted RunnerImageRepo directly
into shell commands (git clone {{ .RunnerImageRepo }} ... and bash -c '.
scripts/vm.sh ...'), which allows arbitrary repo URLs to be executed; fix by
enforcing validation or admin-only restriction where RunnerImageRepo is set:
implement a validateRunnerImageRepo(url) check for HTTPS scheme, host ==
"github.com" (or other approved host), and an allowlist of specific owner/repo
patterns (reject raw strings, file://, ssh, or IP hosts), call this validation
where the --ghactions-runner-image-repo flag is parsed and refuse/exit on
invalid values, and then continue to render RunnerImageRepo into
pkg/integrations/github/snippet-linux-s390x.sh only after validation (also
consider quoting the variable in the git clone command and running git clone
--depth=1 to limit exposure); alternatively restrict the flag to trusted
administrators and document the requirement for approved repositories.
---
Nitpick comments:
In `@pkg/provider/ibmcloud/action/ibm-z/ibm-z_test.go`:
- Around line 40-55: Add a new unit test to cover the GitHub Actions runner
path: create a test (e.g., TestIzUserData_githubRunner) that calls
izUserData(nil, "", " #!/bin/bash\n echo gh") and fails if izUserData
returns an error; decode the output with decodeIzOutput(t, out) and assert the
resulting cfg contains "install-ghrunner.sh" (and optionally that it includes
any expected runcmd/write_files lines). Use the same helpers and style as
TestIzUserData_noRunner so the new test lives alongside it and verifies the
template renders the GitHub runner installer.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 0708cbed-1d28-4c34-b06b-76c3e58bcb05
📒 Files selected for processing (14)
cmd/mapt/cmd/ibmcloud/hosts/ibm-power.gocmd/mapt/cmd/ibmcloud/hosts/ibm-z.gocmd/mapt/cmd/params/params.gopkg/integrations/github/ghrunner.gopkg/integrations/github/snippet-linux-ppc64le.shpkg/integrations/github/snippet-linux-s390x.shpkg/integrations/github/types.gopkg/integrations/integrations.gopkg/provider/ibmcloud/action/ibm-power/cloud-configpkg/provider/ibmcloud/action/ibm-power/ibm-power.gopkg/provider/ibmcloud/action/ibm-power/ibm-power_test.gopkg/provider/ibmcloud/action/ibm-z/cloud-configpkg/provider/ibmcloud/action/ibm-z/ibm-z.gopkg/provider/ibmcloud/action/ibm-z/ibm-z_test.go
- Quote the URL in snippet git clone commands to prevent shell injection - Add --depth=1 to limit clone exposure and speed up provisioning - Validate that only HTTPS URLs are accepted for the runner image repo Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The upstream configure-limits.sh appends duplicate pam_limits.so entries to system-auth and password-auth, causing sshd to drop connections before sending its banner. Deduplicate PAM entries and restart sshd after build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…snippet Adds a background monitor that logs sshd status every 30s during the runner build to identify what breaks SSH. After build completion, dumps full sshd diagnostics (config test, journal, host key perms, crypto policies, PAM config) and attempts repair. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nostics The upstream configure-system.sh runs chmod -R 777 /usr/share which makes /usr/share/empty.sshd (sshd's privilege separation directory) world-writable. sshd refuses to start when this directory is not owned by root or is world-writable. Fix by restoring 755 after the build. Also adds sshd watchdog logging with COS upload so diagnostics are accessible even when SSH is broken. COS credentials are passed through cloud-config template variables. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On RHEL 9/ppc64le, dotnet installs to /usr/lib64/dotnet via dnf, not /opt/dotnet. The GH runner is self-contained (uses ./bin/Runner.Listener) and does not need DOTNET_ROOT. The chown on /opt/dotnet caused cloud-init to fail after a successful build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
pkg/integrations/github/snippet-linux-ppc64le.sh (1)
22-23: ⚡ Quick win
|| truesilently masks all build failures, not just flaky tests.The comment states "Allow build to continue past flaky upstream test failures", but
|| truesuppresses all exit codes from the build, including genuine errors like missing dependencies, syntax errors, or configuration failures. This makes it impossible to distinguish between expected flaky failures and unexpected breakage.The check at lines 94-97 catches the most critical failure (missing runner binary), but other build issues will go unnoticed until runtime.
💡 Suggested improvement
Option 1: Capture and log the exit code
-bash -c '. scripts/vm.sh rhel 9 minimal --skip-snap-lxd' || true +bash -c '. scripts/vm.sh rhel 9 minimal --skip-snap-lxd' || { + BUILD_EXIT=$? + echo "WARNING: vm.sh exited with code $BUILD_EXIT (continuing due to known flaky tests)" >&2 +}Option 2: Check for specific known-flaky exit codes if possible
-bash -c '. scripts/vm.sh rhel 9 minimal --skip-snap-lxd' || true +bash -c '. scripts/vm.sh rhel 9 minimal --skip-snap-lxd' || { + BUILD_EXIT=$? + # Known flaky test exit codes + if [[ $BUILD_EXIT -ne 1 && $BUILD_EXIT -ne 2 ]]; then + echo "ERROR: vm.sh failed with unexpected exit code $BUILD_EXIT" >&2 + exit $BUILD_EXIT + fi + echo "WARNING: vm.sh failed with known-flaky exit code $BUILD_EXIT (continuing)" >&2 +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/integrations/github/snippet-linux-ppc64le.sh` around lines 22 - 23, The bash command executing scripts/vm.sh with the unconditional || true operator is suppressing all exit codes, including genuine failures. Remove the blanket || true from the end of the bash command and instead implement targeted error handling that either captures and logs the exit code before allowing the build to continue, or specifically checks for known flaky test exit codes while failing on actual build errors. This ensures that only expected flaky failures are ignored while real issues like missing dependencies or syntax errors are still caught and reported.pkg/integrations/github/snippet-linux-s390x.sh (1)
9-10: ⚡ Quick win
|| truesilently masks all build failures, not just flaky tests.The comment states "Allow build to continue past flaky upstream test failures", but
|| truesuppresses all exit codes from the build, including genuine errors. This makes it impossible to distinguish between expected flaky failures and real configuration or dependency issues.The check at lines 12-15 catches the most critical failure (missing runner binary), but other build problems will go undetected.
💡 Suggested improvement
Option 1: Capture and log the exit code
-bash -c '. scripts/vm.sh ubuntu 22.04 minimal --skip-snap-lxd' || true +bash -c '. scripts/vm.sh ubuntu 22.04 minimal --skip-snap-lxd' || { + BUILD_EXIT=$? + echo "WARNING: vm.sh exited with code $BUILD_EXIT (continuing due to known flaky tests)" >&2 +}Option 2: Check for specific known-flaky exit codes if possible
-bash -c '. scripts/vm.sh ubuntu 22.04 minimal --skip-snap-lxd' || true +bash -c '. scripts/vm.sh ubuntu 22.04 minimal --skip-snap-lxd' || { + BUILD_EXIT=$? + # Known flaky test exit codes + if [[ $BUILD_EXIT -ne 1 && $BUILD_EXIT -ne 2 ]]; then + echo "ERROR: vm.sh failed with unexpected exit code $BUILD_EXIT" >&2 + exit $BUILD_EXIT + fi + echo "WARNING: vm.sh failed with known-flaky exit code $BUILD_EXIT (continuing)" >&2 +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/integrations/github/snippet-linux-s390x.sh` around lines 9 - 10, The `|| true` operator at the end of the bash command that executes `scripts/vm.sh ubuntu 22.04 minimal --skip-snap-lxd` unconditionally suppresses all exit codes, masking both flaky test failures and genuine errors alike. Instead of using `|| true`, capture the exit code from the bash command execution and conditionally handle it by either logging the exit code for debugging purposes or checking if it matches specific known-flaky exit codes before allowing the build to continue, ensuring that real configuration or dependency failures are not silently masked.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/mapt/cmd/params/params.go`:
- Around line 293-333: The repoURL validation in the GithubRunnerArgs function
only checks if repoURL is empty when the token is being auto-generated (when
token is empty), but repoURL is required for GitHub Actions runner setup
regardless of whether the token is manually provided or auto-generated. Add an
additional validation check to ensure repoURL is not empty that applies to both
the manual token and auto-generated token paths. This validation should occur
after confirming that at least one of token or pat is provided, ensuring that
users cannot bypass the repoURL requirement by providing a manual token via the
ghactions-runner-token flag.
- Around line 335-340: The validateRunnerImageRepo function currently only
validates that the URL uses HTTPS, which is insufficient since a malicious HTTPS
URL can still be provided and executed as code. Either implement a domain
allowlist approach to restrict repository sources to known trusted domains, or
if an allowlist is too restrictive, add documentation to the flag description
warning that this repository will be cloned and executed with elevated
privileges, and add a runtime warning when the imageRepo flag is set to alert
users about the security implications of using untrusted repositories.
In `@pkg/integrations/github/api.go`:
- Around line 41-44: The code currently uses http.DefaultClient without a
timeout, which can cause indefinite hangs if the GitHub API is unresponsive.
Replace the http.DefaultClient.Do(req) call with a custom http.Client that has a
Timeout field set to an appropriate duration value. Create this client with a
reasonable timeout for GitHub API calls and use it instead of the default
client. Ensure the "time" package is imported at the top of the file.
In `@pkg/provider/ibmcloud/data/pisystempools.go`:
- Around line 112-117: The poolAvailableMemory function is inconsistent with
poolHasCapacity in how it checks capacity sources. While poolHasCapacity uses OR
logic to check three capacity sources (MaxCoresAvailable, MaxMemoryAvailable,
and MaxAvailable), poolAvailableMemory only checks MaxAvailable.Memory. Update
poolAvailableMemory to check all three capacity sources in the same manner as
poolHasCapacity, falling back appropriately through MaxCoresAvailable and
MaxMemoryAvailable when MaxAvailable is unavailable, to ensure accurate headroom
calculation and consistent pool prioritization.
---
Nitpick comments:
In `@pkg/integrations/github/snippet-linux-ppc64le.sh`:
- Around line 22-23: The bash command executing scripts/vm.sh with the
unconditional || true operator is suppressing all exit codes, including genuine
failures. Remove the blanket || true from the end of the bash command and
instead implement targeted error handling that either captures and logs the exit
code before allowing the build to continue, or specifically checks for known
flaky test exit codes while failing on actual build errors. This ensures that
only expected flaky failures are ignored while real issues like missing
dependencies or syntax errors are still caught and reported.
In `@pkg/integrations/github/snippet-linux-s390x.sh`:
- Around line 9-10: The `|| true` operator at the end of the bash command that
executes `scripts/vm.sh ubuntu 22.04 minimal --skip-snap-lxd` unconditionally
suppresses all exit codes, masking both flaky test failures and genuine errors
alike. Instead of using `|| true`, capture the exit code from the bash command
execution and conditionally handle it by either logging the exit code for
debugging purposes or checking if it matches specific known-flaky exit codes
before allowing the build to continue, ensuring that real configuration or
dependency failures are not silently masked.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 84f5a3ce-a823-4bd7-aef3-cf2cab08c1c8
📒 Files selected for processing (8)
.github/workflows/smoke-test-s390x.yamlcmd/mapt/cmd/params/params.gopkg/integrations/github/api.gopkg/integrations/github/snippet-linux-ppc64le.shpkg/integrations/github/snippet-linux-s390x.shpkg/provider/ibmcloud/action/ibm-power/cloud-configpkg/provider/ibmcloud/action/ibm-power/ibm-power.gopkg/provider/ibmcloud/data/pisystempools.go
✅ Files skipped from review due to trivial changes (1)
- .github/workflows/smoke-test-s390x.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/provider/ibmcloud/action/ibm-power/cloud-config
- pkg/provider/ibmcloud/action/ibm-power/ibm-power.go
The s390x runner snippet referenced /opt/dotnet in chown and environment setup, but the runner build extracts everything into /opt/runner-cache. This caused cloud-init to fail with "chown: cannot access '/opt/dotnet'" under set -euo pipefail, preventing runner registration even though the binary built successfully. Mirrors the same fix already applied to the ppc64le snippet in de8fc1f. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/manager/context/context.go`:
- Line 186: The change to use ProjectName instead of RunID for runner naming
requires all setup scripts to safely handle name reuse across multiple runs of
the same project. Review the four registration scripts: snippet-linux.sh,
snippet-darwin.sh, snippet-linux-ppc64le.sh, and snippet-linux-s390x.sh. Ensure
that either all scripts include the --replace flag in their config.sh calls, or
verify that the --ephemeral flag in the architecture-specific scripts
(snippet-linux-ppc64le.sh and snippet-linux-s390x.sh) is sufficient to prevent
registration failures due to naming conflicts. Add the --replace flag to any
script that lacks it, or add documentation confirming that --ephemeral alone
handles the collision prevention.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: a3e72b5e-269c-4353-9f7d-7f163c23c5a9
📒 Files selected for processing (5)
cmd/mapt/cmd/params/params.gopkg/integrations/github/api.gopkg/integrations/github/snippet-linux-s390x.shpkg/manager/context/context.gopkg/provider/ibmcloud/data/pisystempools.go
🚧 Files skipped from review as they are similar to previous changes (3)
- pkg/integrations/github/api.go
- cmd/mapt/cmd/params/params.go
- pkg/provider/ibmcloud/data/pisystempools.go
| func manageIntegration(c *Context, ca *ContextArgs) error { | ||
| if ca.GHRunnerArgs != nil { | ||
| ca.GHRunnerArgs.Name = c.RunID() | ||
| ca.GHRunnerArgs.Name = c.ProjectName() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Locate runner setup scripts and registration command usage =="
fd -i 'snippet-linux-.*\.sh' pkg/integrations/github --exec sed -n '1,220p' {}
echo
echo "== Find config.sh invocations and name/replace flags =="
rg -n -C3 --type=sh 'config\.sh|--name|--replace|ephemeral' pkg/integrations/github
echo
echo "== Confirm how GHRunnerArgs.Name flows into rendered user-data =="
rg -n -C3 --type=go 'GHRunnerArgs\.Name|UserDataValues\{|Name:' pkg/manager/context pkg/integrations/github pkg/provider/ibmcloudRepository: redhat-developer/mapt
Length of output: 31595
Add --replace flag to all runner registration scripts or verify ephemeral behavior prevents naming conflicts.
Line 186 changes GitHub runner naming from run-scoped (RunID) to project-scoped (ProjectName). The older setup scripts (snippet-linux.sh, snippet-darwin.sh) use config.sh --replace to handle reuse, but the newer architecture-specific scripts (snippet-linux-ppc64le.sh, snippet-linux-s390x.sh) use only --ephemeral without --replace. Repeated provisioning for the same project will fail registration unless all registration paths safely handle name collisions.
Setup script inconsistency
snippet-linux.sh and snippet-darwin.sh:
./config.sh ... --name "{{ .Name }}" ... --replace ...
snippet-linux-ppc64le.sh and snippet-linux-s390x.sh:
./config.sh ... --ephemeral ... --name "{{ .Name }}" ...
(no --replace)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/manager/context/context.go` at line 186, The change to use ProjectName
instead of RunID for runner naming requires all setup scripts to safely handle
name reuse across multiple runs of the same project. Review the four
registration scripts: snippet-linux.sh, snippet-darwin.sh,
snippet-linux-ppc64le.sh, and snippet-linux-s390x.sh. Ensure that either all
scripts include the --replace flag in their config.sh calls, or verify that the
--ephemeral flag in the architecture-specific scripts (snippet-linux-ppc64le.sh
and snippet-linux-s390x.sh) is sufficient to prevent registration failures due
to naming conflicts. Add the --replace flag to any script that lacks it, or add
documentation confirming that --ephemeral alone handles the collision
prevention.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/provider/ibmcloud/action/ibm-power/ibm-power.go`:
- Around line 180-184: The error handling for the destroyForRetry call is too
lenient and allows retries to continue even when partial stack cleanup fails,
risking dirty state. Modify the error handling in this block so that if
destroyForRetry returns an error, the function only continues to the next
sysType iteration if the error specifically indicates "no stack exists"; for any
other error, the function should return that error and stop retrying rather than
proceeding with potentially leftover resources.
In `@pkg/provider/ibmcloud/data/pisystemtypes.go`:
- Around line 65-78: The filterAndPrioritize function currently filters system
types to only include those present in both the priority list and the supported
list, which causes it to fail when a zone supports types not in
DefaultSystemTypePriority. After building the filtered list from the priority
list iteration, append any remaining supported system types that were not
included in the priority list before checking if filtered is empty. This ensures
that valid API-supported types are preserved as a fallback, maintaining the
preferred ordering while still accepting all valid candidates.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 8428a96e-9095-4043-95ce-333a60cee504
📒 Files selected for processing (2)
pkg/provider/ibmcloud/action/ibm-power/ibm-power.gopkg/provider/ibmcloud/data/pisystemtypes.go
| if i < len(sysTypes.Types)-1 { | ||
| logging.Infof("destroying partial stack before retry...") | ||
| if dErr := destroyForRetry(mCtx); dErr != nil { | ||
| logging.Warnf("failed to destroy partial stack: %v", dErr) | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Do not retry after failed partial-stack cleanup.
The next attempt reuses the same stack name/backend with a different sysType; if destroyForRetry failed, the retry can operate on dirty state or leftover resources. Stop here unless the error is positively known to mean “no stack exists.”
Proposed fix
if i < len(sysTypes.Types)-1 {
logging.Infof("destroying partial stack before retry...")
if dErr := destroyForRetry(mCtx); dErr != nil {
- logging.Warnf("failed to destroy partial stack: %v", dErr)
+ return fmt.Errorf("failed to destroy partial stack before retry: %w", dErr)
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if i < len(sysTypes.Types)-1 { | |
| logging.Infof("destroying partial stack before retry...") | |
| if dErr := destroyForRetry(mCtx); dErr != nil { | |
| logging.Warnf("failed to destroy partial stack: %v", dErr) | |
| } | |
| if i < len(sysTypes.Types)-1 { | |
| logging.Infof("destroying partial stack before retry...") | |
| if dErr := destroyForRetry(mCtx); dErr != nil { | |
| return fmt.Errorf("failed to destroy partial stack before retry: %w", dErr) | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/provider/ibmcloud/action/ibm-power/ibm-power.go` around lines 180 - 184,
The error handling for the destroyForRetry call is too lenient and allows
retries to continue even when partial stack cleanup fails, risking dirty state.
Modify the error handling in this block so that if destroyForRetry returns an
error, the function only continues to the next sysType iteration if the error
specifically indicates "no stack exists"; for any other error, the function
should return that error and stop retrying rather than proceeding with
potentially leftover resources.
| priority := buildPriorityList(preferred) | ||
|
|
||
| var filtered []string | ||
| for _, t := range priority { | ||
| if slices.Contains(supported, t) { | ||
| filtered = append(filtered, t) | ||
| } | ||
| } | ||
|
|
||
| if len(filtered) == 0 { | ||
| return nil, fmt.Errorf( | ||
| "no system types from priority list %v are supported in zone (supported: %v)", | ||
| priority, supported) | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Keep API-supported system types as a fallback.
filterAndPrioritize fails when a zone only supports system types not listed in DefaultSystemTypePriority, even though the API returned valid candidates. Preserve the preferred/default ordering, then append remaining supported types before deciding there is nothing to try.
Proposed fix
priority := buildPriorityList(preferred)
var filtered []string
+ seen := map[string]struct{}{}
for _, t := range priority {
if slices.Contains(supported, t) {
filtered = append(filtered, t)
+ seen[t] = struct{}{}
+ }
+ }
+ for _, t := range supported {
+ if _, ok := seen[t]; ok {
+ continue
}
+ filtered = append(filtered, t)
}
if len(filtered) == 0 {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/provider/ibmcloud/data/pisystemtypes.go` around lines 65 - 78, The
filterAndPrioritize function currently filters system types to only include
those present in both the priority list and the supported list, which causes it
to fail when a zone supports types not in DefaultSystemTypePriority. After
building the filtered list from the priority list iteration, append any
remaining supported system types that were not included in the priority list
before checking if filtered is empty. This ensures that valid API-supported
types are preserved as a fallback, maintaining the preferred ordering while
still accepting all valid candidates.
Integrate the existing pkg/integrations/github framework into the IBM Cloud providers so that
mapt ibmcloud ibm-power createandmapt ibmcloud ibm-z createcan provision VMs that auto-register as ephemeral GitHub Actions self-hosted runners.Since no official runner binaries exist for ppc64le/s390x, arch-specific setup scripts clone action-runner-image-pz and build the runner from source on the target VM. A new --ghactions-runner-image-repo flag controls which repo is cloned (defaults to deekay2310 fork until the RHEL script merges to github.com/IBM/action-runner-image-pz).