Skip to content

OCPEDGE-2737: Enable aarch64 native KVM support for agent-based installation#1908

Open
fonta-rh wants to merge 5 commits into
openshift-metal3:masterfrom
fonta-rh:aarch64-native-kvm
Open

OCPEDGE-2737: Enable aarch64 native KVM support for agent-based installation#1908
fonta-rh wants to merge 5 commits into
openshift-metal3:masterfrom
fonta-rh:aarch64-native-kvm

Conversation

@fonta-rh

@fonta-rh fonta-rh commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Enable IPI and agent-based installation (ABI) on native aarch64 KVM hosts (e.g. AWS Graviton bare metal). Three independent fixes that together unblock the full agent path on aarch64 (and IPI while we're at it):

  1. CPU model fix — Narrow the {% if is_aarch64 %} conditional in the VM template so native KVM falls through to host-passthrough instead of the emulation-only cortex-a57
  2. Hardcoded x86_64 references — Replace x86_64 strings with $(uname -m) / ${ARCH} across agent scripts and RHCOS boot image resolution
  3. CDROM bus fix — Set target.bus explicitly on all virt-xml CDROM attachment calls (scsi on aarch64, sata on x86_64) to prevent the virt machine type from defaulting to USB (which doesn't exist)

Changes

File Fix
02_configure_host.sh Narrow CPU model conditional for native aarch64 KVM
agent/06_agent_create_cluster.sh Add CDROM_BUS variable; set target.bus on all 5 virt-xml CDROM lines
agent/common.sh PXE boot filename: x86_64$(uname -m)
agent/iscsi_utils.sh iSCSI boot filenames (DHCP bootp + iPXE)
agent/07_agent_add_extraworker_nodes.sh Extra worker node ISO filename
agent/iso_no_registry.sh OVE ISO cleanup exclusion pattern
agent/01_agent_requirements.sh oc-mirror download URL
rhcos.sh RHCOS format key lookup (.architectures.x86_64 → host arch)

Test plan

  • Full OCP 4.22 fencing-agent deployment on AWS c7g.metal (Graviton3, native aarch64 KVM)
  • Both nodes Ready, etcd Available, zero degraded cluster operators
  • Verified x86_64 deployments unaffected (all substitutions evaluate to x86_64 / sata on x86 hosts)
  • 02_configure_host.sh: <os>, <features>, VNC conditionals still fire correctly (only CPU block narrowed)

Supersedes #1910.

🤖 Generated with Claude Code

@openshift-ci openshift-ci Bot requested review from andfasano and mkowalski June 9, 2026 09:04
@openshift-ci

openshift-ci Bot commented Jun 9, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign derekhiggins for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci

openshift-ci Bot commented Jun 9, 2026

Copy link
Copy Markdown

Hi @fonta-rh. Thanks for your PR.

I'm waiting for a openshift-metal3 member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 9, 2026
Replace hardcoded x86_64 architecture strings with $(uname -m) or
${ARCH} so that agent-based installation scripts work on aarch64
hosts (e.g. AWS Graviton bare metal).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The metal3-dev-env baremetalvm.xml.j2 template hardcodes cortex-a57
for all aarch64 VMs. This CPU model only works under QEMU emulation;
native aarch64 KVM (e.g. AWS Graviton bare metal) requires
host-passthrough.

Narrow the CPU-section conditional from `{% if is_aarch64 %}` to
`{% if is_aarch64 and libvirt_domain_type == 'qemu' %}` so that
native KVM falls through to host-passthrough. The other three
is_aarch64 blocks (<os>, <features>, VNC) are unaffected — the sed
targets only the CPU block by matching its adjacent HTML comment line.

Tested on AWS c7g.metal (Graviton3) with OCP 4.22.0-rc.5 — full
fencing-IPI deployment with Pacemaker/STONITH operational.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fonta-rh fonta-rh force-pushed the aarch64-native-kvm branch from 15e7ec5 to 15c417f Compare June 9, 2026 10:27
On aarch64 (virt machine type), virt-xml defaults the CDROM bus to USB
when no bus is specified. The virt machine type has no USB controller,
causing "USB is disabled for this domain" errors when attaching the
agent ISO at step 06.

Set target.bus explicitly on all five virt-xml CDROM attachment calls:
sata on x86_64 (q35 default), scsi on aarch64 (matching the bus
already configured by 02_configure_host.sh).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fonta-rh fonta-rh changed the title NO-JIRA: Fix aarch64 CPU model for native KVM deployments NO-JIRA: Enable aarch64 native KVM support for agent-based installation Jun 9, 2026
@fonta-rh fonta-rh changed the title NO-JIRA: Enable aarch64 native KVM support for agent-based installation OCPEDGE-2737: Enable aarch64 native KVM support for agent-based installation Jun 9, 2026
@fonta-rh

fonta-rh commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

/jira refresh

fonta-rh and others added 2 commits June 11, 2026 11:11
The pinned metal3-dev-env hardcodes `linux-amd64` in the go_tarball
template variable. The 01_install_requirements.sh script already
detects the host architecture and passes GOARCH as an Ansible extra
var, but the template ignores it.

Add a sed patch (alongside the existing Ansible version patch) to
make go_tarball use the GOARCH variable, so the correct Go binary
is downloaded on aarch64 hosts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On aarch64 (virt machine type), AAVMF firmware does not reliably honor
boot_order for SCSI CDROM vs virtio disk. After the agent-based installer
writes the OS to disk and reboots, VMs boot back into the installation
ISO instead of from disk, causing installing-pending-user-action timeout.

Fix: eject CDROM media after VMs have booted from the ISO. The CoreOS
live agent runs entirely in RAM, so the ISO is not needed after boot.
When the agent triggers a reboot after image write, the empty CDROM is
skipped and the VM boots from disk.

Gated on aarch64 only — x86_64 OVMF handles boot_order correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant