Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,9 @@ make test-e2e-setup # Setup E2E test environment
make test-e2e-cleanup # Cleanup after E2E tests

# Test variations
TEST_VIRT=true make test-e2e # Run virtualization tests
TEST_UPGRADE=true make test-e2e # Run upgrade tests
TEST_VIRT=true make test-e2e # Run virtualization tests (community HCO, KubeVirt 1.8+)
TEST_VIRT_GA=true make test-e2e # Run virtualization tests (OpenShift Virtualization from redhat-operators)
TEST_UPGRADE=true make test-e2e # Run upgrade tests
TEST_CLI=true make test-e2e # Run CLI-based tests

# Run focused tests
Expand Down
9 changes: 9 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -924,11 +924,16 @@ test-e2e-setup: login-required build-must-gather

VELERO_INSTANCE_NAME ?= velero-test
ARTIFACT_DIR ?= /tmp
# virt
HCO_UPSTREAM ?= false
TEST_VIRT_GA ?= false
TEST_VIRT ?= false
HCO_INDEX_TAG ?= 1.18.0
# hcp
TEST_HCP ?= false
TEST_HCP_EXTERNAL ?= false
HCP_EXTERNAL_ARGS ?= ""
# other
TEST_CLI ?= false
SKIP_MUST_GATHER ?= false
TEST_UPGRADE ?= false
Expand All @@ -938,6 +943,8 @@ $(SED) -r "s/[&]* [!] $(CLUSTER_TYPE)|[!] $(CLUSTER_TYPE) [&]*//")) || $(CLUSTER
#TEST_FILTER := $(shell echo '! aws && ! gcp && ! azure' | $(SED) -r "s/[&]* [!] $(CLUSTER_TYPE)|[!] $(CLUSTER_TYPE) [&]*//")
ifeq ($(TEST_VIRT),true)
TEST_FILTER += && (virt)
else ifeq ($(TEST_VIRT_GA),true)
TEST_FILTER += && (virt)
else
TEST_FILTER += && (! virt)
endif
Expand Down Expand Up @@ -985,6 +992,8 @@ test-e2e: test-e2e-setup install-ginkgo ## Run E2E tests against OADP operator i
-artifact_dir=$(ARTIFACT_DIR) \
-kvm_emulation=$(KVM_EMULATION) \
-hco_upstream=$(HCO_UPSTREAM) \
-hco_community=$(TEST_VIRT) \
-hco_index_tag=$(HCO_INDEX_TAG) \
-skipMustGather=$(SKIP_MUST_GATHER) \
$(HCP_EXTERNAL_ARGS) \
|| EXIT_CODE=$$?; \
Expand Down
16 changes: 10 additions & 6 deletions build/ci-Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,16 @@ RUN curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/s
chmod +x kubectl && \
mv kubectl /usr/local/bin/

# Install Node.js and Claude CLI
# Using NodeSource setup script for RHEL-based images
RUN curl -fsSL https://rpm.nodesource.com/setup_20.x | bash - && \
dnf install -y nodejs && \
npm install -g @anthropic-ai/claude-code && \
dnf clean all
# Install virtctl for KubeVirt VM operations in E2E tests
RUN export KV_VERSION=$(curl -s https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt) && \
curl -L -o virtctl "https://github.com/kubevirt/kubevirt/releases/download/${KV_VERSION}/virtctl-${KV_VERSION}-linux-${TARGETARCH}" && \
chmod +x virtctl && \
mv virtctl /usr/local/bin/

# Install Claude CLI (native binary, no Node.js dependency)
RUN curl -fsSL https://claude.ai/install.sh | bash && \
ln -sf ~/.local/bin/claude /usr/local/bin/claude && \
claude --version

# Clone openshift/velero source code for failure analysis
# Uses oadp-dev branch to match OADP operator development
Expand Down
276 changes: 276 additions & 0 deletions docs/design/specs/2026-04-21-kubevirt-datamover-e2e-test-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,276 @@
# Kubevirt Datamover E2E Test Design

## Summary

Add a basic E2E test to `virt_backup_restore_suite_test.go` that validates the kubevirt-datamover backup path: enabling CBT on a CirrOS VM via HCO configuration and VM labels, deploying OADP with the `kubevirt-datamover` plugin, triggering a Velero backup with `SnapshotMoveData=true`, and verifying that a `VirtualMachineBackupTracker` is created (proving the kubevirt-datamover controller processed the backup).

## Approach

Extend the existing HCO/OLM-based virt test infrastructure. No new KubeVirt install paths.

## Prerequisites

- HCO version with KubeVirt >= v1.7 (CBT support, released Nov 2025)
- kubevirt-datamover-controller deployed (handled by OADP operator when `kubevirt-datamover` plugin is enabled)
- kubevirt-datamover-plugin image available (Velero init container, configured via DPA `DefaultPluginKubeVirtDataMover`)

---

## Existing KubeVirt Installation Flow (No Changes)

The existing virt test suite installs KubeVirt via the HyperConverged Cluster Operator (HCO) through OLM. This flow is in `tests/e2e/lib/virt_helpers.go` and is driven by the `BeforeAll` in `virt_backup_restore_suite_test.go`. The test does NOT install raw upstream KubeVirt; it always uses HCO.

### Step-by-step existing flow

1. **`GetVirtOperator(client, clientset, dynamicClient, useUpstreamHco)`** (line 65)
- Selects namespace and OLM package based on `upstream` flag:
- OpenShift (default): namespace `openshift-cnv`, PackageManifest `kubevirt-hyperconverged`, catalog `redhat-operators`
- Upstream (`HCO_UPSTREAM=true`): namespace `kubevirt-hyperconverged`, PackageManifest `community-kubevirt-hyperconverged`, catalog `community-operators`
- Reads the `stable` channel from the PackageManifest to get the current CSV name and version

2. **`EnsureVirtInstallation()`** (line 732) — only runs if HCO is not already present
- `EnsureNamespace(v.Namespace)` — creates `openshift-cnv` or `kubevirt-hyperconverged`
- `ensureOperatorGroup()` — creates OperatorGroup (upstream uses empty `TargetNamespaces`)
- `ensureSubscription()` — creates OLM Subscription pointing to the catalog/channel/CSV
- `ensureCsv(5min)` — waits for the ClusterServiceVersion to be ready
- `ensureHco(5min)` — creates the `HyperConverged` CR and waits for health

3. **`installHco()`** (line 339) — creates the HyperConverged CR with **empty spec**:
```yaml
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
namespace: <v.Namespace>
spec: {}
```
HCO then creates and manages the KubeVirt CR, CDI, and other operands.

4. **Optional: KVM emulation** (`EnsureEmulation()`, line 686)
- Only when `kvmEmulation=true` (cloud clusters without nested virt)
- Patches the HCO CR's **annotation** `kubevirt.kubevirt.io/jsonpatch` with:
```json
[{"op": "add", "path": "/spec/configuration/developerConfiguration", "value": {"useEmulation": true}}]
```
- HCO applies this JSON patch to the KubeVirt CR it manages

5. **CirrOS boot image setup**
- Downloads latest CirrOS image URL
- Creates DataVolume `cirros` in `openshift-virtualization-os-images` namespace
- Creates DataSource from the PVC

6. **DPA plugin configuration**
- Appends `DefaultPluginKubeVirt` to `dpaCR.VeleroDefaultPlugins`

7. **Storage classes and RBAC**
- Creates `test-sc-immediate` and `test-sc-wffc` StorageClasses
- Installs `cirros-rbac.yaml`

### Key point: HCO annotations propagate to KubeVirt CR

HCO manages the KubeVirt CR. Direct edits to the KubeVirt CR are overwritten by HCO. To inject KubeVirt-level configuration that HCO doesn't directly expose, the pattern is to use the `kubevirt.kubevirt.io/jsonpatch` annotation on the HCO CR. This is already used for KVM emulation (step 4 above).

---

## CBT Enablement: Two Separate Configurations Required

Enabling ChangedBlockTracking on a VM requires two distinct cluster-level configurations plus a per-VM label.

**Note:** An older setup procedure used three `kubevirt.kubevirt.io/jsonpatch` operations to inject `IncrementalBackup` and `UtilityVolumes` feature gates plus the label selector. That is **no longer necessary** — HCO now exposes `incrementalBackup` as a first-class feature gate, and enabling it automatically enables `UtilityVolumes`. Only the label selector still requires the jsonpatch annotation.

### Configuration 1: HCO Feature Gate (`incrementalBackup`)

The HCO CR has a feature gate `spec.featureGates.incrementalBackup` (default: `false`). Setting this to `true`:
- Enables the `IncrementalBackup` feature gate in the KubeVirt CR
- Automatically enables the `UtilityVolumes` feature gate (required for backup output storage)
- This is a Tech Preview feature (Alpha graduation)

**What to set on HCO:**
```yaml
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
spec:
featureGates:
incrementalBackup: true
```

This is a direct field on the HCO spec, so it can be set via a standard merge patch on the HCO resource (no annotation needed).

### Configuration 2: KubeVirt Label Selector (`changedBlockTrackingLabelSelectors`)

The KubeVirt CR has a configuration field `spec.configuration.changedBlockTrackingLabelSelectors` that tells KubeVirt which VMs should have CBT enabled, using label selectors. This field is on the **KubeVirt CR**, not directly exposed by HCO.

**What to set on KubeVirt CR:**
```yaml
apiVersion: kubevirt.io/v1
kind: KubeVirt
spec:
configuration:
changedBlockTrackingLabelSelectors:
virtualMachineLabelSelector:
matchLabels:
changedBlockTracking: "true"
```

Since HCO manages the KubeVirt CR and overwrites direct edits, this must be injected via the `kubevirt.kubevirt.io/jsonpatch` annotation on the HCO CR (same mechanism as KVM emulation):

```json
[{"op": "add", "path": "/spec/configuration/changedBlockTrackingLabelSelectors", "value": {"virtualMachineLabelSelector": {"matchLabels": {"changedBlockTracking": "true"}}}}]
```

**Important:** The `kubevirt.kubevirt.io/jsonpatch` annotation is a single annotation holding a JSON array of patch operations. If KVM emulation is also enabled, both patches must be combined into one annotation value:
```json
[
{"op": "add", "path": "/spec/configuration/developerConfiguration", "value": {"useEmulation": true}},
{"op": "add", "path": "/spec/configuration/changedBlockTrackingLabelSelectors", "value": {"virtualMachineLabelSelector": {"matchLabels": {"changedBlockTracking": "true"}}}}
]
```

### Configuration 3: VM Label

The VM itself must carry the matching label. This is baked into the VM manifest:
```yaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
changedBlockTracking: "true"
```

### VM Restart Required

Even with the label present from VM creation, a restart cycle is required for KubeVirt to:
1. Create a backend storage PVC
2. Create a qcow2 overlay on top of the raw disk
3. Update the VM's domain XML

After restart, the VM's `status.ChangedBlockTracking.State` transitions to `Enabled`.

### Full CBT activation sequence in the test

```
1. EnsureVirtInstallation() — existing flow, installs HCO with empty spec
2. EnableCBTFeatureGate() — patch HCO: spec.featureGates.incrementalBackup = true
3. EnableCBTLabelSelector() — patch HCO annotation: jsonpatch to set changedBlockTrackingLabelSelectors on KubeVirt CR
4. Deploy CirrOS VM with label — template has changedBlockTracking: "true"
5. Wait for VM Running
6. Restart VM (stop + start) — required for qcow2 overlay creation
7. Wait for VM Running again
8. Wait for status.ChangedBlockTracking.State == "Enabled"
9. Proceed with backup
```

Steps 2 and 3 are idempotent and can be placed in `BeforeAll` so they run once for the entire suite.

---

## Changes

### 1. New CirrOS VM template with CBT label

**File:** `tests/e2e/sample-applications/virtual-machines/cirros-test/cirros-test-cbt.yaml`

Based on existing `cirros-test.yaml`, with the addition of:
- `metadata.labels.changedBlockTracking: "true"` on the VirtualMachine
- Same CirrOS boot image, same storage class, same resource requests

### 2. New VirtOperator methods in `tests/e2e/lib/virt_helpers.go`

#### `EnableCBTFeatureGate() error`

Patches the HCO CR to set `spec.featureGates.incrementalBackup: true`. Uses the dynamic client to get the HCO, set the nested field, and update. Follows the same retry-on-conflict pattern as `EnsureEmulation`.

#### `EnableCBTLabelSelector() error`

Patches the HCO CR's `kubevirt.kubevirt.io/jsonpatch` annotation to inject `changedBlockTrackingLabelSelectors` into the KubeVirt CR. Must handle the case where:
- The annotation doesn't exist yet (create it with just the CBT patch)
- The annotation already has patches (e.g. emulation) — parse the existing JSON array, append the CBT patch if not already present, write back

#### `StartVm(namespace, name string) error`

REST call to the KubeVirt subresource API:
```
PUT /apis/subresources.kubevirt.io/v1/namespaces/{namespace}/virtualmachines/{name}/start
```
Mirrors the existing `StopVm` method.

#### `RestartVmAndWaitRunning(namespace, name string, timeout time.Duration) error`

Stops the VM, waits for Stopped status, starts it, and waits for Running status.

#### `WaitForCBTEnabled(namespace, name string, timeout time.Duration) error`

Polls the VM's `status.changedBlockTracking.state` via the dynamic client until it equals `"Enabled"` or times out.

### 3. DPA configuration in `virt_backup_restore_suite_test.go`

In `BeforeAll`, add `DefaultPluginKubeVirtDataMover` to `dpaCR.VeleroDefaultPlugins` alongside the existing `DefaultPluginKubeVirt`. This causes the OADP operator to:
- Add the kubevirt-datamover-plugin as a Velero init container
- Deploy the kubevirt-datamover-controller Deployment

### 4. New test entry in `virt_backup_restore_suite_test.go`

A new `ginkgo.Entry` in the existing `DescribeTable` with label `"virt"`:

**"no-application kubevirt-datamover backup, CirrOS VM with CBT"**

Uses a modified `runVmBackupAndRestore` flow or a dedicated run function that:

1. Creates DPA (via `prepareBackupAndRestore`)
2. Creates namespace, installs the CBT CirrOS VM template
3. Waits for VM Running
4. Calls `v.EnableCBTFeatureGate()` and `v.EnableCBTLabelSelector()` (idempotent, can also be in BeforeAll)
5. Restarts the VM (`v.RestartVmAndWaitRunning`)
6. Waits for `status.ChangedBlockTracking.State == Enabled` (`v.WaitForCBTEnabled`)
7. Triggers Velero backup (via existing `runBackup` with `CSIDataMover` type for `SnapshotMoveData=true`)
8. **Post-backup verification**: Checks that a `VirtualMachineBackupTracker` CR (`backup.kubevirt.io/v1alpha1`) was created in the VM's namespace. This is the definitive signal that the kubevirt-datamover-controller received and started processing the DataUpload — it creates the VMBT during the Accepted phase before creating the VMB for the actual qcow2 backup.
9. Deletes VM and namespace
10. Runs restore
11. **Post-restore verification**: VM comes back running (restore path is best-effort since the kubevirt-datamover-controller doesn't implement DataDownload reconciliation yet)

### 5. Verification helper

**`verifyVMBackupTrackerExists(dynamicClient, vmNamespace string)`**

Uses the dynamic client to list `VirtualMachineBackupTracker` resources (`backup.kubevirt.io/v1alpha1`) in the VM namespace and asserts at least one exists. This proves the full chain worked: BIA plugin created a DataUpload with `dataMover: kubevirt`, and the kubevirt-datamover-controller reconciled it and created the VMBT.

## Files Changed

| File | Type | Description |
|------|------|-------------|
| `tests/e2e/sample-applications/virtual-machines/cirros-test/cirros-test-cbt.yaml` | New | CirrOS VM template with `changedBlockTracking: "true"` label |
| `tests/e2e/lib/virt_helpers.go` | Modified | Add `EnableCBTFeatureGate`, `EnableCBTLabelSelector`, `StartVm`, `RestartVmAndWaitRunning`, `WaitForCBTEnabled` |
| `tests/e2e/virt_backup_restore_suite_test.go` | Modified | Add `DefaultPluginKubeVirtDataMover` to plugins, add CBT test entry, add VMBT verification |

## Test Labels and Execution

The test entry uses the `"virt"` label (same as existing VM tests), gated by `TEST_VIRT=true`. If the HCO version doesn't support the `incrementalBackup` feature gate or CBT, the enablement steps will fail with a clear error.

## Volume Policy: `skip` Action Type

The kubevirt-datamover-plugin uses Velero's volume policy mechanism to determine which PVCs it should handle. Specifically, PVCs that have the `skip` action type in the volume policy are eligible for the kubevirt datamover path. The `skip` action prevents Velero from performing CSI snapshots on these PVCs, allowing the kubevirt-datamover-plugin's `BackupItemActionV2` to create a `DataUpload` CR with `DataMover: "kubevirt"` instead.

> **Future**: Once upstream Velero merges the `custom` action type ([velero-io/velero#9678](https://github.com/velero-io/velero/pull/9678)), the kubevirt-datamover-plugin will be updated to check for `custom` with kubevirt-specific parameters (see [kubevirt-datamover-plugin#4](https://github.com/migtools/kubevirt-datamover-plugin/issues/4)).

The E2E test creates a volume policy ConfigMap in the velero namespace with the `skip` action type:

```yaml
version: v1
volumePolicies:
- conditions:
pvcLabels:
changedBlockTracking: "true"
action:
type: skip
```

This ConfigMap is referenced via `Spec.ResourcePolicy` on the Backup CR. When Velero evaluates volume policies for PVCs with the `changedBlockTracking: "true"` label, it matches the `skip` action and returns `shouldSnapshot=false`, which the kubevirt-datamover-plugin interprets as eligibility for the kubevirt datamover path.

The helpers `EnsureKubevirtVolumePolicy` and `CreateBackupWithVolumePolicy` in `tests/e2e/lib/backup.go` manage this lifecycle.

## Out of Scope

- Restore via kubevirt-datamover (DataDownload controller not implemented)
- Raw upstream KubeVirt daily build installation
4 changes: 3 additions & 1 deletion docs/developer/testing/TESTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ To get started, you need to provide the following **required** environment varia
| `BSL_REGION` | The region of backupLocations | `us-east-1` | false |
| `OADP_TEST_NAMESPACE` | The namespace where OADP operator is installed | `openshift-adp` | false |
| `OPENSHIFT_CI` | Disable colored output from tests suite run | `true` | false |
| `TEST_VIRT` | Exclusively run Virtual Machine backup/restore testing | `false` | false |
| `TEST_VIRT` | Exclusively run VM backup/restore testing using community HCO from custom CatalogSource (mutually exclusive with TEST_VIRT_GA) | `false` | false |
| `TEST_VIRT_GA` | Exclusively run Virtual Machine backup/restore testing (OpenShift Virtualization from redhat-operators) | `false` | false |
| `HCO_INDEX_TAG` | HCO index image tag for the community CatalogSource (used with TEST_VIRT) | `1.18.0` | false |
| `TEST_HCP` | Exclusively run Hypershift backup/restore testing | `false` | false |
| `TEST_UPGRADE` | Exclusively run upgrade tests. Need to first run `make catalog-test-upgrade`, if testing non production operator | `false` | false |
| `TEST_CLI` | Exclusively run CLI-based backup/restore testing | `false` | false |
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ require (
golang.org/x/exp v0.0.0-20240719175910-8a7402abbf56
google.golang.org/api v0.256.0
k8s.io/klog/v2 v2.130.1
sigs.k8s.io/yaml v1.4.0
)

require (
Expand Down Expand Up @@ -199,7 +200,6 @@ require (
sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 // indirect
sigs.k8s.io/randfill v1.0.0 // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.6.0 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)

replace github.com/vmware-tanzu/velero => github.com/openshift/velero v0.10.2-0.20260413161955-ea34d4d90057
Expand Down
Loading