Skip to content

Conversation

@charliedmcb
Copy link
Contributor

@charliedmcb charliedmcb commented Oct 9, 2025

Fixes #

Description
Built off this underlaying framework:
#1200

These tests cover:

  • Assignment of post-provisioning networking labels that were previously handled by Karpenter
  • Karpenter handling of DriftAction + kubelet causing DriftAction
  • PUT machine during PUT MC

How was this change tested?

Tested with this PR:

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note

NONE

@charliedmcb charliedmcb changed the base branch from main to charliedmcb/addBYOMachinePoolE2EBaseStructure October 9, 2025 21:10
bK8sVersion := lo.Must(semver.Parse(*b.KubernetesVersion))
return aK8sVersion.GT(bK8sVersion)
}).KubernetesVersion
upgradedMC := env.ExpectSuccessfulUpgradeOfManagedCluster(kubernetesUpgradeVersion)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the implementation, this might unintentionally wait for PUT to be done, while we want PUT to be ongoing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree - I don't see how cluster is still in upgrading at the end of this test given the above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, good catch.

I think I had a local fix for this changing the func to be async, and you could optionally wait for polling after, so here is could avoid that, and in the regular k8s test it could use that. Must have not gotten pushed, and lost in the codespace now. Sorry about that.

Should be updated.

git_ref: ${{ inputs.git_ref }}
location: ${{ inputs.location }}
provisionmode: ${{ inputs.provisionmode }}
identity_type: ${{ inputs.suite == 'Machines' && 'UserAssigned' || 'SystemAssigned' }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need machine identity to be UserAssigned? Can we leave a comment here or in create-cluster/action.yaml explaining?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for handling kubelet update/upgrade.

You can't do the user assigned kubelet update test without the cluster being in a user assigned identity state.

})
}

func (env *Environment) ExpectGetManagedCluster() containerservice.ManagedCluster {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (env *Environment) ExpectGetManagedCluster() containerservice.ManagedCluster {
func (env *Environment) ExpectGetManagedCluster() *containerservice.ManagedCluster {

func (env *Environment) WarnIfClusterNotInExpectedProvisioningState(expectedProvisioningState string) containerservice.ManagedCluster {
GinkgoHelper()
managedCluster := env.ExpectGetManagedCluster()
Expect(managedCluster.Properties.ProvisioningState).ToNot(BeNil())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: assumes Properties is not nil


// WarnIfClusterNotInExpectedProvisioningState checks if the clusters provisioning state is equal to the
// given expected provisioning state.
func (env *Environment) WarnIfClusterNotInExpectedProvisioningState(expectedProvisioningState string) containerservice.ManagedCluster {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (env *Environment) WarnIfClusterNotInExpectedProvisioningState(expectedProvisioningState string) containerservice.ManagedCluster {
func (env *Environment) WarnIfClusterNotInExpectedProvisioningState(expectedProvisioningState string) *containerservice.ManagedCluster {

fail-fast: false
matrix:
suite: [ACR, BYOK, Chaos, Consolidation, Drift, GPU, InPlaceUpdate, Integration, KubernetesUpgrade, NodeClaim, Scheduling, Spot, Subnet, Utilization]
suite: [ACR, BYOK, Chaos, Consolidation, Drift, GPU, InPlaceUpdate, Integration, KubernetesUpgrade, NodeClaim, Scheduling, Spot, Subnet, Utilization, Machines]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Alphabetical?

TBH might be a good idea at this point to just break this guy and put 1 suite per line.
This is YAML after all we don't need to mash everything into a single line AFAIK.

Expect(node.Labels).To(HaveKeyWithValue("kubernetes.azure.com/azure-cni-overlay", "true"))
Expect(node.Labels).To(HaveKeyWithValue("kubernetes.azure.com/podnetwork-type", consts.NetworkPluginModeOverlay))

// Note: these labels we only check their keys since, the values are dynamic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Note: these labels we only check their keys since, the values are dynamic
// Note: these labels we only check their keys since the values are dynamic

})

// NOTE: ClusterTests modify the actual cluster itself, which means that preforming tests after a cluster test
// might not have a clean environment, and might produce unexpected results. Ordering of cluster tests is important
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Should we add a note here that this is safe in CI because each E2E runs on its own cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, its safe so long as the suite itself is setup correctly to handle the ordering. However, there is leakage between test cases within the suite

// updatedKubeletIdentityResourceID := env.GetKubeletIdentityResourceID(env.Context)

// TODO: check if we want to have this possibly logged
// Expect(updatedKubeletIdentityResourceID).To(Equal(lo.FromPtr(newIdentity.ID)), "Expected updatedKubeletIdentityResourceID to match new kubelet resource id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not want this assert? Not sure I follow why not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a light concern on leaking the indentity info. If there is no concern, than yes this asset should be there

}).WithTimeout(3 * time.Minute).Should(Succeed())

By("expecting nodes to drift")
env.EventuallyExpectDriftedWithTimeout(15*time.Minute, nodeClaims...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15m seems a long time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

especially since this is just detection of drift, not actual drift? Does this really take 15m? Shouldn't it take more like 30s?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I think this came from the other drift tests. Agreed can be shortened; though, I feel like it'd be worth looking at the drift logic, or some metrics to make sure its now being shortened too far.

bK8sVersion := lo.Must(semver.Parse(*b.KubernetesVersion))
return aK8sVersion.GT(bK8sVersion)
}).KubernetesVersion
upgradedMC := env.ExpectSuccessfulUpgradeOfManagedCluster(kubernetesUpgradeVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree - I don't see how cluster is still in upgrading at the end of this test given the above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants