Skip to content

Bug: RoleAssignments stuck in deletion because of authorization (scope) issues #5009

@RoRoMaan

Description

@RoRoMaan

Describe the bug

When a managed cluster (AKS) is deleted through the cluster-api it also triggers the deletion of the corresponding ASO resources.
ASO seems to clean up most of the azure resources without any issues. However, it struggles with RoleAssignments.

The RoleAssignment is attached to another ASO resource (owner: StorageAccount) & when a deletion is initiated (by deleting the resource group or storage account object) ASO deletes the resources in azure. The RoleAssignment remains in a Failed state with the following message:

RESPONSE 403: 403 Forbidden
ERROR CODE: AuthorizationFailed
--------------------------------------------------------------------------------
{
  "error": {
    "code": "AuthorizationFailed",
    "message": "The client '<CLIENT_ID>' with object id '<OBJECT_ID>' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/delete' over scope '/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<CLUSTER_RG>/providers/Microsoft.Storage/storageAccounts/<STORAGE_ACCOUNT>/providers/Microsoft.Authorization/roleAssignments/<ROLE_ASSIGNMENT_UUID>' or the scope is invalid. If access was recently granted, please refresh your credentials."
  }
}
--------------------------------------------------------------------------------

When I check the Azure portal, the storage account is indeed deleted so that the scope message above makes absolutely sense (scope is invalid because it doesn´t exist anymore) but shouldn´t the RoleAssignment be deleted first before the StorageAccount is even touched by ASO?

Azure Service Operator Version: v2.11.0 (via cluster-api-azure)

Expected behavior

RoleAssignments are deleted first if they exist, before moving on with their owner.
Or: If the owner resource (e.g. StorageAccount) doesn´t exist anymore -> delete the RoleAssignment object since the referenced azure resource doesn´t exist anymore anyway.

To Reproduce

  1. create a ResourceGroupobject
  2. create a StorageAccount within that resource group
  3. create a UserAssignedIdentity within that resource group
  4. assign a role to that managed identity via RoleAssignment -> owner / scope is the StorageAccount
  5. initiate a deletion either by...
    5.1. ...deleting the ResourceGroupobject or....
    5.2. ...deleting the StorageAccount object
  6. wait until the resources have been deleted
  7. check out the RoleAssignment object which still exists, though in a Failed state complaining about the scope of the deletion request

Additional context

We are using the cluster-api (CAPZ) to manage our kubernetes clusters in Azure. CAPZ includes the ASO deployment.
With that, we can not only deploy Kubernetes clusters in Azure but also configure all the other necessary components (e.g. Private Endpoints, Storage Accounts etc.).

I am aware, that the ASO version (v2.11.0) is already a bit older but I couldn´t find any github issue or feature / bugfix introducing any change on that matter (or maybe my searching skills are lacking). So I am unsure wether an update of the ASO controller would help or not.

Metadata

Metadata

Assignees

Labels

bug 🪲Something isn't workinghigh-priorityIssues we intend to prioritize (security, outage, blocking bug)

Type

No type

Projects

Status

Medium Term

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions