fix: stop using remove node label #7

nickstern2002 · 2025-11-25T21:24:11Z

What type of PR is this?

What this PR does / why we need it:

Two changes present in this PR

Dropping logic that adds the remove-node label

The ToBeDeletedByClusterAutoscaler taint will be used as an indicator that a node should be removed from the node group. The remove-node label is now returning to its original state of being controlled by a single internal controller
The Autoscaler's usage of the remove-node label was causing conflicts with the internal controller responsible for reconciling node groups.

Taking the absolute value of delta in the DecreaseTargetSize

This is fixing a bug in the current CoreWeave Autoscaler implementation. Currently, when DecreaseTargetSize is called delta is passed as a negative value. This causes the autoscaler to increase a nodePool's target size because the method subtracts delta.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

This improves the user experience when using the CoreWeave implementation of the Cluster Autoscaler. Specifically prevents scenarios such as too many nodes getting removed and node groups being mistakenly scaled up.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

…rework chore: rework cluster-autoscaler Scaleway cloudprovider integration

…r-api-changes Adjust OWNERS so that only API changes need api review

* fix: ensure vpa_recommender_vpa_objects_count UpdateModeInPlaceOrRecreate is reset * move GetUpdateModes() to helpers.go * update copyright Co-authored-by: Adrian Moisey <[email protected]> --------- Co-authored-by: Adrian Moisey <[email protected]>

…t-in-pod-auto-scaler fix: deprecated import of cacheddiscovery in vertical-pod-autoscaler

…deStartupTime Make maxNodeStartupTime configurable

fix: deprecated import of cacheddiscovery in balancer

Add support for Intel Habana Gaudi GPUs in the cluster autoscaler by: - Define ResourceIntelGPU resource name (habana.ai/gaudi) - Add Intel GPU to GPUVendorResourceNames list - Refactor GPU detection logic to iterate through all GPU vendor resource names instead of checking vendors individually This enables the autoscaler to properly detect and handle Intel GPU nodes alongside existing NVIDIA, AMD, and DirectX GPU support.

Extract the GPU allocatable detection loop into a new NodeHasGpuAllocatable helper function in utils/gpu/gpu.go. This eliminates code duplication across gpu_processor.go and makes the logic more maintainable. The new function returns both the GPU allocatable value and whether it exists, allowing callers to get both pieces of information in a single call. Changes: - Add NodeHasGpuAllocatable() helper in utils/gpu/gpu.go - Update NodeHasGpu() to use the new helper - Simplify FilterOutNodesWithUnreadyResources() in gpu_processor.go - Simplify GetNodeGpuTarget() in gpu_processor.go

…support Add Intel GPU (Habana Gaudi) autoscaler support

…encytracker Node removal latency metrics added

RixhersAjazi

LGTM, just a quick question for my understanding. Would suggest waiting on others to review as well to ensure bigger picture context isn't missing.

cluster-autoscaler/cloudprovider/coreweave/coreweave_nodegroup.go

LanceEa

a couple of open questions but the gist looks right.

cluster-autoscaler/cloudprovider/coreweave/coreweave_nodegroup.go

nickstern2002 · 2025-12-03T16:45:41Z

Okay cool, I'm not going to merge this in as it will pollute our fork's commit history. Instead I am going to get this rebased on top of the upstream's master branch and open this PR in the upstream repo. I've already cut an image from this branch's latest commit so it is good to rollout now

This will make this PR a little messy on our end but it will look the same in the upstream PR. Its better than having to ask CBS to manually hard reset our fork.

Will eventually close this PR once the Upstream has been merged in

nickstern2002 · 2025-12-03T17:12:21Z

Here is the upstream PR
kubernetes#8880

RixhersAjazi

Approving commit 6d42554

lxuan94-pp and others added 15 commits November 3, 2025 11:01

make maxNodeStartupTime configurable

bdb5d95

Add Unit Tests

83cd97c

fix: deprecated import of cacheddiscovery

dd06ef4

chore: rework Scaleway cloudprovider integration

543e339

fix gofmt test

62dd8bc

fix: deprecated import of cacheddiscovery

2f9d5b4

Merge pull request kubernetes#8782 from pablo-ruth/upstream-scaleway-…

799aabf

…rework chore: rework cluster-autoscaler Scaleway cloudprovider integration

Adjust OWNERS so that only API changes need api review

f2a54b6

Merge pull request kubernetes#8845 from adrianmoisey/adjust-owners-fo…

5d19bdf

…r-api-changes Adjust OWNERS so that only API changes need api review

Merge pull request kubernetes#8829 from gsixo/fix-of-deprecated-impor…

82018c9

…t-in-pod-auto-scaler fix: deprecated import of cacheddiscovery in vertical-pod-autoscaler

Merge pull request kubernetes#8543 from lxuan94-pp/xualiliu/oci-maxNo…

5e7f7a1

…deStartupTime Make maxNodeStartupTime configurable

Merge pull request kubernetes#8833 from gsixo/lala

7b95cb0

fix: deprecated import of cacheddiscovery in balancer

nickstern2002 self-assigned this Nov 26, 2025

k8s-ci-robot and others added 3 commits November 26, 2025 15:24

Merge pull request kubernetes#8853 from DorWeinstock/add-intel-gaudi-…

ffcbfee

…support Add Intel GPU (Habana Gaudi) autoscaler support

Adding metrics for latency of removal for unneeded/ unready nodes

97e45c5

Merge pull request kubernetes#8485 from ttetyanka/feature/deletionlat…

fb2899a

…encytracker Node removal latency metrics added

nickstern2002 force-pushed the ns/drop-remove-node-label branch from f152187 to 8ce8145 Compare December 2, 2025 21:35

nickstern2002 requested a review from a team December 2, 2025 21:58

RixhersAjazi previously approved these changes Dec 2, 2025

View reviewed changes

cluster-autoscaler/cloudprovider/coreweave/coreweave_nodegroup.go Outdated Show resolved Hide resolved

cluster-autoscaler/cloudprovider/coreweave/coreweave_nodegroup.go Show resolved Hide resolved

LanceEa reviewed Dec 3, 2025

View reviewed changes

cluster-autoscaler/cloudprovider/coreweave/coreweave_nodegroup.go Outdated Show resolved Hide resolved

cluster-autoscaler/cloudprovider/coreweave/coreweave_nodegroup.go Show resolved Hide resolved

nickstern2002 dismissed RixhersAjazi’s stale review via e761968 December 3, 2025 14:10

LanceEa previously approved these changes Dec 3, 2025

View reviewed changes

LanceEa requested a review from RixhersAjazi December 3, 2025 14:34

nickstern2002 dismissed LanceEa’s stale review via 106ffbf December 3, 2025 16:56

nickstern2002 force-pushed the ns/drop-remove-node-label branch from e761968 to 106ffbf Compare December 3, 2025 16:56

fix: drop remove node label and fix decrease size func

6d42554

nickstern2002 force-pushed the ns/drop-remove-node-label branch from 106ffbf to 6d42554 Compare December 3, 2025 17:03

RixhersAjazi approved these changes Dec 3, 2025

View reviewed changes

nickstern2002 merged commit ac281ca into master Dec 5, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: stop using remove node label #7

fix: stop using remove node label #7

Uh oh!

nickstern2002 commented Nov 25, 2025 •

edited

Loading

Uh oh!

RixhersAjazi left a comment

Uh oh!

Uh oh!

Uh oh!

LanceEa left a comment

Uh oh!

Uh oh!

Uh oh!

nickstern2002 commented Dec 3, 2025 •

edited

Loading

Uh oh!

nickstern2002 commented Dec 3, 2025

Uh oh!

RixhersAjazi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

fix: stop using remove node label #7

fix: stop using remove node label #7

Uh oh!

Conversation

nickstern2002 commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

RixhersAjazi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LanceEa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nickstern2002 commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickstern2002 commented Dec 3, 2025

Uh oh!

RixhersAjazi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

nickstern2002 commented Nov 25, 2025 •

edited

Loading

nickstern2002 commented Dec 3, 2025 •

edited

Loading