Skip to content

Commit b984d62

Browse files
committed
AEP-8898: Standardize VPA status condition handling
1 parent a5fcba5 commit b984d62

File tree

1 file changed

+172
-0
lines changed
  • vertical-pod-autoscaler/enhancements/8898-standardize-vpa-status-condition-handling

1 file changed

+172
-0
lines changed
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# AEP-8898: Standardize VPA status condition handling
2+
3+
<!-- toc -->
4+
- [Summary](#summary)
5+
- [Goals](#goals)
6+
- [Non-Goals](#non-goals)
7+
- [Motivation](#motivation)
8+
- [Proposal](#proposal)
9+
- [Improve existing conditions](#improve-existing-conditions)
10+
- [Add new conditions](#add-new-conditions)
11+
- [ScalingBlocked](#scalingblocked)
12+
- [ScalingRequired](#scalingrequired)
13+
- [ScalingActionSucceeded](#scalingactionsucceeded)
14+
- [Design Details](#design-details)
15+
- [Test Plan](#test-plan)
16+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
17+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
18+
- [Implementation History](#implementation-history)
19+
<!-- /toc -->
20+
21+
## Summary
22+
23+
The Condition field on resources in Kubernetes is a standard mechanism to provide reporting from a controller. The purpose of this AEP is to change the behaviour of the existing conditions, bringing them in line with modern practices, and in addition to that, add some additional conditions that will provide users with additional information that can be useful to monitor the behaviour of their VPAs.
24+
25+
### Goals
26+
27+
- Update existing conditions to conform to current [guidance](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties) from sig-architecture, by modifying their status, rather than deleting or adding conditions.
28+
- Add new conditions that indicate useful status to users
29+
30+
### Non-Goals
31+
32+
- Removing unused conditions from the API
33+
34+
## Motivation
35+
36+
The current VPA implementation handles conditions inconsistently - some conditions are deleted when they become "false" rather than having their status updated. This behavior deviates from [Kubernetes API conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties), which recommend that conditions persist and toggle between `True` and `False` status values.
37+
38+
This inconsistency causes several problems:
39+
40+
1. **Monitoring and alerting**: Tools like Prometheus that watch for condition changes cannot reliably alert on VPA state because conditions appear and disappear rather than transitioning. Users cannot easily set up alerts like "alert if ConfigUnsupported has been True for more than 5 minutes" when the condition might not exist at all.
41+
42+
2. **E2E testing**: Tests that verify VPA behavior must resort to waiting for arbitrary time periods rather than checking for specific condition states. With proper condition semantics, tests can wait for a condition like `ScalingRequired=False` to confirm that scaling has completed, providing faster and more reliable test results.
43+
44+
3. **Observability**: Users and operators lack visibility into why a VPA is or isn't taking action. New conditions like `ScalingBlocked`, `ScalingRequired`, and `ScalingActionSucceeded` provide richer information about the VPA's decision-making process.
45+
46+
## Proposal
47+
48+
This proposal is essentially two parts: Improving existing conditions and adding new conditions.
49+
50+
### Improve existing conditions
51+
52+
Existing VPA conditions will be updated to persist with `status: False` instead of being deleted when their state becomes false. The following conditions are affected:
53+
54+
- `ConfigDeprecated`
55+
- `ConfigUnsupported`
56+
- `NoPodsMatched`
57+
58+
### Add new conditions
59+
60+
In addition to changing existing conditions, this AEP also proposes to add new conditions. The purpose of this AEP is to retrofit conditions that should have always been there. The list is a work in progress and will be amended as new retrofitted conditions are required.
61+
62+
Any new feature requiring new conditions will list those conditions in that feature's AEP.
63+
64+
#### ScalingBlocked
65+
66+
Type: ScalingBlocked
67+
68+
True status reasons:
69+
70+
- InsufficientReplicas
71+
- ScalingDisabled
72+
73+
False status reasons:
74+
75+
- SufficientReplicas
76+
77+
#### ScalingRequired
78+
79+
Type: ScalingRequired
80+
81+
True status reasons:
82+
83+
- PodResourcesDiverged (pod resources differ from recommendation)
84+
85+
False status reasons:
86+
87+
- NoEligiblePodsForScaling (no pods require scaling - resources match recommendation or no pods can be safely updated)
88+
89+
#### ScalingActionSucceeded
90+
91+
Type: ScalingActionSucceeded
92+
93+
This condition is only set when a scaling action is actually attempted.
94+
95+
True status reasons:
96+
97+
- InPlaceResizeSuccessful
98+
- EvictionSuccess
99+
100+
False status reasons:
101+
102+
- InPlaceResizeFailure
103+
- EvictionFailed
104+
105+
## Design Details
106+
107+
### Component Responsibilities
108+
109+
The following table shows which VPA component is responsible for setting each condition:
110+
111+
| Condition | Component | Notes |
112+
|-----------|-----------|-------|
113+
| `ConfigDeprecated` | Recommender | Set when VPA uses deprecated configuration |
114+
| `ConfigUnsupported` | Recommender | Set when VPA configuration is invalid |
115+
| `NoPodsMatched` | Recommender | Set based on whether pods match the VPA selector |
116+
| `RecommendationProvided` | Recommender | Set when a recommendation has been calculated |
117+
| `ScalingBlocked` | Updater | Set when scaling cannot proceed (insufficient replicas, disabled mode) |
118+
| `ScalingRequired` | Updater | Set based on whether pods need resource adjustments |
119+
| `ScalingActionSucceeded` | Updater | Set after an actual scaling action (eviction or in-place resize) is attempted |
120+
121+
### Test Plan
122+
123+
**Unit Tests:**
124+
125+
- Test condition state transitions in the updater
126+
- Test that conditions persist with `status: False` rather than being deleted
127+
- Test each condition/reason combination is set correctly based on VPA state
128+
129+
**E2E Tests:**
130+
131+
- Verify `ScalingRequired=False` is set when no pods need scaling
132+
- Verify `ScalingActionSucceeded=True` is set after successful eviction or in-place resize
133+
- Verify `ScalingActionSucceeded=False` is set after failed scaling attempts
134+
- Verify `ScalingBlocked=True` with appropriate reasons when scaling cannot proceed
135+
136+
**Note:** Several E2E tests already exist and can be updated to use the new `ScalingRequired` condition instead of waiting for arbitrary timeouts.
137+
138+
### Upgrade / Downgrade Strategy
139+
140+
#### Upgrade
141+
142+
- Existing VPAs will gain new conditions (`ScalingBlocked`, `ScalingRequired`, `ScalingActionSucceeded`) on first reconciliation after upgrade
143+
- Existing conditions that were previously deleted when their state became "false" will now persist with `status: False`
144+
- This is non-breaking - clients that don't understand new conditions will ignore them per standard Kubernetes behavior
145+
146+
#### Downgrade
147+
148+
- New conditions will remain on VPA objects but won't be updated by older controllers
149+
- Older clients ignore unknown conditions (standard Kubernetes behavior)
150+
- No manual cleanup required
151+
152+
### Feature Enablement and Rollback
153+
154+
#### How can this feature be enabled / disabled in a live cluster?
155+
156+
This feature is always enabled and does not require a feature gate. The changes consist of:
157+
158+
1. **Bug fixes** to existing condition handling (persisting conditions with `status: False` instead of deleting them) - this aligns VPA with Kubernetes API conventions and is always active.
159+
160+
2. **New conditions** (`ScalingBlocked`, `ScalingRequired`, `ScalingActionSucceeded`) - these are additive and do not affect existing functionality.
161+
162+
#### Rollback
163+
164+
To rollback, downgrade the VPA components (recommender, updater) to a previous version. After rollback:
165+
166+
- New conditions will remain on VPA objects but will no longer be updated
167+
- Existing conditions will revert to the old behavior (being deleted instead of set to `False`)
168+
- No manual cleanup is required
169+
170+
## Implementation History
171+
172+
- 2025-12-07: initial version

0 commit comments

Comments
 (0)