Skip to content

Memory: configure and enforce unfreed memory threshold#43783

Open
jronak wants to merge 1 commit intoenvoyproxy:mainfrom
jronak:memory-threshold-enforcement
Open

Memory: configure and enforce unfreed memory threshold#43783
jronak wants to merge 1 commit intoenvoyproxy:mainfrom
jronak:memory-threshold-enforcement

Conversation

@jronak
Copy link
Copy Markdown
Contributor

@jronak jronak commented Mar 5, 2026

Commit Message: Currently xds and admin can trigger memory release from either of the allocators by invoking the utils static APIs. These APIs have static thresholds today, making it difficult to configure the limits using bootstrap configuration.

In production, our xds configs have large footprints causing memory churns over 800MB. Repeated churns cause tcmalloc central collector stalls due to large cleanup operations (especially gperftools), impacting worker threads waiting for memory from central free list due to thread cache exhaustion.

This PR introduces a configurable minimum_unfreed_memory_bytes_threshold to the MemoryAllocatorManager bootstrap configuration. Memory release is only triggered when unfreed memory exceeds this threshold, defaulting to 100MB. Setting it to 0 disables the threshold check.

This is the first PR in a stack, scoped to core threshold configuration and enforcement. Follow-up PRs will pass MemoryAllocatorManager through the subscription factory, gRPC mux, and xDS manager, and remove the remaining static Memory::Utils calls.

Split from #41892 for easier review.

Additional Description: NA
Risk Level: Low
Testing: Unit test
Docs Changes: NA
Release Notes: NA
Platform Specific Features: NA

Signed-off-by: Ronak Jain <ronakjainc@gmail.com>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @adisuissa
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #43783 was opened by jronak.

see: more, trace.

@paul-r-gall
Copy link
Copy Markdown
Contributor

@nezdolik ping for "stalled" bit.

Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Left an API comment.

// Minimum unfreed memory bytes threshold that triggers memory release during admin handler.
// If the unfreed memory bytes is less than this value, no memory release will occur.
// If set to ``0``, the threshold is disabled and memory release will be attempted whenever unfreed
// memory is available (subject to allocator support). Defaults to 100 MB.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about the default value: why was it set to 100MB, and is this the actual default value that was used prior to this PR? If the current default value is different, then this is a semi-breaking-change, and should be clearly declared as such (in the release notes).
In general, it is preferred to keep the same default behavior (so Envoy users that upgrade their binaries don't see an unexpected change), unless the new default value makes more sense for all/most deployments.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 It looks like this PR changes the prior behavior of heap shrinker (also the amount of bytes that will be released). Can we keep it aligned with past default behavior (when this config option is not enabled)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-checked the code this PR was based on, and the 100MB threshold was already the existing hardcoded behavior in the old shrink path, so that part of the change was intended to preserve the previous default.

I agree with @nezdolik’s point about the behavioral change on the tcmalloc path. The intent here was to keep the existing behavior backward compatible while moving the ownership/plumbing, but this refactor ended up switching the immediate shrink path away from the old fixed release behavior. That was not intentional.


void AllocatorManager::releaseFreeMemory() {
#if defined(TCMALLOC)
tcmallocRelease();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, with this new code path this will only release bytes_to_release_ amount of bytes, while before it would try to release maxUnfreedMemoryBytes().

// Minimum unfreed memory bytes threshold that triggers memory release during admin handler.
// If the unfreed memory bytes is less than this value, no memory release will occur.
// If set to ``0``, the threshold is disabled and memory release will be attempted whenever unfreed
// memory is available (subject to allocator support). Defaults to 100 MB.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 It looks like this PR changes the prior behavior of heap shrinker (also the amount of bytes that will be released). Can we keep it aligned with past default behavior (when this config option is not enabled)?

@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Mar 25, 2026

could the latest heapshrinker achieve similar target? See #43321

@nezdolik
Copy link
Copy Markdown
Member

nezdolik commented Apr 7, 2026

/wait

@jronak
Copy link
Copy Markdown
Contributor Author

jronak commented Apr 16, 2026

could the latest heapshrinker achieve similar target? See #43321

Given the overlap with the newer changes, I don’t see value in continuing this PR in its current shape. Right now we have multiple memory-release paths with slightly different ownership and semantics, which is part of why
this became messy during the refactor. I still think there is value in a narrower cleanup that moves this ownership
into AllocatorManager and makes the immediate shrink paths use a single plumbing path without changing the default behavior.

If maintainers think that direction still makes sense, I can cut a smaller follow-up focused only on that refactor.
Otherwise I’m happy to close this out here. @wbpcode @nezdolik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants