Skip to content

Fix Netty ByteBuf leak in GatewayServerErrorInjector fault injection delay paths#48880

Open
jeet1995 wants to merge 1 commit intoAzure:mainfrom
jeet1995:squad/netty-leak-fix
Open

Fix Netty ByteBuf leak in GatewayServerErrorInjector fault injection delay paths#48880
jeet1995 wants to merge 1 commit intoAzure:mainfrom
jeet1995:squad/netty-leak-fix

Conversation

@jeet1995
Copy link
Copy Markdown
Member

@jeet1995 jeet1995 commented Apr 20, 2026

Description

Fixes Netty ByteBuf leaks in GatewayServerErrorInjector when fault injection simulates response delays under HTTP/2.

Root Cause

When fault injection simulates a response delay via delayElement, the real HTTP response body was discarded without draining its ByteBuf content. Under HTTP/1.1, connection closure masks this because all ByteBufs are released when the channel closes. Under HTTP/2, the parent connection stays pooled and the undrained ByteBufs accumulate on the parent channel's allocator — causing growing memory leaks.

Fix

Two leak paths in injectGatewayServerResponseError:

Path Scenario Fix
delay >= timeout Response arrives but is discarded by .then(Mono.error(ReadTimeoutException)) — body is never consumed doOnNext drains and releases the body before delayElement buffers the response
delay < timeout In the happy path the response flows downstream after the delay and the caller drains the body normally. However, if the downstream cancels during delayElement (e.g., an outer timeout fires), delayElement silently drops the buffered HttpResponse without emitting it doOnDiscard(HttpResponse.class, ...) catches that discard and releases the body ByteBuf to prevent leaks

Both paths call releaseResponseBody() which subscribes to the body flux and ReferenceCountUtil.safeRelease()s each ByteBuf — consistent with existing patterns in ReactorNettyClient.java.

Testing

  • Needs validation with PerPartitionCircuitBreakerE2ETests (thin-client + HTTP/2 path)
  • Prior run with similar fix reduced leaks from 72 to 2 (remaining were pre-existing NIO/SslHandler leaks)

Related

@jeet1995 jeet1995 force-pushed the squad/netty-leak-fix branch 2 times, most recently from 0725648 to d360362 Compare April 20, 2026 21:33
@jeet1995 jeet1995 marked this pull request as ready for review April 20, 2026 21:37
Copilot AI review requested due to automatic review settings April 20, 2026 21:37
@jeet1995 jeet1995 requested review from a team and kirankumarkolli as code owners April 20, 2026 21:37
…delay paths

When fault injection simulates a response delay, the real HTTP response body
was discarded without draining its ByteBuf content. Under HTTP/2, this causes
ByteBuf leaks that accumulate on the long-lived parent channel.

Two leak paths fixed:
- delay >= timeout: doOnNext drains body before delayElement buffers it
- delay < timeout: doOnDiscard releases body if downstream cancels during delay

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995 jeet1995 force-pushed the squad/netty-leak-fix branch from d360362 to baf040e Compare April 20, 2026 21:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Netty ByteBuf leak in the Cosmos fault-injection layer when simulating gateway response delays under HTTP/2, by ensuring the real response body is drained/released even when the response is discarded or dropped on cancellation.

Changes:

  • Drain and release the response body before delaying when the injected delay exceeds the effective response timeout.
  • Release the response body when delayElement discards a buffered HttpResponse due to downstream cancellation (via doOnDiscard).
  • Add a helper (releaseResponseBody) to centralize the drain/release logic.

@jeet1995
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants