Add exception handler on HTTP/2 parent channel to suppress WARN logs by jeet1995 · Pull Request #48687 · Azure/azure-sdk-for-java

jeet1995 · 2026-04-03T20:36:21Z

Problem

Customers see noisy Netty WARN logs in HTTP/2 scenarios:

An exceptionCaught() event was fired, and it reached at the tail of the pipeline.
io.netty.channel.unix.Errors$NativeIoException: recvAddress(..) failed with error(-104): Connection reset by peer

These are cosmetic — zero operational impact — but trigger monitoring alerts.

Root Cause

In HTTP/2, reactor-netty multiplexes streams on a shared parent TCP connection. The parent and child channels have different pipeline structures:

HTTP/1.1 pipeline (single channel — no leak to TailContext):

SslHandler → HttpClientCodec → ChannelOperationsHandler → [TAIL]
                                         ↑
                            Catches exceptions, bridges to
                            Reactor subscriber. Exception
                            never reaches TailContext.

HTTP/2 parent channel pipeline (BEFORE fix — leak to TailContext):

SslHandler → Http2FrameCodec → Http2MultiplexHandler → [TAIL]
                                                          ↑
                                            No handler catches it.
                                            TailContext logs WARN.

HTTP/2 parent channel pipeline (AFTER fix):

SslHandler → Http2FrameCodec → Http2MultiplexHandler → Http2ParentChannelExceptionHandler → [TAIL]
                                                                   ↑
                                                       Consumes ALL exceptions.
                                                       Log level based on connection state.

HTTP/2 child stream channel pipeline (unchanged):

H2ToHttp11Codec → IdleStateHandler → ChannelOperationsHandler → [TAIL]
                                              ↑
                               Same as HTTP/1.1 — catches exceptions,
                               bridges to Reactor subscriber.

Design: Connection-State-Based Log Level

The handler consumes all exceptions on the parent channel (no exception type filtering). The log level is determined by connection state:

DEBUG — when activeStreams == 0 OR !channelActive. No in-flight requests are affected (e.g., TCP RST from LB idle timeout, post-close cleanup).
WARN — when active streams exist on a live channel. The exception may affect in-flight requests.

Active streams	Channel active	Log level	Rationale
0	true/false	DEBUG	Idle connection — exception is noise
>0	false	DEBUG	Channel already dead — streams will fail via subscriber
>0	true	WARN	Live requests may be affected — worth alerting

Active stream count is retrieved via Http2FrameCodec.connection().numActiveStreams() on the same parent channel pipeline. Falls back to -1 if the codec is unavailable, which takes the safe WARN path.

Why no exception type filtering?

By the time any exception reaches our handler, all upstream handlers (Http2FrameCodec, Http2MultiplexHandler) have already handled the protocol actions (GOAWAY, stream reset, child channel error delivery). The exception reaching TailContext is an echo of already-handled work, regardless of type. Connection state (active streams + channel activity) is the only dimension that determines whether the exception has diagnostic value.

Why OR (not AND) for the DEBUG condition?

Either condition alone is sufficient to determine the exception is noise:

activeStreams == 0 — no in-flight requests affected, regardless of channel state
!channelActive — channel is already dead, any active streams will fail through their Reactor subscribers independently

Testing

5 EmbeddedChannel unit tests with production-matching pipeline (Http2FrameCodec → Http2MultiplexHandler → handler):

Test	What it proves
`withoutHandler_exceptionReachesTail`	BEFORE: exception reaches TailContext → WARN
`withHandler_zeroActiveStreams_consumedAtDebug`	0 active streams → consumed at DEBUG
`withHandler_exceptionDoesNotCloseChannel`	Handler does NOT close channel
`withHandler_runtimeException_zeroActiveStreams_consumed`	RuntimeException also consumed (no type filtering)
`withHandler_npe_zeroActiveStreams_consumed`	NPE also consumed (no type filtering)

Note: The !channelActive branch cannot be unit-tested with EmbeddedChannel because disconnect() tears down the pipeline before fireExceptionCaught can reach handlers. In production, exceptionCaught() fires while the channel is transitioning to inactive.

Impact

Zero perf impact — handler only overrides exceptionCaught(), Netty @Skip optimization bypasses it for all hot-path events
Zero lifecycle change — handler does NOT close the channel
Telemetry preserved — exceptions with active streams on a live channel still log at WARN
HTTP/1.1 parity — matches ChannelOperationsHandler behavior for connection-level exceptions

Copilot

Pull request overview

This PR addresses noisy Netty WARN logs that occur on HTTP/2 parent (TCP) connections when idle connections are reset, by installing a parent-channel exception handler to consume those exceptions at DEBUG level and close the channel (matching HTTP/1.1 behavior).

Changes:

Install an HTTP/2 parent-channel exception handler during doOnConnected by accessing connection.channel().parent().
Add Http2ParentChannelExceptionHandler to log parent-channel exceptions at DEBUG and close the channel.
Document the fix in azure-cosmos CHANGELOG.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/ReactorNettyClient.java	Adds logic to install a parent (TCP) channel exception handler for HTTP/2 to suppress TailContext WARN logs.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/Http2ParentChannelExceptionHandler.java	Introduces a Netty inbound handler that consumes/logs exceptions at DEBUG and closes the parent channel.
sdk/cosmos/azure-cosmos/CHANGELOG.md	Adds a “Bugs Fixed” entry describing the WARN suppression for HTTP/2 parent-channel exceptions.

Copilot · 2026-04-03T20:41:03Z


 #### Bugs Fixed
 Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/)
+* Fixed Netty WARN log "An exceptionCaught() event was fired, and it reached at the tail of the pipeline" appearing on HTTP/2 connections when the server resets idle TCP connections. Added an exception handler on the HTTP/2 parent channel to consume connection-level exceptions at DEBUG level, matching HTTP/1.1 behavior. - See [PR TBD](https://github.com/Azure/azure-sdk-for-java/pull/TBD)


CHANGELOG entry still contains a placeholder PR reference ("PR TBD" and a /pull/TBD link). This will be a broken link in released notes; please replace with the actual PR number (or remove the link if not available).

Suggested change

* Fixed Netty WARN log "An exceptionCaught() event was fired, and it reached at the tail of the pipeline" appearing on HTTP/2 connections when the server resets idle TCP connections. Added an exception handler on the HTTP/2 parent channel to consume connection-level exceptions at DEBUG level, matching HTTP/1.1 behavior. - See [PR TBD](https://github.com/Azure/azure-sdk-for-java/pull/TBD)

* Fixed Netty WARN log "An exceptionCaught() event was fired, and it reached at the tail of the pipeline" appearing on HTTP/2 connections when the server resets idle TCP connections. Added an exception handler on the HTTP/2 parent channel to consume connection-level exceptions at DEBUG level, matching HTTP/1.1 behavior.

Copilot · 2026-04-03T20:41:03Z

@@ -8,6 +8,7 @@

 #### Bugs Fixed
 Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/)


In the Unreleased 'Bugs Fixed' section, the first entry is not formatted as a bullet while the newly added entry is. Please make the formatting consistent (typically all entries in these sections are bulleted).

Suggested change

Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/)

* Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/)

Copilot · 2026-04-03T20:41:03Z

+                    Channel parent = connection.channel().parent();
+                    if (parent != null
+                        && parent.pipeline().get(Http2ParentChannelExceptionHandler.HANDLER_NAME) == null) {
+
+                        parent.pipeline().addLast(
+                            Http2ParentChannelExceptionHandler.HANDLER_NAME,
+                            new Http2ParentChannelExceptionHandler());


The handler installation on the shared HTTP/2 parent channel uses a non-atomic check-then-add (pipeline().get(name) == null then addLast). If multiple stream channels attempt this concurrently, addLast can throw due to a duplicate handler name. Consider making installation robust (e.g., run the add on the parent channel's event loop and/or catch the duplicate-name exception and ignore it) so new stream creation can’t fail because of a race.

Copilot · 2026-04-03T20:41:04Z

+                    // Install exception handler on the HTTP/2 parent (TCP) channel.
+                    // In H2, doOnConnected fires for stream (child) channels — channel.parent()
+                    // is the TCP connection. The parent pipeline has no ChannelOperationsHandler
+                    // (unlike H1.1), so TCP-level exceptions (RST, broken pipe) propagate to
+                    // Netty's TailContext and get logged as WARN. This handler matches H1.1
+                    // behavior by consuming exceptions at DEBUG level.
+                    Channel parent = connection.channel().parent();
+                    if (parent != null
+                        && parent.pipeline().get(Http2ParentChannelExceptionHandler.HANDLER_NAME) == null) {
+
+                        parent.pipeline().addLast(
+                            Http2ParentChannelExceptionHandler.HANDLER_NAME,
+                            new Http2ParentChannelExceptionHandler());
+                    }


This change introduces new behavior (consuming parent-channel exceptions and closing the parent connection) without accompanying test coverage. There are existing Netty/transport tests in azure-cosmos-tests (e.g., ones that use EmbeddedChannel); please add a unit/integration test that asserts the handler is installed on the H2 parent pipeline and that an exception on the parent is consumed (no TailContext WARN) and results in the parent channel closing.

Copilot · 2026-04-03T20:41:04Z

+    public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
+        if (logger.isDebugEnabled()) {
+            logger.debug("Exception on HTTP/2 parent connection [id:{}]: {}",
+                ctx.channel().id().asShortText(), cause.getMessage(), cause);


exceptionCaught currently logs only cause.getMessage() in the formatted message. For many Netty exceptions the message can be null/empty, reducing diagnostics. Consider logging the exception class (e.g., cause.toString()) in the formatted portion so the DEBUG log remains useful even when getMessage() is null.

Suggested change

ctx.channel().id().asShortText(), cause.getMessage(), cause);

ctx.channel().id().asShortText(), cause.toString(), cause);

In HTTP/2, reactor-netty multiplexes streams on a shared parent TCP connection. The parent channel pipeline has no ChannelOperationsHandler (unlike HTTP/1.1), so TCP-level exceptions like Connection reset by peer (ECONNRESET) propagate to Netty's TailContext, which logs them as WARN. This adds Http2ParentChannelExceptionHandler to the parent channel via doOnConnected (accessing channel.parent()). The handler consumes exceptions at DEBUG level WITHOUT closing the channel or altering connection lifecycle, matching HTTP/1.1 logging behavior. Changes: - Handler logs cause.toString() (not getMessage()) for null-safe diagnostics - Defensive try-catch for duplicate handler name on concurrent stream creation - Before/after verified with EmbeddedChannel unit tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jeet1995 · 2026-04-21T19:20:03Z

Superseded by #48890 (re-raised from fork for push access).

jeet1995 requested review from a team and kirankumarkolli as code owners April 3, 2026 20:36

Copilot AI review requested due to automatic review settings April 3, 2026 20:36

github-actions Bot added the Cosmos label Apr 3, 2026

Copilot started reviewing on behalf of jeet1995 April 3, 2026 20:37 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

jeet1995 marked this pull request as draft April 3, 2026 20:42

jeet1995 force-pushed the AzCosmos_Http2ParentChannelExceptionHandler branch 2 times, most recently from 0ec7d77 to e5f9537 Compare April 3, 2026 21:31

jeet1995 force-pushed the AzCosmos_Http2ParentChannelExceptionHandler branch 3 times, most recently from 0da600f to acea1fb Compare April 20, 2026 23:26

jeet1995 force-pushed the AzCosmos_Http2ParentChannelExceptionHandler branch from acea1fb to 3e2b4f4 Compare April 20, 2026 23:34

jeet1995 marked this pull request as ready for review April 21, 2026 00:55

jeet1995 mentioned this pull request Apr 21, 2026

Add exception handler on HTTP/2 parent channel to suppress WARN logs #48890

Open

jeet1995 closed this Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add exception handler on HTTP/2 parent channel to suppress WARN logs#48687

Add exception handler on HTTP/2 parent channel to suppress WARN logs#48687
jeet1995 wants to merge 1 commit intomainfrom
AzCosmos_Http2ParentChannelExceptionHandler

jeet1995 commented Apr 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

jeet1995 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	* Fixed Netty WARN log "An exceptionCaught() event was fired, and it reached at the tail of the pipeline" appearing on HTTP/2 connections when the server resets idle TCP connections. Added an exception handler on the HTTP/2 parent channel to consume connection-level exceptions at DEBUG level, matching HTTP/1.1 behavior. - See [PR TBD](https://github.com/Azure/azure-sdk-for-java/pull/TBD)
	* Fixed Netty WARN log "An exceptionCaught() event was fired, and it reached at the tail of the pipeline" appearing on HTTP/2 connections when the server resets idle TCP connections. Added an exception handler on the HTTP/2 parent channel to consume connection-level exceptions at DEBUG level, matching HTTP/1.1 behavior.

		@@ -8,6 +8,7 @@

		#### Bugs Fixed
		Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/)

	Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/)
	* Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/)

	ctx.channel().id().asShortText(), cause.getMessage(), cause);
	ctx.channel().id().asShortText(), cause.toString(), cause);

Conversation

jeet1995 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

HTTP/1.1 pipeline (single channel — no leak to TailContext):

HTTP/2 parent channel pipeline (BEFORE fix — leak to TailContext):

HTTP/2 parent channel pipeline (AFTER fix):

HTTP/2 child stream channel pipeline (unchanged):

Design: Connection-State-Based Log Level

Why no exception type filtering?

Why OR (not AND) for the DEBUG condition?

Testing

Impact

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

jeet1995 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeet1995 commented Apr 3, 2026 •

edited

Loading