-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add exception handler on HTTP/2 parent channel to suppress WARN logs #48890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jeet1995
merged 17 commits into
Azure:main
from
jeet1995:AzCosmos_Http2ParentChannelExceptionHandler
Apr 24, 2026
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
d30c4cf
Add exception handler on HTTP/2 parent channel to suppress WARN logs
jeet1995 5f84575
Revert azure-cosmos-tests pom.xml changes
jeet1995 2af3e52
Clarify why duplicate-name is the only possible IAE in handler install
jeet1995 fc3fe78
Move static utility method to bottom of test file
jeet1995 2a3b5b2
Address Copilot review: fix comment accuracy, remove duplicate cause.…
jeet1995 2eac1ce
Add tests for activeStreams > 0 path (WARN branch)
jeet1995 d6d6a09
Add test for codec-absent fallback path
jeet1995 6c26724
Address review: nullable getActiveStreamCount, Error propagation, deb…
jeet1995 fe29c0c
Fix: install H2 exception handler on correct channel
jeet1995 8495236
Simplify handler installation — use channelPipeline directly
jeet1995 08a5de5
Address Kushagra review: log addresses, make handler @Sharable singleton
jeet1995 76a3760
Use string append for log messages, pass channel directly
jeet1995 8166748
Use channel= prefix in log messages for clarity
jeet1995 2a2a75e
Add clientVmId to exception handler log messages
jeet1995 233b748
Fix: resolve vmId eagerly on caller thread, not event loop
jeet1995 ffb98ad
Fix: use non-blocking getCachedMachineId() for vmId
jeet1995 c13dd12
Use n/a fallback for unresolved vmId in logs
jeet1995 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
246 changes: 246 additions & 0 deletions
246
...est/java/com/azure/cosmos/implementation/http/Http2ParentChannelExceptionHandlerTest.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,246 @@ | ||
| // Copyright (c) Microsoft Corporation. All rights reserved. | ||
| // Licensed under the MIT License. | ||
|
|
||
| package com.azure.cosmos.implementation.http; | ||
|
|
||
| import io.netty.channel.ChannelInboundHandlerAdapter; | ||
| import io.netty.channel.embedded.EmbeddedChannel; | ||
| import io.netty.handler.codec.http2.Http2FrameCodec; | ||
| import io.netty.handler.codec.http2.Http2FrameCodecBuilder; | ||
| import io.netty.handler.codec.http2.Http2MultiplexHandler; | ||
| import org.testng.annotations.Test; | ||
|
|
||
| import java.io.IOException; | ||
|
|
||
| import static org.assertj.core.api.Assertions.assertThat; | ||
| import static org.assertj.core.api.Assertions.assertThatThrownBy; | ||
|
|
||
| /** | ||
| * Verifies that {@link Http2ParentChannelExceptionHandler} uses connection | ||
| * state — active stream count and channel activity — to determine whether | ||
| * exceptions are logged at DEBUG (suppressed) or WARN (preserved). | ||
| * Exception type is NOT a filtering dimension. | ||
| * | ||
| * The EmbeddedChannel is configured to mirror the production HTTP/2 parent | ||
| * channel pipeline: | ||
| * <pre> | ||
| * Http2FrameCodec → Http2MultiplexHandler → Http2ParentChannelExceptionHandler → TailContext | ||
| * </pre> | ||
| * (SslHandler is omitted because it requires an SSLContext and is not relevant | ||
| * to exception propagation behavior.) | ||
| * | ||
| * {@code checkException()} re-throws any exception that reached the pipeline tail. | ||
| */ | ||
| public class Http2ParentChannelExceptionHandlerTest { | ||
|
|
||
| /** | ||
| * BEFORE fix — without the handler, exceptions reach the pipeline tail. | ||
| * EmbeddedChannel's checkException() re-throws the unhandled exception, | ||
| * proving it reached Netty's TailContext (which in production logs as WARN). | ||
| */ | ||
| @Test(groups = "unit") | ||
| public void withoutHandler_exceptionReachesTail() { | ||
| EmbeddedChannel channel = createH2ParentChannel(false); | ||
|
|
||
| channel.pipeline().fireExceptionCaught( | ||
| new IOException("Connection reset by peer")); | ||
|
|
||
| assertThatThrownBy(channel::checkException) | ||
| .isInstanceOf(IOException.class) | ||
| .hasMessageContaining("Connection reset by peer"); | ||
|
|
||
| channel.finishAndReleaseAll(); | ||
| } | ||
|
|
||
| /** | ||
| * With handler — exception on idle connection (0 active streams) is | ||
| * consumed at DEBUG. The suppression is based on connection state | ||
| * (no active streams), not exception type. | ||
| * | ||
| * In production, channel.isActive() transitions to false during the | ||
| * RST handling cycle, satisfying the OR condition. In EmbeddedChannel | ||
| * we can only verify the activeStreams == 0 branch. | ||
| */ | ||
| @Test(groups = "unit") | ||
| public void withHandler_zeroActiveStreams_consumedAtDebug() { | ||
| EmbeddedChannel channel = createH2ParentChannel(true); | ||
|
|
||
| Http2FrameCodec codec = channel.pipeline().get(Http2FrameCodec.class); | ||
| assertThat(codec).isNotNull(); | ||
| assertThat(codec.connection().numActiveStreams()).isEqualTo(0); | ||
|
|
||
| channel.pipeline().fireExceptionCaught( | ||
| new IOException("recvAddress(..) failed with error(-104): Connection reset by peer")); | ||
|
|
||
| // Exception consumed — does NOT reach tail | ||
| channel.checkException(); | ||
|
|
||
| channel.finishAndReleaseAll(); | ||
| } | ||
|
|
||
| /** | ||
| * Handler does not close the channel — connection lifecycle is managed | ||
| * by reactor-netty's pool eviction, not by this handler. | ||
| */ | ||
| @Test(groups = "unit") | ||
| public void withHandler_exceptionDoesNotCloseChannel() { | ||
| EmbeddedChannel channel = createH2ParentChannel(true); | ||
|
|
||
| assertThat(channel.isActive()).isTrue(); | ||
|
|
||
| channel.pipeline().fireExceptionCaught( | ||
| new IOException("Connection reset by peer")); | ||
|
|
||
| channel.checkException(); | ||
| assertThat(channel.isOpen()).isTrue(); | ||
|
|
||
| channel.finishAndReleaseAll(); | ||
| } | ||
|
|
||
| /** | ||
| * RuntimeException on idle connection is also consumed — suppression | ||
| * is based on connection state, not exception type. | ||
| */ | ||
| @Test(groups = "unit") | ||
| public void withHandler_runtimeException_zeroActiveStreams_consumed() { | ||
| EmbeddedChannel channel = createH2ParentChannel(true); | ||
|
|
||
| channel.pipeline().fireExceptionCaught( | ||
| new RuntimeException("Unexpected state error")); | ||
|
|
||
| channel.checkException(); | ||
|
|
||
| channel.finishAndReleaseAll(); | ||
| } | ||
|
|
||
| /** | ||
| * NullPointerException on idle connection is also consumed — same | ||
| * connection-state-based suppression regardless of exception type. | ||
| */ | ||
| @Test(groups = "unit") | ||
| public void withHandler_npe_zeroActiveStreams_consumed() { | ||
| EmbeddedChannel channel = createH2ParentChannel(true); | ||
|
|
||
| channel.pipeline().fireExceptionCaught( | ||
| new NullPointerException("handler bug")); | ||
|
|
||
| channel.checkException(); | ||
|
|
||
| channel.finishAndReleaseAll(); | ||
| } | ||
|
|
||
| /** | ||
| * With handler — exception on a connection with active streams is | ||
| * consumed (does not reach TailContext). The handler logs at WARN | ||
| * instead of DEBUG because in-flight requests may be affected. | ||
| */ | ||
| @Test(groups = "unit") | ||
| public void withHandler_activeStreams_consumedAtWarn() throws Exception { | ||
| EmbeddedChannel channel = createH2ParentChannel(true); | ||
|
|
||
| Http2FrameCodec codec = channel.pipeline().get(Http2FrameCodec.class); | ||
| assertThat(codec).isNotNull(); | ||
|
|
||
| // Create an active stream (client-initiated, odd stream ID) | ||
| codec.connection().local().createStream(1, false); | ||
| assertThat(codec.connection().numActiveStreams()).isEqualTo(1); | ||
| assertThat(channel.isActive()).isTrue(); | ||
|
|
||
| channel.pipeline().fireExceptionCaught( | ||
| new IOException("Connection reset by peer")); | ||
|
|
||
| // Exception consumed — does NOT reach tail, even with active streams | ||
| channel.checkException(); | ||
|
|
||
| channel.finishAndReleaseAll(); | ||
| } | ||
|
|
||
| /** | ||
| * Handler does not close the channel even when active streams exist — | ||
| * connection lifecycle is managed by reactor-netty's pool eviction. | ||
| */ | ||
| @Test(groups = "unit") | ||
| public void withHandler_activeStreams_channelNotClosed() throws Exception { | ||
| EmbeddedChannel channel = createH2ParentChannel(true); | ||
|
|
||
| Http2FrameCodec codec = channel.pipeline().get(Http2FrameCodec.class); | ||
| assertThat(codec).isNotNull(); | ||
|
|
||
| codec.connection().local().createStream(1, false); | ||
| assertThat(codec.connection().numActiveStreams()).isEqualTo(1); | ||
| assertThat(channel.isActive()).isTrue(); | ||
|
|
||
| channel.pipeline().fireExceptionCaught( | ||
| new IOException("Connection reset by peer")); | ||
|
|
||
| channel.checkException(); | ||
| assertThat(channel.isOpen()).isTrue(); | ||
|
|
||
| channel.finishAndReleaseAll(); | ||
| } | ||
|
|
||
| /** | ||
| * With handler — when Http2FrameCodec is absent from the pipeline, | ||
| * getActiveStreamCount() returns null. Since the active stream count | ||
| * is unknown and the channel is active, the handler takes the safe | ||
| * WARN path. This covers the fallback behavior when the codec is | ||
| * unavailable (e.g., torn down during shutdown). | ||
| */ | ||
| @Test(groups = "unit") | ||
| public void withHandler_codecAbsent_fallsBackToWarnPath() { | ||
| EmbeddedChannel channel = new EmbeddedChannel( | ||
| Http2ParentChannelExceptionHandler.INSTANCE); | ||
|
|
||
| assertThat(channel.pipeline().get(Http2FrameCodec.class)).isNull(); | ||
| assertThat(channel.isActive()).isTrue(); | ||
|
|
||
| channel.pipeline().fireExceptionCaught( | ||
| new IOException("Connection reset by peer")); | ||
|
|
||
| // Exception consumed — does NOT reach tail | ||
| channel.checkException(); | ||
| assertThat(channel.isOpen()).isTrue(); | ||
|
|
||
| channel.finishAndReleaseAll(); | ||
| } | ||
|
|
||
| /** | ||
| * Error types (e.g., OutOfMemoryError) are NOT consumed by the handler — | ||
| * they propagate to TailContext. This ensures JVM-level errors are never | ||
| * silently swallowed. | ||
| */ | ||
| @Test(groups = "unit") | ||
| public void withHandler_errorNotConsumed_propagatesToTail() { | ||
| EmbeddedChannel channel = createH2ParentChannel(true); | ||
|
|
||
| channel.pipeline().fireExceptionCaught( | ||
| new OutOfMemoryError("test OOM")); | ||
|
|
||
| assertThatThrownBy(channel::checkException) | ||
| .isInstanceOf(OutOfMemoryError.class) | ||
| .hasMessageContaining("test OOM"); | ||
|
|
||
| channel.finishAndReleaseAll(); | ||
| } | ||
|
|
||
| /** | ||
| * Creates an EmbeddedChannel matching the production HTTP/2 parent channel | ||
| * pipeline (minus SslHandler): Http2FrameCodec → Http2MultiplexHandler → | ||
| * Http2ParentChannelExceptionHandler. | ||
| */ | ||
| private static EmbeddedChannel createH2ParentChannel(boolean withExceptionHandler) { | ||
| Http2FrameCodec codec = Http2FrameCodecBuilder.forClient() | ||
| .autoAckSettingsFrame(true) | ||
| .build(); | ||
|
|
||
| Http2MultiplexHandler multiplexHandler = new Http2MultiplexHandler( | ||
| new ChannelInboundHandlerAdapter()); | ||
|
|
||
| if (withExceptionHandler) { | ||
| return new EmbeddedChannel(codec, multiplexHandler, | ||
| Http2ParentChannelExceptionHandler.INSTANCE); | ||
| } else { | ||
| return new EmbeddedChannel(codec, multiplexHandler); | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
104 changes: 104 additions & 0 deletions
104
...rc/main/java/com/azure/cosmos/implementation/http/Http2ParentChannelExceptionHandler.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| // Copyright (c) Microsoft Corporation. All rights reserved. | ||
| // Licensed under the MIT License. | ||
|
|
||
| package com.azure.cosmos.implementation.http; | ||
|
|
||
| import io.netty.channel.ChannelHandler; | ||
| import io.netty.channel.ChannelHandlerContext; | ||
| import io.netty.channel.ChannelInboundHandlerAdapter; | ||
| import io.netty.handler.codec.http2.Http2FrameCodec; | ||
| import com.azure.cosmos.implementation.clienttelemetry.ClientTelemetry; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| /** | ||
| * Exception handler for the HTTP/2 parent (TCP) channel pipeline. | ||
| * <p> | ||
| * In HTTP/2, reactor-netty multiplexes streams on a shared parent TCP connection. | ||
| * Child stream channels have {@code ChannelOperationsHandler} which catches exceptions | ||
| * and fails the active subscriber (matching HTTP/1.1 behavior). However, the parent | ||
| * channel has no such handler — exceptions propagate to Netty's {@code TailContext} | ||
| * which logs them as WARN ("An exceptionCaught() event was fired, and it reached at | ||
| * the tail of the pipeline"). | ||
| * <p> | ||
| * This handler consumes {@link Exception}s on the parent channel and uses connection | ||
| * state to decide the log level: | ||
| * <ul> | ||
| * <li><b>DEBUG</b> — when {@code activeStreams == 0} OR {@code !channelActive}. | ||
| * No in-flight requests are affected (e.g., TCP RST from LB idle timeout, | ||
| * post-close cleanup).</li> | ||
| * <li><b>WARN</b> — when active streams exist on a live channel, or when the | ||
| * active stream count cannot be determined. The exception may affect | ||
| * in-flight requests and is worth alerting on.</li> | ||
| * </ul> | ||
| * <p> | ||
| * {@link Error} types (e.g., {@code OutOfMemoryError}) are never consumed — they | ||
| * propagate to {@code TailContext} for standard Netty handling. | ||
| * <p> | ||
| * The handler does NOT close the channel or alter connection lifecycle — reactor-netty | ||
| * and the connection pool's eviction predicate ({@code !channel.isActive()}) handle that | ||
| * independently. | ||
| * | ||
| * @see ReactorNettyClient#configureChannelPipelineHandlers() | ||
| */ | ||
| @ChannelHandler.Sharable | ||
| final class Http2ParentChannelExceptionHandler extends ChannelInboundHandlerAdapter { | ||
|
|
||
| static final Http2ParentChannelExceptionHandler INSTANCE = new Http2ParentChannelExceptionHandler(); | ||
|
|
||
| static final String HANDLER_NAME = "cosmosH2ParentExceptionHandler"; | ||
|
|
||
| private static final Logger logger = LoggerFactory.getLogger(Http2ParentChannelExceptionHandler.class); | ||
|
|
||
| private Http2ParentChannelExceptionHandler() { | ||
| } | ||
|
|
||
| @Override | ||
| public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) { | ||
| // Do not consume JVM-level errors (OOM, StackOverflow, etc.) — let them | ||
| // propagate to TailContext for standard Netty handling. | ||
| if (cause instanceof Error) { | ||
| ctx.fireExceptionCaught(cause); | ||
| return; | ||
| } | ||
|
|
||
| Integer activeStreams = getActiveStreamCount(ctx); | ||
| boolean channelActive = ctx.channel().isActive(); | ||
|
|
||
| if ((activeStreams != null && activeStreams == 0) || !channelActive) { | ||
| // No active streams OR channel already inactive — exception is noise | ||
| // (e.g., TCP RST from LB idle timeout, post-close cleanup). | ||
| if (logger.isDebugEnabled()) { | ||
| logger.debug( | ||
| "Exception on HTTP/2 parent connection" | ||
| + " [channel=" + ctx.channel() | ||
| + ", activeStreams=" + activeStreams | ||
| + ", channelActive=" + channelActive | ||
| + ", clientVmId=" + ClientTelemetry.getCachedMachineId() + "]", | ||
| cause); | ||
| } | ||
| } else { | ||
| // Active streams on a live channel, or stream count unknown (null) — | ||
| // exception may affect in-flight requests. | ||
| logger.warn( | ||
|
jeet1995 marked this conversation as resolved.
|
||
| "Exception on HTTP/2 parent connection" | ||
| + " [channel=" + ctx.channel() | ||
| + ", activeStreams=" + activeStreams | ||
| + ", channelActive=" + channelActive | ||
| + ", clientVmId=" + ClientTelemetry.getCachedMachineId() + "]", | ||
| cause); | ||
| } | ||
| } | ||
|
|
||
| private static Integer getActiveStreamCount(ChannelHandlerContext ctx) { | ||
| try { | ||
| Http2FrameCodec codec = ctx.pipeline().get(Http2FrameCodec.class); | ||
| if (codec != null) { | ||
| return codec.connection().numActiveStreams(); | ||
| } | ||
| } catch (Exception e) { | ||
| logger.debug("Failed to retrieve active stream count from Http2FrameCodec", e); | ||
| } | ||
| return null; | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.