Describe the bug
We have discovered a possible memory leak with a very specific use-case: we use azure communication only when an application is deployed and never after. Because of this, we run into Azure Service Bus AMQP idle-timeout reconnects. From our analysis on some memory dumps from before the app gets OOM killed, we noticed that every reconnect allocated ~67–134 MB of native memory that the allocator never returned to the OS. Over 6 days and dozens of reconnect cycles, this accumulated to ~1.8–2.0 GB of invisible off-heap memory — exhausting the 6 GiB container limit.
Exception or Stack Trace
The specific chain:
Azure Service Bus broker: 300-second AMQP idle timeout
↓
broker sends amqp:connection:forced
↓
Azure SDK (Reactor/Netty): ReactorSession + RequestResponseChannel errors
↓
Netty PooledByteBufAllocator allocates new native Chunk(s) (~67–134 MB)
via sun.misc.Unsafe.allocateMemory() — bypasses ALL JVM memory metrics
↓
Old chunks returned to pool arena, but native OS pages NEVER freed
↓
RSS grows by ~67–134 MB per reconnect, permanently
↓
After ~11 reconnects: RSS at 97% of 6 GiB limit
↓
13:16:46Z: minor GC burst + CPU spike on both pods
↓
RSS crosses 6 GiB → kernel OOM killer → SIGKILL on both replicas
Setup (please complete the following information):
- Library/Libraries: azure-sdk-bom 1.3.6, netty 4.1.134-FINAL
- Java version: 21
- App Server/Environment: Openshift
- Frameworks: SpringBoot 3.5.8
Additional context
I can provide (if needed) memory dump + the analysis.
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
Describe the bug
We have discovered a possible memory leak with a very specific use-case: we use azure communication only when an application is deployed and never after. Because of this, we run into Azure Service Bus AMQP idle-timeout reconnects. From our analysis on some memory dumps from before the app gets OOM killed, we noticed that every reconnect allocated ~67–134 MB of native memory that the allocator never returned to the OS. Over 6 days and dozens of reconnect cycles, this accumulated to ~1.8–2.0 GB of invisible off-heap memory — exhausting the 6 GiB container limit.
Exception or Stack Trace
The specific chain:
Azure Service Bus broker: 300-second AMQP idle timeout
↓
broker sends amqp:connection:forced
↓
Azure SDK (Reactor/Netty): ReactorSession + RequestResponseChannel errors
↓
Netty PooledByteBufAllocator allocates new native Chunk(s) (~67–134 MB)
via sun.misc.Unsafe.allocateMemory() — bypasses ALL JVM memory metrics
↓
Old chunks returned to pool arena, but native OS pages NEVER freed
↓
RSS grows by ~67–134 MB per reconnect, permanently
↓
After ~11 reconnects: RSS at 97% of 6 GiB limit
↓
13:16:46Z: minor GC burst + CPU spike on both pods
↓
RSS crosses 6 GiB → kernel OOM killer → SIGKILL on both replicas
Setup (please complete the following information):
Additional context
I can provide (if needed) memory dump + the analysis.
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report