Skip to content

[cosmos] Fix ClientTelemetry static-init NPE when IMDS access is disabled#48888

Merged
FabianMeiswinkel merged 4 commits intoAzure:mainfrom
FabianMeiswinkel:users/fabianm/IMDSFix
Apr 21, 2026
Merged

[cosmos] Fix ClientTelemetry static-init NPE when IMDS access is disabled#48888
FabianMeiswinkel merged 4 commits intoAzure:mainfrom
FabianMeiswinkel:users/fabianm/IMDSFix

Conversation

@FabianMeiswinkel
Copy link
Copy Markdown
Member

Description

Fixes a NullPointerException thrown during ClientTelemetry.<clinit> when IMDS access is disabled via COSMOS_DISABLE_IMDS_ACCESS=true (env var) or -DCOSMOS.DISABLE_IMDS_ACCESS=true (system property). The error makes ClientTelemetry permanently un-loadable for the lifetime of the JVM, so every subsequent CosmosClientBuilder.buildAsyncClient() fails with NoClassDefFoundError.

First shipped in 4.79.0 when CACHED_METADATA was introduced as a statically-initialized field; 4.78.0 is unaffected.

Root cause

In ClientTelemetry the fields are declared in this order:

private static final Mono<AzureVMMetadata> CACHED_METADATA = fetchAzureVmMetadata().cache();  // (1)
private static final AzureVMMetadata METADATA_NOT_AVAILABLE = new AzureVMMetadata();          // (2)

<clinit> evaluates them top-down. At (1), fetchAzureVmMetadata() runs and, when IMDS is disabled, eagerly returns:

return Mono.just(METADATA_NOT_AVAILABLE);

But METADATA_NOT_AVAILABLE hasn't been assigned yet, so Mono.just(null) throws NPE via Objects.requireNonNull. This escapes as ExceptionInInitializerError, and every later reference to the class surfaces as NoClassDefFoundError: Could not initialize class ... ClientTelemetry.

Fix

Declare METADATA_NOT_AVAILABLE before CACHED_METADATA so the sentinel exists before fetchAzureVmMetadata() can read it. A short comment documents the ordering invariant.

Diff is 2 files, +7/-3 , reordering only, no behavioral change.

Why it matters

COSMOS_DISABLE_IMDS_ACCESS is the documented opt-out for non-Azure environments (local dev, CI, other clouds), where IMDS at 169.254.169.254 is not routable and the blind probe costs a multi-second timeout per cold start. Users who set it today on 4.79.x cannot build a Cosmos client at all.

Reproduction

System.setProperty("COSMOS.DISABLE_IMDS_ACCESS", "true");

new CosmosClientBuilder()
    .endpoint("https://localhost:8081")
    .key("dGVzdA==")
    .buildAsyncClient();

On 4.79.0 / 4.79.1 (before this PR):

java.lang.NoClassDefFoundError: Could not initialize class
  com.azure.cosmos.implementation.clienttelemetry.ClientTelemetry
Caused by: java.lang.ExceptionInInitializerError
Caused by: java.lang.NullPointerException: value
    at java.util.Objects.requireNonNull(Objects.java:259)
    at reactor.core.publisher.MonoJust.<init>(MonoJust.java:35)
    at reactor.core.publisher.Mono.just(Mono.java:754)
    at com.azure.cosmos.implementation.clienttelemetry.ClientTelemetry.fetchAzureVmMetadata(ClientTelemetry.java:178)
    at com.azure.cosmos.implementation.clienttelemetry.ClientTelemetry.<clinit>(ClientTelemetry.java:54)

With this PR: the client builds normally; fetchAzureVmMetadata() returns a Mono.just(METADATA_NOT_AVAILABLE) that resolves to the sentinel on subscription, exactly as intended.

Note for reviewers (out of scope)

The same declaration-order pattern affects IMDS_AZURE_VM_METADATA and the three IMDS_DEFAULT_* timeout fields, they are also read inside fetchAzureVmMetadata() and declared after CACHED_METADATA. Today that path only runs when IMDS is enabled and happens to tolerate null/0 values at config build time, so it doesn't NPE; but the invariant is fragile. I kept this PR strictly minimal (single-field move + comment). Happy to move those fields too, or replace both declarations with a static { … } block, in a follow-up if you prefer.

All SDK Contribution checklist

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

    Rationale: the bug lives in class-static-initializer ordering. A test that reliably exercises it would need to fork a JVM with COSMOS.DISABLE_IMDS_ACCESS=true before ClientTelemetry is first referenced, then observe a successful buildAsyncClient(). Happy to add such a harness (e.g. a @TempJvmProperty + separate JUnit test process) if maintainers think it's worth the complexity for a one-line ordering fix, otherwise the existing integration tests running in an environment where COSMOS.DISABLE_IMDS_ACCESS=true is set will cover it implicitly.

Copilot AI review requested due to automatic review settings April 21, 2026 15:10
@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

This issue was initially found by @lfavreli-betclic who also contributed the product code fix in #48887

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a Cosmos SDK startup failure caused by ClientTelemetry’s class static-initializer (<clinit>) referencing a sentinel (METADATA_NOT_AVAILABLE) before it was initialized when IMDS access is disabled. The change ensures static initialization order is safe, preventing ExceptionInInitializerError/NoClassDefFoundError during client creation in non-Azure environments.

Changes:

  • Reworked ClientTelemetry static initialization so the sentinel and IMDS defaults are initialized before CACHED_METADATA is created.
  • Added a changelog entry documenting the fix.
  • Added a regression test that forks a fresh JVM with -DCOSMOS.DISABLE_IMDS_ACCESS=true to validate client build does not fail due to ClientTelemetry static init.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/clienttelemetry/ClientTelemetry.java Moves static initialization into an ordered static {} block to prevent NPE during <clinit> when IMDS is disabled.
sdk/cosmos/azure-cosmos/CHANGELOG.md Adds an unreleased bug-fix entry describing the ClientTelemetry static-init fix.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/ImplementationBridgeHelpersTest.java Adds a forked-JVM regression test to catch ClientTelemetry static initializer failures with IMDS disabled.

@FabianMeiswinkel
Copy link
Copy Markdown
Member Author

@sdkReviewAgent

Copy link
Copy Markdown
Member

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@xinlian12
Copy link
Copy Markdown
Member

Review complete (26:06)

No new comments — existing review coverage is sufficient.

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

@FabianMeiswinkel FabianMeiswinkel merged commit c4bc6bb into Azure:main Apr 21, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants