Skip to content

[cosmos] Fix ClientTelemetry static-init NPE when IMDS access is disabled#48887

Closed
lfavreli-betclic wants to merge 3 commits intoAzure:mainfrom
lfavreli-betclic:fix/cosmos-clienttelemetry-static-init-npe
Closed

[cosmos] Fix ClientTelemetry static-init NPE when IMDS access is disabled#48887
lfavreli-betclic wants to merge 3 commits intoAzure:mainfrom
lfavreli-betclic:fix/cosmos-clienttelemetry-static-init-npe

Conversation

@lfavreli-betclic
Copy link
Copy Markdown

@lfavreli-betclic lfavreli-betclic commented Apr 21, 2026

Description

Fixes a NullPointerException thrown during ClientTelemetry.<clinit> when IMDS access is disabled via COSMOS_DISABLE_IMDS_ACCESS=true (env var) or -DCOSMOS.DISABLE_IMDS_ACCESS=true (system property). The error makes ClientTelemetry permanently un-loadable for the lifetime of the JVM, so every subsequent CosmosClientBuilder.buildAsyncClient() fails with NoClassDefFoundError.

First shipped in 4.79.0 when CACHED_METADATA was introduced as a statically-initialized field; 4.78.0 is unaffected.

Root cause

In ClientTelemetry the fields are declared in this order:

private static final Mono<AzureVMMetadata> CACHED_METADATA = fetchAzureVmMetadata().cache();  // (1)
private static final AzureVMMetadata METADATA_NOT_AVAILABLE = new AzureVMMetadata();          // (2)

<clinit> evaluates them top-down. At (1), fetchAzureVmMetadata() runs and, when IMDS is disabled, eagerly returns:

return Mono.just(METADATA_NOT_AVAILABLE);

But METADATA_NOT_AVAILABLE hasn't been assigned yet, so Mono.just(null) throws NPE via Objects.requireNonNull. This escapes as ExceptionInInitializerError, and every later reference to the class surfaces as NoClassDefFoundError: Could not initialize class ... ClientTelemetry.

Fix

Declare METADATA_NOT_AVAILABLE before CACHED_METADATA so the sentinel exists before fetchAzureVmMetadata() can read it. A short comment documents the ordering invariant.

Diff is 2 files, +7/-3 , reordering only, no behavioral change.

Why it matters

COSMOS_DISABLE_IMDS_ACCESS is the documented opt-out for non-Azure environments (local dev, CI, other clouds), where IMDS at 169.254.169.254 is not routable and the blind probe costs a multi-second timeout per cold start. Users who set it today on 4.79.x cannot build a Cosmos client at all.

Reproduction

System.setProperty("COSMOS.DISABLE_IMDS_ACCESS", "true");

new CosmosClientBuilder()
    .endpoint("https://localhost:8081")
    .key("dGVzdA==")
    .buildAsyncClient();

On 4.79.0 / 4.79.1 (before this PR):

java.lang.NoClassDefFoundError: Could not initialize class
  com.azure.cosmos.implementation.clienttelemetry.ClientTelemetry
Caused by: java.lang.ExceptionInInitializerError
Caused by: java.lang.NullPointerException: value
    at java.util.Objects.requireNonNull(Objects.java:259)
    at reactor.core.publisher.MonoJust.<init>(MonoJust.java:35)
    at reactor.core.publisher.Mono.just(Mono.java:754)
    at com.azure.cosmos.implementation.clienttelemetry.ClientTelemetry.fetchAzureVmMetadata(ClientTelemetry.java:178)
    at com.azure.cosmos.implementation.clienttelemetry.ClientTelemetry.<clinit>(ClientTelemetry.java:54)

With this PR: the client builds normally; fetchAzureVmMetadata() returns a Mono.just(METADATA_NOT_AVAILABLE) that resolves to the sentinel on subscription, exactly as intended.

Note for reviewers (out of scope)

The same declaration-order pattern affects IMDS_AZURE_VM_METADATA and the three IMDS_DEFAULT_* timeout fields, they are also read inside fetchAzureVmMetadata() and declared after CACHED_METADATA. Today that path only runs when IMDS is enabled and happens to tolerate null/0 values at config build time, so it doesn't NPE; but the invariant is fragile. I kept this PR strictly minimal (single-field move + comment). Happy to move those fields too, or replace both declarations with a static { … } block, in a follow-up if you prefer.

All SDK Contribution checklist

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

    Rationale: the bug lives in class-static-initializer ordering. A test that reliably exercises it would need to fork a JVM with COSMOS.DISABLE_IMDS_ACCESS=true before ClientTelemetry is first referenced, then observe a successful buildAsyncClient(). Happy to add such a harness (e.g. a @TempJvmProperty + separate JUnit test process) if maintainers think it's worth the complexity for a one-line ordering fix, otherwise the existing integration tests running in an environment where COSMOS.DISABLE_IMDS_ACCESS=true is set will cover it implicitly.

ClientTelemetry.<clinit> evaluates

    CACHED_METADATA = fetchAzureVmMetadata().cache();

before declaring METADATA_NOT_AVAILABLE. When IMDS access is disabled
(e.g. via COSMOS_DISABLE_IMDS_ACCESS=true), fetchAzureVmMetadata() takes
the fast path:

    return Mono.just(METADATA_NOT_AVAILABLE);

At that point METADATA_NOT_AVAILABLE is still null, so Mono.just throws
NullPointerException. The resulting ExceptionInInitializerError leaves
ClientTelemetry permanently un-loadable and every subsequent
CosmosClientBuilder.buildAsyncClient() call fails with:

    NoClassDefFoundError: Could not initialize class
      com.azure.cosmos.implementation.clienttelemetry.ClientTelemetry

Fix: move METADATA_NOT_AVAILABLE ahead of CACHED_METADATA so the sentinel
exists before fetchAzureVmMetadata() can reference it during class init.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 21, 2026 09:32
@github-actions github-actions Bot added Community Contribution Community members are working on the issue Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thank you for your contribution @lfavreli-betclic! We will review the pull request and get back to you soon.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a JVM class-initialization failure in the Cosmos SDK’s ClientTelemetry by ensuring the IMDS “metadata not available” sentinel is initialized before the cached IMDS Mono is created, preventing NoClassDefFoundError cascades when IMDS access is disabled.

Changes:

  • Reordered ClientTelemetry static field declarations and added a clarifying comment to prevent a static-init NPE when IMDS access is disabled.
  • Added a release note entry describing the fix in the Cosmos library changelog.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/clienttelemetry/ClientTelemetry.java Reorders static initialization to avoid Mono.just(null) during <clinit> when IMDS is disabled.
sdk/cosmos/azure-cosmos/CHANGELOG.md Documents the bug fix in the unreleased version’s “Bugs Fixed” section.

// - The fetch executes at most once
// - All concurrent subscribers share the single result
// - The HTTP client is created and disposed within the fetch
private static final Mono<AzureVMMetadata> CACHED_METADATA = fetchAzureVmMetadata().cache();
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a regression test to cover the class-initialization scenario this change fixes (IMDS access disabled causing ClientTelemetry to become unloadable). There is already child-JVM test infrastructure in azure-cosmos-tests (e.g., ImplementationBridgeHelpersTest uses ProcessBuilder) that could run a small main with -DCOSMOS.DISABLE_IMDS_ACCESS=true (or COSMOS_DISABLE_IMDS_ACCESS=true) before any Cosmos classes load, then assert that a CosmosClientBuilder.buildAsyncClient() call completes without ExceptionInInitializerError/NoClassDefFoundError. This would prevent future static-field reordering regressions.

Copilot generated this review using guidance from repository custom instructions.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment thread sdk/cosmos/azure-cosmos/CHANGELOG.md Outdated
@lfavreli-betclic
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree [company="{your company}"]

@microsoft-github-policy-service agree company="Betclic Group"

// Must be declared before CACHED_METADATA so that fetchAzureVmMetadata()
// never reads a null value during class initialization (e.g. when IMDS
// access is disabled via COSMOS_DISABLE_IMDS_ACCESS).
private static final AzureVMMetadata METADATA_NOT_AVAILABLE = new AzureVMMetadata();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like you mentioned in the PR description - moving to static initialization block to also fix the fragile IMDS_DEFAULT_* fields - would be my preference. I can take the additional changes (adding test coverage and making this change form here if you don't wnat to spend more time on this - perfectly understandable). If you want to finish/merge the PR please let me know when these changes are added and I will quickly re-review and approve.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, @FabianMeiswinkel, for your quick review!

You have a much better overall understanding of the SDK, so I’m perfectly happy to let you take over and apply the remaining changes. I hope this initial work was helpful to you. 👌

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was and thanks for providing such detailed repro-info - really appreciated!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lfavreli-betclic - i have published a PR #48888 for this and will get it merged asap.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comment thread sdk/cosmos/azure-cosmos/CHANGELOG.md Outdated
@FabianMeiswinkel
Copy link
Copy Markdown
Member

Thanks @lfavreli-betclic - as discussed, i have published a PR #48888 for this and will get it merged asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community Contribution Community members are working on the issue Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants