Skip to content

[FEATURE REQ] Port hub region caching per partition level from .NET SDK #48788

@jeet1995

Description

@jeet1995

Summary

Port the per-partition hub region caching feature from .NET SDK (PR Azure/azure-cosmos-dotnet-v3#5648) to the Java SDK.

Background

On single-master accounts encountering repeated 404/1002 (ReadSessionNotAvailable) errors, the SDK should discover the hub region via a 403/3 (WriteForbidden) discovery chain and cache the result per partition. Subsequent requests to the same partition route directly to the cached hub, eliminating redundant discovery round-trips.

Key Behavior

  • After 2 consecutive 404/1002 on a single-master account, set \x-ms-cosmos-hub-region-processing-only\ header
  • Non-hub regions return 403/3 (WriteForbidden) — SDK retries to next region (discovery chain)
  • Hub region responds with 200 OK — SDK caches hub URI for that partition
  • Future requests route directly to cached hub (warm path)
  • Works for both PPAF and non-PPAF accounts
  • Must be gated by a feature flag / environment variable (per Debdatta's guidance)

.NET Reference

Java SDK Files Likely Affected

  • \ClientRetryPolicy.java\
  • \GlobalPartitionEndpointManager.java\ / related partition failover classes
  • \RxGatewayStoreModel.java\
  • \Configs.java\ (feature flag)

Acceptance Criteria

  • Hub region discovery via 403/3 chain after 2x 404/1002 on single-master
  • Per-partition hub region caching (warm path skips discovery)
  • Feature flag gate (environment variable, disabled by default initially)
  • Works for both PPAF and non-PPAF accounts
  • Unit tests + integration tests

Metadata

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.Cosmos

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions