Skip to content

fix: provider state race#380

Open
toddbaert wants to merge 3 commits intomainfrom
fix/provider-race
Open

fix: provider state race#380
toddbaert wants to merge 3 commits intomainfrom
fix/provider-race

Conversation

@toddbaert
Copy link
Copy Markdown
Member

@toddbaert toddbaert commented Apr 23, 2026

  • moves "provider status" back into the provider interface
  • clarifies some ordering and atomicity
  • adds "migration" appendix describing the proposed migration strategy

For background, see: #365


PoCs for the migration strategy (not the actual spec changes) by @toddbaert here:

Go: open-feature/go-sdk#487
Java: open-feature/java-sdk#1892
JS: open-feature/js-sdk#1362


Fixes: #365

If/when this is merged, I will create issues for every implementation (thought I think Kotlin/Swift might already be corrected).

Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
@toddbaert toddbaert requested a review from a team as a code owner April 23, 2026 11:51
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the OpenFeature specification to shift provider status ownership from the SDK to the provider itself, addressing race conditions in multi-threaded environments by ensuring status updates and event emissions are atomic. Key changes include the addition of Section 2.8 (Provider status), updates to lifecycle and event requirements to reflect delegation to the provider, and a new migration guide in Appendix E. Review feedback correctly identifies that Requirement 1.7.2.1 is inconsistent with other sections, as it omits the "FATAL" status and uses non-normative language instead of the required "MUST" keyword.

Comment thread specification.json Outdated
Comment thread specification/sections/01-flag-evaluation.md Outdated
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
> The `client` **MUST** define a `provider status` accessor which indicates the readiness of the associated provider, with possible values `NOT_READY`, `READY`, `STALE`, `ERROR`, or `FATAL`.

The SDK at all times maintains an up-to-date state corresponding to the success/failure of the last lifecycle method (`initialize`, `shutdown`, `on context change`) or emitted event.
The client's `provider status` accessor delegates to the associated provider's `status` accessor, which the provider keeps in sync with the success/failure of the last lifecycle method (`initialize`, `shutdown`, `on context change`) or emitted event.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: given that providers must emit events on lifecycle methods, this paragraph can be simplified to:

Suggested change
The client's `provider status` accessor delegates to the associated provider's `status` accessor, which the provider keeps in sync with the success/failure of the last lifecycle method (`initialize`, `shutdown`, `on context change`) or emitted event.
The client's `provider status` accessor delegates to the associated provider's `status` accessor, which the provider keeps in sync with the last emitted event.

> Status changes and any associated event emissions **MUST** be atomic from the perspective of external observers.

When a provider transitions between statuses and emits an event associated with that transition, external observers (such as SDK event handlers) must observe a consistent view: the updated `status` value and the emitted event are visible together.
This prevents ordering anomalies where, for example, a `PROVIDER_READY` handler runs while `status` still indicates `NOT_READY` or `ERROR`, or where the provider transitions out of a status before the associated event is dispatched.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Dispatch" may be a big ambiguous here:

or where the provider transitions out of a status before the associated event is dispatched

Is the intent here that event handlers see the exact status that triggered the event? If so, this may be problematic as it requires holding the status lock while all handlers run, preventing the provider from changing its own status.

I think what we're after here is establishing an observable "happens-before" relationship:

  • Status change must happen before the corresponding event handlers run.
  • Event handlers must only be run after all event handlers for the previous event has finished.

So an event handler may observe next status changes, the important bit is that it must observe the status change that triggered the event.

Comment on lines 214 to +217
> If the provider's `on context changed` function terminates normally, and no other invocations have yet to terminate, associated `PROVIDER_CONTEXT_CHANGED` handlers **MUST** run.

The implementation must run any `PROVIDER_CONTEXT_CHANGED` handlers associated with the provider after the provider has reconciled its state and returned from the `on context changed` function.
The `PROVIDER_CONTEXT_CHANGED` is not emitted from the provider itself; the SDK implementation must run the `PROVIDER_CONTEXT_CHANGED` handlers if the `on context changed` function terminates normally.
It's possible that the `on context changed` function is invoked simultaneously or in quick succession; in this case the SDK will only run the `PROVIDER_CONTEXT_CHANGED` handlers after all reentrant invocations have terminated, and the last to terminate was successful (terminated normally).
`PROVIDER_CONTEXT_CHANGED` handlers associated with the provider must run after the provider has reconciled its state and returned from the `on context changed` function.
It's possible that the `on context changed` function is invoked simultaneously or in quick succession; in this case `PROVIDER_CONTEXT_CHANGED` handlers only run after all reentrant invocations have terminated, and the last to terminate was successful (terminated normally).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking on words: the current phrasing says event handler must run after on context changed function returns, which doesn't make sense if event is emitted from that function.

The handling for reetrancy also seems overly-prescriptive and infeasible (the provider generally cannot emit events after all invocations terminated).

I think we can simplify this to:

  • Provider must emit PROVIDER_CONTEXT_CHANGED after it successfully reconciled the context.
  • It's possible that the on context changed function is invoked simultaneously or in quick succession, so the provider must be prepared to handle that.


- Call `initialize()`, `shutdown()`, and `on context change` lifecycle methods on the provider
- Forward provider-emitted events to registered domain and API-level event handlers
- Run late-attached handlers immediately if the provider is already in the associated state
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

major: I believe SDKs can't do this safely as this requires that subscription and emission of the current status is atomic (there should be no new events emitted between reading current status and running the handler) — and you can't guarantee that when another thread is emitting events (unless SDK implements event buffering which is cumbersome).

- Call `initialize()`, `shutdown()`, and `on context change` lifecycle methods on the provider
- Forward provider-emitted events to registered domain and API-level event handlers
- Run late-attached handlers immediately if the provider is already in the associated state
- Enforce short-circuit behavior for `NOT_READY` and `FATAL` statuses during flag evaluation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

major: multi-threaded SDKs can't enforce this because of TOCTOU

At registration time, check whether the provider implements the `StateManagingProvider` interface (or equivalent).
Store this as a flag on the internal provider wrapper for use during lifecycle calls and event handling.

#### SDK wrapper behavior
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor/aside: I'd argue that the better design is implementing an adapter that takes a non-StateManagingProvider and implements the StateManagingProvider interface. This way, all special-casing/mapping is concentrated in one place, and it is about the only thing that needs to be deleted in the next major release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provider lifecycle race condition in multi-threaded SDKs

2 participants