Conversation
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
There was a problem hiding this comment.
Code Review
This pull request updates the OpenFeature specification to shift provider status ownership from the SDK to the provider itself, addressing race conditions in multi-threaded environments by ensuring status updates and event emissions are atomic. Key changes include the addition of Section 2.8 (Provider status), updates to lifecycle and event requirements to reflect delegation to the provider, and a new migration guide in Appendix E. Review feedback correctly identifies that Requirement 1.7.2.1 is inconsistent with other sections, as it omits the "FATAL" status and uses non-normative language instead of the required "MUST" keyword.
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
| > The `client` **MUST** define a `provider status` accessor which indicates the readiness of the associated provider, with possible values `NOT_READY`, `READY`, `STALE`, `ERROR`, or `FATAL`. | ||
|
|
||
| The SDK at all times maintains an up-to-date state corresponding to the success/failure of the last lifecycle method (`initialize`, `shutdown`, `on context change`) or emitted event. | ||
| The client's `provider status` accessor delegates to the associated provider's `status` accessor, which the provider keeps in sync with the success/failure of the last lifecycle method (`initialize`, `shutdown`, `on context change`) or emitted event. |
There was a problem hiding this comment.
minor: given that providers must emit events on lifecycle methods, this paragraph can be simplified to:
| The client's `provider status` accessor delegates to the associated provider's `status` accessor, which the provider keeps in sync with the success/failure of the last lifecycle method (`initialize`, `shutdown`, `on context change`) or emitted event. | |
| The client's `provider status` accessor delegates to the associated provider's `status` accessor, which the provider keeps in sync with the last emitted event. |
| > Status changes and any associated event emissions **MUST** be atomic from the perspective of external observers. | ||
|
|
||
| When a provider transitions between statuses and emits an event associated with that transition, external observers (such as SDK event handlers) must observe a consistent view: the updated `status` value and the emitted event are visible together. | ||
| This prevents ordering anomalies where, for example, a `PROVIDER_READY` handler runs while `status` still indicates `NOT_READY` or `ERROR`, or where the provider transitions out of a status before the associated event is dispatched. |
There was a problem hiding this comment.
"Dispatch" may be a big ambiguous here:
or where the provider transitions out of a status before the associated event is dispatched
Is the intent here that event handlers see the exact status that triggered the event? If so, this may be problematic as it requires holding the status lock while all handlers run, preventing the provider from changing its own status.
I think what we're after here is establishing an observable "happens-before" relationship:
- Status change must happen before the corresponding event handlers run.
- Event handlers must only be run after all event handlers for the previous event has finished.
So an event handler may observe next status changes, the important bit is that it must observe the status change that triggered the event.
| > If the provider's `on context changed` function terminates normally, and no other invocations have yet to terminate, associated `PROVIDER_CONTEXT_CHANGED` handlers **MUST** run. | ||
|
|
||
| The implementation must run any `PROVIDER_CONTEXT_CHANGED` handlers associated with the provider after the provider has reconciled its state and returned from the `on context changed` function. | ||
| The `PROVIDER_CONTEXT_CHANGED` is not emitted from the provider itself; the SDK implementation must run the `PROVIDER_CONTEXT_CHANGED` handlers if the `on context changed` function terminates normally. | ||
| It's possible that the `on context changed` function is invoked simultaneously or in quick succession; in this case the SDK will only run the `PROVIDER_CONTEXT_CHANGED` handlers after all reentrant invocations have terminated, and the last to terminate was successful (terminated normally). | ||
| `PROVIDER_CONTEXT_CHANGED` handlers associated with the provider must run after the provider has reconciled its state and returned from the `on context changed` function. | ||
| It's possible that the `on context changed` function is invoked simultaneously or in quick succession; in this case `PROVIDER_CONTEXT_CHANGED` handlers only run after all reentrant invocations have terminated, and the last to terminate was successful (terminated normally). |
There was a problem hiding this comment.
Nitpicking on words: the current phrasing says event handler must run after on context changed function returns, which doesn't make sense if event is emitted from that function.
The handling for reetrancy also seems overly-prescriptive and infeasible (the provider generally cannot emit events after all invocations terminated).
I think we can simplify this to:
- Provider must emit
PROVIDER_CONTEXT_CHANGEDafter it successfully reconciled the context. - It's possible that the
on context changedfunction is invoked simultaneously or in quick succession, so the provider must be prepared to handle that.
|
|
||
| - Call `initialize()`, `shutdown()`, and `on context change` lifecycle methods on the provider | ||
| - Forward provider-emitted events to registered domain and API-level event handlers | ||
| - Run late-attached handlers immediately if the provider is already in the associated state |
There was a problem hiding this comment.
major: I believe SDKs can't do this safely as this requires that subscription and emission of the current status is atomic (there should be no new events emitted between reading current status and running the handler) — and you can't guarantee that when another thread is emitting events (unless SDK implements event buffering which is cumbersome).
| - Call `initialize()`, `shutdown()`, and `on context change` lifecycle methods on the provider | ||
| - Forward provider-emitted events to registered domain and API-level event handlers | ||
| - Run late-attached handlers immediately if the provider is already in the associated state | ||
| - Enforce short-circuit behavior for `NOT_READY` and `FATAL` statuses during flag evaluation |
There was a problem hiding this comment.
major: multi-threaded SDKs can't enforce this because of TOCTOU
| At registration time, check whether the provider implements the `StateManagingProvider` interface (or equivalent). | ||
| Store this as a flag on the internal provider wrapper for use during lifecycle calls and event handling. | ||
|
|
||
| #### SDK wrapper behavior |
There was a problem hiding this comment.
minor/aside: I'd argue that the better design is implementing an adapter that takes a non-StateManagingProvider and implements the StateManagingProvider interface. This way, all special-casing/mapping is concentrated in one place, and it is about the only thing that needs to be deleted in the next major release.
For background, see: #365
PoCs for the migration strategy (not the actual spec changes) by @toddbaert here:
Go: open-feature/go-sdk#487
Java: open-feature/java-sdk#1892
JS: open-feature/js-sdk#1362
Fixes: #365
If/when this is merged, I will create issues for every implementation (thought I think Kotlin/Swift might already be corrected).