API specs for offline support by camillecroci · Pull Request #1743 · Financial-Times/dotcom-reliability-kit

camillecroci · 2026-04-09T15:50:04Z

Description

This PR is meant to communicate the necessary changes to the client-metrics API to support offline.
Sorry there is more code change in the PR that I originally wanted because I didnt want to forget.

Current implementation

sendEvent is pulling out the first x events from a queue and calling fetch to POST them to the client-metrics server. Three things are calling sendEvent:

recordEvent: every time an event is being added to the queue of event, if the queue is big enough, it calls the sendEvent
startTimer creates a setInterval that calls sendEvent every 10 seconds
sendEvent calls itself if there are still events in the queue after sending a first batch.

Stuff we want

we try to send events and failed ? We dont want to lose those events
we want to have some retry mechanism to try sending them
we don't want a queue that grows indefinitely
we dont want an infinite recursion of the sendEvent
no breaking change

Suggested changes

To avoid loosing the events: we add them back in case of a failed fetch
We don't really need to think of a retry mechanism because we have a timer that send events every 10 seconds if there are any in the queue
We already have a mechanism within the queue that drops old events when we reach a certain size of the queue
We can have a counter of failed attempt at fetch. If we reach a certain number of attempts, we block the sending events from sendEvent and from recordEvent. Those 2 functions follow this logic ' if the queue is big enough, try to send'. But if we failed many attempts, the size of the queue should no longer be a trigger to try to send. We only rely on a retry mechanism

Extra

Problem

If we think about the app going offline, it might be a lot to keep trying to send events every 10 seconds. This might have an impact on the mobile performance (battery, radio usage, and generally, the app doing an extra action when it could be doing nothing).

Suggested mitigation: exponential backoff

If we detect some failures, we can increase the timeout by 20% (for example) until we reach a max time (2 minutes).
So if its online for a long time, the app only tries to send metrics every 2 minutes. As soon as the fetch is successful, the timeout is back to its initial 10 seconds.

Breaking change

I don't think there are any with the suggested changes

I confirm that the code in this PR has not been generated by AI

rowanmanning

Looks good to me, don't want to nitpick much because we're still early-stages but can see a few optimisations for the repeated logic 🙂

rowanmanning · 2026-04-13T14:00:09Z


-		if (this.#queue.size) {
+		if (this.#queue.size 
+			&& (this.#fetchFailed < this.#maxFetchAttempt)) {


nitpick: it's an implementation detail so fine if this isn't the right time to think about it, but I see the same conditions a bunch. Maybe this could be a getter on the class?

class MetricsClient { get #fetchTriesExhausted() { return this.#fetchFailed >= this.#maxFetchAttempt; } }

This is how we are dealing with offline 1. Because we are not waiting for sending metrics to succeed or fail before trying to send more metrics,we are limiting the number of fetches that can be sent at the same time 2. We are also detecting how many fetches are failing and when we reach a threshold, we consider that the client is offline. When the client is offline, we stop sending metrics - when recording a new event - recursively: from send event that is checking the queue, becasue that would lead to an infinite loop: -> fail to fetch -> put back event in queue -> check if there are events left in queue -> try to fetch... 3. we add a timeout for fetch so it doesnt wait too long before considering something is wrong.its fine because we will retry anyway

feat: add specs for offline support

edb33c4

camillecroci requested a review from a team as a code owner April 9, 2026 15:50

rowanmanning reviewed Apr 13, 2026

View reviewed changes

camillecroci added 2 commits April 14, 2026 08:30

feat: adds back event in the queue if fetch has failed

6e99444

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API specs for offline support#1743

API specs for offline support#1743
camillecroci wants to merge 3 commits intomainfrom
cc/offline-support-specs

camillecroci commented Apr 9, 2026 •

edited

Loading

Uh oh!

rowanmanning left a comment

Uh oh!

rowanmanning Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

camillecroci commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Current implementation

Stuff we want

Suggested changes

Extra

Problem

Suggested mitigation: exponential backoff

Breaking change

Uh oh!

rowanmanning left a comment

Choose a reason for hiding this comment

Uh oh!

rowanmanning Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

camillecroci commented Apr 9, 2026 •

edited

Loading