Skip to content

API specs for offline support#1743

Open
camillecroci wants to merge 3 commits intomainfrom
cc/offline-support-specs
Open

API specs for offline support#1743
camillecroci wants to merge 3 commits intomainfrom
cc/offline-support-specs

Conversation

@camillecroci
Copy link
Copy Markdown
Contributor

@camillecroci camillecroci commented Apr 9, 2026

Description

This PR is meant to communicate the necessary changes to the client-metrics API to support offline.
Sorry there is more code change in the PR that I originally wanted because I didnt want to forget.

Current implementation

sendEvent is pulling out the first x events from a queue and calling fetch to POST them to the client-metrics server. Three things are calling sendEvent:

  • recordEvent: every time an event is being added to the queue of event, if the queue is big enough, it calls the sendEvent
  • startTimer creates a setInterval that calls sendEvent every 10 seconds
  • sendEvent calls itself if there are still events in the queue after sending a first batch.

Stuff we want

  1. we try to send events and failed ? We dont want to lose those events
  2. we want to have some retry mechanism to try sending them
  3. we don't want a queue that grows indefinitely
  4. we dont want an infinite recursion of the sendEvent
  5. no breaking change

Suggested changes

  1. To avoid loosing the events: we add them back in case of a failed fetch
  2. We don't really need to think of a retry mechanism because we have a timer that send events every 10 seconds if there are any in the queue
  3. We already have a mechanism within the queue that drops old events when we reach a certain size of the queue
  4. We can have a counter of failed attempt at fetch. If we reach a certain number of attempts, we block the sending events from sendEvent and from recordEvent. Those 2 functions follow this logic ' if the queue is big enough, try to send'. But if we failed many attempts, the size of the queue should no longer be a trigger to try to send. We only rely on a retry mechanism

Extra

Problem

If we think about the app going offline, it might be a lot to keep trying to send events every 10 seconds. This might have an impact on the mobile performance (battery, radio usage, and generally, the app doing an extra action when it could be doing nothing).

Suggested mitigation: exponential backoff

If we detect some failures, we can increase the timeout by 20% (for example) until we reach a max time (2 minutes).
So if its online for a long time, the app only tries to send metrics every 2 minutes. As soon as the fetch is successful, the timeout is back to its initial 10 seconds.

Breaking change

I don't think there are any with the suggested changes

  • I confirm that the code in this PR has not been generated by AI

@camillecroci camillecroci requested a review from a team as a code owner April 9, 2026 15:50
Copy link
Copy Markdown
Member

@rowanmanning rowanmanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, don't want to nitpick much because we're still early-stages but can see a few optimisations for the repeated logic 🙂


if (this.#queue.size) {
if (this.#queue.size
&& (this.#fetchFailed < this.#maxFetchAttempt)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: it's an implementation detail so fine if this isn't the right time to think about it, but I see the same conditions a bunch. Maybe this could be a getter on the class?

class MetricsClient {
    get #fetchTriesExhausted() {
        return this.#fetchFailed >= this.#maxFetchAttempt;
    }
}

This is how we are dealing with offline
1. Because we are not waiting for sending metrics to succeed or fail
before trying to send more metrics,we are limiting the number of fetches
that can be sent at the same time

2. We are also detecting how many fetches are failing
and when we reach a threshold, we consider that the client is offline.
When the client is offline, we stop sending metrics
- when recording a new event
- recursively: from send event that is checking the queue,
becasue that would lead to an infinite loop:
-> fail to fetch ->  put back event in queue
-> check if there are events left in queue -> try to fetch...

3. we add a timeout for fetch so it doesnt wait too long before
considering something is wrong.its fine because we will retry anyway
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants