Skip to content

Conversation

@bdach
Copy link
Collaborator

@bdach bdach commented Nov 20, 2025

This is a very loose RFC that needs more testing, but first and foremost I would like to confirm that I am at least in the vaguely correct wheelhouse before proceeding further.

Video demo: https://drive.google.com/file/d/1BWlrohNgfKWDLDEENDunhsNnl6W4sDmY/view?usp=share_link (too large to live on GitHub).

Add liveness probes to API

As described in #35736.

Some ground rules:

  • I'm not going to attempt to decide where the probe should be hosted, this PR is mostly agnostic to that anyway.

  • When API is in online state, the probe is periodically queried every minute (interval up for debate).

    • If the probe URL is not set, nothing happens.
    • If the client can't reach the probe, nothing happens (or at least that's the intended design). This is such that client-local outages or inabilities to reach the probe don't knock the rest of the game offline even though everything can be fine with it.
    • If any other deserialisation error or anything else happens, nothing happens.
    • The only time where anything happens is when the probe is up and actively returns a result that shows that there is an active outage going on. In this case, API is instantly knocked to Failing state.
  • When API is in failing state, the probe is queried more actively (every 5 seconds as written, but this is also up for discussion).

    • If the probe actively returns a result that shows that there continues to be an active outage going on, the client does not attempt to get back online.
    • In any other circumstance (read: the probe being disabled, unreachable, or returning a result that indicates that there is no active outage), the client will try to get back online.
    • The side benefit to this is that this could provide relief from retry traffic when we have known outages going on.

The reason this is designed this way is to be fail-safe. In particular, my concern here is that e.g. the probe being potentially unavailable to Russia users due to network blocks does not completely delete online functions for those users.

Skip attempting submission when API is in failing state

Speeds up gameplay loading when loading is more or less doomed to fail anyway.

Will also spam notifications about submission not happening. This could be probably made smarter if we want to (e.g. notify once until API comes back online), but I'm not super sure we want that, and maybe being super loud about submission failures is the preferred way to go instead.

RFC. As described in ppy#35736.

Some ground rules:

- I'm not going to attempt to decide where the probe should be hosted,
  this PR is mostly agnostic to that anyway.

- When API is in online state, the probe is periodically queried every
  minute (interval up for debate).
  - If the probe URL is not set, nothing happens.
  - If the client can't reach the probe, nothing happens (or at least
    that's the intended design). This is such that client-local outages
    or inabilities to reach the probe don't knock the rest of the game
    offline *even though everything can be fine with it*.
  - If any other deserialisation error or anything else happens, nothing
    happens.
  - The only time where anything happens is when the probe is up and
    actively returns a result that shows that there is an active outage
    going on. In this case, API is instantly knocked to `Failing` state.

- When API is in failing state, the probe is queried more actively
  (every 5 seconds as written, but this is also up for discussion).
    - If the probe actively returns a result that shows that there
      continues to be an active outage going on, the client does not
      attempt to get back online.
    - In *any other circumstance* (read: the probe being disabled,
      unreachable, or returning a result that indicates that there is no
      active outage), the client will try to get back online.

The reason this is designed this way is to be failsafe. In particular,
my concern here is that e.g. the probe being potentially unavailable to
Russia users due to network blocks does not completely delete online
functions for those users.
Speeds up gameplay loading when loading is more or less doomed to fail
anyway.

Will also spam notifications about submission not happening. This could
be probably made smarter if we want to (e.g. notify once until API comes
back online), but I'm not super sure we *want* that, and maybe being
super loud about submission failures is the preferred way to go instead.
@bdach bdach requested a review from peppy November 20, 2025 14:13
@bdach bdach self-assigned this Nov 20, 2025
@bdach bdach added area:online functionality Deals with online fetching / sending but don't change much on a surface UI level. type/reliability Deals with game crashing or breaking in a serious way. labels Nov 20, 2025
@bdach bdach moved this from Next up to Pending Review in @peppy's untitled project Nov 20, 2025
if (!api.IsLoggedIn || api.State.Value == APIState.Failing)
{
handleTokenFailure(new InvalidOperationException("API is not online."));
handleTokenFailure(new InvalidOperationException("Online functionality is not available."), displayNotification: api.State.Value == APIState.Failing);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably also do

diff --git a/osu.Game/Screens/Play/SubmittingPlayer.cs b/osu.Game/Screens/Play/SubmittingPlayer.cs
index 78e16c0f4b..62a2fc6f56 100644
--- a/osu.Game/Screens/Play/SubmittingPlayer.cs
+++ b/osu.Game/Screens/Play/SubmittingPlayer.cs
@@ -162,7 +162,7 @@ void handleTokenFailure(Exception exception, bool displayNotification = false)
                                 break;
 
                             default:
-                                Logger.Log($"{whatWillHappen} {exception.Message}", level: LogLevel.Important);
+                                Logger.Log($"{exception.Message}\n\n{whatWillHappen}", level: LogLevel.Important);
                                 break;
                         }
                     }

to make the notification read more correctly. Bonus points if we have a custom styled notification for this.

I was also thinking, rather than a notification for this state, it might be nice to put informational text in the place where the epilepsy warning etc. is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bonus points if we have a custom styled notification for this.

See what you think of 28efcf7.

I was also thinking, rather than a notification for this state, it might be nice to put informational text in the place where the epilepsy warning etc. is.

Not trivially doable because those are in PlayerLoader and this is Player. Would need to either lift token retrieval out to player loader or figure out some awkward flow to shove the information into it from player.

@peppy
Copy link
Member

peppy commented Nov 21, 2025

General structure of this looks correct.

Stable also does this neat thing where when connectivity is unavailable, it shows a global overlay (broken chain link) in the bottom-right of the screen, making it very clear that things are offline. We already kinda do this with the avatar in the toolbar, so maybe that's enough to call a replacement.

I would say that the notification which appears as a result of the API going offline should have a custom style at very least (colour and icon) to make it stand out and look like a global broadcast.

@bdach
Copy link
Collaborator Author

bdach commented Nov 21, 2025

I would say that the notification which appears as a result of the API going offline should have a custom style at very least (colour and icon) to make it stand out and look like a global broadcast.

Attempt was made in 33398b9. See what you think.

Screen.Recording.2025-11-21.at.14.43.57.mov

Notably the score submission notifications become much more spammy because they are no longer debounced via the mechanism forwarding LogLevel.Information log entries into notifications.

{
Text = this.message = message;

IsCritical = true;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disputable, but seemed like a good idea.

[BackgroundDependencyLoader]
private void load()
{
Icon = FontAwesome.Solid.FireExtinguisher;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was very difficult to resist using FontAwesome.Solid.DumpsterFire here (which exists for some reason), but I don't think anyone else would have found it as funny and it's kind of a bad icon anyway.

Copy link
Member

@peppy peppy Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤣

i'd agree with using that, except the fire is in the wrong place.

I forget you had to do this in the old days...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:online functionality Deals with online fetching / sending but don't change much on a surface UI level. size/L type/reliability Deals with game crashing or breaking in a serious way.

Projects

Status: Pending Review

Development

Successfully merging this pull request may close these issues.

2 participants