Symptoms
Symptoms include timeouts on api queries, causing keymanweb.com, help.keyman.com, keyman.com, meaning that keymanweb in particular was failing to load, e.g. with heavy queries such as:
https://api.keyman.com/cloud/4.0/keyboards?jsonp=keyman.register&languageidtype=bcp47&version=17.0&keyboardid=khmer_angkor,basic_kbdkni
Reported also by @LornaSIL this morning:
Something seems to be broken with displaying the osk on help files: https://help.keyman.com/keyboard/sil_bolivia and https://help.keyman.com/keyboard/sil_bwe_karen/1.0.1/sil_bwe_karen
Diagnostics
Looking at the cluster, api.keyman.com database pod was showing 100% CPU utilization for last 2 days:

Unclear why the cpu would be spiking at that point. No evidence of changes to database, or spike in api.keyman.com visits. Memory+disk show a spike that starts hours after the high cpu starts. So that's a bit weird too, but may be SQL Server resource management?
Mitigation
- Restarted the pod. Resolved the immediate issue.
- Will continue to monitor.
Additional actions
- Monitor
- We should have an alert setup for persistent high cpu (e.g. >10 minutes at >.9 CPU avg?)
cc @darcywong00 @tim-eves
Symptoms
Symptoms include timeouts on api queries, causing keymanweb.com, help.keyman.com, keyman.com, meaning that keymanweb in particular was failing to load, e.g. with heavy queries such as:
https://api.keyman.com/cloud/4.0/keyboards?jsonp=keyman.register&languageidtype=bcp47&version=17.0&keyboardid=khmer_angkor,basic_kbdkni
Reported also by @LornaSIL this morning:
Diagnostics
Looking at the cluster, api.keyman.com database pod was showing 100% CPU utilization for last 2 days:
Unclear why the cpu would be spiking at that point. No evidence of changes to database, or spike in api.keyman.com visits. Memory+disk show a spike that starts hours after the high cpu starts. So that's a bit weird too, but may be SQL Server resource management?
Mitigation
Additional actions
cc @darcywong00 @tim-eves