Enable the bsddb deadlock detector by otargowski · Pull Request #90 · sio2project/filetracker

otargowski · 2026-02-27T16:53:40Z

bsddb seems to use page-level locks, which necessitate running a deadlock detector, as two parallel transactions making e.g. two writes each can deadlock regardless of whether any two of those four writes access the same key, since they may be on the same page.
From the documentation:

"[deadlock detection] is necessary for almost all applications in
which more than a single thread of control will be accessing the
database at one time. Even when Berkeley DB automatically handles
database locking, it is normally possible for deadlock to occur."

Almost all of my encounters with this issue were on fresh (a few days old) SIO2 instances.
My guess as to why it didn't (to my best knowledge) happen on SZKOpuł is that when the migration from filetracker 1 was done, the database was already quite big, so the chances of two keys being on the same page were quite low, especially since usually not many PUTs are done in parallel.

I added a test demonstrating the issue. On my machine, it reproduced the deadlocks quite reliably.

The fix involves running a deadlock detector pass every time a lock isn't immediately available, which perhaps kills one of the transactions. This required adding retries to transactions.
The performance penalty from this should be negligible, especially since we don't have many conflicts anyway (thanks to using per-link and per-blob file locks).
Documentation of the needed setting: https://bsdwatch.net/docs/sharedocs/db5/api_reference/C/envset_lk_detect.html.

Also wrap transactions to retry after terminations from the deadlock detector. The reason for these changes is bsddb's use of page-level locks. According to the underlying library's documentation: "[deadlock detection] is necessary for almost all applications in which more than a single thread of control will be accessing the database at one time. Even when Berkeley DB automatically handles database locking, it is normally possible for deadlock to occur." Source (with more info): https://docs.oracle.com/database/bdb181/html/programmer_reference/transapp_deadlock.html https://web.archive.org/web/20260227161711/https://docs.oracle.com/database/bdb181/html/programmer_reference/transapp_deadlock.html

otargowski · 2026-02-27T20:31:56Z

My guess as to why it didn't (to my best knowledge) happen on SZKOpuł

Nvm, SZKOpuł's filetracker just went down and at the start of the unusual part of the logs is:

2026-02-27 20:47:05 CRITICAL (126976) gunicorn.error  WORKER TIMEOUT (pid:127034)
2026-02-27 20:47:05 CRITICAL (126976) gunicorn.error  WORKER TIMEOUT (pid:127042)
2026-02-27 20:47:06 INFO     (126976) gunicorn.config child_exit() hook: sending SIGTERM to ourselves
2026-02-27 20:47:06 INFO     (126976) gunicorn.config child_exit() hook: sending SIGTERM to ourselves
2026-02-27 20:47:06 INFO     (28777) gunicorn.error  Booting worker with pid: 28777
2026-02-27 20:47:07 INFO     (28778) gunicorn.error  Booting worker with pid: 28778

These WORKER TIMEOUTs look just like a deadlock... and cause those processes to exit ungracefully, which corrupts the bsddb and requires a restart and an absurdly slow recovery.

Start of previous downtime:

2026-01-27 12:55:03 CRITICAL (5152 ) gunicorn.error  WORKER TIMEOUT (pid:5179)
2026-01-27 12:55:03 CRITICAL (5152 ) gunicorn.error  WORKER TIMEOUT (pid:5250)
2026-01-27 12:55:04 CRITICAL (5152 ) gunicorn.error  WORKER TIMEOUT (pid:5180)
2026-01-27 12:55:04 CRITICAL (5152 ) gunicorn.error  WORKER TIMEOUT (pid:5232)
2026-01-27 12:55:04 INFO     (5152 ) gunicorn.config child_exit() hook: sending SIGTERM to ourselves
2026-01-27 12:55:04 INFO     (5152 ) gunicorn.config child_exit() hook: sending SIGTERM to ourselves
2026-01-27 12:55:04 INFO     (126115) gunicorn.error  Booting worker with pid: 126115
2026-01-27 12:55:04 INFO     (126116) gunicorn.error  Booting worker with pid: 126116

otargowski force-pushed the bsddb-deadlocks branch from 59768a3 to 4e0b474 Compare February 27, 2026 16:55

otargowski added 2 commits February 27, 2026 18:34

Add a test to showcase the bsddb deadlocks

e49bbb9

otargowski force-pushed the bsddb-deadlocks branch from 4e0b474 to b16b792 Compare February 27, 2026 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable the bsddb deadlock detector#90

Enable the bsddb deadlock detector#90
otargowski wants to merge 2 commits intosio2project:masterfrom
otargowski:bsddb-deadlocks

otargowski commented Feb 27, 2026 •

edited

Loading

Uh oh!

otargowski commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

otargowski commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

otargowski commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

otargowski commented Feb 27, 2026 •

edited

Loading