Skip to content

Comments

[WIP] Switch to an SQLite storage backend#3405

Draft
badboy wants to merge 38 commits intomainfrom
sqlite-2026-approach
Draft

[WIP] Switch to an SQLite storage backend#3405
badboy wants to merge 38 commits intomainfrom
sqlite-2026-approach

Conversation

@badboy
Copy link
Member

@badboy badboy commented Feb 20, 2026

No description provided.

This is a modified version of the kvstore/skv implementation:
https://searchfox.org/firefox-main/rev/cced10961b53e0d29e22e635404fec37728b2644/toolkit/components/kvstore/src/skv/connection.rs
Which itself is based on application-service's sql-support.

It's stripped down to what we need in Glean:
* A file-backed database
* A schema set up on start, potentially applying migrations if we need that
* A read-write connection, which is re-used for all access.
This only integrates it into the module tree.
It compiles, but not warning-free.
It fully replaces the Rkv storage. No migration implemented.
The bincode crate isn't maintained anymore.
While it's been stable and without issues for us for years,
switching to anotherformat is easy while we're switching the database anyway.
MessagePack can be even smaller than bincode for the same data (just a couple of bytes here and there).

Whether it's actually faster has not been benchmarked. Compared to
everything else the (de)serialization overhead is probably a small
fraction of the whole thing.

Why do we need serialization anyway?
Ping assembly does not have any knowledge of metrics.
It only knows what's in the database.
So in order to put in in the right place in the ping payload we need to know the type of the stored data.
That data needs to be somewhere.
By serializing the whole value (the `Metric` enum) we can deserialize it
into that enum and the serde part takes care of "knowing" the type.
we should also document some requirements and how it handles when there's no label even
Now that it's just another column this becomes straight-forward to do.
Same way this was done on Rkv: we just some up the size of all files in
the database directory.
…ater point

downside: slightly worse error messages, but maybe we can inline them
…l moments

See all details:
https://sqlite.org/pragma.html#pragma_synchronous

The default (FULL) syncs on every write.
That's slightly higher guarantees, but also costly.
We're already using WAL (write-ahead log). It's safe from corruption in
NORMAL mode and consistent.
It does lose durability, that means data might roll back following a power loss or system crash.

Note: `rkv` does NOT sync at all. It only writes to disk (and moves
files around). That's strictly worse than WAL in `NORMAL` mode.
Note that even _removing that database fully_ does not break the test.
It needs reverts of 3 commits from PR #3068 that added it.
The data was generated with `cargo run -p sample --bin verify -- tmp` on
a Rkv-powered Glean checkout.
The database (`tmp/db/data.safe.bin`) was then copied into glean-core/rlb/tests/rkv-database.safe.bin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant