Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,7 @@ Load these only when the task touches the topic.
- **[NAPI bridge](docs-internal/engine/napi-bridge.md)** — TSF callback slots, `ActorContextShared` cache reset, `#[napi(object)]` payload rules, cancellation token bridging, error prefix encoding. Read before touching `rivetkit-napi`.
- **[Envoy load balancing](docs-internal/engine/envoy-load-balancing.md)** — Hash-ring layout, virtual nodes, allocator flow, stale-envoy expiry, and tuning. Read before touching pegboard envoy allocation.
- **[BARE protocol crates](docs-internal/engine/bare-protocol-crates.md)** — vbare schema ordering, identity converters, `build.rs` TS codec generation pattern. Read before adding/changing protocol crates.
- **[Depot SQLite overview](docs-internal/engine/depot/overview.md)** — high-level map of the per-actor SQLite storage system: VFS↔depot-client↔depot, deltas/PIDX/shards, the read/write/commit path (inline vs remote envoy), compaction, GC, forking/pinning, and PITR. Start here, then drill into the `depot/` reference docs.
- **[SQLite VFS parity](docs-internal/engine/sqlite-vfs.md)** — native Rust VFS ↔ WASM TypeScript VFS 1:1 parity rule, v2 storage keys, chunk layout, delete/truncate strategy. Read before touching either VFS.
- **[SQLite optimizations](docs-internal/engine/SQLITE_OPTIMIZATIONS.md)** — brief tracker for SQLite cold-read, VFS, storage, preload, and benchmark optimization ideas.
- **[TLS trust roots](docs-internal/engine/tls-trust-roots.md)** — rustls native+webpki union rationale, which clients use which backend.
Expand Down
72 changes: 0 additions & 72 deletions docs-internal/engine/depot.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# SQLite PITR Comparison To Other Systems

This design borrows proven ideas from adjacent systems, but the constraints are different: Rivet has single-writer database ownership, no local SQLite files, FDB as the source of truth, and storage-level fork primitives instead of storage-level rollback.
This design borrows proven ideas from adjacent systems, but the constraints are different: Rivet has single-writer database ownership, no local SQLite files, UDB as the source of truth, and storage-level fork primitives instead of storage-level rollback.

| System | What We Share | What We Diverge On | Why |
|---|---|---|---|
| Neon | Layer model, branching, dependency-graph GC. | Rough PITR by default instead of exact PITR everywhere; FDB is the durable page store instead of a pageserver. | Exact PITR is valuable for Postgres workloads but too expensive as the default for these database databases. |
| Cloudflare Durable Objects SQLite | RestorePoint-like time tokens and the idea that snapshots can be built from log state. | Durable Objects use a follower quorum and do not expose fork primitives. | FDB replaces the multi-replica WAL quorum. Forking and bucket cloning are first-class goals here. |
| Neon | Layer model, branching, dependency-graph GC. | Rough PITR by default instead of exact PITR everywhere; UDB is the durable page store instead of a pageserver. | Exact PITR is valuable for Postgres workloads but too expensive as the default for these database databases. |
| Cloudflare Durable Objects SQLite | RestorePoint-like time tokens and the idea that snapshots can be built from log state. | Durable Objects use a follower quorum and do not expose fork primitives. | UDB replaces the multi-replica WAL quorum. Forking and bucket cloning are first-class goals here. |
| Snowflake | Time travel and zero-copy clone by metadata. | Snowflake is OLAP/table-oriented; this storage layer is per-SQLite-database and exposes lower-level primitives to the engine. | The metadata-only clone idea carries over, but the unit of identity is a database branch, not a warehouse/table abstraction. |
| LiteFS | LTX file format and high-water-mark pending markers. | LiteFS uses local SQLite files and WAL replication. This design forbids local database files and builds PITR around branches. | Stateless database hosting cannot depend on local files. Branchable storage needs graph retention, not only replica catch-up. |
| Litestream | LTX-style incremental backup and rolling post-apply checksum. | Litestream backs up one SQLite database stream. It has no branch graph, bucket fork, or FDB tier. | Litestream answers "can I restore this database?" This design answers "can I fork this database or bucket cheaply?" |
| Litestream | LTX-style incremental backup and rolling post-apply checksum. | Litestream backs up one SQLite database stream. It has no branch graph, bucket fork, or UDB tier. | Litestream answers "can I restore this database?" This design answers "can I fork this database or bucket cheaply?" |
| mvSQLite | Versionstamp awareness as a concept. | mvSQLite's multi-writer PLCC/DLCC/MPC machinery and content-addressed dedup are deliberately not adopted. | Pegboard already guarantees a single writer per database. Multi-writer conflict machinery would add cost without buying correctness. |
| Turso/libSQL | Point-in-time fork/branch as a user-facing primitive. | Turso uses local SQLite files with replication and treats rollback as a storage operation. This design pushes rollback to the engine layer and exposes only fork/delete/restore_point primitives. | Keeping rollback out of storage removes mutable pointer swaps, pointer history, frozen states, and commit-vs-rollback races. |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ Responsibilities:
- Maintain `META/head`, quota counters, and access-touch manifest fields.
- Update `SQLITE_CMP_DIRTY/{database_branch_id}` and send throttled `DeltasAvailable` workflow wakeups when hot lag crosses compaction thresholds.
- Create buckets, create databases, fork buckets, fork databases, and write branch records/catalog markers.
- Create and resolve restore points. Pinned restore points write FDB pins directly and start as `PinStatus::Ready`.
- Create and resolve restore points. Pinned restore points write UDB pins directly and start as `PinStatus::Ready`.

Lease ownership: none. Correctness relies on Pegboard single-writer exclusivity for a live database plus FDB transaction fences. The conveyer must not take compactor leases.
Lease ownership: none. Correctness relies on Pegboard single-writer exclusivity for a live database plus UDB transaction fences. The conveyer must not take compactor leases.

## Workflow Compaction

Expand All @@ -26,11 +26,11 @@ The workflow compaction path uses one persistent DB manager plus hot and reclaim
Responsibilities:

- Coalesce commit wakeups through `SQLITE_CMP_DIRTY/{database_branch_id}` and `DeltasAvailable` signals.
- Plan hot jobs from current FDB state instead of trusting signal payloads.
- Plan hot jobs from current UDB state instead of trusting signal payloads.
- Carry the branch lifecycle generation through planned jobs and reject stale stage, publish, or reclaim work after branch deletion or recreation.
- Have the hot companion write staged shard blobs under `CMP/stage/{job_id}/hot_shard`.
- Install matching hot job output by copying staged blobs to reader-visible `SHARD`, advancing `CMP/root`, and compare-and-clearing expected PIDX rows.
- Have the reclaimer delete only manager-authorized FDB rows and stale staged output.
- Have the reclaimer delete only manager-authorized UDB rows and stale staged output.
- Keep automatic PITR interval coverage and retained restore point pins live until reclaim can prove they are no longer needed.
- Stop the manager and companion workflows through `DestroyDatabaseBranch` when a database branch is no longer live.

Expand All @@ -42,6 +42,6 @@ Lease ownership: none. Gasoline workflow uniqueness uses only the database branc
|---|---|---|
| Conveyer | `META/head`, `COMMITS`, `VTX`, `PIDX`, `DELTA`, branch records, restore points | None |
| Workflow DB manager | `CMP/root`, live `SHARD`, `PITR_INTERVAL`, matching PIDX clears | None |
| Workflow companions | Staged hot output and manager-authorized FDB cleanup | None |
| Workflow companions | Staged hot output and manager-authorized UDB cleanup | None |

The components share branch metadata and pin counters, but each mutable manifest field has one owner.
Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,28 @@ This page records the constraints that shape the PITR/forking storage design. Th
## Binding Constraints

- **Single writer per database.** Pegboard exclusivity is the release-mode concurrency fence. Storage does not implement multi-writer conflict resolution.
- **No local SQLite files.** The durable database state is in FDB. Local files would make storage stateful and non-migratable.
- **Lazy reads.** Forks do not copy data. Reads walk branch ancestry and hydrate from FDB DELTA/SHARD rows only when needed.
- **No local SQLite files.** The durable database state is in UDB. Local files would make storage stateful and non-migratable.
- **Lazy reads.** Forks do not copy data. Reads walk branch ancestry and hydrate from UDB DELTA/SHARD rows only when needed.
- **Per-commit granularity.** PITR targets commits/versionstamps, not individual WAL frames inside a commit.
- **FDB is the source of truth.** OSS Depot has no object-backed cold tier.
- **UDB is the source of truth.** OSS Depot has no object-backed cold tier.
- **Branches are immutable.** A bucket id is its bucket branch id, and a database id is its database branch id.
- **Rollback is engine-owned.** Storage exposes fork primitives; the engine decides which database id a database currently uses.
- **Persisted wire/storage records use vbare.** Raw fixed-width bytes are reserved for atomic counters and simple indexes such as `VTX`.

## Rough PITR By Default

The design keeps rough PITR cheap by preserving enough FDB history for branch-at-position recovery without writing a full image for every commit. Exact recovery is opt-in through restore points, which write FDB history pins that workflow compaction must preserve.
The design keeps rough PITR cheap by preserving enough UDB history for branch-at-position recovery without writing a full image for every commit. Exact recovery is opt-in through restore points, which write UDB history pins that workflow compaction must preserve.

Compared with Neon's exact-PITR posture, this trades precision for lower steady-state cost. That fits Rivet Database-style workloads where "fork near this point" is usually enough, and exact restore points can be created explicitly for critical moments.

## Pages Are Self-Describing

LTX layers carry page numbers and checksums. That lets the system move bytes between DELTA and SHARD rows without a separate opaque page map. FDB PIDX remains the hot routing index.
LTX layers carry page numbers and checksums. That lets the system move bytes between DELTA and SHARD rows without a separate opaque page map. UDB PIDX remains the hot routing index.

The result is an LSM-shaped flow:

- L0: DELTAs in FDB.
- L1: versioned SHARDs in FDB.
- L0: DELTAs in UDB.
- L1: versioned SHARDs in UDB.

## Why Versioned SHARDs

Expand Down Expand Up @@ -127,7 +127,7 @@ align to are durable rows recorded per database branch:
- `DB_PIN` rows: concrete `(txid, versionstamp)` for restore points, database
forks, and bucket forks.

Snapping then compares **FoundationDB versionstamps** (monotonic, globally
Snapping then compares **UDB versionstamps** (monotonic, globally
ordered commit tokens) against those recorded rows: it picks the covered row
with the largest txid whose `versionstamp <= fork_versionstamp`. No clock is
consulted in that decision. A bucket fork carries one `fork_versionstamp`, and
Expand Down
Loading
Loading