Skip to content

Add tlog_witness crate and witness_worker implementing c2sp.org/tlog-witness#218

Open
lukevalenta wants to merge 1 commit intomainfrom
lvalenta/tlog-witness
Open

Add tlog_witness crate and witness_worker implementing c2sp.org/tlog-witness#218
lukevalenta wants to merge 1 commit intomainfrom
lvalenta/tlog-witness

Conversation

@lukevalenta
Copy link
Copy Markdown
Contributor

@lukevalenta lukevalenta commented Apr 18, 2026

Introduces two new crates and an integration-test suite.

tlog_witness (spec-level, publishable to crates.io)

  • AddCheckpointRequest + parsers/serializers for the POST /add-checkpoint
    body format (old <N>\n<proof-lines>\n\n<checkpoint-note>) in both
    directions (log → witness requests, witness → log responses).
  • MAX_CONSISTENCY_PROOF_LINES = 63, matching the spec.
  • CONTENT_TYPE_TLOG_SIZE = "text/x.tlog.size" for 409 response bodies.
  • TlogWitnessError with variants for malformed request / response / note.
  • 12 unit tests covering happy path, round-trips, spec-example shape, and
    malformed-input rejection.

witness_worker (Cloudflare Worker deployment)

  • POST /add-checkpoint implementing all the MUSTs from the spec:
    • Look up log by origin → 404 on unknown.
    • Verify the checkpoint note via signed_note::Note::verify. Both
      UnverifiedNote (no trusted-key signature at all) and
      InvalidSignature (a trusted-key signature that failed to verify,
      making the note malformed per c2sp.org/signed-note) are surfaced as
      403 Forbidden, matching sunlight and sigsum-go. The specific
      NoteError variant is logged for debugging; other variants are
      surfaced as 400 Bad Request.
    • Range-check old_size <= checkpoint.size → 400 on violation.
    • Single-RPC atomic check-proof-and-update against the per-origin
      WitnessState Durable Object. The DO compares old_size against
      stored state (409 with text/x.tlog.size body on mismatch or
      same-size-different-hash), verifies the Merkle consistency proof
      against its stored latest_hash (422 on failure), and writes the
      new (size, hash). Atomicity comes from Durable Objects' input and
      output gates, documented inline at the handler; verifying the proof
      inside the same handler that reads and writes state keeps the whole
      sequence inside one atomic transaction and one DO round trip.
    • On 200 from the DO, produce and return a cosignature/v1 signature.
  • GET /metadata returns the witness's identity (name + DER SPKI),
    description, submission/monitoring URL prefixes, and the full list of
    configured logs (origin + trusted public keys). description fields
    are Option<&str> with #[serde(skip_serializing_if = "Option::is_none")]
    so they are omitted (not null) when unset; regression tests pin this.
  • WitnessState Durable Object (one per log origin, keyed by origin) holds
    per-origin (latest_size, latest_hash) state. The DO's on-disk JSON
    layout and the internal RPC wire shape (including the embedded
    consistency-proof hashes) are pinned by regression tests so format drift
    is caught before it can orphan deployed state.
  • Config model (per-log origin + log_public_keys SPKI list +
    top-level witness_name, submission_prefix, monitoring_prefix;
    WITNESS_SIGNING_KEY secret) validated at build time against a JSON
    Schema.
  • Produces cosignature/v1 signatures using the existing
    tlog_tiles::CosignatureV1CheckpointSigner (Ed25519-only, matching the
    witness spec's algorithm requirement).

Integration tests

  • New crates/integration_tests/tests/tlog_witness.rs drives the witness
    end-to-end against a live wrangler dev instance: /metadata, two
    successful add-checkpoint submissions (with cosignature verification
    against the metadata-advertised public key), then 409 / 404 / 403 / 422 /
    400 error cases — including the trusted-(name,id)-with-bad-signature
    case that verifies the 403-on-InvalidSignature behavior. The cases are
    sequenced inside a single #[tokio::test] threaded through one
    in-memory ToyLog so the witness's persisted state evolves
    predictably and consistency proofs produced by the test always match
    what the witness expects.
  • New integration-tlog-witness CI job in
    .github/workflows/integration.yml builds the worker, starts
    wrangler dev, waits for /metadata to respond, then runs the test
    suite.
  • crates/witness_worker/config.dev.json now ships an Ed25519 SPKI whose
    matching seed is embedded in the test source as LOG_SIGNING_KEY_PEM.
    This keypair is dev-only, clearly documented as such, and never used in
    production.

This witness is deployable independently of any MTC work and can cosign
checkpoints from any signed-note-based log (static-ct, MTC, sigsum, etc.).

Comment thread crates/tlog_witness/src/add_checkpoint.rs Outdated
Comment thread crates/witness_worker/src/frontend_worker.rs
Comment thread crates/witness_worker/src/frontend_worker.rs Outdated
Comment thread crates/witness_worker/src/frontend_worker.rs Outdated
Comment thread crates/witness_worker/src/lib.rs Outdated
Comment thread crates/witness_worker/src/witness_state_do.rs
Comment thread crates/witness_worker/README.md Outdated
Comment thread crates/witness_worker/README.md Outdated
Comment thread crates/witness_worker/config.schema.json Outdated
@lukevalenta lukevalenta force-pushed the lvalenta/tlog-witness branch 4 times, most recently from 1c6863c to 5c882c8 Compare April 19, 2026 11:29
@lukevalenta
Copy link
Copy Markdown
Contributor Author

/bonk review

@ask-bonk
Copy link
Copy Markdown

ask-bonk bot commented Apr 19, 2026

UnknownError: ProviderInitError

github run

@ask-bonk
Copy link
Copy Markdown

ask-bonk bot commented Apr 19, 2026

@lukevalenta Bonk workflow failed. Check the logs for details.

View workflow run · To retry, trigger Bonk again.

@ask-bonk
Copy link
Copy Markdown

ask-bonk bot commented Apr 19, 2026

UnknownError: ProviderInitError

github run

@lukevalenta
Copy link
Copy Markdown
Contributor Author

/bonk review

@ask-bonk
Copy link
Copy Markdown

ask-bonk bot commented Apr 20, 2026

UnknownError: ProviderInitError

github run

1 similar comment
@ask-bonk
Copy link
Copy Markdown

ask-bonk bot commented Apr 20, 2026

UnknownError: ProviderInitError

github run

@lukevalenta lukevalenta force-pushed the lvalenta/tlog-witness branch from 1029452 to d273275 Compare April 20, 2026 14:39
@lukevalenta
Copy link
Copy Markdown
Contributor Author

/bonk review

Copy link
Copy Markdown

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — bonk pass

Solid PR overall. The spec-level / worker-level split is clean, the DO atomicity reasoning is well-argued (and well-documented inline), and the integration tests are genuinely end-to-end. The review below flags a few real issues and a pile of smaller nits.

Correctness

1. config.schema.json is missing the top-level description field.
AppConfig deserializes description: Option<String>, but the schema uses "additionalProperties": false and does not list description under properties. Any config that actually sets description will fail jsonschema::validate at build time. Either add description to the schema, or drop it from AppConfig. config.dev.json happens to set it — I'm surprised the build doesn't already reject it. Worth checking whether the dev build actually runs schema validation on the file with description set.

2. Note::verify can return NoteError::AmbiguousKey, which the witness surfaces as 400.
VerifierList::verifier returns AmbiguousKey when two configured keys share the same (name, key_id). For Ed25519 that requires a 32-bit key_id collision on two distinct SPKIs (~2^-32), so this is extremely unlikely in practice, but:

  • If it does happen it silently locks the log out (400 Bad Request returned to the log on every submission, rather than a loud operator-facing error),
  • and it's not covered by build.rs validation either.

Either reject ambiguous configured SPKIs at build time (preferred), or at least log at error level and map the case to 500 rather than 400 so it's visible in the worker logs.

3. Origin-mismatch check is a debug_assert_eq!.
frontend_worker.rs:220 asserts cp_text.origin() == origin under debug_assert_eq!. In release this is a no-op. checkpoint_origin uses text.lines().next() while CheckpointText::from_bytes does stricter parsing — if they ever disagree, the release build silently trusts the looser check for log-lookup purposes, then cosigns a CheckpointText whose origin may differ. Swap for a real if cp_text.origin() != origin { return Response::error(..., 400); } — it costs nothing and removes a sharp edge.

4. No upper bound on req.bytes() in add_checkpoint.
frontend_worker.rs:151 unconditionally buffers the whole request body. A reasonable upper bound is MAX_NOTE_SIZE (signed_note's cap) + ~5 KB for the control block (old N\n + up to 63 base64 hashes) ≈ 20 KB. Anything bigger is guaranteed to be rejected downstream anyway; rejecting it at the read boundary avoids accidental amplification.

5. InvalidSignature mapped to 403 is defensible but not the only reasonable choice.
The inline comment argues this matches sunlight/sigsum-go, which is true. But the signed-note spec calls this a malformed note, and another reasonable mapping is 400. Your choice is consistent with the prior art, and the commit message explains why, so I think 403 is fine — just noting that sunlight's choice isn't itself justified by the tlog-witness spec, so if that behavior ever changes upstream you'll want to revisit.

Style / maintainability

6. monitoring_prefix: String + is_empty() sentinel → use Option<String>.
config::AppConfig::monitoring_prefix uses #[serde(default)] + is_empty() to mean "absent". Option<String> is self-documenting and removes the "did the operator deliberately configure an empty string?" ambiguity. Same pattern the schema uses for logs.*.description via Option.

7. load_witness_public_key_der recomputes the SPKI on every /metadata hit.
Small but it's a LazyLock<Vec<u8>> / OnceLock<Vec<u8>> away from being free. Low-traffic endpoint though, so low priority.

8. base64 dep in witness_worker_config is only used via the Base64 helper in serde_with.
You already depend on serde_with which bundles base64 via its base64 feature. Check whether dropping the direct base64.workspace = true dep from config/Cargo.toml is possible — cargo machete will tell you.

9. checkpoint_origin(&checkpoint) is used only once; inline it or delete it in favor of CheckpointText::origin().
CheckpointText::from_bytes already parses and validates the first line, so you can flip the order: parse cp_text first, then look up the log by cp_text.origin(). That removes the checkpoint_origin helper, removes the debug_assert_eq! entirely, and fixes issue #3 for free.

10. hash_hex::to_hex.
The manual hex loop reinvents what hex::encode does in three characters. Either pull in the hex crate (already a transitive dep via several things in the workspace, worth checking) or use format!("{:02x?}", ...)-style. Not blocking.

11. config::Base64 is a private duplicate of serde_with::base64::Base64.
The config crate already depends on serde_with. Replacing your custom Base64 helper with serde_with::base64::Base64 (as frontend_worker.rs:36 already does) would shrink config/src/lib.rs by ~10 lines and drop the direct base64 dep. LogParams::log_public_keys would become #[serde_as(as = "Vec<serde_with::base64::Base64>")].

12. AddCheckpointRequest::consistency_proof uses Vec<Hash>; consider documenting the empty-vs-non-empty invariants in the type.
The spec says the proof MUST be empty when old_size == 0 or old_size == new_size. That's validated in the DO today, but the parser-level type doesn't encode it. Fine as-is (keeps the parser general), but a doc sentence on AddCheckpointRequest::consistency_proof would help.

13. Tests for the response format are asymmetric.
request_matches_spec_example_shape pins the request shape; there's no equivalent for the response. A one-liner asserting serialize_add_checkpoint_response(&[ns]) starts with "\u{2014} " and ends with "\n" would lock that down.

14. tlog_witness/src/lib.rs re-exports via pub use.
The re-export is fine, but dropping pub mod add_checkpoint; (keeping only the pub use) hides the inner module name from downstream docs. Or the reverse — pick one and be consistent; right now both the module and the re-exports are pub, so tlog_witness::add_checkpoint::AddCheckpointRequest and tlog_witness::AddCheckpointRequest are both valid paths.

Integration tests

15. LOG_SIGNING_KEY_PEM committed to the repo.
Already called out in the source comment and commit message. Worth adding a belt-and-braces check: a #[test] in witness_worker that reads config.dev.json, parses the committed SPKI, and asserts it matches the verifying key derived from the PEM in the integration test. Otherwise a future operator who rotates the dev key without touching LOG_SIGNING_KEY_PEM gets a confusing 403 at integration-test time instead of a targeted unit-test failure.

16. Integration test step (7) (trusted (name,id) with bad sig bytes) is excellent.
That's the case most witness implementations get wrong. Glad to see it explicitly.

17. wait_for_witness loops 30 × 2s = 60 s.
Matches the CI workflow's own wait loop. Fine, just double-booked.

18. .github/workflows/integration.yml installs wasm-pack but never invokes it.
The witness build uses worker-build, which is installed on the next line. wasm-pack install appears to be copy-pasted from a sibling job — it's ~30 s of wasted CI time. Worth dropping unless I'm missing something.

Nits

  • witness_state_do.rs:147 error message "consistency proof must be empty when old_size == checkpoint size" is good; consider the symmetric version for old_size == 0 to also mention "first cosignature" for debuggability.
  • frontend_worker.rs:297 sets content-type: text/plain; charset=utf-8 on the success response. The spec doesn't mandate a specific content type here — fine, but sunlight uses text/x.signed-checkpoint or similar on some responses; worth a sanity-check against what clients expect.
  • AGENTS.md mentions cargo test excludes integration_tests by default. Confirmed this PR's new test respects that (it's in integration_tests).

Pre-push checks

Per AGENTS.md, please confirm on your side:

cargo clippy --workspace --all-targets -- -Dwarnings -Dclippy::pedantic
cargo test
cargo fmt --all --check
cargo machete

all pass — in particular -Dclippy::pedantic on the new crates.

Nice work overall.

@ask-bonk
Copy link
Copy Markdown

ask-bonk bot commented Apr 20, 2026

Posted a review on PR #218. The main findings:

Correctness issues worth fixing:

  1. config.schema.json is missing the top-level description field (while AppConfig accepts it and config.dev.json sets it). With additionalProperties: false this should be failing build-time schema validation.
  2. NoteError::AmbiguousKey is mapped to 400 in add_checkpoint, which silently locks out a log on colliding Ed25519 key_ids. Should be validated at build time or surfaced as 500 + logged loudly.
  3. The origin-match check in add_checkpoint is debug_assert_eq!, a no-op in release. Also suggested reordering to parse CheckpointText first and look up the log by cp_text.origin(), which eliminates the checkpoint_origin helper entirely.
  4. req.bytes() in add_checkpoint has no size cap.

Style / maintainability: monitoring_prefix should be Option<String> (not is_empty() sentinel); the config crate's custom Base64 helper duplicates serde_with::base64::Base64; hash_hex::to_hex reinvents hex::encode; wasm-pack install step in the new CI job appears unused.

Tests: Commended the trusted-(name,id)-with-bad-sig case. Suggested a unit test in witness_worker that cross-checks the committed dev SPKI against the PEM embedded in the integration test.

Also noted that the 403-for-InvalidSignature choice is defensible (matches sunlight/sigsum-go) even though the spec itself is silent on it.

github run

…witness

Introduces two new crates and an integration-test suite.

tlog_witness (spec-level, publishable to crates.io)

- `AddCheckpointRequest` + parsers/serializers for the `POST /add-checkpoint`
  body format (`old <N>\n<proof-lines>\n\n<checkpoint-note>`) in both
  directions (log → witness requests, witness → log responses).
- `MAX_CONSISTENCY_PROOF_LINES = 63`, matching the spec.
- `CONTENT_TYPE_TLOG_SIZE = "text/x.tlog.size"` for 409 response bodies.
- `TlogWitnessError` with variants for malformed request / response / note.
- 15 unit tests covering happy path, round-trips, spec-example shape,
  malformed-input rejection, and response wire-format shape.
- Internal module structure: the `add_checkpoint` module is private;
  all public items are re-exported from the crate root.

witness_worker (Cloudflare Worker deployment)

- `POST /add-checkpoint` implementing all the MUSTs from the spec:
  - Caps the request body at 1 MiB + 16 KiB before buffering to bound
    memory use on malformed or malicious input.
  - Parses `CheckpointText` from the note body first and uses
    `cp_text.origin()` for the log lookup so a single parse is
    authoritative for both the 404 check and the origin field.
  - Looks up configured log keys from `LOG_KEYS` — a `LazyLock`-built
    map populated at first access. Any non-Ed25519 SPKI or `(name,
    key_id)` collision across two configured keys for the same log
    causes the worker to abort at startup rather than failing silently
    at request time.
  - Verifies the checkpoint note via `signed_note::Note::verify`. Both
    `UnverifiedNote` (no trusted-key signature at all) and
    `InvalidSignature` (a trusted-key signature that failed to verify,
    making the note malformed per c2sp.org/signed-note) are surfaced as
    403 Forbidden, matching sunlight and sigsum-go. Other variants are
    surfaced as 400 Bad Request.
  - Range-check `old_size <= checkpoint.size` → 400 on violation.
  - Single-RPC atomic check-proof-and-update against the per-origin
    `WitnessState` Durable Object. The DO compares `old_size` against
    stored state (409 with `text/x.tlog.size` body on mismatch or
    same-size-different-hash), verifies the Merkle consistency proof
    against its stored `latest_hash` (422 on failure), and writes the
    new `(size, hash)`. Atomicity comes from Durable Objects' input and
    output gates, documented inline at the handler; verifying the proof
    inside the same handler that reads and writes state keeps the whole
    sequence inside one atomic transaction.
  - On 200 from the DO, produce and return a `cosignature/v1` signature.
- `GET /metadata` returns the witness's identity (name + DER SPKI),
  optional description, submission/monitoring URL prefixes, and the full
  list of configured logs. The witness SPKI is computed once and cached in
  a `OnceLock`. `description` and `monitoring_prefix` are
  `Option<String>`; `description` is omitted from the JSON when absent
  (not serialized as `null`); `monitoring_prefix` falls back to
  `submission_prefix` when absent. Regression tests pin the serde shape.
- `WitnessState` Durable Object holds per-origin `(latest_size,
  latest_hash)` state. The on-disk JSON layout and the internal RPC wire
  shape (including the embedded consistency-proof hashes) are pinned by
  regression tests so format drift is caught before it can orphan deployed
  state. Hex encoding uses the `hex` crate throughout.
- Config model (per-log `origin` + `log_public_keys` SPKI list, parsed
  via `serde_with::base64::Base64`; top-level `witness_name`,
  `submission_prefix`, optional `monitoring_prefix`; `WITNESS_SIGNING_KEY`
  secret) validated at build time against a JSON Schema.
- Produces `cosignature/v1` signatures using the existing
  `tlog_tiles::CosignatureV1CheckpointSigner` (Ed25519-only).

Integration tests

- New `crates/integration_tests/tests/tlog_witness.rs` drives the witness
  end-to-end against a live `wrangler dev` instance: `/metadata`, two
  successful `add-checkpoint` submissions (with cosignature verification
  against the metadata-advertised public key), then 409 / 404 / 403 / 422 /
  400 error cases — including the trusted-(name,id)-with-bad-signature
  case that verifies the 403-on-`InvalidSignature` behavior. The cases are
  sequenced inside a single `#[tokio::test]` threaded through one
  in-memory `ToyLog` so state evolves predictably and consistency proofs
  always match what the witness expects.
- A unit test in `witness_worker` cross-checks that the SPKI in
  `config.dev.json` matches the `LOG_SIGNING_KEY_PEM` embedded in the
  integration tests; a mismatch is caught at `cargo test` time rather than
  as a confusing 403 at integration-test time.
- New `integration-tlog-witness` CI job builds the worker, starts
  `wrangler dev`, and runs the test suite. The job intentionally omits
  `wasm-pack` (not needed for `witness_worker`, unlike the CT/MTC jobs).

This witness is deployable independently of any MTC work and can cosign
checkpoints from any `signed-note`-based log (static-ct, MTC, sigsum, etc.).
@lukevalenta lukevalenta force-pushed the lvalenta/tlog-witness branch from d273275 to 75f8f35 Compare April 20, 2026 15:58
@lukevalenta
Copy link
Copy Markdown
Contributor Author

/bonk review again

@ask-bonk
Copy link
Copy Markdown

ask-bonk bot commented Apr 20, 2026

All pre-push checks pass. The code already addresses essentially all of the actionable bonk review items. Let me write a response to the PR author summarizing the status.

Based on my analysis of the current PR state, here's my summary:

Status of bonk review items on PR #218

I've audited the current branch against the bonk review. Almost every actionable item is already addressed in the committed code:

Correctness items — all addressed

  1. config.schema.json missing description — bonk was wrong. The schema lists description under top-level properties (config.schema.json:16-18) and per-log (:35). With additionalProperties: false this already validates.
  2. NoteError::AmbiguousKey silently returns 400 — addressed at startup, not runtime: parse_log_keys (src/lib.rs:84-111) asserts on (name, key_id) collisions when building LOG_KEYS, so the worker aborts at init with a loud operator-facing panic rather than 400-ing every submission.
  3. debug_assert_eq! origin check — already gone. frontend_worker.rs:181-191 parses CheckpointText first and looks up the log by cp_text.origin(). The checkpoint_origin helper no longer exists.
  4. Unbounded req.bytes() — capped at MAX_ADD_CHECKPOINT_BODY_SIZE = 1 MiB + 16 KiB (frontend_worker.rs:155-161, 269), with a comment deriving the bound from MAX_NOTE_SIZE + envelope headroom.
  5. InvalidSignature → 403 — kept, with the spec-silence caveat documented inline (frontend_worker.rs:193-210).

Style / maintainability — all addressed

  1. monitoring_prefix: Option<String> — done (config/src/lib.rs:37), with fallback to submission_prefix in /metadata (frontend_worker.rs:114-117).
  2. Cached SPKIWITNESS_PUBLIC_KEY_DER: OnceLock<Vec<u8>> (src/lib.rs:135, populated in load_witness_public_key_der at :182-193).
  3. Extraneous base64 dep — not present; witness_worker/config/Cargo.toml only depends on serde and serde_with.
  4. Private Base64 duplicate — the config crate already uses serde_with::base64::Base64 directly (config/src/lib.rs:18,70).
  5. Response-shape pin testresponse_matches_spec_line_shape (tlog_witness/src/add_checkpoint.rs:382-397) asserts leading "\u{2014} " and trailing \n.

Tests — addressed

  1. Cross-check testdev_config_tests::dev_config_spki_matches_embedded_pem in src/lib.rs:223-244 decodes the committed dev SPKI and the integration-test PEM and asserts equality. A dev-key rotation that forgets to update both will now fail a targeted unit test, not a confusing 403 in integration.
  2. wasm-pack in CI — removed from the witness job with an explanatory comment (.github/workflows/integration.yml:122-124).

Remaining open items (all low priority or not actionable)

Pre-push checks

All four pass locally on this branch:

cargo clippy --workspace --all-targets -- -Dwarnings -Dclippy::pedantic   ✓
cargo test                                                                ✓
cargo fmt --all --check                                                   ✓
cargo machete                                                             ✓

No code changes are required from the bonk review. If you'd like me to harden any of the remaining low-priority nits (e.g. item #12's stronger parser-type encoding, or a more precise content-type on the 200 response per nit #2), say the word and I'll push a follow-up commit. Otherwise the PR is ready as-is from the review-response side.

github run

@lukevalenta lukevalenta marked this pull request as ready for review April 20, 2026 18:15
@lukevalenta lukevalenta requested review from a team and lbaquerofierro as code owners April 20, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant