Add tlog_witness crate and witness_worker implementing c2sp.org/tlog-witness#218
Add tlog_witness crate and witness_worker implementing c2sp.org/tlog-witness#218lukevalenta wants to merge 1 commit intomainfrom
Conversation
1c6863c to
5c882c8
Compare
|
/bonk review |
|
UnknownError: ProviderInitError |
|
@lukevalenta Bonk workflow failed. Check the logs for details. View workflow run · To retry, trigger Bonk again. |
|
UnknownError: ProviderInitError |
5c882c8 to
ec86e84
Compare
ec86e84 to
8156350
Compare
8156350 to
d273275
Compare
|
/bonk review |
|
UnknownError: ProviderInitError |
1 similar comment
|
UnknownError: ProviderInitError |
1029452 to
d273275
Compare
|
/bonk review |
There was a problem hiding this comment.
Review — bonk pass
Solid PR overall. The spec-level / worker-level split is clean, the DO atomicity reasoning is well-argued (and well-documented inline), and the integration tests are genuinely end-to-end. The review below flags a few real issues and a pile of smaller nits.
Correctness
1. config.schema.json is missing the top-level description field.
AppConfig deserializes description: Option<String>, but the schema uses "additionalProperties": false and does not list description under properties. Any config that actually sets description will fail jsonschema::validate at build time. Either add description to the schema, or drop it from AppConfig. config.dev.json happens to set it — I'm surprised the build doesn't already reject it. Worth checking whether the dev build actually runs schema validation on the file with description set.
2. Note::verify can return NoteError::AmbiguousKey, which the witness surfaces as 400.
VerifierList::verifier returns AmbiguousKey when two configured keys share the same (name, key_id). For Ed25519 that requires a 32-bit key_id collision on two distinct SPKIs (~2^-32), so this is extremely unlikely in practice, but:
- If it does happen it silently locks the log out (400 Bad Request returned to the log on every submission, rather than a loud operator-facing error),
- and it's not covered by
build.rsvalidation either.
Either reject ambiguous configured SPKIs at build time (preferred), or at least log at error level and map the case to 500 rather than 400 so it's visible in the worker logs.
3. Origin-mismatch check is a debug_assert_eq!.
frontend_worker.rs:220 asserts cp_text.origin() == origin under debug_assert_eq!. In release this is a no-op. checkpoint_origin uses text.lines().next() while CheckpointText::from_bytes does stricter parsing — if they ever disagree, the release build silently trusts the looser check for log-lookup purposes, then cosigns a CheckpointText whose origin may differ. Swap for a real if cp_text.origin() != origin { return Response::error(..., 400); } — it costs nothing and removes a sharp edge.
4. No upper bound on req.bytes() in add_checkpoint.
frontend_worker.rs:151 unconditionally buffers the whole request body. A reasonable upper bound is MAX_NOTE_SIZE (signed_note's cap) + ~5 KB for the control block (old N\n + up to 63 base64 hashes) ≈ 20 KB. Anything bigger is guaranteed to be rejected downstream anyway; rejecting it at the read boundary avoids accidental amplification.
5. InvalidSignature mapped to 403 is defensible but not the only reasonable choice.
The inline comment argues this matches sunlight/sigsum-go, which is true. But the signed-note spec calls this a malformed note, and another reasonable mapping is 400. Your choice is consistent with the prior art, and the commit message explains why, so I think 403 is fine — just noting that sunlight's choice isn't itself justified by the tlog-witness spec, so if that behavior ever changes upstream you'll want to revisit.
Style / maintainability
6. monitoring_prefix: String + is_empty() sentinel → use Option<String>.
config::AppConfig::monitoring_prefix uses #[serde(default)] + is_empty() to mean "absent". Option<String> is self-documenting and removes the "did the operator deliberately configure an empty string?" ambiguity. Same pattern the schema uses for logs.*.description via Option.
7. load_witness_public_key_der recomputes the SPKI on every /metadata hit.
Small but it's a LazyLock<Vec<u8>> / OnceLock<Vec<u8>> away from being free. Low-traffic endpoint though, so low priority.
8. base64 dep in witness_worker_config is only used via the Base64 helper in serde_with.
You already depend on serde_with which bundles base64 via its base64 feature. Check whether dropping the direct base64.workspace = true dep from config/Cargo.toml is possible — cargo machete will tell you.
9. checkpoint_origin(&checkpoint) is used only once; inline it or delete it in favor of CheckpointText::origin().
CheckpointText::from_bytes already parses and validates the first line, so you can flip the order: parse cp_text first, then look up the log by cp_text.origin(). That removes the checkpoint_origin helper, removes the debug_assert_eq! entirely, and fixes issue #3 for free.
10. hash_hex::to_hex.
The manual hex loop reinvents what hex::encode does in three characters. Either pull in the hex crate (already a transitive dep via several things in the workspace, worth checking) or use format!("{:02x?}", ...)-style. Not blocking.
11. config::Base64 is a private duplicate of serde_with::base64::Base64.
The config crate already depends on serde_with. Replacing your custom Base64 helper with serde_with::base64::Base64 (as frontend_worker.rs:36 already does) would shrink config/src/lib.rs by ~10 lines and drop the direct base64 dep. LogParams::log_public_keys would become #[serde_as(as = "Vec<serde_with::base64::Base64>")].
12. AddCheckpointRequest::consistency_proof uses Vec<Hash>; consider documenting the empty-vs-non-empty invariants in the type.
The spec says the proof MUST be empty when old_size == 0 or old_size == new_size. That's validated in the DO today, but the parser-level type doesn't encode it. Fine as-is (keeps the parser general), but a doc sentence on AddCheckpointRequest::consistency_proof would help.
13. Tests for the response format are asymmetric.
request_matches_spec_example_shape pins the request shape; there's no equivalent for the response. A one-liner asserting serialize_add_checkpoint_response(&[ns]) starts with "\u{2014} " and ends with "\n" would lock that down.
14. tlog_witness/src/lib.rs re-exports via pub use.
The re-export is fine, but dropping pub mod add_checkpoint; (keeping only the pub use) hides the inner module name from downstream docs. Or the reverse — pick one and be consistent; right now both the module and the re-exports are pub, so tlog_witness::add_checkpoint::AddCheckpointRequest and tlog_witness::AddCheckpointRequest are both valid paths.
Integration tests
15. LOG_SIGNING_KEY_PEM committed to the repo.
Already called out in the source comment and commit message. Worth adding a belt-and-braces check: a #[test] in witness_worker that reads config.dev.json, parses the committed SPKI, and asserts it matches the verifying key derived from the PEM in the integration test. Otherwise a future operator who rotates the dev key without touching LOG_SIGNING_KEY_PEM gets a confusing 403 at integration-test time instead of a targeted unit-test failure.
16. Integration test step (7) (trusted (name,id) with bad sig bytes) is excellent.
That's the case most witness implementations get wrong. Glad to see it explicitly.
17. wait_for_witness loops 30 × 2s = 60 s.
Matches the CI workflow's own wait loop. Fine, just double-booked.
18. .github/workflows/integration.yml installs wasm-pack but never invokes it.
The witness build uses worker-build, which is installed on the next line. wasm-pack install appears to be copy-pasted from a sibling job — it's ~30 s of wasted CI time. Worth dropping unless I'm missing something.
Nits
witness_state_do.rs:147error message "consistency proof must be empty when old_size == checkpoint size" is good; consider the symmetric version forold_size == 0to also mention "first cosignature" for debuggability.frontend_worker.rs:297setscontent-type: text/plain; charset=utf-8on the success response. The spec doesn't mandate a specific content type here — fine, but sunlight usestext/x.signed-checkpointor similar on some responses; worth a sanity-check against what clients expect.AGENTS.mdmentionscargo testexcludesintegration_testsby default. Confirmed this PR's new test respects that (it's inintegration_tests).
Pre-push checks
Per AGENTS.md, please confirm on your side:
cargo clippy --workspace --all-targets -- -Dwarnings -Dclippy::pedantic
cargo test
cargo fmt --all --check
cargo machete
all pass — in particular -Dclippy::pedantic on the new crates.
Nice work overall.
|
Posted a review on PR #218. The main findings: Correctness issues worth fixing:
Style / maintainability: Tests: Commended the trusted-(name,id)-with-bad-sig case. Suggested a unit test in Also noted that the 403-for- |
…witness
Introduces two new crates and an integration-test suite.
tlog_witness (spec-level, publishable to crates.io)
- `AddCheckpointRequest` + parsers/serializers for the `POST /add-checkpoint`
body format (`old <N>\n<proof-lines>\n\n<checkpoint-note>`) in both
directions (log → witness requests, witness → log responses).
- `MAX_CONSISTENCY_PROOF_LINES = 63`, matching the spec.
- `CONTENT_TYPE_TLOG_SIZE = "text/x.tlog.size"` for 409 response bodies.
- `TlogWitnessError` with variants for malformed request / response / note.
- 15 unit tests covering happy path, round-trips, spec-example shape,
malformed-input rejection, and response wire-format shape.
- Internal module structure: the `add_checkpoint` module is private;
all public items are re-exported from the crate root.
witness_worker (Cloudflare Worker deployment)
- `POST /add-checkpoint` implementing all the MUSTs from the spec:
- Caps the request body at 1 MiB + 16 KiB before buffering to bound
memory use on malformed or malicious input.
- Parses `CheckpointText` from the note body first and uses
`cp_text.origin()` for the log lookup so a single parse is
authoritative for both the 404 check and the origin field.
- Looks up configured log keys from `LOG_KEYS` — a `LazyLock`-built
map populated at first access. Any non-Ed25519 SPKI or `(name,
key_id)` collision across two configured keys for the same log
causes the worker to abort at startup rather than failing silently
at request time.
- Verifies the checkpoint note via `signed_note::Note::verify`. Both
`UnverifiedNote` (no trusted-key signature at all) and
`InvalidSignature` (a trusted-key signature that failed to verify,
making the note malformed per c2sp.org/signed-note) are surfaced as
403 Forbidden, matching sunlight and sigsum-go. Other variants are
surfaced as 400 Bad Request.
- Range-check `old_size <= checkpoint.size` → 400 on violation.
- Single-RPC atomic check-proof-and-update against the per-origin
`WitnessState` Durable Object. The DO compares `old_size` against
stored state (409 with `text/x.tlog.size` body on mismatch or
same-size-different-hash), verifies the Merkle consistency proof
against its stored `latest_hash` (422 on failure), and writes the
new `(size, hash)`. Atomicity comes from Durable Objects' input and
output gates, documented inline at the handler; verifying the proof
inside the same handler that reads and writes state keeps the whole
sequence inside one atomic transaction.
- On 200 from the DO, produce and return a `cosignature/v1` signature.
- `GET /metadata` returns the witness's identity (name + DER SPKI),
optional description, submission/monitoring URL prefixes, and the full
list of configured logs. The witness SPKI is computed once and cached in
a `OnceLock`. `description` and `monitoring_prefix` are
`Option<String>`; `description` is omitted from the JSON when absent
(not serialized as `null`); `monitoring_prefix` falls back to
`submission_prefix` when absent. Regression tests pin the serde shape.
- `WitnessState` Durable Object holds per-origin `(latest_size,
latest_hash)` state. The on-disk JSON layout and the internal RPC wire
shape (including the embedded consistency-proof hashes) are pinned by
regression tests so format drift is caught before it can orphan deployed
state. Hex encoding uses the `hex` crate throughout.
- Config model (per-log `origin` + `log_public_keys` SPKI list, parsed
via `serde_with::base64::Base64`; top-level `witness_name`,
`submission_prefix`, optional `monitoring_prefix`; `WITNESS_SIGNING_KEY`
secret) validated at build time against a JSON Schema.
- Produces `cosignature/v1` signatures using the existing
`tlog_tiles::CosignatureV1CheckpointSigner` (Ed25519-only).
Integration tests
- New `crates/integration_tests/tests/tlog_witness.rs` drives the witness
end-to-end against a live `wrangler dev` instance: `/metadata`, two
successful `add-checkpoint` submissions (with cosignature verification
against the metadata-advertised public key), then 409 / 404 / 403 / 422 /
400 error cases — including the trusted-(name,id)-with-bad-signature
case that verifies the 403-on-`InvalidSignature` behavior. The cases are
sequenced inside a single `#[tokio::test]` threaded through one
in-memory `ToyLog` so state evolves predictably and consistency proofs
always match what the witness expects.
- A unit test in `witness_worker` cross-checks that the SPKI in
`config.dev.json` matches the `LOG_SIGNING_KEY_PEM` embedded in the
integration tests; a mismatch is caught at `cargo test` time rather than
as a confusing 403 at integration-test time.
- New `integration-tlog-witness` CI job builds the worker, starts
`wrangler dev`, and runs the test suite. The job intentionally omits
`wasm-pack` (not needed for `witness_worker`, unlike the CT/MTC jobs).
This witness is deployable independently of any MTC work and can cosign
checkpoints from any `signed-note`-based log (static-ct, MTC, sigsum, etc.).
d273275 to
75f8f35
Compare
|
/bonk review again |
|
All pre-push checks pass. The code already addresses essentially all of the actionable bonk review items. Let me write a response to the PR author summarizing the status. Based on my analysis of the current PR state, here's my summary: Status of bonk review items on PR #218I've audited the current branch against the bonk review. Almost every actionable item is already addressed in the committed code: Correctness items — all addressed
Style / maintainability — all addressed
Tests — addressed
Remaining open items (all low priority or not actionable)
Pre-push checksAll four pass locally on this branch: No code changes are required from the bonk review. If you'd like me to harden any of the remaining low-priority nits (e.g. item #12's stronger parser-type encoding, or a more precise |
Introduces two new crates and an integration-test suite.
tlog_witness (spec-level, publishable to crates.io)
AddCheckpointRequest+ parsers/serializers for thePOST /add-checkpointbody format (
old <N>\n<proof-lines>\n\n<checkpoint-note>) in bothdirections (log → witness requests, witness → log responses).
MAX_CONSISTENCY_PROOF_LINES = 63, matching the spec.CONTENT_TYPE_TLOG_SIZE = "text/x.tlog.size"for 409 response bodies.TlogWitnessErrorwith variants for malformed request / response / note.malformed-input rejection.
witness_worker (Cloudflare Worker deployment)
POST /add-checkpointimplementing all the MUSTs from the spec:signed_note::Note::verify. BothUnverifiedNote(no trusted-key signature at all) andInvalidSignature(a trusted-key signature that failed to verify,making the note malformed per c2sp.org/signed-note) are surfaced as
403 Forbidden, matching sunlight and sigsum-go. The specific
NoteErrorvariant is logged for debugging; other variants aresurfaced as 400 Bad Request.
old_size <= checkpoint.size→ 400 on violation.WitnessStateDurable Object. The DO comparesold_sizeagainststored state (409 with
text/x.tlog.sizebody on mismatch orsame-size-different-hash), verifies the Merkle consistency proof
against its stored
latest_hash(422 on failure), and writes thenew
(size, hash). Atomicity comes from Durable Objects' input andoutput gates, documented inline at the handler; verifying the proof
inside the same handler that reads and writes state keeps the whole
sequence inside one atomic transaction and one DO round trip.
cosignature/v1signature.GET /metadatareturns the witness's identity (name + DER SPKI),description, submission/monitoring URL prefixes, and the full list of
configured logs (origin + trusted public keys).
descriptionfieldsare
Option<&str>with#[serde(skip_serializing_if = "Option::is_none")]so they are omitted (not
null) when unset; regression tests pin this.WitnessStateDurable Object (one per log origin, keyed by origin) holdsper-origin
(latest_size, latest_hash)state. The DO's on-disk JSONlayout and the internal RPC wire shape (including the embedded
consistency-proof hashes) are pinned by regression tests so format drift
is caught before it can orphan deployed state.
origin+log_public_keysSPKI list +top-level
witness_name,submission_prefix,monitoring_prefix;WITNESS_SIGNING_KEYsecret) validated at build time against a JSONSchema.
cosignature/v1signatures using the existingtlog_tiles::CosignatureV1CheckpointSigner(Ed25519-only, matching thewitness spec's algorithm requirement).
Integration tests
crates/integration_tests/tests/tlog_witness.rsdrives the witnessend-to-end against a live
wrangler devinstance:/metadata, twosuccessful
add-checkpointsubmissions (with cosignature verificationagainst the metadata-advertised public key), then 409 / 404 / 403 / 422 /
400 error cases — including the trusted-(name,id)-with-bad-signature
case that verifies the 403-on-InvalidSignature behavior. The cases are
sequenced inside a single
#[tokio::test]threaded through onein-memory
ToyLogso the witness's persisted state evolvespredictably and consistency proofs produced by the test always match
what the witness expects.
integration-tlog-witnessCI job in.github/workflows/integration.ymlbuilds the worker, startswrangler dev, waits for/metadatato respond, then runs the testsuite.
crates/witness_worker/config.dev.jsonnow ships an Ed25519 SPKI whosematching seed is embedded in the test source as
LOG_SIGNING_KEY_PEM.This keypair is dev-only, clearly documented as such, and never used in
production.
This witness is deployable independently of any MTC work and can cosign
checkpoints from any
signed-note-based log (static-ct, MTC, sigsum, etc.).