Skip to content

fix: replay rebuilds credential_registry + sled-first balance reads + DID URL-decode#2709

Open
umwelt wants to merge 5 commits into
developmentfrom
fix/credential-replay-and-sled-balance-reads
Open

fix: replay rebuilds credential_registry + sled-first balance reads + DID URL-decode#2709
umwelt wants to merge 5 commits into
developmentfrom
fix/credential-replay-and-sled-balance-reads

Conversation

@umwelt

@umwelt umwelt commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

This branch fixes a chain of bugs that combined to make OPAQUE login impossible for every registered user across the cluster, plus make new SOV balances and welcome state invisible to mobile clients.

What was happening

  • Blockchain::load_from_store falls back to a replay loop in init.rs:613-773 when fast-hydrate fails. That loop was missing three tx-type processors (process_credential_transactions, process_nft_transactions, process_entity_registry_transactions). The fast-hydrate path was failing on every node in the cluster, so every node booted with credentials=0 despite sled containing 798 credential projection records.
  • OPAQUE login handler does bc.credential_registry.get(&username) — empty map → 401 "Invalid username or password" for every user (baseone, joshb413, ... all 798).
  • Profile lookups also returned no username because did_to_username is populated as a side-effect of the same missing call.
  • Identity handlers (get, verify) checked did_str.starts_with("did:zhtp:") against the raw URL path segment. Mobile sends did%3Azhtp%3A<hex> (percent-encoded), so the check missed and the handler prepended did:zhtp: to the encoded form — lookup key did:zhtp:did%3Azhtp%3A<hex> that never matches.
  • /api/v1/token/balances/<addr> (plural) and /api/v1/wallet/list/<did> (fallback path) read SOV balances from the in-memory TokenContract.balances HashMap. BlockExecutor writes SOV debits/credits to sled; the in-memory sync can lag or miss entries.

What this PR does

5 surgical commits, all independent:

  1. fix(lib-blockchain/init) — adds the three missing process_*_transactions calls to the replay-loop fallback so credential_registry / nft_registry / entity_registry rebuild on every restart. Aligns the load_from_store replay path with replay_block_state in replay.rs which already makes these calls.
  2. fix(api/identity)urlencoding::decode the DID path segment in handle_get_identity_by_did and handle_verify_identity_by_did before the starts_with("did:zhtp:") check.
  3. fix(api/token)handle_get_balances_for_address SOV branch reads sled-first (mirrors handle_get_balance singular which already does this).
  4. fix(api/wallet)handle_list_wallets fallback path reads sled-first for SOV (mirrors main path which already does this at line ~393).
  5. chore(lib-blockchain/executor) — adds [fee-debug] INFO-level instrumentation around apply_token_transfer in apply_block_inner. Used to diagnose the fee-debit path on testnet without restarting under RUST_LOG=trace.

Verification

Deployed and verified across all 7 cluster nodes (g1-g5, gw1, gw2). After deploy:

node credentials before credentials after
g1 0 798
g2 0 798
g3 0 798
g4 0 798
g5 0 798
gw1 0 798
gw2 0 798

Cluster remained at >4/5 quorum throughout the staged roll. Blocks continued committing during each individual restart.

Known follow-up

hot_state_projection_is_current returns false on every node, so every restart goes through the (now correct) full replay instead of the much faster projection-hydrate. Not user-blocking — replay is correct, just slower. Deferred to a separate PR.

Test plan

  • OPAQUE login as a known registered user (e.g. baseone) returns success against gw1 or gw2
  • /api/v1/identity/get/did:zhtp:<hex> returns the bound username field (not just raw DID)
  • A new domain-registration fee_payment_tx debits the user's wallet and the mobile balance view reflects the post-debit value
  • [fee-debug] log lines appear in the executor logs for TokenTransfer txs on testnet validators
  • All 7 nodes report credentials=798 (or current count) in Blockchain load complete after the next restart

umwelt added 5 commits June 13, 2026 11:54
…ft_registry, entity_registry

`Blockchain::load_from_store`'s fallback replay loop (used whenever
`hot_state_projection_is_current` returns false at startup) was missing
three tx-type processors that `replay_block_state` in replay.rs already
calls:

  - process_credential_transactions  (populates credential_registry + did_to_username)
  - process_nft_transactions         (populates nft_registry)
  - process_entity_registry_transactions (populates entity registry)

Cluster effect: every node in the cluster (5 validators + 2 gateways)
booted with credentials=0 and nft/entity registries empty, because the
fast-hydrate path never reported success on any node and the fallback
silently skipped these tables. Symptom: OPAQUE login returned 401
"Invalid username or password" for every registered user — the
credential_registry HashMap was empty so every lookup missed. Profile
queries also returned no username because did_to_username was empty
(it is populated as a side-effect of process_credential_transactions).

After this fix, all 7 nodes reload to credentials=798 at the next
restart, restoring login + profile for all registered users.

This aligns the load_from_store replay loop with replay_block_state in
replay.rs (line 41) which already makes these calls in the right order.
…fy handlers

handle_get_identity_by_did and handle_verify_identity_by_did extracted
the DID from the URL path and tested `did_str.starts_with("did:zhtp:")`
with literal colons. Mobile clients send the DID percent-encoded
(`did%3Azhtp%3A<hex>`), so the prefix check always missed and the
handler prepended `did:zhtp:` to the already-encoded form. Lookup key
became `did:zhtp:did%3Azhtp%3A<hex>` which never matches anything in
the chain map.

Result: profile fetches returned no display_name and no username for
every DID, because `bc.did_to_username.get(&malformed_key)` and
`bc.identity_display_name(&malformed_key)` both returned None. The
mobile UI rendered the raw DID hex where the username should have been.

Fix: decode the captured path segment via `urlencoding::decode` before
the `starts_with` check, then proceed with the canonical
`did:zhtp:<hex>` form. Both handlers now share the same decode pattern.
The SOV branch of handle_get_balances_for_address was reading
`token.balance_of(&wallet_key)` from the in-memory TokenContract.balances
HashMap. BlockExecutor writes SOV debits/credits to the sled
`token_balances` tree; the in-memory map is only kept in sync by a
post-apply sync step that can miss entries (e.g. when the initial
allocation was inserted under a non-synthetic PublicKey shape).

Practical symptom: after a domain-registration fee_payment_tx
committed and debited the user's SOV in sled, the mobile wallet still
displayed the stale pre-debit balance because /api/v1/token/balances
returned the in-memory value.

The singular endpoint (handle_get_balance) already does sled-first for
SOV with the same rationale — see existing comment at L530. This commit
brings the plural endpoint into alignment: read sled when a store is
attached, fall back to in-memory only when no store is present.
handle_list_wallets has a fallback branch (used when the in-memory
identity_manager has no entry for the requested DID) that surfaces
wallets from the on-chain wallet_registry. The SOV balance lookup in
this branch was reading
`token.balance_of(&PublicKey { dilithium_pk: [0; 2592], kyber_pk: [0; 1568], key_id })`
against the in-memory TokenContract.balances HashMap — same divergence
as the singular handle_get_balance and the plural
handle_get_balances_for_address: BlockExecutor writes to sled, the
in-memory map can lag.

Symptom: identities that aren't currently in the in-memory
identity_manager (typical for gateways that have been restarted, since
device→canonical-DID bindings live only in-memory) saw zero SOV in the
mobile wallet view even when sled had the correct post-debit balance.

The main path (line ~393) already calls
`blockchain.token_balance(&sov_token_id, &key_id)` (sled-first). This
commit brings the fallback into alignment.
…round apply_token_transfer

Adds INFO-level logs in BlockExecutor:
  - Entry into the per-tx loop in apply_block_inner with tx_hash,
    tx_type, index, height.
  - BEFORE apply_token_transfer in the TokenTransfer arm with token,
    from, to, amount, nonce, fee_bps.
  - AFTER apply_token_transfer with the same fields plus fee_collected.

Useful for tracing the fee-debit path end-to-end on testnet:
  mobile signs fee_payment_tx  →  gateway extracts + submits to mempool
  → broadcast → validator receives → block proposal → BFT commit →
  executor.apply_block → apply_token_transfer.

The previous incarnation of this investigation found that registrations
silently failed to debit the user's SOV because the in-memory balance
read was stale; these logs make the apply-time debit observable so the
state of the executor can be confirmed without restarting with
RUST_LOG=trace.

No behaviour change. Log volume is one line per TokenTransfer per block
plus two for the apply call.
Copilot AI review requested due to automatic review settings June 13, 2026 10:55
@umwelt umwelt requested a review from a team as a code owner June 13, 2026 10:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes several restart/replay and API read-path issues that caused credential registries to rebuild empty (breaking OPAQUE login), DID lookups to fail for percent-encoded mobile requests, and SOV balances to appear stale/zero due to reading from in-memory maps instead of sled.

Changes:

  • Rebuilds credential/NFT/entity registries during load_from_store fallback replay to match the canonical replay path.
  • URL-decodes DID path segments in identity get/verify endpoints before parsing/lookup.
  • Reads SOV balances sled-first in token balances and wallet listing fallback paths; adds fee-debug instrumentation in the executor.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
lib-blockchain/src/blockchain/init.rs Adds missing replay processors so credential/NFT/entity registries rebuild correctly after fallback replay.
lib-blockchain/src/execution/executor.rs Adds [fee-debug] logging around per-tx loop and token-transfer fee application.
zhtp/src/api/handlers/identity/mod.rs URL-decodes DID path segment before did:zhtp: handling to fix mobile percent-encoded requests.
zhtp/src/api/handlers/token/mod.rs Makes /token/balances/{address} read SOV balance from sled when store is present.
zhtp/src/api/handlers/wallet/mod.rs Makes wallet-list fallback path read SOV balances via sled-first facade.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +761 to +767
// Replay entity-registry transactions so the registry index is populated.
if let Err(e) = blockchain.process_entity_registry_transactions(&block) {
warn!(
"⚠️ Failed to replay entity registry tx at height {}: {}",
height, e
);
}
Comment on lines +772 to +779
let _tx_hash_hex = hex::encode(tx.hash().as_bytes());
tracing::info!(
tx_hash = %&_tx_hash_hex[..16],
tx_type = ?tx.transaction_type,
index = index,
height = block_height,
"[fee-debug] executor entering per-tx loop"
);
Comment on lines +3130 to +3139
tracing::info!(
tx_hash = %hex::encode(&tx.hash().as_bytes()[..8]),
token = %hex::encode(&token.0[..8]),
from = %hex::encode(&from.0[..8]),
to = %hex::encode(&to.0[..8]),
amount = amount,
nonce = transfer_data.nonce,
fee_bps = fee_bps,
"[fee-debug] BEFORE apply_token_transfer"
);
Comment on lines +3149 to +3156
tracing::info!(
tx_hash = %hex::encode(&tx.hash().as_bytes()[..8]),
from = %hex::encode(&from.0[..8]),
to = %hex::encode(&to.0[..8]),
amount = amount,
fee_collected = _fee_collected,
"[fee-debug] AFTER apply_token_transfer (debit succeeded)"
);
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants