fix: replay rebuilds credential_registry + sled-first balance reads + DID URL-decode#2709
Open
umwelt wants to merge 5 commits into
Open
fix: replay rebuilds credential_registry + sled-first balance reads + DID URL-decode#2709umwelt wants to merge 5 commits into
umwelt wants to merge 5 commits into
Conversation
…ft_registry, entity_registry `Blockchain::load_from_store`'s fallback replay loop (used whenever `hot_state_projection_is_current` returns false at startup) was missing three tx-type processors that `replay_block_state` in replay.rs already calls: - process_credential_transactions (populates credential_registry + did_to_username) - process_nft_transactions (populates nft_registry) - process_entity_registry_transactions (populates entity registry) Cluster effect: every node in the cluster (5 validators + 2 gateways) booted with credentials=0 and nft/entity registries empty, because the fast-hydrate path never reported success on any node and the fallback silently skipped these tables. Symptom: OPAQUE login returned 401 "Invalid username or password" for every registered user — the credential_registry HashMap was empty so every lookup missed. Profile queries also returned no username because did_to_username was empty (it is populated as a side-effect of process_credential_transactions). After this fix, all 7 nodes reload to credentials=798 at the next restart, restoring login + profile for all registered users. This aligns the load_from_store replay loop with replay_block_state in replay.rs (line 41) which already makes these calls in the right order.
…fy handlers
handle_get_identity_by_did and handle_verify_identity_by_did extracted
the DID from the URL path and tested `did_str.starts_with("did:zhtp:")`
with literal colons. Mobile clients send the DID percent-encoded
(`did%3Azhtp%3A<hex>`), so the prefix check always missed and the
handler prepended `did:zhtp:` to the already-encoded form. Lookup key
became `did:zhtp:did%3Azhtp%3A<hex>` which never matches anything in
the chain map.
Result: profile fetches returned no display_name and no username for
every DID, because `bc.did_to_username.get(&malformed_key)` and
`bc.identity_display_name(&malformed_key)` both returned None. The
mobile UI rendered the raw DID hex where the username should have been.
Fix: decode the captured path segment via `urlencoding::decode` before
the `starts_with` check, then proceed with the canonical
`did:zhtp:<hex>` form. Both handlers now share the same decode pattern.
The SOV branch of handle_get_balances_for_address was reading `token.balance_of(&wallet_key)` from the in-memory TokenContract.balances HashMap. BlockExecutor writes SOV debits/credits to the sled `token_balances` tree; the in-memory map is only kept in sync by a post-apply sync step that can miss entries (e.g. when the initial allocation was inserted under a non-synthetic PublicKey shape). Practical symptom: after a domain-registration fee_payment_tx committed and debited the user's SOV in sled, the mobile wallet still displayed the stale pre-debit balance because /api/v1/token/balances returned the in-memory value. The singular endpoint (handle_get_balance) already does sled-first for SOV with the same rationale — see existing comment at L530. This commit brings the plural endpoint into alignment: read sled when a store is attached, fall back to in-memory only when no store is present.
handle_list_wallets has a fallback branch (used when the in-memory
identity_manager has no entry for the requested DID) that surfaces
wallets from the on-chain wallet_registry. The SOV balance lookup in
this branch was reading
`token.balance_of(&PublicKey { dilithium_pk: [0; 2592], kyber_pk: [0; 1568], key_id })`
against the in-memory TokenContract.balances HashMap — same divergence
as the singular handle_get_balance and the plural
handle_get_balances_for_address: BlockExecutor writes to sled, the
in-memory map can lag.
Symptom: identities that aren't currently in the in-memory
identity_manager (typical for gateways that have been restarted, since
device→canonical-DID bindings live only in-memory) saw zero SOV in the
mobile wallet view even when sled had the correct post-debit balance.
The main path (line ~393) already calls
`blockchain.token_balance(&sov_token_id, &key_id)` (sled-first). This
commit brings the fallback into alignment.
…round apply_token_transfer
Adds INFO-level logs in BlockExecutor:
- Entry into the per-tx loop in apply_block_inner with tx_hash,
tx_type, index, height.
- BEFORE apply_token_transfer in the TokenTransfer arm with token,
from, to, amount, nonce, fee_bps.
- AFTER apply_token_transfer with the same fields plus fee_collected.
Useful for tracing the fee-debit path end-to-end on testnet:
mobile signs fee_payment_tx → gateway extracts + submits to mempool
→ broadcast → validator receives → block proposal → BFT commit →
executor.apply_block → apply_token_transfer.
The previous incarnation of this investigation found that registrations
silently failed to debit the user's SOV because the in-memory balance
read was stale; these logs make the apply-time debit observable so the
state of the executor can be confirmed without restarting with
RUST_LOG=trace.
No behaviour change. Log volume is one line per TokenTransfer per block
plus two for the apply call.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes several restart/replay and API read-path issues that caused credential registries to rebuild empty (breaking OPAQUE login), DID lookups to fail for percent-encoded mobile requests, and SOV balances to appear stale/zero due to reading from in-memory maps instead of sled.
Changes:
- Rebuilds credential/NFT/entity registries during
load_from_storefallback replay to match the canonical replay path. - URL-decodes DID path segments in identity
get/verifyendpoints before parsing/lookup. - Reads SOV balances sled-first in token balances and wallet listing fallback paths; adds fee-debug instrumentation in the executor.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
lib-blockchain/src/blockchain/init.rs |
Adds missing replay processors so credential/NFT/entity registries rebuild correctly after fallback replay. |
lib-blockchain/src/execution/executor.rs |
Adds [fee-debug] logging around per-tx loop and token-transfer fee application. |
zhtp/src/api/handlers/identity/mod.rs |
URL-decodes DID path segment before did:zhtp: handling to fix mobile percent-encoded requests. |
zhtp/src/api/handlers/token/mod.rs |
Makes /token/balances/{address} read SOV balance from sled when store is present. |
zhtp/src/api/handlers/wallet/mod.rs |
Makes wallet-list fallback path read SOV balances via sled-first facade. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+761
to
+767
| // Replay entity-registry transactions so the registry index is populated. | ||
| if let Err(e) = blockchain.process_entity_registry_transactions(&block) { | ||
| warn!( | ||
| "⚠️ Failed to replay entity registry tx at height {}: {}", | ||
| height, e | ||
| ); | ||
| } |
Comment on lines
+772
to
+779
| let _tx_hash_hex = hex::encode(tx.hash().as_bytes()); | ||
| tracing::info!( | ||
| tx_hash = %&_tx_hash_hex[..16], | ||
| tx_type = ?tx.transaction_type, | ||
| index = index, | ||
| height = block_height, | ||
| "[fee-debug] executor entering per-tx loop" | ||
| ); |
Comment on lines
+3130
to
+3139
| tracing::info!( | ||
| tx_hash = %hex::encode(&tx.hash().as_bytes()[..8]), | ||
| token = %hex::encode(&token.0[..8]), | ||
| from = %hex::encode(&from.0[..8]), | ||
| to = %hex::encode(&to.0[..8]), | ||
| amount = amount, | ||
| nonce = transfer_data.nonce, | ||
| fee_bps = fee_bps, | ||
| "[fee-debug] BEFORE apply_token_transfer" | ||
| ); |
Comment on lines
+3149
to
+3156
| tracing::info!( | ||
| tx_hash = %hex::encode(&tx.hash().as_bytes()[..8]), | ||
| from = %hex::encode(&from.0[..8]), | ||
| to = %hex::encode(&to.0[..8]), | ||
| amount = amount, | ||
| fee_collected = _fee_collected, | ||
| "[fee-debug] AFTER apply_token_transfer (debit succeeded)" | ||
| ); |
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
This branch fixes a chain of bugs that combined to make OPAQUE login impossible for every registered user across the cluster, plus make new SOV balances and welcome state invisible to mobile clients.
What was happening
Blockchain::load_from_storefalls back to a replay loop ininit.rs:613-773when fast-hydrate fails. That loop was missing three tx-type processors (process_credential_transactions,process_nft_transactions,process_entity_registry_transactions). The fast-hydrate path was failing on every node in the cluster, so every node booted withcredentials=0despite sled containing 798 credential projection records.bc.credential_registry.get(&username)— empty map → 401 "Invalid username or password" for every user (baseone, joshb413, ... all 798).did_to_usernameis populated as a side-effect of the same missing call.get,verify) checkeddid_str.starts_with("did:zhtp:")against the raw URL path segment. Mobile sendsdid%3Azhtp%3A<hex>(percent-encoded), so the check missed and the handler prependeddid:zhtp:to the encoded form — lookup keydid:zhtp:did%3Azhtp%3A<hex>that never matches./api/v1/token/balances/<addr>(plural) and/api/v1/wallet/list/<did>(fallback path) read SOV balances from the in-memoryTokenContract.balancesHashMap. BlockExecutor writes SOV debits/credits to sled; the in-memory sync can lag or miss entries.What this PR does
5 surgical commits, all independent:
fix(lib-blockchain/init)— adds the three missingprocess_*_transactionscalls to the replay-loop fallback so credential_registry / nft_registry / entity_registry rebuild on every restart. Aligns the load_from_store replay path withreplay_block_statein replay.rs which already makes these calls.fix(api/identity)—urlencoding::decodethe DID path segment inhandle_get_identity_by_didandhandle_verify_identity_by_didbefore thestarts_with("did:zhtp:")check.fix(api/token)—handle_get_balances_for_addressSOV branch reads sled-first (mirrorshandle_get_balancesingular which already does this).fix(api/wallet)—handle_list_walletsfallback path reads sled-first for SOV (mirrors main path which already does this at line ~393).chore(lib-blockchain/executor)— adds[fee-debug]INFO-level instrumentation aroundapply_token_transferinapply_block_inner. Used to diagnose the fee-debit path on testnet without restarting underRUST_LOG=trace.Verification
Deployed and verified across all 7 cluster nodes (g1-g5, gw1, gw2). After deploy:
Cluster remained at >4/5 quorum throughout the staged roll. Blocks continued committing during each individual restart.
Known follow-up
hot_state_projection_is_currentreturns false on every node, so every restart goes through the (now correct) full replay instead of the much faster projection-hydrate. Not user-blocking — replay is correct, just slower. Deferred to a separate PR.Test plan
/api/v1/identity/get/did:zhtp:<hex>returns the boundusernamefield (not just raw DID)[fee-debug]log lines appear in the executor logs for TokenTransfer txs on testnet validatorscredentials=798(or current count) inBlockchain load completeafter the next restart