audit: implement storage truth enforcement and self-healing lifecycle#121
audit: implement storage truth enforcement and self-healing lifecycle#121j-rafique wants to merge 1 commit intoLEP-6-heal-op-lifecyclefrom
Conversation
Reviewed the storage truth enforcement and self-healing lifecycle implementation. Found 2 behavioral issues and 1 dead code item.
Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
| if isPass { | ||
| state.CleanPassCount++ | ||
| state.LastCleanPassEpoch = epochID | ||
| } | ||
|
|
||
| if isFailure { |
There was a problem hiding this comment.
CleanPassCount is incremented on every PASS but never reset when a failure occurs. The recovery gate in shouldRecoverFromStorageTruthPostponement checks state.CleanPassCount >= requiredPasses (default 3). Because the counter accumulates over the node's entire lifetime, a node that earned 3+ clean passes before any violations would immediately satisfy the recovery gate once decay brings its score below the watch threshold -- even right after being postponed for severe storage-truth faults. The spec's intent ("recovery requires N clean passes") likely means N clean passes since the last failure or since postponement, not N total historical passes. Consider resetting CleanPassCount to 0 when a failure is recorded.
| if isPass { | |
| state.CleanPassCount++ | |
| state.LastCleanPassEpoch = epochID | |
| } | |
| if isFailure { | |
| if isPass { | |
| state.CleanPassCount++ | |
| state.LastCleanPassEpoch = epochID | |
| } | |
| if isFailure { | |
| // Reset clean pass streak -- recovery requires consecutive clean passes after failures. | |
| state.CleanPassCount = 0 |
Fix it with Roo Code or mention @roomote and request a fix.
| switch result.ResultClass { | ||
| case types.StorageProofResultClass_STORAGE_PROOF_RESULT_CLASS_HASH_MISMATCH, | ||
| types.StorageProofResultClass_STORAGE_PROOF_RESULT_CLASS_RECHECK_CONFIRMED_FAIL: | ||
| state.ClassACountWindow++ | ||
| state.LastClassAEpoch = epochID | ||
| case types.StorageProofResultClass_STORAGE_PROOF_RESULT_CLASS_TIMEOUT_OR_NO_RESPONSE: | ||
| state.ClassBCountWindow++ | ||
| state.LastClassBEpoch = epochID | ||
| } | ||
| } |
There was a problem hiding this comment.
ClassACountWindow only increments for HASH_MISMATCH and RECHECK_CONFIRMED_FAIL, but the Class A definition in storageTruthIsClassAFault (used by enforcement predicates) also includes any failure with ARTIFACT_CLASS_INDEX regardless of result class. When fact indexes are empty, the enforcement fallback path in storageTruthPostponePredicatesMet uses state.ClassACountWindow to evaluate postpone predicates. This means INDEX artifact failures (e.g. TIMEOUT_OR_NO_RESPONSE on an INDEX artifact) would not be counted as Class A in the fallback, potentially preventing a justified postponement.
| switch result.ResultClass { | |
| case types.StorageProofResultClass_STORAGE_PROOF_RESULT_CLASS_HASH_MISMATCH, | |
| types.StorageProofResultClass_STORAGE_PROOF_RESULT_CLASS_RECHECK_CONFIRMED_FAIL: | |
| state.ClassACountWindow++ | |
| state.LastClassAEpoch = epochID | |
| case types.StorageProofResultClass_STORAGE_PROOF_RESULT_CLASS_TIMEOUT_OR_NO_RESPONSE: | |
| state.ClassBCountWindow++ | |
| state.LastClassBEpoch = epochID | |
| } | |
| } | |
| case types.StorageProofResultClass_STORAGE_PROOF_RESULT_CLASS_HASH_MISMATCH, | |
| types.StorageProofResultClass_STORAGE_PROOF_RESULT_CLASS_RECHECK_CONFIRMED_FAIL: | |
| state.ClassACountWindow++ | |
| state.LastClassAEpoch = epochID | |
| case types.StorageProofResultClass_STORAGE_PROOF_RESULT_CLASS_TIMEOUT_OR_NO_RESPONSE: | |
| state.ClassBCountWindow++ | |
| state.LastClassBEpoch = epochID | |
| default: | |
| // Other failure result classes on INDEX artifacts are also Class A. | |
| } | |
| if result.ArtifactClass == types.StorageProofArtifactClass_STORAGE_PROOF_ARTIFACT_CLASS_INDEX { | |
| if result.ResultClass != types.StorageProofResultClass_STORAGE_PROOF_RESULT_CLASS_HASH_MISMATCH && | |
| result.ResultClass != types.StorageProofResultClass_STORAGE_PROOF_RESULT_CLASS_RECHECK_CONFIRMED_FAIL { | |
| state.ClassACountWindow++ | |
| state.LastClassAEpoch = epochID | |
| } | |
| } |
Fix it with Roo Code or mention @roomote and request a fix.
| return result | ||
| } | ||
|
|
||
| func mulInt64ByUint64Saturated(v int64, m uint64) int64 { |
There was a problem hiding this comment.
mulInt64ByUint64Saturated is defined but has no callers in the codebase. It appears to be leftover from a previous iteration of the scoring logic. Consider removing it to reduce dead code.
Fix it with Roo Code or mention @roomote and request a fix.
Production-gate review by Zee — 15 findingsMethodology: full file-by-file read of every non-generated changed file in this PR's diff (pr-121 vs its base branch), cross-checked against:
Status legend: each finding's status is computed at the PR #122 stack-tip (consensus-gap-fixes commit Severity breakdown: CRITICAL=1, HIGH=7, MEDIUM=7 121-F16 — DETERMINISM: float64 arithmetic in EndBlock consensus path drives state mutation
121-F1 — Recheck PASS double-counts original-reporter penalty: +12 (contradiction) + +25 (explicit) = +37 vs spec +25
121-F2 — Recheck +25 / −3 applied to
|
There was a problem hiding this comment.
Pull request overview
Implements LEP-6 storage-truth “truth enforcement” and self-healing lifecycle by extending audit module params/state, adding new KV indexes for transcript/failure evidence, integrating recheck + divergence scoring, and tightening simulation/system/integration coverage around the new behaviors.
Changes:
- Add new storage-truth params (thresholds/windows/deadlines) and update validation/defaulting semantics (including treating UNSPECIFIED as a no-op mode).
- Implement transcript/failure indexing, reporter divergence scoring at epoch end, heal-op scheduling priority rules, and recheck evidence processing with replay protection.
- Expand unit/integration/system tests and simulation operation registry to cover the new LEP-6 workflows and edge cases.
Reviewed changes
Copilot reviewed 47 out of 47 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| x/audit/v1/types/params_test.go | Updates defaulting expectations for enforcement mode. |
| x/audit/v1/types/params.go | Adds LEP-6 params/keys/defaults and expands validation logic. |
| x/audit/v1/types/keys.go | Adds KV key prefixes/builders for postponement, recheck dedup, and fact indexes. |
| x/audit/v1/types/events.go | Adds new storage-truth event types and attributes. |
| x/audit/v1/simulation/storage_truth.go | Adds no-op simulation handlers for new storage-truth msgs. |
| x/audit/v1/module/simulation_test.go | Updates weighted-ops tests to include storage-truth ops. |
| x/audit/v1/module/simulation.go | Registers weighted operations for new storage-truth msgs. |
| x/audit/v1/keeper/storage_truth_scoring_internal_test.go | Updates scoring/decay/trust-band unit tests for new model. |
| x/audit/v1/keeper/storage_truth_scoring.go | Refactors scoring to be result-aware; adds new bookkeeping, decay, and history tracking. |
| x/audit/v1/keeper/storage_truth_recheck_state.go | Adds KV dedup state for recheck evidence submissions. |
| x/audit/v1/keeper/storage_truth_postponement_state.go | Adds KV state for storage-truth postponement tracking. |
| x/audit/v1/keeper/storage_truth_heal_ops_test.go | Adjusts heal-op scheduling tests for new eligibility predicate fields. |
| x/audit/v1/keeper/storage_truth_heal_ops.go | Adds enforcement gating, eligibility predicate, priority sort, and configurable deadlines. |
| x/audit/v1/keeper/storage_truth_fact_indexes.go | Introduces transcript/failure/reporter-result KV indexes and helpers. |
| x/audit/v1/keeper/storage_truth_divergence_test.go | Adds tests for reporter divergence penalties and volume gating. |
| x/audit/v1/keeper/storage_truth_divergence.go | Implements divergence scoring based on rolling-window outliers vs median. |
| x/audit/v1/keeper/query_storage_truth_test.go | Updates query expectations for new scoring deltas. |
| x/audit/v1/keeper/query_assigned_targets.go | Applies eligibility filtering for challengers before target assignment. |
| x/audit/v1/keeper/msg_submit_epoch_report_test.go | Adds FULL-mode compound proof coverage requirement test. |
| x/audit/v1/keeper/msg_submit_epoch_report_storage_truth_scores_test.go | Updates score expectations for new deltas/decay/trust multiplier logic. |
| x/audit/v1/keeper/msg_submit_epoch_report_storage_proofs.go | Adds compound coverage validation for FULL enforcement mode. |
| x/audit/v1/keeper/msg_submit_epoch_report.go | Filters challengers by eligibility and indexes transcripts before scoring. |
| x/audit/v1/keeper/msg_storage_truth_test.go | Adds extensive unit tests for recheck + heal verification quorum + edge cases. |
| x/audit/v1/keeper/msg_storage_truth.go | Implements recheck evidence flow; adjusts heal claim/verification and finalize behavior. |
| x/audit/v1/keeper/enforcement_predicates_test.go | Adds tests for enforcement matrix predicates and recovery gating. |
| x/audit/v1/keeper/enforcement.go | Adds storage-truth banding, enforcement, postponement tracking, and recovery path. |
| x/audit/v1/keeper/audit_peer_assignment_test.go | Adds tests for one-third coverage assignment vs legacy mode. |
| x/audit/v1/keeper/audit_peer_assignment.go | Adds storage-truth assignment algorithm + challenger eligibility filtering. |
| x/audit/v1/keeper/abci.go | Runs reporter divergence scoring at end-block before heal-op processing. |
| tests/systemtests/lep5_action_test.go | Extends test signature expiration horizon for stability. |
| tests/systemtests/audit_test_helpers_test.go | Adds epoch seed derivation, enforcement-mode mutator, and transcript seeding helpers. |
| tests/systemtests/audit_submit_and_query_test.go | Sets enforcement mode to UNSPECIFIED for legacy assignment expectations. |
| tests/systemtests/audit_storage_truth_edge_cases_test.go | Adds end-to-end edge-case tests for enforcement/recovery/replay/heal failure paths. |
| tests/systemtests/audit_recovery_enforcement_test.go | Increases epoch length + adds timeouts; forces UNSPECIFIED mode for determinism. |
| tests/systemtests/audit_peer_ports_enforcement_test.go | Uses epoch-derived seed instead of header hash for deterministic assignment. |
| tests/systemtests/audit_peer_observation_completeness_test.go | Ensures the actual prober is used when asserting completeness failure. |
| tests/systemtests/audit_host_requirements_enforcement_test.go | Forces UNSPECIFIED mode for legacy host-requirement enforcement tests. |
| tests/systemtests/audit_host_requirements_bypass_test.go | Forces UNSPECIFIED mode for legacy host-requirement bypass tests. |
| tests/system/audit/msg_storage_truth_test.go | Adds msg-server “system” tests using a live app wiring (no mocks). |
| tests/integration/audit/keeper_test.go | Adds keeper integration suite using real codec + IAVL KV store. |
| proto/lumera/audit/v1/params.proto | Extends Params protobuf with new LEP-6 fields. |
| proto/lumera/audit/v1/audit.proto | Extends protobuf state for node/reporter/ticket tracking and adds docs/comments. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Replay protection: one recheck per (epoch, ticket, creator). | ||
| if m.HasRecheckEvidence(sdkCtx, req.EpochId, req.TicketId, req.Creator) { | ||
| return nil, errorsmod.Wrapf(types.ErrInvalidRecheckEvidence, "recheck evidence already submitted for epoch %d ticket %q by %q", req.EpochId, req.TicketId, req.Creator) | ||
| } | ||
| m.SetRecheckEvidence(sdkCtx, req.EpochId, req.TicketId, req.Creator) | ||
|
|
||
| // Derive current epoch for scoring context. | ||
| params := m.GetParams(sdkCtx).WithDefaults() | ||
| currentEpoch, err := deriveEpochAtHeight(sdkCtx.BlockHeight(), params) | ||
| if err != nil { | ||
| return nil, err | ||
| } |
No description provided.