audit: implement LEP-6 heal-op lifecycle and recheck handlers by j-rafique · Pull Request #120 · LumeraProtocol/lumera

j-rafique · 2026-04-20T11:24:38Z

Summary

This PR implements LEP-6 PR4 (heal-op lifecycle) in lumera: on-chain heal operation transitions, verifier-driven finalization, and epoch-end expiration/scheduling for self-heal ops.
The rollout remains non-breaking and keeps deferred LEP-6 enforcement behavior out of scope.

What’s Implemented

1) Heal-op tx lifecycle

Added/implemented keeper logic for:

MsgClaimHealComplete
MsgSubmitHealVerification
MsgSubmitStorageRecheckEvidence (validated + wired, intentionally gated as not active in this milestone)

Behavior:

Strict request validation and signer/role authorization
Status transitions:
- SCHEDULED / IN_PROGRESS -> HEALER_REPORTED
- HEALER_REPORTED -> VERIFIED (all required positives) or FAILED (any negative)
Single-node finalize path (no verifiers) with immediate completion
Ticket linkage updates on finalize (active_heal_op_id cleared; verified path updates probation/last-heal fields)
Event emission for healer reported / verified / failed

2) Epoch-end heal-op processing

Implemented epoch-end lifecycle execution:

Expire overdue non-final heal ops (deadline_epoch_id <= current_epoch)
Clear stale active heal-op pointers from ticket deterioration state
Schedule new heal ops from deterioration candidates by priority:
- threshold/probation filtering
- max-per-epoch cap
- deterministic participant selection (healer + verifiers)
- deterministic tie-breaking and ID progression

3) State model integration

Integrated with existing LEP-6 storage-truth state surfaces:

HealOp + status/ticket indexes
verifier submissions keyed by (heal_op_id, verifier)
ticket deterioration state linkage/counters used by lifecycle transitions

Out of Scope (Deferred)

Recheck evidence activation logic (kept gated for later milestone)
Enforcement/penalty activation (PR5 scope)

Key Files

x/audit/v1/keeper/msg_storage_truth.go
x/audit/v1/keeper/storage_truth_heal_ops.go
x/audit/v1/keeper/abci.go
x/audit/v1/keeper/msg_storage_truth_test.go
x/audit/v1/keeper/storage_truth_heal_ops_test.go

Testing

Added/updated tests for:

tx validation + authorization paths
healer claim / verifier submission flows
verified and failed finalize branches
single-node immediate finalize
epoch-end expiration + scheduling behavior
ticket state linkage updates

Validation run:

go test ./x/audit/v1/... -count=1

- Implement storage-truth heal-op lifecycle and recheck handlers - Scope LEP-6 heal op lifecycle to PR4

roomote-v0 · 2026-04-20T11:25:07Z

Rooviewer See task

All 3 previously flagged issues have been addressed in aead156. No new issues found.

Bug: ClaimHealComplete double-appends req.Details to healOp.Notes in the zero-verifier (single-node) path -- once before calling finalizeHealOp and once inside it
Cleanup: finalizeHealOp calls m.GetParams(ctx).WithDefaults() twice in the same block; can be read once into a local variable (also, GetParams already applies WithDefaults internally)
Performance: GetAllHealOps full KV scan runs twice per epoch end -- once in expireStorageTruthHealOpsAtEpochEnd and again in scheduleStorageTruthHealOpsAtEpochEnd

Previous reviews

9cd434b: Review #1

_{Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.}

mateeullahmalik · 2026-04-25T16:29:12Z

Production-gate review by Zee — 6 findings

Methodology: full file-by-file read of every non-generated changed file in this PR's diff (pr-120 vs its base branch), cross-checked against:

LEP-6 spec (Notion source of truth)
invariant-first-coding skill (write-path enumeration, sibling symmetry, single source of truth, post-fix re-audit)
Cosmos SDK consensus discipline (no float, no map iteration, bounded EndBlock, genesis round-trip, errorsmod wrapping)

Status legend: each finding's status is computed at the PR #122 stack-tip (consensus-gap-fixes commit a51c439), so 'FIXED' means a downstream PR in the stack already addresses it; 'OPEN' means it is still present at the tip and must be fixed before merge / Phase-2 activation. Severity rubric in the charter (~/work/lep6-review/ctx/charter.md): CRITICAL = consensus halt / state corruption / non-determinism in ABCI; HIGH = spec mismatch with economic impact, missing genesis round-trip, replay enabler; MEDIUM = invariant asymmetry without immediate exploit, unbounded loop with practical bound, missing param validation.

Severity breakdown: CRITICAL=1, HIGH=3, MEDIUM=2

120-F1 — `finalizeHealOp` ignored spec §20 score handling — verified heal didn't reduce D, failed heal didn't add D

Severity: CRITICAL
File: x/audit/v1/keeper/msg_storage_truth.go
Lines: PR audit: implement LEP-6 heal-op lifecycle and recheck handlers #120 head: 231-269 (finalizeHealOp)
Status at PR feat(audit): finalize LEP-6 consensus gap fixes #122 tip: FIXED in PR feat(audit): finalize LEP-6 consensus gap fixes #122 (verified branch now D = max(8, D_old/4); failed branch D += 15 and probation cooldown advanced). Verify the math goes through the lazy-decay helper before adding the +15 to avoid double-decay in subsequent reads.
What: Verified path only set LastHealEpoch+ProbationUntilEpoch; D stayed at 110+. With default HealThreshold=100, DecayPerEpoch=1, ProbationEpochs=3, ticket re-scheduled 3 epochs later → infinite heal loop. Failed path didn't update D OR ProbationUntilEpoch — same-epoch re-eligibility.

120-F2 — Empty-verifier-set heal op finalized as VERIFIED on healer's word alone (verifier-quorum bypass)

Severity: HIGH
File: x/audit/v1/keeper/msg_storage_truth.go
Lines: PR audit: implement LEP-6 heal-op lifecycle and recheck handlers #120 head: 109-124 single-node finalize path
Status at PR feat(audit): finalize LEP-6 consensus gap fixes #122 tip: FIXED in PR feat(audit): finalize LEP-6 consensus gap fixes #122 (ClaimHealComplete now refuses heal ops with empty verifier set: errorsmod.Wrap(types.ErrHealOpInvalidState, "heal op has no independent verifier assignments")). Verify scheduler in assignStorageTruthHealParticipants also refuses to emit empty verifier sets when len(activeAccounts) < 2.
What: len(VerifierSupernodeAccounts) == 0 triggered immediate VERIFIED on healer self-attestation. Spec §19 has no carve-out.

120-F3 — Heal-op state and verifications never pruned — unbounded EndBlock work

Severity: HIGH
File: x/audit/v1/keeper/prune.go (omission); storage_truth_heal_ops.go, storage_truth_state.go
Status at PR feat(audit): finalize LEP-6 consensus gap fixes #122 tip: OPEN — verify at PR feat(audit): finalize LEP-6 consensus gap fixes #122 head that PruneOldEpochs handles st/ho/, st/hot/, st/hos/, st/hov/ and cascades. EndBlock GetAllHealOps runs every epoch end.
What: Final-status (VERIFIED|FAILED|EXPIRED) heal ops never deleted. GetAllHealOps walks lifetime entries every epoch end → linear gas growth in EndBlock and state bloat.

120-F4 — Failed heal left ticket immediately re-eligible (no cooldown)

Severity: HIGH
File: x/audit/v1/keeper/msg_storage_truth.go
Lines: PR audit: implement LEP-6 heal-op lifecycle and recheck handlers #120 head: 252-268
Status at PR feat(audit): finalize LEP-6 consensus gap fixes #122 tip: FIXED in PR feat(audit): finalize LEP-6 consensus gap fixes #122 (failed path advances ProbationUntilEpoch by StorageTruthProbationEpochs).
What: Failed branch only cleared ActiveHealOpId; next epoch's scheduler re-picks same ticket. Allows malicious healer to burn deterministic healer slots epoch after epoch.

120-F5 — `SubmitStorageRecheckEvidence` does state-touching reads then returns `ErrNotImplemented`

Severity: MEDIUM
File: x/audit/v1/keeper/msg_storage_truth.go
Lines: PR audit: implement LEP-6 heal-op lifecycle and recheck handlers #120 head: 15-69
Status at PR feat(audit): finalize LEP-6 consensus gap fixes #122 tip: VERIFY at PR feat(audit): finalize LEP-6 consensus gap fixes #122 (recheck activated). Now should run real validation; ensure CheckTx is properly gated.
What: At PR audit: implement LEP-6 heal-op lifecycle and recheck handlers #120 head, gate was the trailing error after multiple supernodeKeeper.GetSuperNodeByAccount reads. Move ErrNotImplemented (or now-real validation early-fail) above keeper traffic to bound CheckTx cost.

120-F6 — `SubmitHealVerification` does not pin `req.VerificationHash` to `healOp.ResultHash`

Severity: MEDIUM
File: x/audit/v1/keeper/msg_storage_truth.go
Lines: PR audit: implement LEP-6 heal-op lifecycle and recheck handlers #120: 144-228
Status at PR feat(audit): finalize LEP-6 consensus gap fixes #122 tip: OPEN — verify at PR feat(audit): finalize LEP-6 consensus gap fixes #122 head whether positive attestations (req.Verified == true) require req.VerificationHash == healOp.ResultHash. Spec §19 trusts attestations, but pinning the verifier's claim to the healer's manifest closes one collusion mode.
What: Verifier may submit any hash; quorum is purely on req.Verified boolean. Allows verifier set to finalize a heal even if no verifier saw the same manifest as the healer.

This review is posted as a COMMENT (not REQUEST_CHANGES) so it does not block merge mechanically — but the CRITICAL and HIGH items must be triaged before activation. I'm available to walk through any of these in detail.

— Zee

Copilot

Pull request overview

Implements the LEP-6 PR4 “heal-op lifecycle” and extends storage-truth state/scoring in x/audit/v1, including verifier submissions, epoch-end heal-op expiration/scheduling, and score state enrichment (trust band, contradiction/failure tracking).

Changes:

Added heal-op lifecycle handlers (healer claim, verifier verification, gated recheck evidence) plus new KV state for per-(heal_op, verifier) verification tracking.
Implemented epoch-end processing for heal ops: expire overdue ops, clear stale ticket pointers, and schedule new ops deterministically by priority/caps.
Introduced storage-truth scoring pipeline on report ingestion, including new reporter/ticket metadata fields and score update events.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
x/audit/v1/types/keys.go	Adds KV prefixes/keys for heal-op verifier submissions.
x/audit/v1/types/events.go	Introduces event type/attribute constants for scoring + heal-op lifecycle.
x/audit/v1/types/errors.go	Adds error codes for heal-op lifecycle + recheck evidence validation.
x/audit/v1/types/audit.pb.go	Regenerated protobuf Go types (trust band enum, new state fields).
x/audit/v1/module/autocli.go	Updates CLI help text to reflect implemented heal-op txs.
x/audit/v1/keeper/storage_truth_state.go	Adds keeper getters/setters for heal-op verifier submissions.
x/audit/v1/keeper/storage_truth_state_test.go	Expands state round-trip tests and adds heal-op verification round-trip test.
x/audit/v1/keeper/storage_truth_scoring.go	Adds scoring implementation applied during report ingestion.
x/audit/v1/keeper/storage_truth_scoring_internal_test.go	Unit tests for scoring helper functions.
x/audit/v1/keeper/msg_submit_epoch_report.go	Wires scoring into `SubmitEpochReport`.
x/audit/v1/keeper/msg_submit_epoch_report_storage_truth_scores_test.go	Comprehensive tests for scoring behavior, decay, events, contradictions.
x/audit/v1/keeper/storage_truth_heal_ops.go	Implements epoch-end heal-op expiry + scheduling logic.
x/audit/v1/keeper/storage_truth_heal_ops_test.go	Tests for scheduling priority, expiry, and rescheduling after expiry.
x/audit/v1/keeper/msg_storage_truth.go	Implements `ClaimHealComplete`, `SubmitHealVerification`, and gated `SubmitStorageRecheckEvidence`.
x/audit/v1/keeper/msg_storage_truth_test.go	Tests tx validation/authorization and heal-op lifecycle flows.
x/audit/v1/keeper/msg_storage_truth_placeholders.go	Removes prior placeholder Msg server implementations.
x/audit/v1/keeper/msg_storage_truth_placeholders_test.go	Removes placeholder-only tests.
x/audit/v1/keeper/query_storage_truth_test.go	Updates queries for new state fields and adds an ingestion->query reflection test.
x/audit/v1/keeper/abci.go	Runs heal-op epoch-end processing in `EndBlocker`.
proto/lumera/audit/v1/audit.proto	Adds `ReporterTrustBand` and extends persisted scoring state messages.
app/proto_bridge.go	Registers new enum for proto bridge.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-26T03:20:24Z

+	if result != nil {
+		if isStorageTruthFailureClass(result.ResultClass) && epochID != state.LastFailureEpoch {
+			nextState.LastFailureEpoch = epochID
+			nextState.RecentFailureEpochCount = updateRecentFailureEpochCount(state, epochID, k.GetParams(ctx).WithDefaults())
+		} else if !found {
+			nextState.RecentFailureEpochCount = 0
+		}


applyTicketDeteriorationDelta doesn’t increment LastFailureEpoch/RecentFailureEpochCount for the first failure when epochID == 0 because the guard epochID != state.LastFailureEpoch is false for a zero-value (not-found) state. This causes epoch-0 failures to be recorded with RecentFailureEpochCount == 0, breaking repeated-failure escalation and the epoch-0 carryover behavior in later epochs. Consider treating the not-found case specially (initialize LastFailureEpoch=epochID and RecentFailureEpochCount=1 for failure classes) or changing the condition to allow updates when !found.

Copilot · 2026-04-26T03:20:24Z

+	nextState := types.ReporterReliabilityState{
+		ReporterSupernodeAccount: reporterAccount,
+		ReliabilityScore:         next,
+		LastUpdatedEpoch:         epochID,
+		TrustBand:                reporterTrustBandForScore(next, k.GetParams(ctx).WithDefaults()),
+		ContradictionCount:       state.ContradictionCount + contradictionIncrements,
+	}


applyReporterReliabilityDelta and applyTicketDeteriorationDelta re-fetch params (k.GetParams(ctx).WithDefaults()) while applyStorageTruthScores already loaded params once. This adds repeated store reads in the per-result hot path and can also create subtle inconsistencies if params are ever mutated within a block. Prefer passing the already-loaded params (or the specific thresholds/decays needed) into these helpers and using it for reporterTrustBandForScore / updateRecentFailureEpochCount.

Copilot · 2026-04-26T03:20:24Z

+func (k Keeper) ProcessStorageTruthHealOpsAtEpochEnd(ctx sdk.Context, epochID uint64, params types.Params) error {
+	healOps, err := k.GetAllHealOps(ctx)
+	if err != nil {
+		return err
+	}


ProcessStorageTruthHealOpsAtEpochEnd calls GetAllHealOps, which iterates over every heal op ever created. Since heal ops aren’t pruned in PruneOldEpochs, this makes epoch-end processing O(total heal ops) and can grow without bound, increasing EndBlocker time over the life of the chain. Consider iterating only non-final statuses via the existing HealOpByStatus index (SCHEDULED / IN_PROGRESS / HEALER_REPORTED), and/or adding a pruning strategy for finalized/expired heal ops.

Copilot · 2026-04-26T03:20:24Z

+	sdkCtx.EventManager().EmitEvent(
+		sdk.NewEvent(
+			types.EventTypeHealOpHealerReported,
+			sdk.NewAttribute(sdk.AttributeKeyModule, types.ModuleName),
+			sdk.NewAttribute(types.AttributeKeyHealOpID, strconv.FormatUint(healOp.HealOpId, 10)),
+			sdk.NewAttribute(types.AttributeKeyTicketID, healOp.TicketId),
+			sdk.NewAttribute(types.AttributeKeyHealerSupernodeAccount, req.Creator),
+			sdk.NewAttribute(types.AttributeKeyTranscriptHash, req.HealManifestHash),
+		),


EventTypeHealOpHealerReported emits AttributeKeyTranscriptHash but the value is HealManifestHash from MsgClaimHealComplete. This makes the event payload ambiguous for indexers/consumers (a “transcript” hash vs a “heal manifest” hash are distinct concepts in the API). Consider adding a dedicated attribute key (e.g., heal_manifest_hash) and emitting that instead of reusing transcript_hash.

Copilot · 2026-04-26T03:20:25Z

@@ -0,0 +1,33 @@
+package types
+
+// Event types and attributes for storage-truth score updates.


File header comment says these are “Event types and attributes for storage-truth score updates”, but the constants also cover heal-op lifecycle and recheck evidence events. Updating the comment to reflect the broader scope will prevent confusion when adding/consuming events later.

Suggested change

// Event types and attributes for storage-truth score updates.

// Event types and attributes for storage-truth score updates, heal-op lifecycle, and recheck evidence events.

j-rafique added 2 commits April 15, 2026 18:14

audit: add LEP-6 shadow scoring for storage truth

51ea7b0

audit: implement LEP-6 heal-op lifecycle and recheck handlers

9cd434b

- Implement storage-truth heal-op lifecycle and recheck handlers - Scope LEP-6 heal op lifecycle to PR4

roomote-v0 Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread x/audit/v1/keeper/msg_storage_truth.go Outdated

roomote-v0 Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread x/audit/v1/keeper/msg_storage_truth.go Outdated

roomote-v0 Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread x/audit/v1/keeper/storage_truth_heal_ops.go Outdated

j-rafique force-pushed the LEP-6-shadow-scoring branch from 51ea7b0 to c86ef0e Compare April 22, 2026 08:22

audit: address PR120 heal-op lifecycle review feedback

aead156

roomote-v0 Bot approved these changes Apr 22, 2026

View reviewed changes

j-rafique self-assigned this Apr 22, 2026

a-ok123 requested a review from Copilot April 26, 2026 03:15

Copilot started reviewing on behalf of a-ok123 April 26, 2026 03:16 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audit: implement LEP-6 heal-op lifecycle and recheck handlers#120

audit: implement LEP-6 heal-op lifecycle and recheck handlers#120
j-rafique wants to merge 3 commits intoLEP-6-shadow-scoringfrom
LEP-6-heal-op-lifecycle

j-rafique commented Apr 20, 2026 •

edited

Loading

Uh oh!

roomote-v0 Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mateeullahmalik commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,33 @@
		package types

		// Event types and attributes for storage-truth score updates.

Conversation

j-rafique commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What’s Implemented

1) Heal-op tx lifecycle

2) Epoch-end heal-op processing

3) State model integration

Out of Scope (Deferred)

Key Files

Testing

Uh oh!

roomote-v0 Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mateeullahmalik commented Apr 25, 2026

Production-gate review by Zee — 6 findings

120-F1 — finalizeHealOp ignored spec §20 score handling — verified heal didn't reduce D, failed heal didn't add D

120-F2 — Empty-verifier-set heal op finalized as VERIFIED on healer's word alone (verifier-quorum bypass)

120-F3 — Heal-op state and verifications never pruned — unbounded EndBlock work

120-F4 — Failed heal left ticket immediately re-eligible (no cooldown)

120-F5 — SubmitStorageRecheckEvidence does state-touching reads then returns ErrNotImplemented

120-F6 — SubmitHealVerification does not pin req.VerificationHash to healOp.ResultHash

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

j-rafique commented Apr 20, 2026 •

edited

Loading

roomote-v0 Bot commented Apr 20, 2026 •

edited

Loading

120-F1 — `finalizeHealOp` ignored spec §20 score handling — verified heal didn't reduce D, failed heal didn't add D

120-F5 — `SubmitStorageRecheckEvidence` does state-touching reads then returns `ErrNotImplemented`

120-F6 — `SubmitHealVerification` does not pin `req.VerificationHash` to `healOp.ResultHash`