Skip to content

audit: implement LEP-6 heal-op lifecycle and recheck handlers#120

Open
j-rafique wants to merge 3 commits intoLEP-6-shadow-scoringfrom
LEP-6-heal-op-lifecycle
Open

audit: implement LEP-6 heal-op lifecycle and recheck handlers#120
j-rafique wants to merge 3 commits intoLEP-6-shadow-scoringfrom
LEP-6-heal-op-lifecycle

Conversation

@j-rafique
Copy link
Copy Markdown
Contributor

@j-rafique j-rafique commented Apr 20, 2026

Summary

This PR implements LEP-6 PR4 (heal-op lifecycle) in lumera: on-chain heal operation transitions, verifier-driven finalization, and epoch-end expiration/scheduling for self-heal ops.
The rollout remains non-breaking and keeps deferred LEP-6 enforcement behavior out of scope.

What’s Implemented

1) Heal-op tx lifecycle

Added/implemented keeper logic for:

  • MsgClaimHealComplete
  • MsgSubmitHealVerification
  • MsgSubmitStorageRecheckEvidence (validated + wired, intentionally gated as not active in this milestone)

Behavior:

  • Strict request validation and signer/role authorization
  • Status transitions:
    • SCHEDULED / IN_PROGRESS -> HEALER_REPORTED
    • HEALER_REPORTED -> VERIFIED (all required positives) or FAILED (any negative)
  • Single-node finalize path (no verifiers) with immediate completion
  • Ticket linkage updates on finalize (active_heal_op_id cleared; verified path updates probation/last-heal fields)
  • Event emission for healer reported / verified / failed

2) Epoch-end heal-op processing

Implemented epoch-end lifecycle execution:

  • Expire overdue non-final heal ops (deadline_epoch_id <= current_epoch)
  • Clear stale active heal-op pointers from ticket deterioration state
  • Schedule new heal ops from deterioration candidates by priority:
    • threshold/probation filtering
    • max-per-epoch cap
    • deterministic participant selection (healer + verifiers)
    • deterministic tie-breaking and ID progression

3) State model integration

Integrated with existing LEP-6 storage-truth state surfaces:

  • HealOp + status/ticket indexes
  • verifier submissions keyed by (heal_op_id, verifier)
  • ticket deterioration state linkage/counters used by lifecycle transitions

Out of Scope (Deferred)

  • Recheck evidence activation logic (kept gated for later milestone)
  • Enforcement/penalty activation (PR5 scope)

Key Files

  • x/audit/v1/keeper/msg_storage_truth.go
  • x/audit/v1/keeper/storage_truth_heal_ops.go
  • x/audit/v1/keeper/abci.go
  • x/audit/v1/keeper/msg_storage_truth_test.go
  • x/audit/v1/keeper/storage_truth_heal_ops_test.go

Testing

Added/updated tests for:

  • tx validation + authorization paths
  • healer claim / verifier submission flows
  • verified and failed finalize branches
  • single-node immediate finalize
  • epoch-end expiration + scheduling behavior
  • ticket state linkage updates

Validation run:

  • go test ./x/audit/v1/... -count=1

- Implement storage-truth heal-op lifecycle and recheck handlers
- Scope LEP-6 heal op lifecycle to PR4
@roomote-v0
Copy link
Copy Markdown

roomote-v0 Bot commented Apr 20, 2026

Rooviewer Clock   See task

All 3 previously flagged issues have been addressed in aead156. No new issues found.

  • Bug: ClaimHealComplete double-appends req.Details to healOp.Notes in the zero-verifier (single-node) path -- once before calling finalizeHealOp and once inside it
  • Cleanup: finalizeHealOp calls m.GetParams(ctx).WithDefaults() twice in the same block; can be read once into a local variable (also, GetParams already applies WithDefaults internally)
  • Performance: GetAllHealOps full KV scan runs twice per epoch end -- once in expireStorageTruthHealOpsAtEpochEnd and again in scheduleStorageTruthHealOpsAtEpochEnd
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

Comment thread x/audit/v1/keeper/msg_storage_truth.go Outdated
Comment thread x/audit/v1/keeper/msg_storage_truth.go Outdated
Comment thread x/audit/v1/keeper/storage_truth_heal_ops.go Outdated
@j-rafique j-rafique force-pushed the LEP-6-shadow-scoring branch from 51ea7b0 to c86ef0e Compare April 22, 2026 08:22
@j-rafique j-rafique self-assigned this Apr 22, 2026
@mateeullahmalik
Copy link
Copy Markdown
Contributor

Production-gate review by Zee — 6 findings

Methodology: full file-by-file read of every non-generated changed file in this PR's diff (pr-120 vs its base branch), cross-checked against:

  • LEP-6 spec (Notion source of truth)
  • invariant-first-coding skill (write-path enumeration, sibling symmetry, single source of truth, post-fix re-audit)
  • Cosmos SDK consensus discipline (no float, no map iteration, bounded EndBlock, genesis round-trip, errorsmod wrapping)

Status legend: each finding's status is computed at the PR #122 stack-tip (consensus-gap-fixes commit a51c439), so 'FIXED' means a downstream PR in the stack already addresses it; 'OPEN' means it is still present at the tip and must be fixed before merge / Phase-2 activation. Severity rubric in the charter (~/work/lep6-review/ctx/charter.md): CRITICAL = consensus halt / state corruption / non-determinism in ABCI; HIGH = spec mismatch with economic impact, missing genesis round-trip, replay enabler; MEDIUM = invariant asymmetry without immediate exploit, unbounded loop with practical bound, missing param validation.

Severity breakdown: CRITICAL=1, HIGH=3, MEDIUM=2


120-F1 — finalizeHealOp ignored spec §20 score handling — verified heal didn't reduce D, failed heal didn't add D

120-F2 — Empty-verifier-set heal op finalized as VERIFIED on healer's word alone (verifier-quorum bypass)

120-F3 — Heal-op state and verifications never pruned — unbounded EndBlock work

  • Severity: HIGH
  • File: x/audit/v1/keeper/prune.go (omission); storage_truth_heal_ops.go, storage_truth_state.go
  • Status at PR feat(audit): finalize LEP-6 consensus gap fixes #122 tip: OPEN — verify at PR feat(audit): finalize LEP-6 consensus gap fixes #122 head that PruneOldEpochs handles st/ho/, st/hot/, st/hos/, st/hov/ and cascades. EndBlock GetAllHealOps runs every epoch end.
  • What: Final-status (VERIFIED|FAILED|EXPIRED) heal ops never deleted. GetAllHealOps walks lifetime entries every epoch end → linear gas growth in EndBlock and state bloat.

120-F4 — Failed heal left ticket immediately re-eligible (no cooldown)

120-F5 — SubmitStorageRecheckEvidence does state-touching reads then returns ErrNotImplemented

120-F6 — SubmitHealVerification does not pin req.VerificationHash to healOp.ResultHash


This review is posted as a COMMENT (not REQUEST_CHANGES) so it does not block merge mechanically — but the CRITICAL and HIGH items must be triaged before activation. I'm available to walk through any of these in detail.

— Zee

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the LEP-6 PR4 “heal-op lifecycle” and extends storage-truth state/scoring in x/audit/v1, including verifier submissions, epoch-end heal-op expiration/scheduling, and score state enrichment (trust band, contradiction/failure tracking).

Changes:

  • Added heal-op lifecycle handlers (healer claim, verifier verification, gated recheck evidence) plus new KV state for per-(heal_op, verifier) verification tracking.
  • Implemented epoch-end processing for heal ops: expire overdue ops, clear stale ticket pointers, and schedule new ops deterministically by priority/caps.
  • Introduced storage-truth scoring pipeline on report ingestion, including new reporter/ticket metadata fields and score update events.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
x/audit/v1/types/keys.go Adds KV prefixes/keys for heal-op verifier submissions.
x/audit/v1/types/events.go Introduces event type/attribute constants for scoring + heal-op lifecycle.
x/audit/v1/types/errors.go Adds error codes for heal-op lifecycle + recheck evidence validation.
x/audit/v1/types/audit.pb.go Regenerated protobuf Go types (trust band enum, new state fields).
x/audit/v1/module/autocli.go Updates CLI help text to reflect implemented heal-op txs.
x/audit/v1/keeper/storage_truth_state.go Adds keeper getters/setters for heal-op verifier submissions.
x/audit/v1/keeper/storage_truth_state_test.go Expands state round-trip tests and adds heal-op verification round-trip test.
x/audit/v1/keeper/storage_truth_scoring.go Adds scoring implementation applied during report ingestion.
x/audit/v1/keeper/storage_truth_scoring_internal_test.go Unit tests for scoring helper functions.
x/audit/v1/keeper/msg_submit_epoch_report.go Wires scoring into SubmitEpochReport.
x/audit/v1/keeper/msg_submit_epoch_report_storage_truth_scores_test.go Comprehensive tests for scoring behavior, decay, events, contradictions.
x/audit/v1/keeper/storage_truth_heal_ops.go Implements epoch-end heal-op expiry + scheduling logic.
x/audit/v1/keeper/storage_truth_heal_ops_test.go Tests for scheduling priority, expiry, and rescheduling after expiry.
x/audit/v1/keeper/msg_storage_truth.go Implements ClaimHealComplete, SubmitHealVerification, and gated SubmitStorageRecheckEvidence.
x/audit/v1/keeper/msg_storage_truth_test.go Tests tx validation/authorization and heal-op lifecycle flows.
x/audit/v1/keeper/msg_storage_truth_placeholders.go Removes prior placeholder Msg server implementations.
x/audit/v1/keeper/msg_storage_truth_placeholders_test.go Removes placeholder-only tests.
x/audit/v1/keeper/query_storage_truth_test.go Updates queries for new state fields and adds an ingestion->query reflection test.
x/audit/v1/keeper/abci.go Runs heal-op epoch-end processing in EndBlocker.
proto/lumera/audit/v1/audit.proto Adds ReporterTrustBand and extends persisted scoring state messages.
app/proto_bridge.go Registers new enum for proto bridge.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +244 to +250
if result != nil {
if isStorageTruthFailureClass(result.ResultClass) && epochID != state.LastFailureEpoch {
nextState.LastFailureEpoch = epochID
nextState.RecentFailureEpochCount = updateRecentFailureEpochCount(state, epochID, k.GetParams(ctx).WithDefaults())
} else if !found {
nextState.RecentFailureEpochCount = 0
}
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyTicketDeteriorationDelta doesn’t increment LastFailureEpoch/RecentFailureEpochCount for the first failure when epochID == 0 because the guard epochID != state.LastFailureEpoch is false for a zero-value (not-found) state. This causes epoch-0 failures to be recorded with RecentFailureEpochCount == 0, breaking repeated-failure escalation and the epoch-0 carryover behavior in later epochs. Consider treating the not-found case specially (initialize LastFailureEpoch=epochID and RecentFailureEpochCount=1 for failure classes) or changing the condition to allow updates when !found.

Copilot uses AI. Check for mistakes.
Comment on lines +204 to +210
nextState := types.ReporterReliabilityState{
ReporterSupernodeAccount: reporterAccount,
ReliabilityScore: next,
LastUpdatedEpoch: epochID,
TrustBand: reporterTrustBandForScore(next, k.GetParams(ctx).WithDefaults()),
ContradictionCount: state.ContradictionCount + contradictionIncrements,
}
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyReporterReliabilityDelta and applyTicketDeteriorationDelta re-fetch params (k.GetParams(ctx).WithDefaults()) while applyStorageTruthScores already loaded params once. This adds repeated store reads in the per-result hot path and can also create subtle inconsistencies if params are ever mutated within a block. Prefer passing the already-loaded params (or the specific thresholds/decays needed) into these helpers and using it for reporterTrustBandForScore / updateRecentFailureEpochCount.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +19
func (k Keeper) ProcessStorageTruthHealOpsAtEpochEnd(ctx sdk.Context, epochID uint64, params types.Params) error {
healOps, err := k.GetAllHealOps(ctx)
if err != nil {
return err
}
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProcessStorageTruthHealOpsAtEpochEnd calls GetAllHealOps, which iterates over every heal op ever created. Since heal ops aren’t pruned in PruneOldEpochs, this makes epoch-end processing O(total heal ops) and can grow without bound, increasing EndBlocker time over the life of the chain. Consider iterating only non-final statuses via the existing HealOpByStatus index (SCHEDULED / IN_PROGRESS / HEALER_REPORTED), and/or adding a pruning strategy for finalized/expired heal ops.

Copilot uses AI. Check for mistakes.
Comment on lines +130 to +138
sdkCtx.EventManager().EmitEvent(
sdk.NewEvent(
types.EventTypeHealOpHealerReported,
sdk.NewAttribute(sdk.AttributeKeyModule, types.ModuleName),
sdk.NewAttribute(types.AttributeKeyHealOpID, strconv.FormatUint(healOp.HealOpId, 10)),
sdk.NewAttribute(types.AttributeKeyTicketID, healOp.TicketId),
sdk.NewAttribute(types.AttributeKeyHealerSupernodeAccount, req.Creator),
sdk.NewAttribute(types.AttributeKeyTranscriptHash, req.HealManifestHash),
),
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EventTypeHealOpHealerReported emits AttributeKeyTranscriptHash but the value is HealManifestHash from MsgClaimHealComplete. This makes the event payload ambiguous for indexers/consumers (a “transcript” hash vs a “heal manifest” hash are distinct concepts in the API). Consider adding a dedicated attribute key (e.g., heal_manifest_hash) and emitting that instead of reusing transcript_hash.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,33 @@
package types

// Event types and attributes for storage-truth score updates.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File header comment says these are “Event types and attributes for storage-truth score updates”, but the constants also cover heal-op lifecycle and recheck evidence events. Updating the comment to reflect the broader scope will prevent confusion when adding/consuming events later.

Suggested change
// Event types and attributes for storage-truth score updates.
// Event types and attributes for storage-truth score updates, heal-op lifecycle, and recheck evidence events.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants