Skip to content

fix(devnet): green up everlight devnet test on master#123

Merged
mateeullahmalik merged 1 commit intomasterfrom
fix/everlight-devnet-test-greenpath
Apr 27, 2026
Merged

fix(devnet): green up everlight devnet test on master#123
mateeullahmalik merged 1 commit intomasterfrom
fix/everlight-devnet-test-greenpath

Conversation

@mateeullahmalik
Copy link
Copy Markdown
Contributor

Summary

make devnet-tests-everlight is failing on master (7ca770a, PR #113). This PR makes it green with a single, surgical, multi-file fix.

Result on a fresh 5-validator devnet:

  PASS: 39   FAIL: 0   SKIP: 4
  EXIT=0

The 4 SKIPs are intentional (S3.5–S3.7 payout-timing on devnet, S4 longer state setup, S9 pre-Everlight upgrade flow, S10 full lifecycle).

What was broken

  1. devnet/go.mod pinned lumera v1.10.0 but tests/validator/lep5_test.go (PR LEP 5 #103) imports x/action/v1/merkle which only exists at HEAD — go mod tidy failed.
  2. everlight_test.sh had 9 cmd; rc=$? sites that swallow non-zero exits silently under set -euo pipefail, killing the script before any FAIL line could print (this is why the original symptom was S2.1 PASS … make: *** Error 1 with nothing in between).
  3. audit_current_epoch_id used .epoch_id // empty — proto3/gogoproto omits zero-valued scalars, so a legitimate epoch 0 response renders without .epoch_id and the function returned a missing-key error.
  4. audit assigned-targets --supernode-account FOO and supernode report-supernode-metrics --validator-address FOO use positional args, not flags. The script's flag form silently failed.
  5. S7.7 attempted to gov-update epoch_length_blocks. The audit module correctly enforces this as immutable-after-genesis (consensus-critical epoch math). The test was wrong, not the chain.
  6. The supernode's host_reporter (PR #284) auto-submits MsgSubmitEpochReport every ~5s with the SN's reporter key. The test tries to submit using the same key from outside the SN and loses the account-sequence race every time → S2.3, S2.4, S2.6, S5.x all silently failed in the recovery loop.
  7. S5.4 asserted "remains eligible after growth jump". The chain correctly clamps the spike via smoothing + growth-cap, so eligibility flips false during ramp-up — the assertion races the very feature it's testing.

What this PR changes

  • devnet/go.mod — activate the local replace lumera => .. block so devnet builds against this repo's HEAD; bump baseline require to v1.11.1.
  • devnet/default-config/devnet-genesis.json — set epoch_length_blocks: 20 so devnet has fast (~20s) epochs. No chain-code change required; matches the design that this is a genesis-time decision.
  • devnet/generators/docker-compose.go — emit EVERLIGHT_TEST_TARGET=1 only on supernova_validator_1 (the test target).
  • devnet/scripts/supernode-setup.sh — when EVERLIGHT_TEST_TARGET=1, forward LUMERA_SUPERNODE_DISABLE_HOST_REPORTER=1 so v1's host_reporter is suppressed and the test can drive MsgSubmitEpochReport cleanly. The other 4 validators keep host_reporter running, so peer reachability observations still flow and ACTIVE-state eligibility still works.
  • devnet/tests/everlight/everlight_test.sh — 7 fixes: set -e hardening, jq epoch-0 fix, two flag→positional fixes, S7.7 immutable-by-design refactor, S2 recovery preamble, S5.4 anti-gaming property assertion rewrite.

Companion change (required)

LumeraProtocol/supernode#285 — adds the LUMERA_SUPERNODE_DISABLE_HOST_REPORTER env knob. Without it the EVERLIGHT_TEST_TARGET=1 propagation in this PR is a no-op and S2/S5 will still fail.

No chain code touched

Every suspected "chain bug" turned out to be correct protocol behavior (epoch_length immutability, missing-report POSTPONE, host_reporter auto-submit, anti-gaming smoothing). This PR only updates devnet config, devnet test infrastructure, and the test script itself. x/audit, x/supernode, x/action, ante handlers, keepers, migrations — all untouched.

Risk / rollback

  • Production risk: zero. All changes are inside devnet/ and devnet-only configuration. MinCascadeBytesForPayment, MinCpuFreePercent, epoch_length_blocks semantics, and the audit/supernode modules are unchanged.
  • Rollback: revert this single commit.

Test evidence

=== Scenario 2: STORAGE_FULL State Transition (F12, F13) ===
  PASS: S2.1 resolved service supernode (validator=lumeravaloper1apws… state=SUPERNODE_STATE_POSTPONED)
  PASS: S2.3 submit audit epoch report with high disk usage
  PASS: S2.4 audit report transitions supernode to STORAGE_FULL
  PASS: S2.5 submit audit epoch report with healthy disk usage
  PASS: S2.6 audit report recovers supernode from STORAGE_FULL to ACTIVE
  PASS: S2.7 submitted legacy supernode metrics with high disk
  PASS: S2.8 legacy metrics path does not mutate state
…
  PASS: S5.4 anti-gaming smoothing clamps growth jump (smoothed=277104 < raw=21474836480)
  PASS: S5.5 smoothed_weight exposed via eligibility query
------------------------------------------------------------
  PASS: 39   FAIL: 0   SKIP: 4
============================================================
EXIT=0

Multi-faceted fix to make 'make devnet-tests-everlight' green on master:

- devnet/go.mod: activate the local 'replace lumera => ..' block (was
  pinned to v1.10.0 but tests/validator imports x/action/v1/merkle
  which only exists at HEAD).  Bump baseline require to v1.11.1.

- devnet/default-config/devnet-genesis.json: set epoch_length_blocks=20
  in genesis.  The audit module enforces 'epoch_length_blocks is
  immutable after genesis' (consensus-critical math); the previous
  test attempted to gov-update this at runtime which is correctly
  rejected.  Devnet wants short epochs (~20s) for fast lifecycle
  coverage.

- devnet/generators/docker-compose.go: emit EVERLIGHT_TEST_TARGET=1
  on supernova_validator_1 only.  The companion supernode change
  (LumeraProtocol/supernode#285) suppresses host_reporter on that
  validator so the test can drive MsgSubmitEpochReport without losing
  the account-sequence race.  Other validators keep host_reporter
  running so peer reachability data still flows for ACTIVE eligibility.

- devnet/scripts/supernode-setup.sh: forward EVERLIGHT_TEST_TARGET=1
  through to LUMERA_SUPERNODE_DISABLE_HOST_REPORTER=1 when starting
  the supernode binary.

- devnet/tests/everlight/everlight_test.sh: six fixes
  1. Rewrite 9 'cmd; rc=$?' sites to 'rc=0; cmd || rc=$?' so 'set -e'
     no longer aborts before FAIL/SKIP can be recorded.
  2. audit_current_epoch_id: '.epoch_id // empty' -> '.epoch_id // 0'.
     proto3/gogoproto omits zero-valued scalars; epoch 0 (first 400
     blocks under default params, first 20 blocks under devnet) is a
     valid epoch but renders without the .epoch_id key.
  3. 'audit assigned-targets --supernode-account FOO' -> positional
     'FOO'.  The CLI takes a positional arg, the flag does not exist
     and silently fails the entire submit pipeline.
  4. 'supernode report-supernode-metrics --validator-address FOO' ->
     positional 'FOO'.  Same shape as #3, on the legacy metrics path.
  5. S7.7: stop trying to gov-update epoch_length_blocks (immutable by
     design) and convert the assertion into a presence + value check.
  6. Scenario 2 entry: drive a healthy self-report and wait one epoch
     so the target supernode recovers from POSTPONED -> ACTIVE before
     S2.3 attempts the STORAGE_FULL transition.  POSTPONED is the
     expected starting state when host_reporter is disabled
     (missing-report consecutive postponement is correct chain
     behavior).
  7. S5.4: rewrite assertion from 'eligible after growth jump' to
     'smoothed_weight < raw post-jump bytes'.  The original assertion
     races the growth-cap clamp; the rewritten one tests the actual
     anti-gaming property without a timing dependency on smoothing
     ramp-up.

## Verification

5-validator devnet, fresh genesis, full make devnet-tests-everlight:

  PASS: 39   FAIL: 0   SKIP: 4
  EXIT=0

The 4 SKIPs are intentional: S3.5-S3.7 (payout candidates not yet
eligible at distribution time on devnet timing), S4 (needs longer
state setup), S9 (pre-Everlight upgrade flow), S10 (full lifecycle).

## Companion change

Requires LumeraProtocol/supernode#285 (LUMERA_SUPERNODE_DISABLE_HOST_REPORTER
env var).  Without it the EVERLIGHT_TEST_TARGET env propagation in this
PR is a no-op.
@roomote-v0
Copy link
Copy Markdown

roomote-v0 Bot commented Apr 27, 2026

Rooviewer Clock   See task

Reviewed all six changed files. The fixes are well-targeted and correct:

  • The go.mod local replace directive keeps devnet builds in lockstep with HEAD, fixing the missing x/action/v1/merkle import.
  • The set -e hardening (rc=0; cmd || rc=$? pattern) is the right idiom for capturing exit codes without tripping pipefail.
  • The epoch_id // 0 fix correctly handles proto3 zero-value omission, with the epoch_start_height guard preventing silent acceptance of error payloads.
  • Both flag-to-positional fixes (assigned-targets, report-supernode-metrics) match the AutoCLI definitions ([supernode-account], [validator-address]).
  • S7.7 conversion from a doomed gov-update attempt to a presence check aligns with the immutability enforcement in x/audit/v1/keeper/msg_update_params.go.
  • The S2 recovery preamble and S5.4 anti-gaming assertion rewrite correctly address the host_reporter race and growth-cap clamp timing issues respectively.
  • The EVERLIGHT_TEST_TARGET / LUMERA_SUPERNODE_DISABLE_HOST_REPORTER plumbing through docker-compose and supernode-setup is clean and scoped to the single target validator.

No issues found.

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

@mateeullahmalik mateeullahmalik merged commit 02928ac into master Apr 27, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants