Skip to content

htlcswitch: add FSM fuzz harness for channelLink commit protocol#1

Open
MPins wants to merge 9 commits intomasterfrom
link_fsm_fuzz
Open

htlcswitch: add FSM fuzz harness for channelLink commit protocol#1
MPins wants to merge 9 commits intomasterfrom
link_fsm_fuzz

Conversation

@MPins
Copy link
Copy Markdown
Owner

@MPins MPins commented Mar 12, 2026

The channelLink commit protocol — the sequence of CommitSig / RevokeAndAck exchanges that advance commitment heights on both sides of a channel — is one of the most critical and subtle state machines in lnd. Despite extensive unit tests, the ordering of these messages is highly concurrent and easy to get wrong. A single missed revocation or out-of-order commit can corrupt channel state irreparably.

This PR adds a coverage-guided fuzz harness that exercises the full commit protocol FSM by randomly interleaving HTLC additions, commits, revocations, settlements, and failures from both Alice and Bob. The fuzzer checks structural invariants (monotonic commit heights, mirror symmetry between peers) after every event, catching protocol violations that deterministic tests cannot anticipate.

Testing

go test ./htlcswitch/ -run TestChannelLinkFSMScenarios -v
go test ./htlcswitch -run=^$ -fuzz=FuzzChannelLinkFSM -fuzztime=1m

Comment thread htlcswitch/fuzz_link_test.go Outdated
Comment thread htlcswitch/link_isolated_test.go Outdated
@coveralls
Copy link
Copy Markdown

coveralls commented Mar 12, 2026

Coverage Report for CI Build 24286915830

Coverage decreased (-10.1%) to 52.197%

Details

  • Coverage decreased (-10.1%) from the base build.
  • Patch coverage: 175 uncovered changes across 5 files (22 of 197 lines covered, 11.17%).
  • 30606 coverage regressions across 468 files.

Uncovered Changes

File Changed Covered %
htlcswitch/test_utils.go 96 0 0.0%
htlcswitch/mock.go 55 0 0.0%
lnwallet/channel.go 31 19 61.29%
lnwallet/mock.go 8 0 0.0%
lnwallet/sigpool.go 7 3 42.86%

Coverage Regressions

30606 previously-covered lines in 468 files lost coverage.

Top 10 Files by Coverage Loss Lines Losing Coverage Coverage
lnwire/test_message.go 1469 0.0%
lnwallet/channel.go 1088 68.48%
invoices/sql_store.go 1066 0.0%
htlcswitch/test_utils.go 814 0.0%
htlcswitch/mock.go 531 0.0%
peer/test_utils.go 529 0.0%
autopilot/agent.go 395 0.0%
channeldb/migration30/migration.go 367 0.0%
lnwallet/test_utils.go 364 0.0%
contractcourt/utxonursery.go 342 38.01%

Coverage Stats

Coverage Status
Relevant Lines: 193501
Covered Lines: 101002
Line Coverage: 52.2%
Coverage Strength: 1.62 hits per line

💛 - Coveralls

@MPins MPins force-pushed the link_fsm_fuzz branch 3 times, most recently from 001781c to 1400d9a Compare March 13, 2026 21:50
@MPins MPins force-pushed the link_fsm_fuzz branch 4 times, most recently from 1b219d1 to ca4be0a Compare March 28, 2026 00:16
@MPins MPins force-pushed the link_fsm_fuzz branch 10 times, most recently from fcb4bb0 to 6e4610c Compare April 1, 2026 12:59
MPins added 3 commits April 8, 2026 16:40
Expose the `invoiceRegistry` field in `singleLinkTestHarness` so
tests can register and look up invoices directly.

Add `generateSingleHopHtlc`, a test helper that builds a single-hop
`UpdateAddHTLC` with a random preimage, intended for use in unit and
fuzz tests.
Add a no-op MailBox implementation and a no-op ticker for use in
the channelLink FSM fuzz harness.
Replace createChannelLinkWithPeer (which required a Switch and spawned the
htlcManager goroutine) with newFuzzLink, a minimal link factory that:

- accepts dependencies directly (registry, preimage cache, circuit map,
  bestHeight) instead of a mockServer, so no Switch or background goroutines
  are created at all
- sets link.upstream directly to a buffered channel controlled by the
  caller, bypassing the mailbox entirely
- attaches a mockMailBox so mailBox.ResetPackets() in resumeLink succeeds
@MPins
Copy link
Copy Markdown
Owner Author

MPins commented Apr 9, 2026

@Crypt-iQ when you have time, could you take a look?

@Crypt-iQ
Copy link
Copy Markdown

Crypt-iQ commented Apr 9, 2026

@Crypt-iQ when you have time, could you take a look?

Sure I will take a look

@MPins MPins force-pushed the link_fsm_fuzz branch 3 times, most recently from b28e3d8 to 5f570d5 Compare April 11, 2026 01:34
MPins added 5 commits April 11, 2026 13:36
Introduce `fuzz_link_test.go` with a model-based fuzzer that drives
the Alice-Bob channel link through arbitrary sequences of protocol
events and checks key invariants after each step.

fuzz_link_test
Introduce fuzzSigner and fuzzSigVerifier in the fuzz harness, along
with the SigVerifier hook in LightningChannel (WithSigVerifier,
verifySig) and a matching SigPool extension (VerifyFunc field) so the
harness can bypass secp256k1 verification end-to-end. Also refactors
createTestChannel to accept functional options (testChannelOpt) so
the signer and channel options can be injected from tests.
Introduce CommitKeyDeriverFunc and WithCommitKeyDeriver to allow
LightningChannel to bypass the secp256k1-based DeriveCommitmentKeys
on every commit round. All internal call sites are migrated to
lc.deriveCommitmentKeys. The fuzz harness injects fuzzCommitKeyDeriver,
a trivial identity deriver that avoids scalar-multiplication overhead.
createTestChannel started alicePool and bobPool but never stopped
them. During fuzzing this caused goroutines to leak per. Register
t.Cleanup handlers to call Stop() on both pools so all workers are
torn down when the test ends.
newMockRegistry started an InvoiceRegistry but never stopped it.
InvoiceRegistry internally starts two background goroutines —
invoiceEventLoop and the InvoiceExpiryWatcher mainLoop — that
run for the lifetime of the registry. Without a matching Stop()
call both goroutines leaked for every test that called
newMockRegistry, accumulating thousands of goroutines during
fuzzing.

Register a t.Cleanup to call registry.Stop() so both loops are
torn down when the test ends.
Copy link
Copy Markdown

@Crypt-iQ Crypt-iQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I took a (quick) look at the fuzz test because I've lost context and am short on time so I can't give a detailed line-by-line code-level review that I'd like to give. I skipped over the commits that stub out the signing stuff, seems good because signing could be a bottleneck. Using a RAM disk is a good idea since you want to avoid disk i/o. My main comment is that the fuzz harness could use more of the fuzz input. For example, when a new fee is being sent, the fee is calculated by newFee := len(f.aliceLink.channel.ActiveHtlcs())*100 + 1000. I think most of the messages should instead be constructed from the fuzz input (besides things like signatures that would cause obvious invalidity but even those you could sometimes make invalid). So instead of the fuzz input being a sequence of events, it would be a sequence of events + some byte slice that can be parsed as message fields. You could also just accept one byte blob as you do here, but parse it into events + message fields. Also, if possible, I think it'd be good to see if the link can start up after being stopped. There have been several bugs over the years where the link can't start up properly due to reestablishment (and I think the other work-in-progress link harness found an issue just like this). Finally, it would be a good idea to measure the coverage and see if there are any obvious blind spots for this fuzzer and then improve on those by adding extra events.


// Generate the ChannelReestablish messages that each side needs to
// receive in order to complete the sync handshake.
aliceSyncMsg, err := alice.channel.State().ChanSyncMsg()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could fuzz these

}

// Check total balances.
var aliceHtlcAmt, bobHtlcAmt lnwire.MilliSatoshi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to assert that alice or bob's balance is a certain expected value? this works, but I'm wondering if there's any way to detect funds loss

Comment thread htlcswitch/link_test.go
}

var preimage lntypes.Preimage
r, err := generateRandomBytes(sha256.Size)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-deterministic?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to use a corpus input with HTLC ID and sender ID to deterministically generate the preimage.

}

// Pick the oldest preimage Alice tracks and settle it on her
// link.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary - can choose randomly?

Copy link
Copy Markdown
Owner Author

@MPins MPins Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be even better to use own input to decide which HTLC to settle.


// Guard against excessively long inputs that would make the
// test run too long.
if len(data) > 250 {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump up? Sometimes very large inputs are interesting

// applyEvent dispatches a single fuzz-generated event to the FSM for either
// Alice or Bob. Events that cannot be applied in the current state are silently
// skipped so the fuzzer can keep making progress without failing the test.
func (f *fuzzFSM) applyEvent(e Event) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes, it may be worth it to send messages where no pre-checks are done. So send an HTLC where Bob hasn't created the Hold invoice, sending a settle for an HTLC that doesn't exist, etc.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I’ll look into more of those unexpected events.


type Event uint8

const (
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, would be good to have an event where the links restart and assert that they can still sync?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it’s part of the follow-up work.

@MPins
Copy link
Copy Markdown
Owner Author

MPins commented Apr 13, 2026

Thank you @Crypt-iQ for your time. I’ll be addressing the comments above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants