Skip to content

Conversation

@joostjager
Copy link
Contributor

@joostjager joostjager commented Jan 21, 2026

Introduce ldk-server-chaos, a testing harness that stress-tests ldk-server by running multiple nodes, opening channels, and continuously sending payments while randomly killing and restarting nodes.

This tool readily reproduces the long-existing channel monitor/manager desync issue that can occur when a node is forcefully terminated at an inopportune moment during channel state updates. By using SIGKILL at random intervals across 3 nodes while 20 concurrent payment loops are running, the harness creates exactly the conditions that trigger such desyncs without making assumptions about specific timing or failure scenarios.

The test can verify potential fixes in a robust way: if payments continue flowing successfully across thousands of kill/restart cycles, there's strong evidence the fix is working. Failure is detected when any payment direction times out (no success for 60 seconds), typically indicating a desync has rendered channels unusable.

@ldk-reviews-bot
Copy link

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

@joostjager joostjager changed the title Chaos Add chaos testing framework for ldk-server Jan 21, 2026
Adds a new endpoint to connect to a peer on the Lightning Network
without opening a channel. This is useful for establishing connections
before channel operations or for maintaining peer connectivity.

The endpoint accepts node_pubkey, address, and an optional persist flag
that defaults to true for automatic reconnection on restart.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduce ldk-server-chaos, a testing harness that stress-tests ldk-server
by running multiple nodes, opening channels, and continuously sending
payments while randomly killing and restarting nodes.

Features:
- Spawns 3 ldk-server nodes with auto-generated configs
- Creates a fully connected channel topology
- Runs concurrent payment loops between all node pairs
- Randomly kills and restarts nodes to test resilience
- Tracks payment success rates and detects timeout failures
- Uses bitcoind RPC for on-chain operations and block generation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reports successful payments/sec rate every 10 seconds along with
success count and percentage, helping monitor payment throughput
during chaos testing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@joostjager joostjager self-assigned this Jan 22, 2026
joostjager and others added 3 commits January 23, 2026 11:28
- Add trusted_peers_0conf config option to ldk-server
- Configure chaos test nodes to trust each other for 0-conf channels
- Open 100 channels from Node 0 to Node 1 and 100 from Node 1 to Node 2
- Add assertions to verify channel counts after opening
- Use hard exit on Ctrl+C for reliable termination

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add panic hook and Ctrl+C handler to kill all ldk-server processes on exit
- Open channels in batches of 4 to avoid LDK's MAX_UNFUNDED_CHANS_PER_PEER limit
- Unify channel opening loop for 0->1 and 1->2 directions
- Merge visible/usable channel waiting into single loop
- Increase node funding to 20 block rewards for more channels

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Allows running payment loops without node restarts for baseline testing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@joostjager
Copy link
Contributor Author

The chaos testing infrastructure was enhanced with zero-confirmation channel support, scaling to 200 channels across 3 nodes with batch processing to avoid LDK limits.

Cleanup handlers were added for proper process termination on exit.

A --no-chaos flag enables baseline testing without random restarts, and payment amounts were reduced to 1 msat to avoid chan depletion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants