Skip to content

Conversation

@joostjager
Copy link
Contributor

Update MonitorUpdatingPersister and MonitorUpdatingPersisterAsync to queue persist operations in memory instead of writing immediately to disk. The Persist trait methods now return ChannelMonitorUpdateStatus:: InProgress and the actual writes happen when flush() is called.

This fixes a race condition that could cause channel force closures: previously, if the node crashed after writing channel monitors but before writing the channel manager, the monitors would be ahead of the manager on restart. By deferring monitor writes until after the channel manager is persisted (via flush()), we ensure the manager is always at least as up-to-date as the monitors.

Key changes:

  • Add PendingWrite enum to represent queued write/remove operations
  • Add pending_writes queue to MonitorUpdatingPersisterAsyncInner
  • Add flush() to Persist trait and ChainMonitor
  • Update Persist impl to queue writes and return InProgress
  • Call flush() in background processor after channel manager persistence
  • Remove unused event_notifier from AsyncPersister

@ldk-reviews-bot
Copy link

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

use core::mem;
use core::ops::Deref;
use core::pin::{pin, Pin};
use core::pin::pin;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to do this without touching persist.rs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would we do it then? Queue in ChainMonitor, in KVStore, or somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChainMonitor would need to store the actual monitor data to defer writes, not just track update IDs as it does now. This means either cloning expensive ChannelMonitor objects or storing serialized bytes, which leaks persistence format details into ChainMonitor?

@joostjager joostjager self-assigned this Jan 15, 2026
@joostjager joostjager force-pushed the mon-barrier branch 2 times, most recently from 2b85294 to 93ff6c9 Compare January 16, 2026 09:35
Update MonitorUpdatingPersister and MonitorUpdatingPersisterAsync to
queue persist operations in memory instead of writing immediately to
disk. The Persist trait methods now return ChannelMonitorUpdateStatus::
InProgress and the actual writes happen when flush() is called.

This fixes a race condition that could cause channel force closures:
previously, if the node crashed after writing channel monitors but
before writing the channel manager, the monitors would be ahead of
the manager on restart. By deferring monitor writes until after the
channel manager is persisted (via flush()), we ensure the manager is
always at least as up-to-date as the monitors.

The flush() method takes an optional count parameter to flush only a
specific number of queued writes. The background processor captures
the queue size before persisting the channel manager, then flushes
exactly that many writes afterward. This prevents flushing monitor
updates that arrived after the manager state was captured.

Key changes:
- Add PendingWrite enum to represent queued write/remove operations
- Add pending_writes queue to MonitorUpdatingPersisterAsyncInner
- Add pending_write_count() and flush(count) to Persist trait and ChainMonitor
- ChainMonitor::flush() calls channel_monitor_updated for each completed write
- Update Persist impl to queue writes and return InProgress
- Call flush() in background processor after channel manager persistence
- Remove unused event_notifier from AsyncPersister

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants