-
Notifications
You must be signed in to change notification settings - Fork 182
fix: remove unique events logic from the ChainGetEvents API
#6286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughRemoved the IndicesStore and related column/impls, consolidated event retrieval to AMT-backed StampedEvent::get_events (preserving order and duplicates), updated ChainStore::new to drop the indices/db-writer parameter across call sites, added StateEvents and EVENTS_AMT_BITWIDTH, and added a v0.31.0 DB migration removing the Indices column. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
dda3266 to
a15e4e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/shim/executor.rs (1)
340-415: StampedEvent AMT loader looks good; watch serde wire format & small perf nitThe new
StampedEvent::get_eventshelper correctly handles both v4-first and v3-fallback AMT loading and preserves storage order, which aligns with the new state-manager tests. Two minor follow‑ups to consider:
#[serde(untagged)]onStampedEventchanges its serialized shape (no variant tag). Please double‑check that nothing persists or exchanges the old tagged representation; if there is such usage, a compatibility note or migration test would help.- If you expect large event AMTs, you can avoid reallocations by reserving
eventscapacity from the AMT’s element count beforefor_each.src/state_manager/tests.rs (1)
341-406: Cache tests are correct; consider aligning events_root construction with productionThe receipt/events cache tests behave correctly:
update_cache_with_state_outputonly uses the providedeventsvector and treatsevents_rootas an opaque identifier, so building the events root with Amtv0 is safe here. For future clarity (especially now that production writes events into a specific-bitwidth AMT), it may be worth switching this test’s events_root construction to the same AMT flavor/bitwidth used in the main code, or adding a comment noting that the root isn’t dereferenced in this test.src/state_manager/mod.rs (1)
99-105: StateEvents and cache update logic look coherent; consider asserting vector invariantsMaking
StateEventspublic and wiringupdate_cache_with_state_outputto insert both events and their roots into the receipt/event cache handler matches howtipset_state_eventsand the RPC layer consume this data. One implicit assumption is thatevents.len()andevents_roots.len()always stay equal; given other code zips these two vectors, it may be worth adding a debug/assert (either when creatingStateEventsor before caching) to catch any future mismatch early.Also applies to: 511-528
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/chain/store/chain_store.rs(1 hunks)src/rpc/methods/chain.rs(1 hunks)src/rpc/methods/eth/filter/mod.rs(1 hunks)src/shim/executor.rs(3 hunks)src/state_manager/mod.rs(4 hunks)src/state_manager/tests.rs(4 hunks)src/tool/subcommands/api_cmd/test_snapshots.txt(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-02T14:23:53.808Z
Learnt from: akaladarshi
Repo: ChainSafe/forest PR: 5923
File: src/tool/subcommands/api_cmd/test_snapshots.txt:206-252
Timestamp: 2025-09-02T14:23:53.808Z
Learning: In the Forest project, .rpcsnap.json.zst snapshot files listed in src/tool/subcommands/api_cmd/test_snapshots.txt are not stored in the repository itself but are downloaded from a DigitalOcean (DO) bucket when needed. The manifest file serves as an index/catalog of available snapshots.
Applied to files:
src/tool/subcommands/api_cmd/test_snapshots.txt
📚 Learning: 2025-08-25T14:11:10.790Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5978
File: .github/workflows/unit-tests.yml:86-96
Timestamp: 2025-08-25T14:11:10.790Z
Learning: hanabi1224's RPC snapshot tests include a checksum validation mechanism that compares against HTTP HEAD headers, automatically re-downloading snapshots if the cached version doesn't match, providing protection against corrupted cached files.
Applied to files:
src/tool/subcommands/api_cmd/test_snapshots.txt
📚 Learning: 2025-07-17T15:21:40.753Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5841
File: src/utils/get_size/mod.rs:10-10
Timestamp: 2025-07-17T15:21:40.753Z
Learning: The get_size2 crate's GetSize trait provides default implementations: get_stack_size() uses std::mem::size_of, get_heap_size() returns 0, and get_size() returns their sum. An empty impl like `impl GetSize for MyType {}` is valid and uses these defaults, making it suitable for simple wrapper types that don't allocate heap memory.
Applied to files:
src/shim/executor.rs
📚 Learning: 2025-08-08T12:11:55.266Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5867
File: src/ipld/util.rs:461-487
Timestamp: 2025-08-08T12:11:55.266Z
Learning: Forest (src/ipld/util.rs, Rust): In UnorderedChainStream::poll_next, dropping `extract_sender` (when no more tipsets and the extract queue is empty) is the intended shutdown signal for workers. Any subsequent attempt to enqueue work after this drop is a logic error and should be treated as an error; do not change `send()` to ignore a missing sender.
Applied to files:
src/rpc/methods/chain.rs
📚 Learning: 2025-09-11T16:03:14.328Z
Learnt from: akaladarshi
Repo: ChainSafe/forest PR: 6000
File: src/tool/subcommands/api_cmd/state_decode_params_tests/miner.rs:4-9
Timestamp: 2025-09-11T16:03:14.328Z
Learning: In the Forest codebase's state_decode_params_tests module, common imports like `to_vec`, `TokenAmount`, `Cid`, etc. are centralized in the mod.rs file and made available to submodules through `use super::*;`, eliminating the need for duplicate imports in individual test files.
Applied to files:
src/state_manager/tests.rssrc/state_manager/mod.rs
🧬 Code graph analysis (2)
src/chain/store/chain_store.rs (2)
src/shim/executor.rs (1)
key(305-310)src/blocks/tipset.rs (2)
key(337-340)key(512-515)
src/state_manager/tests.rs (1)
src/shim/executor.rs (4)
events_root(180-186)event(383-388)events(118-124)get_events(392-414)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build MacOS
- GitHub Check: Build Ubuntu
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: All lint checks
🔇 Additional comments (7)
src/tool/subcommands/api_cmd/test_snapshots.txt (1)
7-7: The snapshot filefilecoin_chaingetevents_1764864316078100.rpcsnap.json.zstis correctly listed in the manifest and is already available in the DigitalOcean bucket (https://forest-snapshots.fra1.cdn.digitaloceanspaces.com/rpc_test/). The change is valid and the test suite will be able to download this snapshot at runtime.src/rpc/methods/eth/filter/mod.rs (1)
379-395: Preserving AMT event duplicates and order is correctThe updated
collect_chain_eventsflow (filter by root, then flatten without dedup) now faithfully reflects the AMT contents, including duplicates and order, which is consistent with the new events tests and the API goal of returning all emitted events.src/chain/store/chain_store.rs (1)
201-208: Renamed tipset-key lookup clarifies events-root usageThe new
get_tipset_key_by_events_rootwrapper keeps the previous behavior while making the intent (lookup by events root) explicit, which matches its use fromChainGetEvents.src/rpc/methods/chain.rs (1)
254-263: ChainGetEvents now correctly resolves via events-root indexSwitching to
get_tipset_key_by_events_rootand updating the error message to reference the tipset for the given events root aligns the RPC with the new indexing logic; the subsequent tipset load and event collection remain consistent.src/state_manager/mod.rs (3)
471-595: Event-root indexing and blockstore fallback path are sensible; confirm all relevant tipsets pass through hereThe new logic in
tipset_state_outputandtipset_state_eventsgives a clear pipeline:
- When computing state, you index each non‑
Noneevents_root to itsTipsetKey.- When serving events, you first try the cache, then load from the events AMT in the blockstore (when an
events_rootis provided), and only fall back to recomputing state if that fails.This is a good layering and matches how
ChainGetEventsnow resolves root → tipset. One thing to verify is that, for all tipsets whose events you expect clients to query by root, there is at least one code path that callstipset_state_output(and thus writes these indices). If some validation or background paths only ever callapply_block_messagesdirectly without going throughtipset_state_output, their events_root mappings won’t be written and ChainGetEvents for those particular roots will correctly, but perhaps unexpectedly, return the “can't find tipset for events root …” error.
1898-2034: Receipt and events AMT construction/verification in apply_block_messages is well-structuredWithin
apply_block_messages, constructing the receipts root from the VM‑returned receipts and then, separately, materializing each non‑empty events AMT and asserting that its derived root matches the VM‑provided root gives you:
- A receipts root consistent with what
Receipt::get_receiptsexpects to read.- Events AMTs that are actually persisted, with an explicit equality check guarding against any divergence between how the VM computes roots and how the node rebuilds them (including bitwidth configuration).
The error path on an events‑root mismatch is clear and will surface early during tipset execution, which is appropriate for a consensus‑sensitive invariant.
1769-1819: validate_tipsets now bails cleanly on state mismatchSwitching
validate_tipsetsto usebail!("state mismatch")when the expected and actual state/receipt roots differ simplifies the error path without changing behavior, and the surrounding logging still records all the relevant details for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/state_manager/tests.rs (1)
437-625: Event AMT tests comprehensively cover duplicates, order, empties, and v3/v4 compatibilityThe new helpers and tests look solid:
create_raw_event_v3/v4generate deterministic RAW IPLD events, which is ideal for CID equality checks.test_events_store_and_retrieve_basic,test_events_entries_are_preserved_when_duplicates_are_stored, andtest_events_preserve_orderdirectly exerciseStampedEvent::get_eventsand confirm the “no dedup, preserve insertion order” behavior that this PR is aiming for.test_events_same_content_same_cidandtest_identical_events_produce_same_roottogether validate that identical event sequences yield identical AMT roots and remain readable throughget_events.test_events_empty_listandtest_events_v3_store_and_retrievenicely cover the empty‑root and v3‑compat paths.One small, purely optional nit:
test_events_same_content_same_cidandtest_identical_events_produce_same_rootoverlap quite a bit; you could fold them into a single test that asserts both CID equality and successful retrieval to reduce duplication, but it's not required.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
scripts/tests/api_compare/filter-list(0 hunks)scripts/tests/api_compare/filter-list-offline(0 hunks)src/state_manager/tests.rs(4 hunks)
💤 Files with no reviewable changes (2)
- scripts/tests/api_compare/filter-list-offline
- scripts/tests/api_compare/filter-list
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-11T16:03:14.328Z
Learnt from: akaladarshi
Repo: ChainSafe/forest PR: 6000
File: src/tool/subcommands/api_cmd/state_decode_params_tests/miner.rs:4-9
Timestamp: 2025-09-11T16:03:14.328Z
Learning: In the Forest codebase's state_decode_params_tests module, common imports like `to_vec`, `TokenAmount`, `Cid`, etc. are centralized in the mod.rs file and made available to submodules through `use super::*;`, eliminating the need for duplicate imports in individual test files.
Applied to files:
src/state_manager/tests.rs
🧬 Code graph analysis (1)
src/state_manager/tests.rs (1)
src/shim/executor.rs (4)
events_root(180-186)event(383-388)events(118-124)get_events(392-414)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: All lint checks
🔇 Additional comments (2)
src/state_manager/tests.rs (2)
14-14: Importing bothAmtandAmtv0keeps tests aligned with runtime AMT variantsUsing
Amtfor the newEVENTS_AMT_BITWIDTH‑backed event AMTs andAmtv0for legacy roots (receipts / earlier tests) makes the intent explicit and avoids alias confusion; usage below is consistent with this split. No changes needed here, just noting the clarity.
351-352:Amtv0::new_from_itercorrectly mirrors v0‑formatted roots in cache testsSwitching both the mock
events_rootandreceipt_rootconstruction toAmtv0::new_from_iterensures these tests only pass when the blockstore actually contains a valid v0 AMT, matching how roots are stored/loaded in the state manager. This gives the cache tests more realistic coverage without overcomplicating them.Also applies to: 388-389
b1ae8f4 to
ad77eee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/state_manager/mod.rs (2)
511-528: tipset_state_events fast path currently poisons the events cache with partial dataWhen
events_rootis provided, the new logic:
- Checks the cache via
get_events_or_else.- Inside the supplier closure, tries to load only the events for that root via
StampedEvent::get_events, and if found, returns aStateEventscontaining a singleVec<StampedEvent>and a singleSome(root).Because this
StateEventsis then cached under the tipset key, later callers that expect a full tipset view (e.g.,collect_eventswithevents_root = None) will:
- Pull this truncated
StateEventsfrom the cache.- Hit the new length check in
collect_events(messages.len() == events.len()), and fail with a length mismatch.This can manifest whenever
Filecoin.ChainGetEventsis called for a tipset before any filter-based event queries.A safer pattern would be:
- Use the blockstore fast path outside of
get_events_or_elseand do not insert that partialStateEventsinto the cache, or- Restrict the cache to always store the full per-tipset
events/rootsproduced bycompute_tipset_state, regardless of whether anevents_roothint is present.For example, you could restructure
tipset_state_eventsroughly as:@@ - let key = tipset.key(); - let ts = tipset.clone(); - let this = Arc::clone(self); - let cids = tipset.cids(); - let events_root = events_root.cloned(); - self.receipt_event_cache_handler - .get_events_or_else( - key, - Box::new(move || { - Box::pin(async move { - // Try to load events directly from the blockstore - if let Some(stamped_events) = events_root - .as_ref() - .and_then(|root| StampedEvent::get_events(this.blockstore(), root).ok()) - .filter(|events| !events.is_empty()) - { - return Ok(StateEvents { - events: vec![stamped_events], - roots: vec![events_root], - }); - } - - // Fallback: compute the tipset state if events not found in the blockstore - let state_out = this - .compute_tipset_state(ts, NO_CALLBACK, VMTrace::NotTraced) - .await?; - trace!("Completed tipset state calculation {:?}", cids); - Ok(StateEvents { - events: state_out.events, - roots: state_out.events_roots, - }) - }) - }), - ) - .await + let events_root = events_root.cloned(); + + // Fast path: if a specific events root is requested and the AMT exists, + // serve it directly without mutating the per‑tipset cache. + if let Some(root) = events_root.as_ref() { + if let Ok(stamped_events) = StampedEvent::get_events(self.blockstore(), root) { + if !stamped_events.is_empty() { + return Ok(StateEvents { + events: vec![stamped_events], + roots: vec![events_root], + }); + } + } + } + + let key = tipset.key(); + let ts = tipset.clone(); + let this = Arc::clone(self); + let cids = tipset.cids(); + self.receipt_event_cache_handler + .get_events_or_else( + key, + Box::new(move || { + Box::pin(async move { + let state_out = this + .compute_tipset_state(ts, NO_CALLBACK, VMTrace::NotTraced) + .await?; + trace!("Completed tipset state calculation {:?}", cids); + Ok(StateEvents { + events: state_out.events, + roots: state_out.events_roots, + }) + }) + }), + ) + .awaitThis keeps the cache invariant (“cached StateEvents represent the full tipset”) while still leveraging the AMT-backed fast path for
ChainGetEvents.Also applies to: 555-595
199-205: Verify that event index insertion is not bypassed in future state computation paths
tipset_state_outputindexes events roots viaput_indexafter computing state, but this indexing only occurs in two paths:
- Main RPC path via
tipset_state_output(line 497)- Backfill utility via
process_tsin db_util.rsDirect calls to
compute_tipset_stateorapply_block_messagesoutside these two paths—such as inexecution_trace, validation, or tool commands—do not trigger indexing. While current production paths (sync viatipset_syncer, RPC viatipset_state) route throughtipset_state_outputproperly, future code that calls these functions directly risks creating persisted state without index entries, breakingChainGetEventsresolution.Consider either consolidating all state computations through
tipset_state_output, or moving theput_indexcall to the point where events roots are produced (e.g., inapply_block_messagesaftervm.flush()), to guarantee indexing regardless of caller.
🧹 Nitpick comments (3)
src/rpc/methods/eth/filter/mod.rs (1)
281-291: Stricter message/events length check is appropriateThe added
ensure!with concrete lengths will fail fast instead of silently zipping mismatchedmessagesandevents, which is much safer for debugging and prevents misaligned event attribution.src/state_manager/tests.rs (1)
437-625: Event AMT tests give good coverage of storage, order, duplicates, and v3/v4 handlingThe new helpers (
create_raw_event_v3/v4) and tests comprehensively cover:
- Basic store/retrieve via
EVENTS_AMT_BITWIDTHandStampedEvent::get_events.- Duplicate events (consecutive and non‑consecutive) preserving count and position.
- Strict order preservation across multiple entries.
- Identical content producing identical root CIDs.
- Empty-event lists round‑tripping correctly.
- V3-encoded events still being readable through
StampedEvent::get_events.The slight overlap between
test_events_same_content_same_cidandtest_identical_events_produce_same_rootis acceptable and keeps intent clear.src/state_manager/mod.rs (1)
1899-2035: Receipt and events AMT construction in apply_block_messages is consistent and well-guardedThe updated
apply_block_messagespipeline:
- Uses
Amtv0::new_from_iterto buildreceipt_rootfrom the receipts vector, aligning with the tests that now construct receipts usingAmtv0.- Iterates
(events, events_roots)and, when anevents_rootis present, reconstructs the AMT viaAmt::new_from_iter_with_bit_widthandEVENTS_AMT_BITWIDTH, thenensure!s that the derived root matches the FVM-provided root.That
ensure!gives a clear, early failure if the bitwidth or serialization diverge from what the VM produced, which is valuable for catching subtle incompatibilities at state-application time.As long as
EVENTS_AMT_BITWIDTHstays coordinated with the bitwidth used when events roots are produced, this looks sound.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/chain/store/chain_store.rs(1 hunks)src/rpc/methods/chain.rs(1 hunks)src/rpc/methods/eth/filter/mod.rs(3 hunks)src/shim/executor.rs(3 hunks)src/state_manager/mod.rs(4 hunks)src/state_manager/tests.rs(4 hunks)src/tool/subcommands/api_cmd/test_snapshots.txt(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/tool/subcommands/api_cmd/test_snapshots.txt
- src/shim/executor.rs
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: akaladarshi
Repo: ChainSafe/forest PR: 5923
File: src/rpc/registry/actors/miner.rs:221-223
Timestamp: 2025-09-02T10:05:34.350Z
Learning: For miner actor ChangeOwnerAddress and ChangeOwnerAddressExported methods: versions 8-10 use bare Address as parameter type, while versions 11+ use ChangeOwnerAddressParams. This reflects the evolution of the Filecoin miner actor parameter structures across versions.
📚 Learning: 2025-09-11T16:03:14.328Z
Learnt from: akaladarshi
Repo: ChainSafe/forest PR: 6000
File: src/tool/subcommands/api_cmd/state_decode_params_tests/miner.rs:4-9
Timestamp: 2025-09-11T16:03:14.328Z
Learning: In the Forest codebase's state_decode_params_tests module, common imports like `to_vec`, `TokenAmount`, `Cid`, etc. are centralized in the mod.rs file and made available to submodules through `use super::*;`, eliminating the need for duplicate imports in individual test files.
Applied to files:
src/state_manager/tests.rssrc/state_manager/mod.rs
📚 Learning: 2025-08-08T12:11:55.266Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5867
File: src/ipld/util.rs:461-487
Timestamp: 2025-08-08T12:11:55.266Z
Learning: Forest (src/ipld/util.rs, Rust): In UnorderedChainStream::poll_next, dropping `extract_sender` (when no more tipsets and the extract queue is empty) is the intended shutdown signal for workers. Any subsequent attempt to enqueue work after this drop is a logic error and should be treated as an error; do not change `send()` to ignore a missing sender.
Applied to files:
src/rpc/methods/chain.rs
🧬 Code graph analysis (2)
src/chain/store/chain_store.rs (2)
src/shim/executor.rs (1)
key(305-310)src/blocks/tipset.rs (2)
key(337-340)key(512-515)
src/state_manager/mod.rs (2)
src/shim/executor.rs (5)
events_root(180-186)key(305-310)new(254-266)get_events(392-414)events(118-124)src/state_manager/cache.rs (3)
get_events(152-152)get_events(206-208)get_events(263-265)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests-release
- GitHub Check: tests
🔇 Additional comments (5)
src/chain/store/chain_store.rs (1)
201-208: Method rename to clarify events-root index usage looks goodRenaming to
get_tipset_key_by_events_rootbetter reflects how the indices store is used here; behavior remains a straight passthrough toindices.read_objand is consistent with the ChainGetEvents call site.src/rpc/methods/chain.rs (1)
248-265: ChainGetEvents now resolves tipset explicitly via events-root indexSwitching to
get_tipset_key_by_events_rootwith the updated error context accurately describes the failure mode while keeping the rest of the flow (loading the tipset and collecting events) unchanged.src/rpc/methods/eth/filter/mod.rs (1)
369-403: ChainGetEvents now preserves duplicate events and order as stored in AMTThe new
ensure!onstate_events.roots.len() == state_events.events.len()plus removal of any deduplication ensures:
- You only proceed when roots and event vectors are structurally consistent.
- All events under the selected root (including intentional duplicates) are returned in AMT order.
This aligns with the stated goal of matching Lotus’ behavior for
ChainGetEvents.src/state_manager/tests.rs (1)
340-407: Receipt/events cache tests align with new AMTv0 usageUsing
Amtv0::new_from_iterto buildreceipt_rootand the mockevents_rootmatches the production change to useAmtv0for receipts while treating event roots as opaqueCids in the cache. The tests correctly assert that empty events do not populate the cache and that a valid AMT-backedreceipt_rootyields one cached receipt.src/state_manager/mod.rs (1)
93-105: StateEvents layout and sizing helpers look consistentMaking
StateEventspublic withevents: Vec<Vec<StampedEvent>>androots: Vec<Option<Cid>>, plus theGetSizeannotations, matches how it’s consumed by the cache layer and RPC code. The explicit heap-size helpers for nested vectors should keep cache accounting accurate.
51357fe to
35ac348
Compare
|
DB migration check in CI is failing because I've removed the I was thinking, if I can get some comments on the PR, to finalise before the migration, it would be helpful 🙏 @hanabi1224 @LesnyRumcajs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/state_manager/mod.rs (1)
1985-2006: Consider adding a length check before the zip operation.The
zipat line 1989 will silently truncate ifeventsandevents_rootshave different lengths. Whileapply_block_messagesshould return consistent lengths, adding an explicit check would catch bugs early.// step 5: construct receipt root from receipts let receipt_root = Amtv0::new_from_iter(chain_index.db(), receipts)?; // step 6: store events AMTs in the blockstore + ensure!( + events.len() == events_roots.len(), + "Events count ({}) and events_roots count ({}) mismatch", + events.len(), + events_roots.len() + ); for (msg_events, events_root) in events.iter().zip(events_roots.iter()) {src/rpc/methods/eth/filter/mod.rs (1)
368-383: LGTM onget_events_by_event_rootimplementation.The function correctly loads events via
StampedEvent::get_eventsfrom the blockstore without deduplication, aligning with Lotus behavior. The error message at line 377 provides useful context.Consider simplifying the error handling pattern:
- let state_events = - match StampedEvent::get_events(ctx.chain_store().blockstore(), events_root) { - Ok(e) => e, - Err(e) => { - return Err(anyhow::anyhow!("load events amt: {}", e)); - } - }; + let state_events = StampedEvent::get_events(ctx.chain_store().blockstore(), events_root) + .context("load events amt")?;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
src/benchmark_private/tipset_validation.rs(0 hunks)src/chain/store/chain_store.rs(3 hunks)src/chain_sync/chain_follower.rs(0 hunks)src/daemon/context.rs(0 hunks)src/daemon/db_util.rs(1 hunks)src/db/car/many.rs(1 hunks)src/db/memory.rs(1 hunks)src/db/mod.rs(0 hunks)src/db/parity_db.rs(1 hunks)src/libp2p/chain_exchange/provider.rs(0 hunks)src/rpc/methods/chain.rs(1 hunks)src/rpc/methods/eth/filter/mod.rs(3 hunks)src/rpc/methods/sync.rs(1 hunks)src/state_manager/mod.rs(4 hunks)src/state_manager/tests.rs(4 hunks)src/tool/offline_server/server.rs(0 hunks)src/tool/subcommands/api_cmd/api_compare_tests.rs(0 hunks)src/tool/subcommands/api_cmd/generate_test_snapshot.rs(1 hunks)src/tool/subcommands/api_cmd/test_snapshot.rs(2 hunks)src/tool/subcommands/api_cmd/test_snapshots.txt(1 hunks)src/tool/subcommands/index_cmd.rs(0 hunks)src/tool/subcommands/state_compute_cmd.rs(0 hunks)
💤 Files with no reviewable changes (9)
- src/tool/offline_server/server.rs
- src/libp2p/chain_exchange/provider.rs
- src/tool/subcommands/state_compute_cmd.rs
- src/daemon/context.rs
- src/db/mod.rs
- src/tool/subcommands/index_cmd.rs
- src/benchmark_private/tipset_validation.rs
- src/tool/subcommands/api_cmd/api_compare_tests.rs
- src/chain_sync/chain_follower.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- src/tool/subcommands/api_cmd/test_snapshots.txt
- src/state_manager/tests.rs
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-10-17T14:24:47.046Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6167
File: src/tool/subcommands/state_compute_cmd.rs:89-91
Timestamp: 2025-10-17T14:24:47.046Z
Learning: In `src/tool/subcommands/state_compute_cmd.rs`, when using `ReadOpsTrackingStore` to generate minimal snapshots, `HEAD_KEY` should be written to `db.tracker` (not `db` itself) before calling `export_forest_car()`, because the export reads from the tracker MemoryDB which accumulates only the accessed data during computation.
Applied to files:
src/db/memory.rssrc/daemon/db_util.rssrc/tool/subcommands/api_cmd/generate_test_snapshot.rssrc/tool/subcommands/api_cmd/test_snapshot.rs
📚 Learning: 2025-09-11T16:03:14.328Z
Learnt from: akaladarshi
Repo: ChainSafe/forest PR: 6000
File: src/tool/subcommands/api_cmd/state_decode_params_tests/miner.rs:4-9
Timestamp: 2025-09-11T16:03:14.328Z
Learning: In the Forest codebase's state_decode_params_tests module, common imports like `to_vec`, `TokenAmount`, `Cid`, etc. are centralized in the mod.rs file and made available to submodules through `use super::*;`, eliminating the need for duplicate imports in individual test files.
Applied to files:
src/state_manager/mod.rssrc/db/car/many.rssrc/tool/subcommands/api_cmd/generate_test_snapshot.rs
📚 Learning: 2025-08-08T12:11:55.266Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5867
File: src/ipld/util.rs:461-487
Timestamp: 2025-08-08T12:11:55.266Z
Learning: Forest (src/ipld/util.rs, Rust): In UnorderedChainStream::poll_next, dropping `extract_sender` (when no more tipsets and the extract queue is empty) is the intended shutdown signal for workers. Any subsequent attempt to enqueue work after this drop is a logic error and should be treated as an error; do not change `send()` to ignore a missing sender.
Applied to files:
src/rpc/methods/chain.rssrc/daemon/db_util.rssrc/rpc/methods/eth/filter/mod.rssrc/tool/subcommands/api_cmd/test_snapshot.rs
📚 Learning: 2025-08-25T13:35:24.230Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5969
File: src/tool/subcommands/snapshot_cmd.rs:412-412
Timestamp: 2025-08-25T13:35:24.230Z
Learning: In src/tool/subcommands/snapshot_cmd.rs, the +1 in `last_epoch = ts.epoch() - epochs as i64 + 1` fixes an off-by-1 bug where specifying --check-stateroots=N would validate N+1 epochs instead of N epochs, causing out-of-bounds errors when the snapshot contains only N recent state roots.
Applied to files:
src/tool/subcommands/api_cmd/test_snapshot.rs
📚 Learning: 2025-08-11T14:00:47.060Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5886
File: src/daemon/db_util.rs:236-259
Timestamp: 2025-08-11T14:00:47.060Z
Learning: In Forest's snapshot import (`src/daemon/db_util.rs`), when F3 CID is present in snapshot metadata but the actual F3 data is missing, this should cause a hard error as it indicates snapshot corruption. Only errors during the F3 import process itself (after data is successfully retrieved) should be non-fatal and logged.
Applied to files:
src/tool/subcommands/api_cmd/test_snapshot.rs
🧬 Code graph analysis (8)
src/db/memory.rs (3)
src/chain/store/chain_store.rs (1)
db(503-503)src/chain/store/index.rs (1)
db(48-50)src/rpc/methods/state.rs (1)
db(1443-1444)
src/rpc/methods/chain.rs (3)
src/rpc/methods/eth/filter/mod.rs (1)
get_events_by_event_root(369-383)src/tool/subcommands/api_cmd/generate_test_snapshot.rs (1)
ctx(99-151)src/tool/subcommands/api_cmd/test_snapshot.rs (1)
ctx(122-169)
src/db/car/many.rs (3)
src/db/memory.rs (2)
SettingsStoreExt(32-32)SettingsStoreExt(156-156)src/db/parity_db.rs (1)
SettingsStoreExt(177-177)src/chain/mod.rs (1)
SettingsStoreExt(45-45)
src/rpc/methods/sync.rs (1)
src/chain/store/chain_store.rs (4)
new(117-142)new(567-571)db(503-503)chain_config(247-249)
src/tool/subcommands/api_cmd/generate_test_snapshot.rs (5)
src/db/car/many.rs (1)
SettingsStoreExt(256-256)src/db/memory.rs (2)
SettingsStoreExt(32-32)SettingsStoreExt(156-156)src/db/parity_db.rs (1)
SettingsStoreExt(177-177)src/chain/mod.rs (1)
SettingsStoreExt(45-45)src/db/mod.rs (1)
open_db(283-285)
src/chain/store/chain_store.rs (3)
src/chain/store/index.rs (2)
db(48-50)new(42-46)src/state_manager/mod.rs (2)
new(181-183)chain_config(359-361)src/daemon/context.rs (1)
chain_config(65-67)
src/rpc/methods/eth/filter/mod.rs (1)
src/shim/executor.rs (3)
events(118-124)events_root(180-186)get_events(392-414)
src/tool/subcommands/api_cmd/test_snapshot.rs (1)
src/chain/store/chain_store.rs (6)
db(503-503)eth_mappings(180-182)eth_mappings(196-198)new(117-142)new(567-571)chain_config(247-249)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests-release
- GitHub Check: tests
🔇 Additional comments (25)
src/db/car/many.rs (1)
14-15: LGTM - Import cleanup aligns with IndicesStore removal.The consolidated import statement correctly removes
IndicesStorewhile retaining all other necessary traits forManyCarfunctionality.src/db/memory.rs (1)
6-6: LGTM - Clean removal of IndicesStore import.The import change correctly reflects the removal of the
IndicesStoretrait implementation fromMemoryDB.src/chain/store/chain_store.rs (3)
9-9: LGTM - Import cleanup consistent with IndicesStore removal.The import correctly removes
IndicesStoreandIndicesStoreExtwhile retainingEthMappingsStoreandEthMappingsStoreExtwhich are still used for Ethereum mapping functionality.
743-744: Test correctly updated for new ChainStore::new signature.The test now passes 5 arguments matching the updated constructor:
(db, heaviest_tipset_key_provider, eth_mappings, chain_config, genesis_block_header).
758-758: LGTM - Consistent with other test updates.src/rpc/methods/sync.rs (1)
189-191: LGTM - Test correctly updated for new ChainStore::new signature.The
ctx()function now constructsChainStorewith the updated 5-argument signature, consistent with the removal of theIndicesStoreparameter.src/tool/subcommands/api_cmd/test_snapshot.rs (2)
55-64: Simplified backfill logic using let-chains.The refactored function cleanly chains the conditionals. Note that
try_write()will silently skip the backfill if the lock is already held elsewhere. This is likely acceptable for test scenarios but worth being aware of.
134-140: Improved error handling with?operator.Good change - using
?instead ofunwrap()properly propagates errors through theanyhow::Resultreturn type.src/daemon/db_util.rs (1)
332-334: Event root indexing removed from tipset processing.The
compute_tipset_stateresult is no longer unpacked to extract and indexevents_roots. This aligns with the PR objective of serving events directly from the blockstore rather than maintaining a separate index. Events are now stored as part of state computation and retrieved viaStampedEvent::get_eventswhen needed.src/rpc/methods/chain.rs (2)
254-256: Simplified event retrieval aligned with Lotus behavior.The handler now directly fetches events from the blockstore via
EthEventHandler::get_events_by_event_root. This removes the previous tipset-resolution flow and duplicate-event logic that caused issue #6271. If events aren't present in the blockstore, the call will error out with an AMT load error, matching Lotus's behavior.
1610-1618: Test helper correctly updated for new ChainStore signature.The
_loadhelper now constructsChainStorewith the updated 5-argument signature, using separateMemoryDBinstances forheaviest_tipset_key_providerandeth_mappings.src/tool/subcommands/api_cmd/generate_test_snapshot.rs (3)
13-14: LGTM on import updates.The imports are correctly updated to remove
IndicesStorewhile retaining the necessaryEthMappingsStore,SettingsStore, and related types for the tracking store functionality.
83-97: LGTM on simplifiedbuild_indexfunction.The function now only builds the
eth_mappingsindex from the tracker, which aligns with the removal of the indices store column. The logic correctly returnsNonewhen no mappings are present.
112-121: VerifyChainStore::newsignature consistency.The constructor takes 5 parameters:
db(Arc),heaviest_tipset_key_provider(Arc<dyn HeaviestTipsetKeyProvider + Sync + Send>),eth_mappings(Arc<dyn EthMappingsStore + Sync + Send>),chain_config, andgenesis_block_header. All call sites consistently pass the same db instance for the first three parameters, which is valid since the concrete db type implements all required traits.src/db/parity_db.rs (3)
4-4: LGTM on import simplification.The
IndicesStoreimport is correctly removed, leaving only the necessary store traits.
23-42: LGTM onDbColumnenum update.The
Indicesvariant is cleanly removed. The remaining columns (GraphDagCborBlake2b256,GraphFull,Settings,EthMappings,PersistentGraph) are properly ordered and documented.
44-81: LGTM oncreate_column_optionsupdate.The match arms correctly handle all remaining column variants with appropriate options (preimage, btree_index, compression settings).
src/state_manager/mod.rs (6)
64-64: LGTM on import update.Adding
ensurefromanyhowis appropriate for the new validation logic in step 6.
99-105: LGTM onStateEventsstruct.The struct appropriately holds paired
eventsandrootsvectors with proper derives for debugging, cloning, and memory tracking.
550-576: LGTM ontipset_state_eventsfallback logic.The method now properly falls back to computing the tipset state when events are not found in cache, returning a
StateEventsstruct with both events and roots.
1992-2003: Good defensive check for AMT root consistency.The
ensure!macro correctly validates that the derived event root matches the FVM-computed root. This is essential for detecting any discrepancies in event serialization or AMT construction.
94-94: EVENTS_AMT_BITWIDTH value is verified against FVM at runtime.The bitwidth of 5 is already validated through the equality check at lines 2000-2001, where the derived event root is compared against the FVM-computed root. If this constant were misaligned with FVM's event AMT construction, the
ensure!()check would fail. No additional verification needed.
71-71: Confirm the intentionality of usingAmtv0for receipt roots andAmtfor event roots.The code uses
Amtv0::new_from_iterfor receipt construction andAmt::new_from_iter_with_bit_widthfor events. WhileAmtv0is documented as the legacy AMT encoding (for backward compatibility) andAmtis the current encoding, the codebase lacks explicit documentation explaining why receipts specifically requireAmtv0. Consider adding a comment referencing the Lotus behavior or actor specification that mandates this version distinction.src/rpc/methods/eth/filter/mod.rs (2)
40-40: LGTM onStampedEventimport.The import is correctly added to support the new
get_events_by_event_rootimplementation that loads events directly viaStampedEvent::get_events.
283-290: LGTM on improved error messages.The error message now includes the concrete lengths for better debugging when there's a mismatch between messages and events.
Before moving forward without the indices column, I'd like to first see some numbers showing the performance impact for affected methods. |
| match StampedEvent::get_events(ctx.chain_store().blockstore(), events_root) { | ||
| Ok(e) => e, | ||
| Err(e) => { | ||
| return Err(anyhow::anyhow!("load events amt: {}", e)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| return Err(anyhow::anyhow!("load events amt: {}", e)); | |
| bail!("load events amt: {e}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, probably let state_events = bla.context("failed getting events")?; would make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current error doesn't make much sense but I was trying to match lotus API level error , hence this
| ) -> anyhow::Result<Vec<StampedEvent>> { | ||
| let mut events = Vec::new(); | ||
|
|
||
| // Try StampedEvent_v4 first (StampedEvent_v4 and StampedEvent_v3 are identical, use v4 here) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there other versions as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, for events there is only two versions , V3 and V4,
Line 342 in f4008dd
| pub enum StampedEvent { |
src/shim/executor.rs
Outdated
|
|
||
| // Try StampedEvent_v4 first (StampedEvent_v4 and StampedEvent_v3 are identical, use v4 here) | ||
| if let Ok(amt) = Amt::<StampedEvent_v4, _>::load(events_root, db) { | ||
| amt.for_each(|_, event| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for_each_cacheless is a better option yielding faster iteration if the AMT is not iterated multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| // step 5: construct receipt root from receipts and flush the state-tree | ||
| let receipt_root = Amt::new_from_iter(chain_index.db(), receipts)?; | ||
| // step 5: construct receipt root from receipts | ||
| let receipt_root = Amtv0::new_from_iter(chain_index.db(), receipts)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why v0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier it was V0 only, but we renamed it to Amt .
It was being used like this
use fil_actors_shared::fvm_ipld_amt::Amtv0 as Amt;
| // step 5: construct receipt root from receipts | ||
| let receipt_root = Amtv0::new_from_iter(chain_index.db(), receipts)?; | ||
|
|
||
| // step 6: store events AMTs in the blockstore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there corresponding Lotus logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah , lotus has similar logic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/shim/executor.rs (1)
390-414:get_eventsimplementation matches goals (v4‑first, v3 fallback, ordered, no dedup)The new
StampedEvent::get_eventscorrectly:
- Loads from the events AMT root via
Amt<StampedEvent_v4>first, then cleanly falls back toAmt<StampedEvent_v3>on error.- Uses
for_each_cacheless, which is appropriate for single‑pass iteration and preserves AMT index order, so events are returned in the stored order with duplicates intact (aligning with the new ChainGetEvents behavior).If you expect very large event sets, you could optionally pre‑allocate
eventswithevents.reserve(amt.count() as usize)before iteration for a minor perf win, but that’s not required for correctness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/shim/executor.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-17T15:21:40.753Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5841
File: src/utils/get_size/mod.rs:10-10
Timestamp: 2025-07-17T15:21:40.753Z
Learning: The get_size2 crate's GetSize trait provides default implementations: get_stack_size() uses std::mem::size_of, get_heap_size() returns 0, and get_size() returns their sum. An empty impl like `impl GetSize for MyType {}` is valid and uses these defaults, making it suitable for simple wrapper types that don't allocate heap memory.
Applied to files:
src/shim/executor.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: tests-release
- GitHub Check: cargo-publish-dry-run
- GitHub Check: tests
- GitHub Check: All lint checks
- GitHub Check: Build Ubuntu
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: Build MacOS
🔇 Additional comments (2)
src/shim/executor.rs (2)
10-10: Amt import aligns with new events AMT usagePulling in
AmtalongsideAmtv0is consistent with using the new AMT API for events while keeping receipts onAmtv0. Please just double‑check that the version ofAmthere matches how events are actually stored and how Lotus reads them, so ChainGetEvents parity is guaranteed end‑to‑end.
340-346: UntaggedStampedEventserialization looks appropriate for RPC parityAdding
#[serde(untagged)]toStampedEventmatches howReceiptis handled and should make the serialized shape match the underlying FVM event types (and Lotus) instead of introducing an enum wrapper. Since we only deriveSerialize, this shouldn’t introduce deserialization ambiguity. Please confirm that no external clients relied on the previously tagged JSON shape ofStampedEvent(if it was ever exposed directly).
…vailable in cache
1dd279f to
d3f0c50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/daemon/db_util.rs (1)
371-378: Stale documentation: Events root → TipsetKey mapping no longer exists.The doc comment mentions "Events root [
Cid] -> [TipsetKey]" mapping, but this PR removes that functionality. The comment should be updated to reflect that only the Ethereum RPC API mappings are populated.📝 Suggested fix
-/// To support the Event RPC API, a new column has been added to parity-db to handle the mapping: -/// - Events root [`Cid`] -> [`TipsetKey`]. -/// -/// Similarly, to support the Ethereum RPC API, another column has been introduced to map: +/// To support the Ethereum RPC API, columns exist to map: /// - [`struct@EthHash`] -> [`TipsetKey`], /// - [`struct@EthHash`] -> Delegated message [`Cid`]. /// /// This function traverses the chain store and populates these columns accordingly.
🤖 Fix all issues with AI agents
In @src/db/migration/v0_31_0.rs:
- Around line 112-144: The EthMappings migration (DbColumn::EthMappings) is
storing TipsetKey bytes (val.value) as the value instead of the expected
serialized (Cid, u64) tuple, which corrupts data and breaks get_mapping and
get_message_cids; fix by decoding TipsetKey with fvm_ipld_encoding::from_slice,
extracting the Cid via TipsetKey::cid(), convert Cid -> EthHash for the key,
then serialize the tuple (cid, timestamp_u64) with fvm_ipld_encoding::to_vec and
pass that serialized bytes into new_db.db.commit_changes (via Db::set_operation)
instead of val.value, preserving the existing error handling flow around
tsk/cid/res and bubbling errors; add a unit test that runs the migration on a
test DB containing an old EthMappings entry and asserts the migrated value
deserializes to (Cid, u64) and that get_mapping/get_message_cids return the
expected results.
🧹 Nitpick comments (2)
src/db/migration/v0_31_0.rs (2)
163-252: Consider extracting shared column options logic to reduce duplication.The
paritydb_0_26_0andparitydb_0_31_0modules have nearly identicalcreate_column_optionsimplementations. While duplication in migration code is often acceptable for stability (avoiding changes to old migration logic), consider whether a shared helper could reduce maintenance burden for future migrations.This is a minor observation — keeping migrations self-contained is also a valid design choice for reproducibility.
Also applies to: 254-336
30-161: Missing unit tests for migration logic.This migration file lacks test coverage. Given that database migrations are critical operations that can lead to data loss if incorrect, consider adding tests that:
- Create a mock v0.26.0 database with sample data
- Run the migration
- Verify data integrity in the resulting v0.31.0 database
The existing migration tests in
migration_map.rstest the migration chain logic but not the actual data transformation.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
Cargo.tomlsrc/chain/store/chain_store.rssrc/chain_sync/chain_follower.rssrc/daemon/context.rssrc/daemon/db_util.rssrc/db/car/many.rssrc/db/memory.rssrc/db/migration/migration_map.rssrc/db/migration/mod.rssrc/db/migration/v0_31_0.rssrc/db/mod.rssrc/db/parity_db.rssrc/libp2p/chain_exchange/provider.rs
💤 Files with no reviewable changes (4)
- src/daemon/context.rs
- src/chain_sync/chain_follower.rs
- src/db/mod.rs
- src/libp2p/chain_exchange/provider.rs
✅ Files skipped from review due to trivial changes (1)
- Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (1)
- src/db/memory.rs
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-11T16:03:14.328Z
Learnt from: akaladarshi
Repo: ChainSafe/forest PR: 6000
File: src/tool/subcommands/api_cmd/state_decode_params_tests/miner.rs:4-9
Timestamp: 2025-09-11T16:03:14.328Z
Learning: In the Forest codebase's state_decode_params_tests module, common imports like `to_vec`, `TokenAmount`, `Cid`, etc. are centralized in the mod.rs file and made available to submodules through `use super::*;`, eliminating the need for duplicate imports in individual test files.
Applied to files:
src/db/car/many.rs
📚 Learning: 2026-01-05T12:54:40.850Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6381
File: src/lotus_json/actors/states/cron_state.rs:8-8
Timestamp: 2026-01-05T12:54:40.850Z
Learning: In Rust code reviews, do not derive Eq for a struct if any field does not implement Eq (e.g., types from external dependencies). If a type like CronStateLotusJson includes fields wrapping external dependencies that lack Eq, derive PartialEq (or implement PartialEq manually) but avoid deriving Eq. This ensures comparisons compile and reflect actual equivalence semantics. When needed, consider implementing custom PartialEq (and possibly Eq) only after ensuring all fields (or wrappers) implement Eq, or keep PartialEq-only if full equality semantics cannot be expressed.
Applied to files:
src/db/car/many.rssrc/daemon/db_util.rssrc/db/migration/mod.rssrc/db/migration/migration_map.rssrc/chain/store/chain_store.rssrc/db/parity_db.rssrc/db/migration/v0_31_0.rs
📚 Learning: 2026-01-05T12:56:13.802Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6381
File: src/lotus_json/actors/states/evm_state.rs:41-44
Timestamp: 2026-01-05T12:56:13.802Z
Learning: In Rust codebases (e.g., Forest), do not add #[cfg(test)] to functions already annotated with #[test]. The #[test] attribute ensures the function is compiled only for tests, so a separate #[cfg(test)] is redundant and can be removed if present. Apply this check to all Rust files that contain #[test] functions.
Applied to files:
src/db/car/many.rssrc/daemon/db_util.rssrc/db/migration/mod.rssrc/db/migration/migration_map.rssrc/chain/store/chain_store.rssrc/db/parity_db.rssrc/db/migration/v0_31_0.rs
📚 Learning: 2025-10-17T14:24:47.046Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6167
File: src/tool/subcommands/state_compute_cmd.rs:89-91
Timestamp: 2025-10-17T14:24:47.046Z
Learning: In `src/tool/subcommands/state_compute_cmd.rs`, when using `ReadOpsTrackingStore` to generate minimal snapshots, `HEAD_KEY` should be written to `db.tracker` (not `db` itself) before calling `export_forest_car()`, because the export reads from the tracker MemoryDB which accumulates only the accessed data during computation.
Applied to files:
src/daemon/db_util.rs
📚 Learning: 2025-08-08T12:11:55.266Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5867
File: src/ipld/util.rs:461-487
Timestamp: 2025-08-08T12:11:55.266Z
Learning: Forest (src/ipld/util.rs, Rust): In UnorderedChainStream::poll_next, dropping `extract_sender` (when no more tipsets and the extract queue is empty) is the intended shutdown signal for workers. Any subsequent attempt to enqueue work after this drop is a logic error and should be treated as an error; do not change `send()` to ignore a missing sender.
Applied to files:
src/daemon/db_util.rs
🧬 Code graph analysis (2)
src/db/car/many.rs (3)
src/db/memory.rs (2)
SettingsStoreExt(32-32)SettingsStoreExt(156-156)src/db/parity_db.rs (1)
SettingsStoreExt(177-177)src/chain/mod.rs (1)
SettingsStoreExt(45-45)
src/db/migration/v0_31_0.rs (2)
src/db/migration/migration_map.rs (12)
new(26-28)new(321-326)new(491-496)from(30-30)from(328-330)from(498-500)to(32-32)to(332-334)to(502-504)migrate_core(60-60)migrate_core(309-315)migrate_core(481-489)src/db/parity_db.rs (1)
set_operation(365-367)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: rubocop
- GitHub Check: All lint checks
- GitHub Check: Coverage
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests-release
🔇 Additional comments (8)
src/chain/store/chain_store.rs (3)
9-9: LGTM!The import correctly removes
IndicesStoredependency, aligning with the PR objective to eliminate the events root → tipset key mapping from the database.
743-744: LGTM!Test correctly updated to use the new
ChainStore::newsignature. TheMemoryDBimplements all required traits (Blockstore,HeaviestTipsetKeyProvider,EthMappingsStore), making this a valid test setup.
758-758: LGTM!Consistent with the other test update.
src/daemon/db_util.rs (1)
342-344: LGTM!The change correctly removes events_roots indexing. Events are now stored directly in the blockstore during FVM execution, eliminating the need for a separate index.
src/db/car/many.rs (1)
14-16: LGTM!The import correctly removes
IndicesStorefrom the module. TheSettingsStoreandEthMappingsStoreimplementations remain intact, and this change aligns with the PR objective to remove the indices store column from the database.src/db/migration/migration_map.rs (1)
13-13: LGTM! Migration entry follows established patterns.The import and migration mapping are correctly added. The comment clarifies that v0.30.0 uses the same schema as v0.26.0, which explains why
Migration0_26_0_0_31_0is reused for the 0.30.0 → 0.31.0 path.Also applies to: 159-160
src/db/migration/mod.rs (1)
8-8: LGTM!Module declaration follows the established pattern and ordering.
src/db/parity_db.rs (1)
4-4: LGTM! Indices column correctly removed from schema.The
Indicescolumn was at position 5 (the last column), so removing it doesn't affect the indices of the remaining columns. The import and enum changes are consistent with the migration.Also applies to: 21-42
| DbColumn::GraphDagCborBlake2b256 | DbColumn::PersistentGraph => { | ||
| db.db.iter_column_while(col as u8, |val| { | ||
| let hash = MultihashCode::Blake2b256.digest(&val.value); | ||
| let cid = Cid::new_v1(DAG_CBOR, hash); | ||
| res = new_db | ||
| .db | ||
| .commit_changes([Db::set_operation( | ||
| col as u8, | ||
| cid.to_bytes(), | ||
| val.value, | ||
| )]) | ||
| .context("failed to commit"); | ||
|
|
||
| if res.is_err() { | ||
| return false; | ||
| } | ||
|
|
||
| true | ||
| })?; | ||
| res?; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for PersistentGraph usage and how data is stored in this column
rg -n "PersistentGraph" --type rust -A 3 -B 3 | head -100Repository: ChainSafe/forest
Length of output: 5798
🏁 Script executed:
#!/bin/bash
# Look for where data is written to PersistentGraph column to understand the format
rg -n "put.*[Pp]ersistent" --type rust -A 2 | head -80Repository: ChainSafe/forest
Length of output: 2537
🏁 Script executed:
#!/bin/bash
# Check how GraphDagCborBlake2b256 is populated to compare patterns
rg -n "GraphDagCborBlake2b256" --type rust -A 3 -B 3 | head -100Repository: ChainSafe/forest
Length of output: 6403
🏁 Script executed:
#!/bin/bash
# Check what data types are stored in PersistentGraph - specifically manifests
rg -n "manifest|Manifest" --type rust src/daemon/bundle.rs -B 5 -A 5 | head -100Repository: ChainSafe/forest
Length of output: 1558
🏁 Script executed:
#!/bin/bash
# Check the actual CID construction for data stored in PersistentGraph
rg -n "put_keyed_persistent|cid\|block" --type rust src/daemon/bundle.rs -B 5 -A 2 | head -80Repository: ChainSafe/forest
Length of output: 694
🏁 Script executed:
#!/bin/bash
# Look at the snapshot import logic for any clues about CID format for persistent data
rg -n "F3|manifest" --type rust src/daemon/ -l | head -20Repository: ChainSafe/forest
Length of output: 120
🏁 Script executed:
#!/bin/bash
# Check ActorBundleInfo definition to see the CID codec/hash for manifests
rg -n "ActorBundleInfo" --type rust -A 10 | head -60Repository: ChainSafe/forest
Length of output: 4368
🏁 Script executed:
#!/bin/bash
# Look for how CIDs are created when writing to PersistentGraph - check CAR stream handling
rg -n "CarBlock|CarStream" --type rust src/daemon/bundle.rs -B 3 -A 3Repository: ChainSafe/forest
Length of output: 1123
Migration assumption about PersistentGraph CID format requires validation.
The migration reconstructs CIDs for PersistentGraph data using hardcoded DAG_CBOR codec and Blake2b256 hash. However, PersistentGraph accepts arbitrary IPLD data via pre-computed CIDs from external sources (actor bundle CAR files). While actor bundles almost certainly use DAG_CBOR + Blake2b256, the code doesn't enforce or validate this assumption. If any data with a different codec or hash was stored in PersistentGraph, the migration would silently reconstruct incorrect CID keys, corrupting the data. Consider adding validation during migration to detect and reject data that doesn't match the expected format, rather than silently reconstructing potentially wrong CIDs.
| DbColumn::EthMappings => { | ||
| db.db.iter_column_while(col as u8, |val| { | ||
| let tsk: Result<TipsetKey, fvm_ipld_encoding::Error> = | ||
| fvm_ipld_encoding::from_slice(&val.value); | ||
| if tsk.is_err() { | ||
| res = Err(tsk.context("serde error").unwrap_err()); | ||
| return false; | ||
| } | ||
| let cid = tsk.unwrap().cid(); | ||
|
|
||
| if cid.is_err() { | ||
| res = Err(cid.context("serde error").unwrap_err()); | ||
| return false; | ||
| } | ||
|
|
||
| let hash: EthHash = cid.unwrap().into(); | ||
| res = new_db | ||
| .db | ||
| .commit_changes([Db::set_operation( | ||
| col as u8, | ||
| hash.0.as_bytes().to_vec(), | ||
| val.value, | ||
| )]) | ||
| .context("failed to commit"); | ||
|
|
||
| if res.is_err() { | ||
| return false; | ||
| } | ||
|
|
||
| true | ||
| })?; | ||
| res?; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the schema definitions in the migration file
fd v0_31_0.rs --type f -x cat -n {} | head -300Repository: ChainSafe/forest
Length of output: 13093
🏁 Script executed:
# Get the rest of the v0_31_0.rs file to see the complete new schema
fd v0_31_0.rs --type f -x tail -n +300 {}Repository: ChainSafe/forest
Length of output: 1163
🏁 Script executed:
# Search for EthMappings usage in the codebase to understand what data it stores
rg -n "EthMappings" --type rust -B 3 -A 3 | head -150Repository: ChainSafe/forest
Length of output: 9223
🏁 Script executed:
# Search for EthMappingsStore write operations to understand what data is actually stored
rg -n "write.*EthMappings|EthMappings.*write" --type rust -B 2 -A 2 | head -200Repository: ChainSafe/forest
Length of output: 2651
🏁 Script executed:
# Search for message_cids references to understand the (Cid, u64) format
rg -n "message_cids|MessageCid" --type rust -B 2 -A 2 | head -150Repository: ChainSafe/forest
Length of output: 7314
🏁 Script executed:
# Check git history of parity_db.rs around EthMappings to see if schema changed
git log --oneline --all -- src/db/parity_db.rs | head -20Repository: ChainSafe/forest
Length of output: 1128
🏁 Script executed:
# Look for old implementations of EthMappings in migration files
rg -n "EthMappings" src/db/migration/ --type rust -B 5 -A 5Repository: ChainSafe/forest
Length of output: 11339
🏁 Script executed:
# Check the v0_26_0 migration file to see how EthMappings was handled there
fd v0_26_0.rs --type f -x cat -n {} | grep -A 30 "DbColumn::EthMappings"Repository: ChainSafe/forest
Length of output: 4061
🏁 Script executed:
# Let's search for when the EthMappings schema changed from TipsetKey to (Cid, u64)
git log --all -p -- src/db/parity_db.rs | grep -B 10 -A 10 "Cid, u64" | head -100Repository: ChainSafe/forest
Length of output: 1793
🏁 Script executed:
# Look for get_message_cids implementation history
git log --oneline --all -- src/db/parity_db.rs src/db/migration/Repository: ChainSafe/forest
Length of output: 2763
🏁 Script executed:
# Search for where EthMappings values are actually written to understand the current schema
rg -n "write.*EthHash|write_bin.*eth" --type rust -B 5 -A 5 | head -150Repository: ChainSafe/forest
Length of output: 6176
🏁 Script executed:
# Search for where (Cid, u64) values are written to EthMappingsStore
rg -n "write_obj.*u64|write_bin" src/rpc/methods/eth.rs -B 5 -A 5 | head -200Repository: ChainSafe/forest
Length of output: 42
🏁 Script executed:
# Look for EthGetMessageCidByTransactionHash implementation
rg -n "EthGetMessageCidByTransactionHash" src/rpc/methods/eth.rs -B 2 -A 30 | head -150Repository: ChainSafe/forest
Length of output: 1606
🏁 Script executed:
# Search for where eth mappings are populated with message CIDs
rg -n "eth.*mapping|EthMappingsStore.*write" src/ --type rust -B 3 -A 3 | head -200Repository: ChainSafe/forest
Length of output: 12145
🏁 Script executed:
# Search for get_mapping implementation to see what it returns
rg -n "fn get_mapping" src/ --type rust -B 2 -A 10Repository: ChainSafe/forest
Length of output: 1019
🏁 Script executed:
# Search for EthMappingCollector to understand what it's supposed to write
rg -n "EthMappingCollector" src/ --type rust -B 2 -A 15 | head -150Repository: ChainSafe/forest
Length of output: 4277
🏁 Script executed:
# Check if there are any tests for the migration
rg -n "test.*migration|#\[test\]" src/db/migration/v0_31_0.rs --type rustRepository: ChainSafe/forest
Length of output: 42
The EthMappings migration stores the wrong data type and will corrupt migrated data.
The migration correctly transforms the key from the old format to EthHash, but it stores TipsetKey bytes as the value instead of serialized (Cid, u64) tuples. This causes:
get_mapping()(chain_store.rs:198) expects to deserialize values as(Cid, u64)but will getTipsetKeybytes, returningNonesilentlyget_message_cids()(parity_db.rs:207) usesif let Ok(...)which silently ignores deserialization failures- All Ethereum mappings from migrated databases will be effectively lost
The migration should extract the CID from the TipsetKey and serialize it with a timestamp as (Cid, u64) before storing:
Expected fix pattern (similar to GraphDagCborBlake2b256 migration)
DbColumn::EthMappings => {
db.db.iter_column_while(col as u8, |val| {
let tsk: Result<TipsetKey, _> = fvm_ipld_encoding::from_slice(&val.value);
if tsk.is_err() { /* handle error */ }
let cid = tsk.unwrap().cid();
if cid.is_err() { /* handle error */ }
let hash: EthHash = cid.unwrap().into();
// Serialize (Cid, u64) instead of storing val.value
let value = fvm_ipld_encoding::to_vec(&(cid.unwrap(), timestamp))?;
res = new_db.db.commit_changes([Db::set_operation(col as u8, hash.0.as_bytes().to_vec(), value)])...
})
}Additionally, no unit tests exist for this migration, allowing this corruption to go undetected.
🤖 Prompt for AI Agents
In @src/db/migration/v0_31_0.rs around lines 112 - 144, The EthMappings
migration (DbColumn::EthMappings) is storing TipsetKey bytes (val.value) as the
value instead of the expected serialized (Cid, u64) tuple, which corrupts data
and breaks get_mapping and get_message_cids; fix by decoding TipsetKey with
fvm_ipld_encoding::from_slice, extracting the Cid via TipsetKey::cid(), convert
Cid -> EthHash for the key, then serialize the tuple (cid, timestamp_u64) with
fvm_ipld_encoding::to_vec and pass that serialized bytes into
new_db.db.commit_changes (via Db::set_operation) instead of val.value,
preserving the existing error handling flow around tsk/cid/res and bubbling
errors; add a unit test that runs the migration on a test DB containing an old
EthMappings entry and asserts the migrated value deserializes to (Cid, u64) and
that get_mapping/get_message_cids return the expected results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @src/db/migration/v0_31_0.rs:
- Around line 144-152: The default branch currently commits each record
individually using new_db.db.commit_changes([Db::set_operation(col as u8, key,
value)]), causing the same per-record performance issue; change it to collect a
batch of Db::set_operation entries (same batch size and batching logic used in
the other branches), then call new_db.db.commit_changes once per batch (and once
for any remaining ops) while iterating the iterator (iter) so commits are
batched instead of per-record.
- Around line 91-109: The current loop in iter_column_while calls
new_db.db.commit_changes for every record (inside iter_column_while), causing
extreme slowdown; change iter_column_while usage to collect Db::set_operation
items into a Vec (e.g., ops) and call new_db.db.commit_changes(ops_slice) only
when the batch size threshold (e.g., 10_000) is reached and once more after the
loop to flush remaining ops; ensure you propagate errors by mapping
commit_changes().context("failed to commit") to res and return false from the
iterator on error (so the ? outside still works), clear the ops Vec after each
successful commit, and keep using MultihashCode::Blake2b256.digest, Cid::new_v1
and cid.to_bytes() to build the Db::set_operation entries.
🧹 Nitpick comments (3)
src/db/migration/v0_31_0.rs (3)
111-143: Simplify error handling with?operator or pattern matching.The
is_err()+unwrap()pattern is convoluted and error-prone. Consider restructuring to use idiomatic Rust error handling.♻️ Suggested refactor for cleaner error handling
DbColumn::EthMappings => { db.db.iter_column_while(col as u8, |val| { - let tsk: Result<TipsetKey, fvm_ipld_encoding::Error> = - fvm_ipld_encoding::from_slice(&val.value); - if tsk.is_err() { - res = Err(tsk.context("serde error").unwrap_err()); - return false; - } - let cid = tsk.unwrap().cid(); - - if cid.is_err() { - res = Err(cid.context("serde error").unwrap_err()); - return false; - } - - let hash: EthHash = cid.unwrap().into(); + let result = (|| -> anyhow::Result<()> { + let tsk: TipsetKey = fvm_ipld_encoding::from_slice(&val.value) + .context("failed to deserialize TipsetKey")?; + let cid = tsk.cid().context("failed to get CID from TipsetKey")?; + let hash: EthHash = cid.into(); + new_db.db.commit_changes([Db::set_operation( + col as u8, + hash.0.as_bytes().to_vec(), + val.value, + )]).context("failed to commit")?; + Ok(()) + })(); + + match result { + Ok(()) => true, + Err(e) => { + res = Err(e); + false + } + } - res = new_db - .db - .commit_changes([Db::set_operation( - col as u8, - hash.0.as_bytes().to_vec(), - val.value, - )]) - .context("failed to commit"); - - if res.is_err() { - return false; - } - - true })?; res?; }
162-251: LGTM!The old schema module correctly defines the v0.26.0 database structure with the
Indicescolumn.Minor: Line 201 has a typo "othewise" → "otherwise" (also present in the v0.31.0 module at line 292).
80-85: Add explicit documentation or assertion to safeguard column index compatibility in migration logic.The code correctly handles the migration by skipping the
Indicescolumn (which is removed) and copying columns 0-4 using their enum indices viacol as u8. This works because these columns maintain the same order and meaning in both schemas. However, the implicit coupling between the twoDbColumnenum definitions (oldparitydb_0_26_0and newparitydb_0_31_0) creates a maintenance risk: reordering columns in either version would silently break the migration.Consider adding a compile-time assertion or a detailed comment (near lines 80-85) explicitly documenting that columns 0-4 must remain in the same order across both schema versions for the migration to function correctly.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/db/migration/v0_31_0.rs
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-11T14:00:47.060Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5886
File: src/daemon/db_util.rs:236-259
Timestamp: 2025-08-11T14:00:47.060Z
Learning: In Forest's snapshot import (`src/daemon/db_util.rs`), when F3 CID is present in snapshot metadata but the actual F3 data is missing, this should cause a hard error as it indicates snapshot corruption. Only errors during the F3 import process itself (after data is successfully retrieved) should be non-fatal and logged.
Applied to files:
src/db/migration/v0_31_0.rs
📚 Learning: 2026-01-05T12:54:40.850Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6381
File: src/lotus_json/actors/states/cron_state.rs:8-8
Timestamp: 2026-01-05T12:54:40.850Z
Learning: In Rust code reviews, do not derive Eq for a struct if any field does not implement Eq (e.g., types from external dependencies). If a type like CronStateLotusJson includes fields wrapping external dependencies that lack Eq, derive PartialEq (or implement PartialEq manually) but avoid deriving Eq. This ensures comparisons compile and reflect actual equivalence semantics. When needed, consider implementing custom PartialEq (and possibly Eq) only after ensuring all fields (or wrappers) implement Eq, or keep PartialEq-only if full equality semantics cannot be expressed.
Applied to files:
src/db/migration/v0_31_0.rs
📚 Learning: 2026-01-05T12:56:13.802Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6381
File: src/lotus_json/actors/states/evm_state.rs:41-44
Timestamp: 2026-01-05T12:56:13.802Z
Learning: In Rust codebases (e.g., Forest), do not add #[cfg(test)] to functions already annotated with #[test]. The #[test] attribute ensures the function is compiled only for tests, so a separate #[cfg(test)] is redundant and can be removed if present. Apply this check to all Rust files that contain #[test] functions.
Applied to files:
src/db/migration/v0_31_0.rs
🧬 Code graph analysis (1)
src/db/migration/v0_31_0.rs (2)
src/db/migration/migration_map.rs (12)
new(26-28)new(321-326)new(491-496)from(30-30)from(328-330)from(498-500)to(32-32)to(332-334)to(502-504)migrate_core(60-60)migrate_core(309-315)migrate_core(481-489)src/db/parity_db.rs (1)
set_operation(365-367)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests-release
- GitHub Check: Coverage
🔇 Additional comments (4)
src/db/migration/v0_31_0.rs (4)
1-28: LGTM!Imports are appropriate for the migration functionality. The module documentation clearly explains the purpose of the migration.
29-50: LGTM!The migration struct correctly implements the
MigrationOperationtrait methods.
55-72: LGTM!The CAR database copy logic correctly handles the case where the directory may not exist in older versions.
253-335: LGTM!The new schema module correctly removes the
Indicescolumn while preserving column indices 0-4 for the remaining columns.
| db.db.iter_column_while(col as u8, |val| { | ||
| let hash = MultihashCode::Blake2b256.digest(&val.value); | ||
| let cid = Cid::new_v1(DAG_CBOR, hash); | ||
| res = new_db | ||
| .db | ||
| .commit_changes([Db::set_operation( | ||
| col as u8, | ||
| cid.to_bytes(), | ||
| val.value, | ||
| )]) | ||
| .context("failed to commit"); | ||
|
|
||
| if res.is_err() { | ||
| return false; | ||
| } | ||
|
|
||
| true | ||
| })?; | ||
| res?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance: Per-record commits will be extremely slow for large databases.
Each record triggers a separate commit_changes call, which involves disk I/O synchronization. For databases with millions of records, this will make the migration prohibitively slow.
Consider batching commits (e.g., every 10,000 records) to significantly improve performance.
♻️ Suggested approach for batching commits
+const BATCH_SIZE: usize = 10_000;
+
match col {
DbColumn::GraphDagCborBlake2b256 | DbColumn::PersistentGraph => {
+ let mut batch = Vec::with_capacity(BATCH_SIZE);
db.db.iter_column_while(col as u8, |val| {
let hash = MultihashCode::Blake2b256.digest(&val.value);
let cid = Cid::new_v1(DAG_CBOR, hash);
- res = new_db
- .db
- .commit_changes([Db::set_operation(
- col as u8,
- cid.to_bytes(),
- val.value,
- )])
- .context("failed to commit");
-
- if res.is_err() {
- return false;
+ batch.push(Db::set_operation(col as u8, cid.to_bytes(), val.value));
+ if batch.len() >= BATCH_SIZE {
+ res = new_db.db.commit_changes(batch.drain(..)).context("failed to commit");
+ if res.is_err() {
+ return false;
+ }
}
-
true
})?;
+ // Commit remaining items
+ if !batch.is_empty() {
+ new_db.db.commit_changes(batch).context("failed to commit")?;
+ }
res?;
}🤖 Prompt for AI Agents
In @src/db/migration/v0_31_0.rs around lines 91 - 109, The current loop in
iter_column_while calls new_db.db.commit_changes for every record (inside
iter_column_while), causing extreme slowdown; change iter_column_while usage to
collect Db::set_operation items into a Vec (e.g., ops) and call
new_db.db.commit_changes(ops_slice) only when the batch size threshold (e.g.,
10_000) is reached and once more after the loop to flush remaining ops; ensure
you propagate errors by mapping commit_changes().context("failed to commit") to
res and return false from the iterator on error (so the ? outside still works),
clear the ops Vec after each successful commit, and keep using
MultihashCode::Blake2b256.digest, Cid::new_v1 and cid.to_bytes() to build the
Db::set_operation entries.
| _ => { | ||
| let mut iter = db.db.iter(col as u8)?; | ||
| while let Some((key, value)) = iter.next()? { | ||
| new_db | ||
| .db | ||
| .commit_changes([Db::set_operation(col as u8, key, value)]) | ||
| .context("failed to commit")?; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same per-record commit performance issue applies here.
Apply the same batching approach for the default column handling path.
🤖 Prompt for AI Agents
In @src/db/migration/v0_31_0.rs around lines 144 - 152, The default branch
currently commits each record individually using
new_db.db.commit_changes([Db::set_operation(col as u8, key, value)]), causing
the same per-record performance issue; change it to collect a batch of
Db::set_operation entries (same batch size and batching logic used in the other
branches), then call new_db.db.commit_changes once per batch (and once for any
remaining ops) while iterating the iterator (iter) so commits are batched
instead of per-record.
|
@akaladarshi could you please mark @coderabbitai comments as irrelevant, incorrect or address them if they make sense? |
|
Tip For best results, initiate chat on the files or code changes. @LesnyRumcajs, I'm here to help! I'm happy to clarify any of my review comments if needed, or provide additional context. @akaladarshi, feel free to let me know if you have questions about any of my suggestions or if you'd like me to re-evaluate any specific comments based on new information or context I may have missed. 🧠 Learnings used |
I was thinking we can address this as a separate PR, as the current migration code is taken from previous one , so we will also have to update those migrations as well. |
|
Tip For best results, initiate chat on the files or code changes. @akaladarshi, splitting a large PR into smaller, manageable pieces can improve clarity and facilitate more thorough reviews, so your proposal makes sense. Given that this PR already touches many files across the codebase to remove the Indices column and add the migration logic, keeping the scope focused on that core functionality is reasonable. Since the migration code patterns are inherited from previous migrations and would require updating multiple migration files to be consistent, deferring that cleanup to a separate PR is a pragmatic approach. This way:
Would you like me to help create a GitHub issue to track the migration code improvements as a follow-up task? That way, the work won't be forgotten and can be prioritized appropriately by the team. |
@coderabbitai Yes, Create a new issue for this. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@akaladarshi, I'll create a GitHub issue to track the migration code improvements. 🏁 Script executed: Length of output: 47 🏁 Script executed: Length of output: 47 @akaladarshi, I've successfully created issue #6413 to track the migration code improvements: #6413 The issue has been assigned to you and includes:
This will ensure the migration code improvements aren't forgotten and can be prioritized appropriately by the team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
CHANGELOG.md (1)
44-55: Changelog: consider referencing issue #6271 and keep link formatting consistent.Since this change closes an issue, consider using the issue reference (or include both), and add
:after the link for consistency with nearby entries.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
CHANGELOG.mdscripts/tests/api_compare/filter-listscripts/tests/api_compare/filter-list-offlinesrc/rpc/methods/chain.rs
💤 Files with no reviewable changes (2)
- scripts/tests/api_compare/filter-list
- scripts/tests/api_compare/filter-list-offline
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-05T12:54:40.850Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6381
File: src/lotus_json/actors/states/cron_state.rs:8-8
Timestamp: 2026-01-05T12:54:40.850Z
Learning: In Rust code reviews, do not derive Eq for a struct if any field does not implement Eq (e.g., types from external dependencies). If a type like CronStateLotusJson includes fields wrapping external dependencies that lack Eq, derive PartialEq (or implement PartialEq manually) but avoid deriving Eq. This ensures comparisons compile and reflect actual equivalence semantics. When needed, consider implementing custom PartialEq (and possibly Eq) only after ensuring all fields (or wrappers) implement Eq, or keep PartialEq-only if full equality semantics cannot be expressed.
Applied to files:
src/rpc/methods/chain.rs
📚 Learning: 2026-01-05T12:56:13.802Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6381
File: src/lotus_json/actors/states/evm_state.rs:41-44
Timestamp: 2026-01-05T12:56:13.802Z
Learning: In Rust codebases (e.g., Forest), do not add #[cfg(test)] to functions already annotated with #[test]. The #[test] attribute ensures the function is compiled only for tests, so a separate #[cfg(test)] is redundant and can be removed if present. Apply this check to all Rust files that contain #[test] functions.
Applied to files:
src/rpc/methods/chain.rs
🧬 Code graph analysis (1)
src/rpc/methods/chain.rs (3)
src/rpc/methods/eth/filter/mod.rs (1)
get_events_by_event_root(369-383)src/tool/subcommands/api_cmd/generate_test_snapshot.rs (1)
ctx(99-151)src/tool/subcommands/api_cmd/test_snapshot.rs (1)
ctx(122-169)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests-release
- GitHub Check: cargo-publish-dry-run
- GitHub Check: All lint checks
- GitHub Check: Coverage
- GitHub Check: rubocop
🔇 Additional comments (1)
src/rpc/methods/chain.rs (1)
242-261: Add root CID to error context (helps parity debugging & ops).Right now the error path depends on downstream formatting and may omit
root_cid. Adding local context makes failures much easier to diagnose.Proposed tweak
async fn handle( ctx: Ctx<impl Blockstore + Send + Sync + 'static>, (root_cid,): Self::Params, ) -> Result<Self::Ok, ServerError> { - let events = EthEventHandler::get_events_by_event_root(&ctx, &root_cid)?; + let events = EthEventHandler::get_events_by_event_root(&ctx, &root_cid) + .with_context(|| format!("failed to load events for events_root={root_cid}"))?; Ok(events) }[建议_recommended_refactor]
|
Please ensure all @coderabbitai comments are closed. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@LesnyRumcajs, I'll check the status of all my review comments on this PR and help close any that have been addressed or are no longer relevant. 🏁 Script executed: Length of output: 122208 @LesnyRumcajs, I'll check the resolution status of the review comments to give you a complete picture. 🏁 Script executed: Length of output: 864 🏁 Script executed: Length of output: 2105 @LesnyRumcajs, I found 5 unresolved CodeRabbit review comments on this PR: Summary by Severity:🔴 Critical (2 comments) -
🟠 Major (2 comments) -
🟡 Minor (1 comment) -
Status:None of these comments have received any replies, meaning they are all still open and unaddressed. Recommendation:Since 4 of the 5 comments are flagged as Critical or Major severity issues in the migration code (specifically around per-record commit performance), I recommend reviewing these before closing them. The performance issues in particular could cause significant problems during database migration. Would you like me to:
🧠 Learnings used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/rpc/methods/eth/filter/mod.rs (1)
368-383: Consider simplifying the error handling pattern.The function correctly loads events from the blockstore using
StampedEvent::get_events, which aligns with the PR objective to serve events directly from the blockstore. The conversion toVec<Event>viaInto::intois clean.However, the error handling can be simplified using the
?operator withcontext:♻️ Suggested simplification
pub fn get_events_by_event_root<DB: Blockstore + Send + Sync + 'static>( ctx: &Ctx<DB>, events_root: &Cid, ) -> anyhow::Result<Vec<Event>> { - let state_events = - match StampedEvent::get_events(ctx.chain_store().blockstore(), events_root) { - Ok(e) => e, - Err(e) => { - return Err(anyhow::anyhow!("load events amt: {}", e)); - } - }; + let state_events = StampedEvent::get_events(ctx.chain_store().blockstore(), events_root) + .with_context(|| format!("load events amt for root {events_root}"))?; let chain_events: Vec<Event> = state_events.into_iter().map(Into::into).collect(); Ok(chain_events) }This provides consistent error handling with the rest of the codebase and includes the events root CID in the error context for easier debugging.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
CHANGELOG.mdCargo.tomlsrc/rpc/methods/eth/filter/mod.rssrc/state_manager/mod.rssrc/tool/subcommands/api_cmd/test_snapshots.txtsrc/tool/subcommands/index_cmd.rs
💤 Files with no reviewable changes (1)
- src/tool/subcommands/index_cmd.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- Cargo.toml
- CHANGELOG.md
- src/tool/subcommands/api_cmd/test_snapshots.txt
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-11T16:03:14.328Z
Learnt from: akaladarshi
Repo: ChainSafe/forest PR: 6000
File: src/tool/subcommands/api_cmd/state_decode_params_tests/miner.rs:4-9
Timestamp: 2025-09-11T16:03:14.328Z
Learning: In the Forest codebase's state_decode_params_tests module, common imports like `to_vec`, `TokenAmount`, `Cid`, etc. are centralized in the mod.rs file and made available to submodules through `use super::*;`, eliminating the need for duplicate imports in individual test files.
Applied to files:
src/state_manager/mod.rs
📚 Learning: 2026-01-05T13:02:14.604Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6381
File: src/shim/actors/builtin/cron/mod.rs:47-47
Timestamp: 2026-01-05T13:02:14.604Z
Learning: The Entry enum in src/shim/actors/builtin/cron/mod.rs cannot derive Eq because it wraps external fil_actor_cron_state::vX::Entry types (v8-v17) that don't implement Eq. Only PartialEq can be derived for this enum due to this external dependency constraint.
Applied to files:
src/state_manager/mod.rssrc/rpc/methods/eth/filter/mod.rs
📚 Learning: 2026-01-05T12:54:40.850Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6381
File: src/lotus_json/actors/states/cron_state.rs:8-8
Timestamp: 2026-01-05T12:54:40.850Z
Learning: In Rust code reviews, do not derive Eq for a struct if any field does not implement Eq (e.g., types from external dependencies). If a type like CronStateLotusJson includes fields wrapping external dependencies that lack Eq, derive PartialEq (or implement PartialEq manually) but avoid deriving Eq. This ensures comparisons compile and reflect actual equivalence semantics. When needed, consider implementing custom PartialEq (and possibly Eq) only after ensuring all fields (or wrappers) implement Eq, or keep PartialEq-only if full equality semantics cannot be expressed.
Applied to files:
src/state_manager/mod.rssrc/rpc/methods/eth/filter/mod.rs
📚 Learning: 2026-01-05T12:56:13.802Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6381
File: src/lotus_json/actors/states/evm_state.rs:41-44
Timestamp: 2026-01-05T12:56:13.802Z
Learning: In Rust codebases (e.g., Forest), do not add #[cfg(test)] to functions already annotated with #[test]. The #[test] attribute ensures the function is compiled only for tests, so a separate #[cfg(test)] is redundant and can be removed if present. Apply this check to all Rust files that contain #[test] functions.
Applied to files:
src/state_manager/mod.rssrc/rpc/methods/eth/filter/mod.rs
📚 Learning: 2025-08-08T12:11:55.266Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5867
File: src/ipld/util.rs:461-487
Timestamp: 2025-08-08T12:11:55.266Z
Learning: Forest (src/ipld/util.rs, Rust): In UnorderedChainStream::poll_next, dropping `extract_sender` (when no more tipsets and the extract queue is empty) is the intended shutdown signal for workers. Any subsequent attempt to enqueue work after this drop is a logic error and should be treated as an error; do not change `send()` to ignore a missing sender.
Applied to files:
src/rpc/methods/eth/filter/mod.rs
🧬 Code graph analysis (1)
src/rpc/methods/eth/filter/mod.rs (1)
src/shim/executor.rs (3)
events(118-124)events_root(180-186)get_events(392-414)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: Check
- GitHub Check: rubocop
- GitHub Check: tests-release
- GitHub Check: All lint checks
- GitHub Check: Coverage
🔇 Additional comments (8)
src/state_manager/mod.rs (6)
64-64: LGTM!The
ensure!macro import is correctly added to support the new verification check inapply_block_messages.
71-71: LGTM!The import of both
AmtandAmtv0is correct.Amtv0is used for receipts (legacy format) andAmtis used for events AMT with the configurable bit width.
93-93: LGTM!The
EVENTS_AMT_BITWIDTHconstant with value 5 aligns with the FVM's standard bit width for events AMT. This ensures consistency with Lotus behavior.
98-104: LGTM!The
StateEventsstruct cleanly encapsulates events data with appropriate derives and custom size calculation helpers.
562-570: LGTM!The fallback path correctly computes the tipset state when events are not found in the blockstore, aligning with the PR objective to serve events directly from the blockstore.
2014-2034: Error handling and verification logic are sound—no changes needed.The events AMT storage code correctly handles both failure scenarios. The
EVENTS_AMT_BITWIDTH = 5constant is already well-defined (src/state_manager/mod.rs:93) and validated consistently across tests and the codebase—the bitwidth verification request is unnecessary. The.map_err()provides descriptive context for storage failures, and theensure!()check properly validates the derived root matches the FVM-computed root. The error messages are clear and sufficient for debugging. While addingerror!level logging before the ensure check could serve as defensive instrumentation, it is optional given the explicit error message already provided.src/rpc/methods/eth/filter/mod.rs (2)
40-40: LGTM!The
StampedEventimport is correctly added to support the newget_events_by_event_rootfunction.
283-290: LGTM!Good improvement to the error message - including the actual lengths of messages and events helps with debugging when a mismatch occurs. The
tipset_state_eventscall is correctly updated to match the new signature.
Summary of changes
Changes introduced in this pull request:
blockstoreevents root->tipsetKeymappingblockstoreto serve theChainGetEventsAPI, and if events are not available we just error out, same as lotusReference issue to close (if applicable)
Closes #6271
Other information and links
The issue was: A message generated two identical events and
forestalways used to compute the state and get the events for serving theChainGetEventsAPI call request, so we were always getting two identical events, but thelotuswas serving directly from the blockstore so they were getting single event.Change checklist
Summary by CodeRabbit
Bug Fixes
Breaking Changes
New Features
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.