-
Notifications
You must be signed in to change notification settings - Fork 476
Reanalyze: reactive incremental analysis + churn/mermaid tooling #8092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+9,921
−1,346
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1b2651a to
9265b64
Compare
rescript
@rescript/darwin-arm64
@rescript/darwin-x64
@rescript/linux-arm64
@rescript/linux-x64
@rescript/runtime
@rescript/win32-x64
commit: |
de52abf to
210862e
Compare
cristianoc
commented
Dec 17, 2025
b171655 to
f329544
Compare
cristianoc
commented
Dec 24, 2025
cristianoc
commented
Dec 24, 2025
cristianoc
commented
Dec 24, 2025
cristianoc
commented
Dec 24, 2025
cristianoc
commented
Dec 24, 2025
cristianoc
commented
Dec 24, 2025
Vendor skip-lite library and integrate reactive analysis capabilities: - Vendor skip-lite marshal_cache and reactive_file_collection modules - Modify C++ code to handle ReScript CMT file format (CMI+CMT headers) - Add CmtCache module for mmap-based CMT file reading - Add ReactiveAnalysis module for incremental file processing - Add CLI flags: -cmt-cache, -reactive, -runs - Add README.md with usage and benchmark instructions Benchmark results (~5000 files): - Standard: CMT processing 0.78s, Total 1.01s - Reactive (warm): CMT processing 0.01s, Total 0.20s - Speedup: 74x for CMT processing, 5x total The reactive mode caches processed file_data and uses read_cmt_if_changed to skip unchanged files entirely on subsequent runs.
- Change --create-sourcedirs to default to true (always create .sourcedirs.json) - Hide the flag from help since it's now always enabled - Add deprecation warning when flag is explicitly used - Fix package name mismatches in test projects: - deadcode rescript.json: sample-typescript-app -> @tests/reanalyze-deadcode - rescript-react package.json: @tests/rescript-react -> @rescript/react
Simplify the reactive_file_collection implementation by making 'v t be the concrete record type used at runtime, removing the unused phantom fields and internal wrapper type. This eliminates warning 69 about unused record fields and relies directly on a single record with its process function stored as an Obj.t-based callback.
- CmtCache: rewritten using Unix.stat for file change detection (mtime, size, inode) instead of C++ mmap cache - ReactiveFileCollection: new pure OCaml module for reactive file collections with delta-based updates - ReactiveAnalysis: refactored to use ReactiveFileCollection, collection passed as parameter (no global mutable state) - Timing: only show parallel merge timing when applicable - Deleted skip-lite vendor directory (C++ code no longer needed) This eliminates the Linux/musl C++ compilation issue while maintaining the same incremental analysis performance: - Cold run: ~1.0s - Warm run: ~0.01s (90x faster, skips unchanged files)
- Create new analysis/reactive library with: - Reactive.ml: Core combinators (delta type, flatMap with merge) - ReactiveFileCollection.ml: Generic file collection with change detection - Comprehensive tests including multi-stage composition - Remove CmtCache module (logic absorbed into ReactiveFileCollection) - ReactiveAnalysis now uses generic ReactiveFileCollection with: - read_file: Cmt_format.read_cmt - process: CMT analysis function - Test composition: files -> word_counts (with merge) -> frequent_words Demonstrates delta propagation across multiple flatMap stages
- Add ReactiveMerge module for reactive merge of per-file DCE data - Add extraction functions (builder_to_list, create_*) to data modules - Expose types needed for reactive merge (CrossFileItems.t fields, etc.) - ReactiveAnalysis: add iter_file_data, collect_exception_results - runAnalysis: use ReactiveMerge for decls/annotations/cross_file when reactive mode enabled Note: refs and file_deps still use O(n) iteration because they need post-processing (type-label deps, exception refs). Next step is to make these reactive via indexed lookups.
- Add Reactive.lookup: single-key subscription from a collection - Add Reactive.join: reactive hash join between two collections - Add ReactiveTypeDeps: type-label dependencies via reactive join - Add ReactiveExceptionRefs: exception ref resolution via reactive join - Update ARCHITECTURE.md with generated SVG diagrams - Add diagram sources (.mmd) for batch pipeline, reactive pipeline, delta propagation The reactive modules express cross-file dependency resolution declaratively: - ReactiveTypeDeps uses flatMap to index decls by path, then join to connect impl<->intf - ReactiveExceptionRefs uses join to resolve exception paths to declaration locations
Bug fix: - ReactiveFileCollection.process now receives (path, raw) instead of just (raw) - ReactiveAnalysis passes cmtFilePath to DceFileProcessing.process_cmt_file - This fixes @genType annotations being incorrectly collected when .cmti exists Previously, reactive mode passed cmtFilePath:"" which made the .cmti existence check always return false, causing @genType annotations to be collected even for files with interface files (where they should be ignored). Architecture updates: - Updated reactive-pipeline.mmd with accurate node names - Added legend table explaining all diagram symbols - Diagram now shows: VR/TR (per-file refs), all ReactiveTypeDeps fields, ReactiveExceptionRefs flow, and combined output Helper functions added for debugging (kept as useful): - Declarations.length, FileAnnotations.length/iter - References.value_refs_length/type_refs_length - FileDeps.files_count/deps_count
Introduce five new store modules that wrap reactive collections directly, eliminating O(N) freeze/copy operations in the merge phase: - AnnotationStore: wraps FileAnnotations reactive collection - DeclarationStore: wraps Declarations reactive collection - ReferenceStore: combines 7 reactive sources (value_refs, type_refs, type_deps.*, exception_refs) without copying - FileDepsStore: wraps file deps reactive collections - CrossFileItemsStore: iterates reactive collection directly without intermediate allocation Performance improvement on 4900-file benchmark: - Merge time: 37ms → 0.002ms (eliminated) - Warm run total: 165ms → 129ms (22% faster) The stores provide a unified interface that works with both frozen (non-reactive) and reactive data, dispatching to the appropriate implementation at runtime. Signed-off-by: Cristiano Calcagno <[email protected]>
This is a significant simplification of the dead code analysis solver: **Removed:** - Backward recursive solver (resolveRecursiveRefs, solveDeadBackward) - FileDepsStore module (was only needed for file ordering in backward solver) - Unused find_value_targets/find_type_targets functions **Added:** - Liveness module: forward fixpoint liveness computation - ReactiveDeclRefs: maps declarations to their outgoing references - ReactiveLiveness: reactive liveness via Reactive.fixpoint combinator - Reactive.union: combine two collections with optional merge - Reactive.fixpoint: transitive closure combinator (init + edges → reachable) **Changed:** - References now stores both directions: refs_to (for reporting) and refs_from (for liveness) - ReactiveMerge exposes value_refs_from and type_refs_from - ReactiveTypeDeps produces all_type_refs_from - ReactiveExceptionRefs produces refs_from direction - Updated ARCHITECTURE.md and reactive pipeline diagram The forward algorithm is simpler and more suitable for reactive updates: 1. Identify roots (@live/@genType annotations or externally referenced) 2. Build index mapping declarations to outgoing references 3. Propagate liveness from roots through references 4. Return set of all live positions Both reactive and non-reactive modes correctly report 380 issues on test suite. Signed-Off-By: Cristiano Calcagno <[email protected]>
Removes refs_to direction from core data structures. Now only refs_from is stored, which is what the forward liveness algorithm needs. **Removed from storage:** - References.builder/t: removed value_refs_to, type_refs_to hashtables - ReactiveMerge.t: removed value_refs, type_refs fields - ReferenceStore: removed find_value_refs, find_type_refs functions **Added lazy computation:** - DeadCommon.RefsToLazy: computes refs_to by inverting refs_from - Only computed when debug OR transitive mode is enabled - Zero cost in the common case (no storage, no computation) Cost analysis: - Common case (no debug, no transitive): 0 storage, 0 computation - Debug or transitive enabled: O(refs) one-time cost at reporting This simplifies the reactive architecture - it now only deals with refs_from direction, matching the forward liveness algorithm. Signed-Off-By: Cristiano Calcagno <[email protected]>
- Add rank-based incremental fixpoint algorithm to Reactive module - BFS expansion when init/edges grow - Well-founded cascade contraction when init/edges shrink - Inverse index for efficient edge removal handling - Add comprehensive fixpoint tests (14 new test cases) - Cycles, diamond graphs, alternative support - Adding/removing base elements and edges - Delta emission verification - Use Lazy.t for ReactiveLiveness creation - Created on first force (after files processed) - Cached and reused across runs - Fix external refs detection in ReactiveLiveness - Use flatMap with synchronous decls.get lookup - Correctly identifies refs from non-declaration positions - Fix DceCommand.ml missing reactive_liveness parameter Performance: Run 1 ~11ms, Run 2+ ~1ms (incremental fixpoint) Signed-Off-By: Cristiano Calcagno <[email protected]>
- Change external_value_refs and external_type_refs from flatMap to join in ReactiveLiveness.ml, making the dependency on decls explicit - Update ReactiveLiveness to return a record with live, edges, roots fields for better debugging and introspection - Add trace_edges debug flag in Reactive.Fixpoint for future debugging - Document order-dependence issue in BUG-reactive-order-dependence.md The reactive pipeline has a delta ordering issue where refs arriving before decls causes spurious external refs that aren't properly cleaned up. Using Lazy.t as workaround until properly fixed.
- Remove RefsToLazy module (no longer builds inverse refs index) - Add shared make_hasRefBelow function used by both reactive and non-reactive modes - Non-reactive mode: iterates through References.t directly - Reactive mode: iterates through Reactive.t directly (no freezing needed) - Remove unused posHashFindSet and posHashAddSet helpers - Remove deadcode-minimal test directory (not useful) This eliminates the need for frozen refs in reactive mode when transitive=true, and shares the same O(total_refs) per-dead-decl algorithm between both modes.
- Clarify current status: collect_issues iterates all dead_decls + live_decls - Document TODO items: isInsideReportedValue, hasRefBelow, module marking - Document missing issues: 18 optional args (6 Redundant + 12 Unused) - Document correct issues: 362 dead code + 1 incorrect @dead
- Update ARCHITECTURE.md with current ReactiveSolver collections - Add table documenting all reactive collections in ReactiveSolver - Update pipeline stages to show full reactive flow - Regenerate reactive-pipeline.svg with detailed solver internals - Show timing breakdown (~0.7ms on cache hit for dead_code solving)
Track updates_received and updates_emitted for each reactive collection. Add stats field to Reactive.t type. Add print_stats to ReactiveSolver to show all collection statistics. Example output (benchmark, cold start): dead_decls: recv=142575 emit=40682 len=18050 live_decls: recv=142575 emit=27674 len=17500 dead_modules: recv=68356 emit=11856 len=1700 dead_decls_by_file: recv=40682 emit=40682 len=2900 issues_by_file: recv=40682 emit=40682 len=2900 incorrect_dead_decls: recv=38474 emit=50 len=50 dead_module_issues: recv=56563 emit=21416 len=1550 On cache hit (second run), counters don't increase - zero new updates.
Print roots, edges, and live (fixpoint) stats in timing mode. Example output (benchmark, cold start): ReactiveLiveness collection stats: roots: recv=156200 emit=156200 len=16203 edges: recv=445363 emit=445363 len=35550 live (fixpoint): recv=601563 emit=107025 len=31903 Key insights: - fixpoint receives 601,563 updates (roots + edges combined) - fixpoint emits 107,025 updates (BFS expansion deltas) - 31,903 positions are live (17,500 decls + type positions) - On cache hit (run 2), counters stay same = zero new work
Add a new Batch case to the delta type that allows sending multiple
updates at once. This enables more efficient processing during bulk
loading (e.g., when processing CMT files).
Key changes:
- Add Batch of ('k * 'v option) list to delta type
- Add set/remove helper functions for batch entries
- Update all combinators (flatMap, lookup, join, union, fixpoint)
to handle Batch deltas
- Process batch entries, deduplicate affected keys, emit as batch
- Add tests for batch processing
Batch processing uses hashtbl internally for deduplication but
passes batches as lists between combinators. This matches the
existing pattern in combinators and provides O(1) dedup.
Example usage:
emit (Batch [
Reactive.set "a" 1;
Reactive.set "b" 2;
Reactive.remove "c";
])
Process all files as a single Batch delta instead of individual deltas. This dramatically reduces the number of updates flowing through reactive combinators during bulk loading. Key changes: - Add process_files_batch to ReactiveFileCollection (emits single Batch) - Add process_file_silent helper (processes without emitting) - Update ReactiveAnalysis.process_files to use batch processing Performance improvement on deadcode-benchmark: - Before: fixpoint recv=601,563 updates - After: fixpoint recv=13 updates (batches) This means downstream combinators process all file data together instead of handling 600K+ individual updates one at a time.
Instead of calling apply_init_delta (which runs BFS) for each entry in a batch, collect all new roots first and run a single BFS expansion. Before: Batch of N roots → N separate BFS operations After: Batch of N roots → 1 BFS with N-element frontier This matches how the fixpoint initialization already works. Note: The main CMT processing overhead (~2.5s) comes from other combinators (flatMaps, joins), not the fixpoint. The fixpoint only receives 13 batch updates vs 600K+ individual deltas before batching.
This reverts commit 42d0efe.
… Reactive - Add adds_received/removes_received to track input churn - Add adds_emitted/removes_emitted to track output changes - Add process_count (runs) to track how many times each node processes - Delete unused V1 API (Reactive.ml/mli) - Rename ReactiveV2 to Reactive throughout codebase - Split ReactiveTest.ml into themed test files - Add Makefile and README for analysis/reactive - Clean up README to remove V1/V2 references
- Add -mermaid CLI flag to output auto-generated Mermaid diagram - Update manual diagram to show all nodes (no hidden intermediate nodes) - Use clearer names (less abbreviated, multi-line where needed) - Regenerate SVG
- Remove lookup combinator (was deleted) - Add source and Batch delta type - Update scheduler description to accumulate-then-propagate design - Replace ASCII diagram with reference to Mermaid diagram - Add Stats Tracking section documenting -timing output - Update Reactive module description
- Use readable names instead of abbreviations (decls not D, etc.) - Keep high-level abstraction (~25 nodes, hides internal details) - Document -mermaid flag for full 44-node auto-generated diagram - Regenerate SVG
- Generate reactive-pipeline-full.mmd and .svg (44 nodes) - Link from ARCHITECTURE.md to the full diagram
- Add label parameter to Registry.add_edge - flatMap, join, union edges labeled with combinator name - fixpoint edges labeled as 'roots' and 'edges' - Regenerate reactive-pipeline-full.mmd with labels
Multi-input combinators (join, union, fixpoint) are now shown as
separate diamond-shaped nodes in the diagram:
left --> combinator{join}
right --> combinator
combinator --> result
This makes the diagram more faithful to the actual data flow.
- join nodes: light blue (#e6f3ff) - union nodes: light orange (#fff0e6) - fixpoint nodes: light green (#e6ffe6) - Remove level annotations from node labels
- Dead module issues with ghost locations now use fileName from declaration instead of _none_ from the ghost location - hasRefBelow now correctly checks ALL refs (including cross-file) since refIsBelow returns true for any cross-file ref - Both fixes ensure reactive/non-reactive produce identical results - Verified against master: all 4 mode combinations match exactly
- Add -churn n CLI option to simulate file add/remove cycles - Alternates between removing n random files and adding them back - Shows issue count changes per run with detailed diff - Add remove_batch to ReactiveFileCollection for batched removals (28x faster) - Add reset_stats to Reactive for per-operation stat tracking - Include churn time in timing report with proper accounting - Print aggregate stats at end: mean/std for churn time and issue changes - Add skip_file parameter to filter removed files from processing
Remove the temporary -parallel flag and Domains-based parallel loader from reanalyze, and update docs/tests/bench scripts accordingly. Signed-off-by: Cristiano Calcagno <[email protected]>
5e3f37b to
4037f6d
Compare
- Remove stale references to -cmt-cache/skip-lite and outdated parallel benchmark notes - Clarify reactive CLI usage and document churn/mermaid flags Signed-off-by: Cristiano Calcagno <[email protected]>
- Print forward-graph summary (roots + decl-deps) in -debug mode\n- Add per-decl dependency context (incoming/outgoing decl edges)\n- Update tests expected output Signed-off-by: Cristiano Calcagno <[email protected]>
- When -timing is enabled, print a per-wave scheduler summary showing which nodes ran and their recv/emit deltas and time - Enhance Reactive.print_stats with a Top-5 nodes by total process time section Signed-off-by: Cristiano Calcagno <[email protected]>
zth
approved these changes
Dec 28, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds a reactive pipeline for reanalyze and a small reactive-collections library.
Key changes: