Skip to content

Restore write_unit=txn_fragment for shape consumers#3906

Draft
alco wants to merge 1 commit intomainfrom
alco/enable-txn-fragment-to-storage
Draft

Restore write_unit=txn_fragment for shape consumers#3906
alco wants to merge 1 commit intomainfrom
alco/enable-txn-fragment-to-storage

Conversation

@alco
Copy link
Member

@alco alco commented Feb 24, 2026

Context

PR #3783 introduced the infrastructure for streaming transaction fragments directly to storage (write_unit=txn_fragment) instead of buffering entire transactions in consumer memory. This dramatically reduces memory usage for large transactions (9GB → 500MB in benchmarks).

However, correctness issues emerged with subquery shapes, and the final version of #3783 sets write_unit=txn for all shapes to ship a safe baseline. All the fragment-streaming code paths remain in the codebase but are currently unreachable.

This PR tracks re-enabling write_unit=txn_fragment, starting with the simpler case (standalone shapes) and eventually covering all shapes.

Phase 1: Restore txn_fragment for standalone shapes (no subquery dependencies)

Standalone shapes have no materializer subscribers and no shape dependencies. The fragment-streaming code path was already working for these shapes before it was disabled.

  • In State.initialize_shape/3, set write_unit=txn_fragment for shapes where shape_dependencies == [] and is_subquery_shape? == false
  • Run the oracle property-based tests for standalone shapes and confirm no new failures compared to main
  • Verify memory usage improvement on large transactions with a manual or automated benchmark

Phase 2: Restore txn_fragment for inner (dependency) subquery shapes

For inner shapes each consumer process has a materializer process subscribed to it. Outer shape's consumer is in turn subscribed to the inner shape's materializer to correctly handle move-ins and move-outs. Fragment streaming for these shapes requires the materializer to correctly defer event processing until all changes for the current transaction have been processed.

  • Fix the materializer subscription race: in subscribe_materializer, (AI hallucations: return the last committed offset from storage (Storage.fetch_latest_offset) instead of state.latest_offset, which can be a mid-transaction fragment offset ahead of the committed boundary)
    • File: lib/electric/shapes/consumer.ex, handle_call({:subscribe_materializer, ...})
  • Set write_unit=txn_fragment for shapes with is_subquery_shape? == true (and no shape_dependencies of their own)
  • Verify the commit: false / commit: true deferred notification path is exercised end-to-end
  • Add test coverage: inner shape with write_unit=txn_fragment and a materializer subscriber receives a multi-fragment transaction; the materializer's pending_events accumulate across fragments and only flush on commit
  • Run oracle tests for shapes-with-subqueries and confirm no regressions

Phase 3: Restore txn_fragment for outer (parent) subquery shapes

Outer shapes have shape_dependencies != [] and process materializer events (move-ins/move-outs) as part of their transaction handling. This is the hardest case.

  • Audit write_txn_fragment_to_storage for move-in/move-out correctness
  • Decide on the approach for materializer events arriving mid-fragment-write
  • Implement fragment-level change conversion that accounts for the shape's subquery state
  • Add test coverage: outer shape with dependencies receives a multi-fragment transaction while materializer events arrive from inner shapes mid-transaction
  • Run full oracle test suite and confirm parity with write_unit=txn

Additional items

  • Handle the edge case where a standalone consumer with write_unit=txn_fragment is later adopted as an inner shape for a newly created outer subquery shape

References

@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.20%. Comparing base (48bbbe3) to head (4348ebe).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3906   +/-   ##
=======================================
  Coverage   87.20%   87.20%           
=======================================
  Files          25       25           
  Lines        2391     2391           
  Branches      600      599    -1     
=======================================
  Hits         2085     2085           
  Misses        304      304           
  Partials        2        2           
Flag Coverage Δ
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 91.87% <ø> (ø)
packages/y-electric 56.05% <ø> (ø)
typescript 87.20% <ø> (ø)
unit-tests 87.20% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Feb 24, 2026

Found 1 test failure on Blacksmith runners:

Failure

Test View Logs
Elixir.Electric.ShapeCacheTest/test get_or_create_shape_handle/
2 against real db crashes when initial snapshot query fails to return data quickly enou
gh
View Logs

Fix in Cursor

Base automatically changed from alco/write-txn-fragments-to-storage to main February 26, 2026 14:32
@alco alco force-pushed the alco/enable-txn-fragment-to-storage branch from c5100b8 to 4348ebe Compare February 26, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant