Skip to content

feat: Write transaction fragments directly to storage to reduce consumer memory footprint#3783

Merged
alco merged 33 commits intomainfrom
alco/write-txn-fragments-to-storage
Feb 26, 2026
Merged

feat: Write transaction fragments directly to storage to reduce consumer memory footprint#3783
alco merged 33 commits intomainfrom
alco/write-txn-fragments-to-storage

Conversation

@alco
Copy link
Member

@alco alco commented Jan 27, 2026

Closes #3415

Summary

For shapes without subquery dependencies (the vast majority), the consumer now streams each transaction fragment directly to storage as it arrives, instead of buffering the entire transaction in memory. This significantly reduces per-shape-consumer memory usage for large transactions.

Shapes with subquery dependencies continue to buffer the full transaction (write_unit=txn) because they need the complete change set for move-in/move-out reasoning.

How it works

The consumer operates in one of two modes, determined at shape creation time:

  • write_unit=txn_fragment (default): Each fragment is filtered, converted to log items, and written to storage immediately. A lightweight PendingTxn struct tracks only metadata (xid, offset, byte count, change count) across fragments. When the final fragment with a commit arrives, storage is notified to advance its transaction boundary.

  • write_unit=txn (shapes with subquery dependencies): Fragments are accumulated into a complete transaction before writing, as before.

Key additions

Consumer.PendingTxn — tracks in-progress transaction metadata during fragment streaming. It is the lightweight counterpart to TransactionBuilder: where TransactionBuilder accumulates all changes in memory, PendingTxn only tracks offsets and counters while actual data is written to storage and discarded immediately.

New storage callbacksappend_fragment_to_log!/2 writes log items without advancing the persisted transaction offset (can be called multiple times per transaction), and signal_txn_commit!/2 marks the transaction as complete so crash-recovery sees the correct committed boundary. The existing full-transaction write path is refactored to share the same underlying logic.

Support.StorageTracer — test helper using Erlang trace sessions to observe which storage functions are called from a process, enabling non-intrusive assertions on the write path taken without wrapper modules or function patching.

Other notable changes

  • Initial snapshot filtering (xid-based skip) is now applied at the first fragment, avoiding accumulation of fragments for transactions that will be discarded.
  • The FlushTracker in ShapeLogCollector is now only invoked for fragments that include a commit, since mid-transaction fragments don't represent a flush boundary.
  • Removed CrashingFileStorage (dead code).

Expected memory footprint reduction

Previously, the consumer held an entire transaction's worth of change structs in memory until commit. Now, each fragment's changes are written to storage and released immediately. The consumer only retains the PendingTxn metadata struct (~100 bytes) per in-progress transaction, reducing peak memory by roughly the size of the buffered change data. The improvement scales linearly with transaction size.

Testing

Consumer and storage tests have been expanded to cover both write_unit modes, including tracing-based assertions on which storage functions are called.

@coderabbitai
Copy link

coderabbitai bot commented Jan 27, 2026

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review

Comment @coderabbitai help to get the list of available commands and usage tips.

@netlify
Copy link

netlify bot commented Jan 27, 2026

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 846aa9f
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/699dcb614dec2f00085735af
😎 Deploy Preview https://deploy-preview-3783--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov
Copy link

codecov bot commented Jan 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.20%. Comparing base (8f2f7bd) to head (b4b89a2).
⚠️ Report is 11 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3783      +/-   ##
==========================================
+ Coverage   87.06%   87.20%   +0.13%     
==========================================
  Files          25       25              
  Lines        2343     2391      +48     
  Branches      592      595       +3     
==========================================
+ Hits         2040     2085      +45     
- Misses        301      304       +3     
  Partials        2        2              
Flag Coverage Δ
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 91.87% <ø> (+0.05%) ⬆️
packages/y-electric 56.05% <ø> (ø)
typescript 87.20% <ø> (+0.13%) ⬆️
unit-tests 87.20% <ø> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blacksmith-sh

This comment has been minimized.

@alco alco force-pushed the alco/write-txn-fragments-to-storage branch from d3dc0f4 to 1c882c2 Compare January 28, 2026 11:18
@alco alco force-pushed the alco/write-txn-fragments-to-storage branch from 1c882c2 to 71b61f5 Compare February 9, 2026 15:12
@blacksmith-sh

This comment has been minimized.

@alco alco force-pushed the alco/write-txn-fragments-to-storage branch from dba7fc3 to c7d9d17 Compare February 10, 2026 12:42
@blacksmith-sh

This comment has been minimized.

@alco alco changed the title Alco/write txn fragments to storage feat: Write transaction fragments directly to storage to reduce consumer memory footprint Feb 10, 2026
@alco alco force-pushed the alco/write-txn-fragments-to-storage branch from c7d9d17 to ccaed2d Compare February 10, 2026 13:54
@alco
Copy link
Member Author

alco commented Feb 10, 2026

Manual memory benchmark results

The tests below where done separately with electric@main and electric@this-branch. The database had a single table

CREATE TABLE IF NOT EXISTS public.users (
  id bigserial PRIMARY KEY,
  name TEXT NOT NULL,
  email TEXT NOT NULL                                                  
);

and Electric was initialized with 200 shapes, each shape having a where clause N=N where N in 1..200.

After creating the shapes and firing up Observer, I executed the following INSERT in the database:

insert into                              
  public.users(name, email)
    select
      'username_' || generate_series::text || '_' || repeat('x', 2000),
      'username_' || generate_series::text || '@example.com'
    from
      generate_series(1,10001);

Memory charts

The total memory usage on electric@main grows almost to 9GB during transaction processing. For electric@this-branch the total memory usage stays below 500MB.

electric@main

electric-main-mem-chart

electric@this-branch

electric-fragment-mem-chart

System stats comparison

For each run, I screenshotted the system stats in Observer both before the INSERT and after the memory usage had fallen back down (after GC and consumer hibernation).

The key thing to notice from these is that electric@main has residual process memory usage after the load testing which is not the case for electric@this-branch.

electric@main

electric-main-stats

electric@this-branch

electric-fragment-stats

@alco alco marked this pull request as ready for review February 10, 2026 14:26
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ccaed2d3ca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@alco
Copy link
Member Author

alco commented Feb 10, 2026

@claude please review this PR.

  • Does it address the referenced issue?
  • Do you see any obvious issue in the code?
  • Lack of test coverage?
  • Any gnarly or hairy code?

@alco
Copy link
Member Author

alco commented Feb 11, 2026

I've run the replication throughput benchmark on our cloud fleet with the following params, comparing main (i.e. electricsql/electric:canary) with this PR (pushed to Docker Hub as alco/electric:fragment-direct-pr):

%{
  "row_size" => [150],
  "shape_count" => [5, 100, 1000, 2000],
  "tx_affects_shape" => [true, false],
  "tx_count" => [1000],
  "tx_row_count" => [90, 100, 120, 200, 220],
  "where_clause" => ["name = '\#{name}'", "name ILIKE '\#{name}%'"]
}
canvas

@balegas
Copy link
Contributor

balegas commented Feb 11, 2026

Every microsecond counts!

Copy link
Contributor

@robacourt robacourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great work! The interaction with the Materializer will need addressing, maybe consumers with materializers should be write_unit=txn

@alco alco added the claude label Feb 11, 2026
@claude
Copy link

claude bot commented Feb 11, 2026

Claude Code Review

Summary

Tenth review iteration of PR #3783 (2026-02-24). Since Review #9, a significant batch of commits landed that address all four remaining "What Remains" items plus two chatgpt-codex P1 concerns from Feb 10. The PR is now in excellent shape. One low-probability race condition from Review #9 remains open.


What's Working Well

Storage capability check (b5952c8837 + 596d828b25):

validate_storage_capabilities in State.initialize/3 is a clean implementation. It calls Storage.supports_txn_fragment_streaming?/1 and gracefully downgrades write_unit to :txn with a warning for unsupported backends. The InMemoryStorage and TestStorage both implement supports_txn_fragment_streaming?/0false, closing the chatgpt-codex P1 concern about InMemoryStorage crashing on multi-fragment transactions.

Flush boundary offset fix (2198e59a6d):

maybe_complete_pending_txn now uses txn_fragment.last_log_offset (the WAL commit boundary) as the right-hand side of txn_offset_mapping, instead of the previous approach that could use the last-relevant-change offset. This correctly handles the case where the commit fragment has no relevant changes — the flush boundary now always advances to the true WAL commit offset, not just to the last relevant-change offset. This closes the chatgpt-codex P1 from Feb 10.

Split notification functions (b17c81fee2):

notify_new_changes:ok, with separate notify_materializer_of_new_changes and notify_clients_of_new_changes is a clean refactor. The :ok = ... assertion pattern at every call site is good defensive style. The materializer now correctly receives commit: false per fragment and a final commit: true commit signal (via the empty-list path in maybe_complete_pending_txn).

Expanded test coverage:

The new tests in consumer_test.exs (buffered fragments, snapshot filtering, idempotency, flush boundaries) provide substantial coverage that effectively replaces the "oracle test" concern from Review #9. Support.StorageTracer with Erlang :trace sessions is a well-designed non-intrusive tracing mechanism. The assert_receive_timeout-based timeout in collect_traced_calls/1 makes assertions reliable without flakiness.

is_subquery_shape? unit tests (e2b3dc8924):

state_test.exs now covers all three write_unit cases: standalone shape → :txn_fragment, shape with dependencies → :txn, and is_subquery_shape?: true:txn. These directly prevent the regression risk noted in Review #9 suggestion #3.

Changeset (0f8f833d6b, b162c97291):

Now accurately documents the three new callbacks and the automatic fallback behavior for InMemoryStorage.


Issues Found

Important (Should Fix)

1. Pre-existing standalone consumer + new materializer subscription race

File: packages/sync-service/lib/electric/shapes/consumer.ex:210-215

Carried over from Review #9 — still unaddressed. subscribe_materializer returns state.latest_offset:

def handle_call({:subscribe_materializer, pid}, _from, state) do
  Process.monitor(pid, tag: :materializer_down)
  {:reply, {:ok, state.latest_offset}, %{state | materializer_subscribed?: true}, ...}
end

For a write_unit: :txn_fragment consumer, state.latest_offset is updated per-fragment in write_txn_fragment_to_storage. If a materializer subscribes while a multi-fragment transaction is in progress, state.latest_offset is a mid-transaction offset. The materializer's read_stream reads storage up to this offset, but PureFileStorage.get_log_stream is bounded by last_seen_txn_offset (only advanced by signal_txn_commit!/2). The materializer reads nothing for the in-flight transaction, and the per-fragment changes written before subscription are never re-delivered via notify_materializer_of_new_changes.

Impact: Incorrect move-in/out tracking in the outer shape for that one transaction.

Probability: Low — requires (a) inner shape created independently before outer shape, (b) outer shape created while a multi-fragment transaction is active on the inner shape.

Suggested mitigation: In subscribe_materializer, return the last committed offset from storage rather than state.latest_offset:

def handle_call({:subscribe_materializer, pid}, _from, state) do
  Process.monitor(pid, tag: :materializer_down)
  {:ok, safe_offset} = ShapeCache.Storage.fetch_latest_offset(state.storage)
  {:reply, {:ok, safe_offset}, %{state | materializer_subscribed?: true}, state.hibernate_after}
end

This ensures the materializer always starts from a committed boundary.


Suggestions (Nice to Have)

2. Changeset claim doesn't match behavior for backends missing supports_txn_fragment_streaming?/0

File: .changeset/brown-rice-float.md, packages/sync-service/lib/electric/shapes/consumer/state.ex:264-278

The changeset says: "Custom storage backends that do not implement these callbacks will automatically fall back to the default write_unit=txn mode". This is true for backends that implement supports_txn_fragment_streaming?/0 → false (like InMemoryStorage), but a backend that doesn't implement the callback at all would get an UndefinedFunctionError in validate_storage_capabilities rather than falling back.

The @callback definition will produce compile-time warnings for non-implementing backends, which mitigates the risk in practice. But if you want the changeset claim to be strictly accurate, a guard like function_exported?(mod, :supports_txn_fragment_streaming?, 0) would ensure clean fallback for older backends:

defp validate_storage_capabilities(%__MODULE__{write_unit: @write_unit_txn_fragment} = state, storage) do
  {mod, _opts} = storage
  supports? = function_exported?(mod, :supports_txn_fragment_streaming?, 0) and mod.supports_txn_fragment_streaming?()
  if supports? do
    state
  else
    # ... existing downgrade logic
  end
end

3. Pre-existing: pending_txn: nil crash on non-begin fragment

process_txn_fragment/2 at line ~616 accesses txn.consider_flushed? where txn = state.pending_txn. A non-begin fragment arriving with pending_txn: nil (malformed WAL stream or redelivery edge case) would crash with ArgumentError: argument error on nil.consider_flushed?. Very low probability in practice; still no change from Review #9.


Issue Conformance

Linked issue #3415: "Avoid holding whole transaction in consumer memory"

✅ Core objective delivered: memory improvement via direct fragment-to-storage streaming
✅ Fragment streaming for standalone shapes (no subquery deps, no materializer)
✅ Full-txn buffering for shapes in subquery hierarchy (is_subquery_shape?: true)
✅ Storage capability check: Falls back to :txn mode with warning for InMemoryStorage
✅ Materializer correctness: commit: false/true pattern ensures consistent transaction view
✅ Mid-transaction crash: Eliminated by setting write_unit at initialization
✅ Flush boundary fix: txn_offset_mapping now uses WAL commit offset
✅ Changeset: Documents new storage interface additions and fallback behavior
✅ Unit tests: Cover all write_unit modes, snapshot filtering, idempotency, flush boundaries
⚠️ Standalone materializer subscription race: Still present, low probability


Previous Review Status (Review #9 → Review #10)

What was addressed:

What remains:

  1. ⚠️ Standalone inner-shape materializer subscribe race — low probability, subscribe_materializer returns potentially mid-transaction offset

Overall Assessment

The PR is merge-ready from a correctness standpoint. The critical architecture decisions (write_unit set at init time, materializer commit deferral, flush boundary alignment) are all correct. The one remaining Important issue is a low-probability edge case that requires a specific timing window. If the team is comfortable with the probability, this can merge as-is and the fix can be a follow-up. If not, the mitigation is a one-line change in subscribe_materializer.


Review iteration: 10 | 2026-02-24

@alco alco force-pushed the alco/write-txn-fragments-to-storage branch 2 times, most recently from 5f18340 to 4714a7a Compare February 16, 2026 13:08
@alco
Copy link
Member Author

alco commented Feb 16, 2026

Update (Feb 16)

Rebased on current main and pushed new commits addressing review feedback:

  • Split materializer and client notifications (a93967e): Separated notify_new_changes into distinct functions for notifying the materializer vs. clients. This is needed to selectively notify the appropriate subscriber when processing individual txn fragments.

  • Use commit fragment offset for txn_offset_mapping and client notifications (78a629a): Fixes the flush alignment issue flagged in review — when the final commit fragment is filtered out, the mapping now uses the commit fragment's last_log_offset directly instead of the last relevant-change boundary. PendingTxn no longer needs to track last_log_offset.

  • Add supports_txn_fragment_streaming? callback to Storage behaviour (4e630ab): InMemoryStorage and TestStorage return false, PureFileStorage returns true. This allows runtime capability checking before enabling fragment streaming mode.

  • Downgrade write_unit to :txn when storage lacks fragment streaming support (4714a7a): On consumer init, validates that the storage backend supports txn fragment streaming. If unsupported (e.g. InMemoryStorage), falls back to full-transaction buffering with a warning log. This addresses the concern about InMemoryStorage crashing on fragment callbacks.

Still TODO: address @robacourt's feedback about shapes that are dependencies (innermost shapes with no dependencies themselves but that DO have a materializer) — these should probably use write_unit=txn to ensure the materializer gets the full transaction change set on commit.

Based on current implementation, Materializer sees changes from all txn fragments but when it then notifies the consumer with a :materializer_changes message in return, Consumer may still be assuming that a complete transaction has been processed.

@blacksmith-sh

This comment has been minimized.

@alco
Copy link
Member Author

alco commented Feb 16, 2026

Latest changes (3 new commits since last push)

Defer Materializer subscriber notifications until commit fragment

Previously, when write_unit=:txn_fragment, the inner Consumer called Materializer.new_changes() per fragment, and the Materializer immediately sent :materializer_changes (move-in/move-out events) to the outer Consumer after each fragment. This could cause redundant intermediate notifications within a single transaction.

Changes:

  • Added a commit: option to Materializer.new_changes/3 (defaults to true for backward compatibility). When commit: false, events are accumulated in a new pending_events field on the Materializer state without notifying subscribers. When commit: true, accumulated events are flushed to subscribers.
  • The fragment-streaming path in consumer.ex passes commit: false for every fragment. The flush happens via notify_new_changes in maybe_complete_pending_txn which uses commit: true (the default).
  • Events from multiple fragments are merged by prepending newer events to older ones, matching the existing reverse-chronological ordering convention used by increment_value/decrement_value.
  • Added a test verifying correct event ordering across uncommitted fragments.
  • Moved signal_txn_commit! inside the num_changes > 0 branch since there's no need to signal storage when no changes were written.

@alco
Copy link
Member Author

alco commented Feb 16, 2026

Regarding an erlier agentic comment about the potential impact of direct txn fragment writing to storage on PureFileStorage.trim_log!, I've asked Claude to analyse it and here's the result - https://gist.github.com/alco/588b3d7d7665f63d768d5f1b611499c7#file-fragment_storage_safety_analysis-md.

TL;DR: Fragment Streaming Storage Safety

All 7 crash/consistency scenarios are safe. The three-offset architecture (last_seen_offset / last_seen_txn_offset / last_persisted_txn_offset) provides the core safety guarantee:

  • trim_log!/2 correctly truncates uncommitted fragments — it uses last_persisted_txn_offset (only advanced after a complete txn is finalized + flushed) to remove both chunk index entries and log file bytes beyond the committed boundary.
  • Readers never see uncommitted datastream_main_log caps reads at last_seen_txn_offset, which isn't advanced until signal_txn_commit! calls finalize_txn.
  • ETS consistency is maintained — fragment entries stay in an in-memory buffer until finalize_txn; the disk/ETS split read strategy handles mid-transaction flushes correctly.
  • Compaction has one theoretical edge case (transaction spanning 30+ MB creating 3+ chunks) where the compacted file could retain uncommitted data after a crash, but readers are still bounded by the txn offset, so no data corruption occurs. Practically, this is a near-zero likelihood scenario.

No code changes are needed — the existing invariants hold correctly with the fragment write path.

Copy link
Contributor

@robacourt robacourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. To test the affect of this on subqueries, you could cherry-pick 0cc3b49 and run the tests with the number of mutations in the test transactions over the max txn size and see if there any issues:

ELECTRIC_EXPERIMENTAL_MAX_TXN_SIZE=5 \
MUTATIONS_PER_TXN=15 \
mix test --only oracle test/integration/oracle_property_test.exs

@alco
Copy link
Member Author

alco commented Feb 16, 2026

@robacourt That's dope! It has found one failure already after running for ~10sec. I'll look into fixing it and see if it can discover anything else.

alco and others added 12 commits February 23, 2026 17:21
…rate functions

This is needed to be able to selectively notify the appropriate
subscriber and processing individual txn fragments.
…tions

PendingTxn no longer needs to track last_log_offset since the commit
fragment's offset (txn_fragment.last_log_offset) is now used directly in
maybe_complete_pending_txn. This fixes flush alignment lagging behind
when the final commit fragment is filtered out.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
InMemoryStorage and TestStorage return false, PureFileStorage returns
true. This allows runtime capability checking before enabling fragment
streaming mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pport

When initializing a consumer, validate that the storage backend
supports txn fragment streaming before using :txn_fragment mode.
If unsupported (e.g. InMemoryStorage), fall back to full-transaction
buffering with a warning log.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, Materializer sent :materializer_changes to subscribers on
every new_changes call. When write_unit=:txn_fragment, this meant the
outer Consumer could receive multiple intermediate move-in/move-out
notifications within a single transaction, causing redundant work.

Add a commit: option to Materializer.new_changes/3 (default: true).
Changes are accumulated in pending_events and only flushed to
subscribers when commit: true. The fragment-streaming path passes
commit: not is_nil(fragment.commit) to defer until the final fragment.
maybe_complete_pending_txn sends an empty new_changes with commit: true
to handle the edge case where the commit fragment has no matching
changes but earlier fragments accumulated events.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verifies that splitting inserts across two new_changes calls (one
uncommitted, one committed) produces the same event order as sending
them in a single call.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…_to_storage

The :includes_truncate and {[], 0} branches returned {state, []} instead
of bare state, causing a FunctionClauseError crash when piped into
maybe_complete_pending_txn which pattern-matches on %State{}.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…fications

maybe_complete_pending_txn was notifying clients with
txn_fragment.last_log_offset — the offset of the last change across ALL
tables in the commit fragment. But fetch_latest_offset returns the offset
of the last change written to THIS shape's storage. When a split
transaction has changes to tables not matching the shape, the client
receives a higher offset than what the shape's storage knows about,
causing "offset out of bounds" on the next poll.

Fix: use state.latest_offset (the last offset actually written to this
shape's log) instead of the fragment's cross-table offset.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alco alco force-pushed the alco/write-txn-fragments-to-storage branch from 751cf24 to cdf0cb7 Compare February 23, 2026 16:23
@alco
Copy link
Member Author

alco commented Feb 23, 2026

I have pushed a fix for the issue described in #3783 (comment) and have set write_unit=txn even for leaf Consumer processes - those that are started for the innermost shape of a shape with subqueries.

Still it's impossible to be sure this branch doesn't add new failures on top of those already observed in main until we figure out the root cause of those. This is just one example of different failures that are readily reproducible with oracle tests in main.

@alco alco marked this pull request as ready for review February 24, 2026 09:21
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cdf0cb70ac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

alco and others added 3 commits February 24, 2026 16:52
…hape-with-subqueries hierarchy

The write_txn_fragment_to_storage path in Consumer was designed with
simplifications that assumed the consumer has no inner shape dependencies:
it doesn't account for move-ins/move-outs, doesn't call State.track_change,
and evaluates WHERE clauses without extra_refs from inner materializers.

These simplifications are correct for true leaf consumers, but a consumer
can have a materializer subscribed to it by an outer shape even if it was
initially assigned write_unit=txn_fragment (based on having no dependencies
of its own). When such a consumer processes fragments directly, the outer
shape's do_handle_txn call to Materializer.get_all_as_refs sees intermediate
materializer state that hasn't been properly committed, leading to incorrect
WHERE clause evaluation in the outer shape.

Instead of reacting to materializer subscription events at runtime, we now
initialize the consumer with write_unit=txn upfront when its shape is known
to be part of a shape-with-subqueries hierarchy. This ensures that any
consumer whose state may be observed by an outer shape via extra_refs
buffers all fragments and processes the complete transaction atomically
from the start, guaranteeing the materializer's value_counts are updated
in a single consistent operation before the outer shape queries them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
No need to pass the whole state, only stack_id is used from it. Makes
the data dependencies between functions clearer.
@alco alco force-pushed the alco/write-txn-fragments-to-storage branch from cdf0cb7 to 846aa9f Compare February 24, 2026 16:01
@blacksmith-sh

This comment has been minimized.

alco and others added 3 commits February 24, 2026 22:40
…angeset

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tack

This is a temporary measure to get the code changes made in this branch
so far out of the door before polishing interop between subqueries and
individual txn fragment writing to storage.
@alco
Copy link
Member Author

alco commented Feb 24, 2026

In order to get this PR merged, I have disabled individual txn fragment writing to storage when the allow_subqueries feature flag is present in the stack config.

I will undo that and will continue fixing edge cases for shapes with subqueries in this follow-up PR.

Copy link
Contributor

@robacourt robacourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just left a couple of comments

|> build_shape_dependencies(known)
|> build_shape_dependencies(false, known)

# Any inner shape of a root shape with subqueries must pass the is_subquery_shape? option
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why being a root is a special case. Any dependency should have the is_subquery_shape?: true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, ignore me. You're just excluding the root from being is_subquery_shape?. Fair enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in my head seems more straightforward that you tell a shape whether it's a subquery shape or not, when building the root thats false and when it builds it's dependencies it's true. No need to even mention the root. Also is_subquery_shape? could always be set rather than some additional conditional logic. But this is a very minor nit!

Enum.zip(shape.shape_dependencies_handles, shape.shape_dependencies)
|> Enum.reject(fn {handle, _shape} -> MapSet.member?(known, handle) end)
|> build_shape_dependencies(known)
|> build_shape_dependencies(false, known)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a shape become a subquery shape after it's created? So if SELECT id FROM products was created as a shape first and then id IN (SELECT id FROM products) later, would the 2nd shape use the first as the subquery shape? If so setting is_subquery_shape? on creation breaks. If this is the case personally I wouldn't be against ensuring that shapes and subquery shapes were distinct to prevent this from happening, it's only an optimisation to reuse shapes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can address both your comments here.

You are right, a shape may start independent but then have a materializer subscribe to it at a later point in time. I attempted to identify inner shapes using the is_subquery_shape? option passed at consumer startup but then determined that it wouldn't be sufficient. If we want to enable txn fragment processing for stacks that have allow_subqueries, we need to make it work even for shape with subqueries properly, at least in the case where shape.shape_dependencies == []. At that point, the is_subquery_shape? flag becomes superfluous and will be removed from the codebase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so your plan for this PR is to not enable txn fragment processing for stacks that have allow_subqueries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly.

Once this is merged, direct txn fragment writing to storage will only be enabled for stack that don't have subqueries.

@alco alco merged commit 48bbbe3 into main Feb 26, 2026
63 of 64 checks passed
@alco alco deleted the alco/write-txn-fragments-to-storage branch February 26, 2026 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid holding whole transaction in consumer memory

3 participants