From 2d8b5eff163a7e8c131ce41f65e3e67a4676790a Mon Sep 17 00:00:00 2001
From: shruti2522 <shruti.apc01@gmail.com>
Date: Wed, 8 Apr 2026 15:25:21 +0000
Subject: [PATCH] docs: research wasmtime Gc architecture for boa_gc API
 redesign

---
 notes/wasm_gc_research/1_collector_arch.md    |  66 +++++++++++
 notes/wasm_gc_research/2_deferred_ref.md      |  68 +++++++++++
 notes/wasm_gc_research/3_object_layout.md     |  88 ++++++++++++++
 notes/wasm_gc_research/4_root_tracking.md     |  90 ++++++++++++++
 .../5_api_safety_and_tracer_design.md         | 106 +++++++++++++++++
 notes/wasm_gc_research/conclusion.md          | 110 ++++++++++++++++++
 6 files changed, 528 insertions(+)
 create mode 100644 notes/wasm_gc_research/1_collector_arch.md
 create mode 100644 notes/wasm_gc_research/2_deferred_ref.md
 create mode 100644 notes/wasm_gc_research/3_object_layout.md
 create mode 100644 notes/wasm_gc_research/4_root_tracking.md
 create mode 100644 notes/wasm_gc_research/5_api_safety_and_tracer_design.md
 create mode 100644 notes/wasm_gc_research/conclusion.md
diff --git a/notes/wasm_gc_research/1_collector_arch.md b/notes/wasm_gc_research/1_collector_arch.md
new file mode 100644
index 0000000..aa355df
--- /dev/null
+++ b/notes/wasm_gc_research/1_collector_arch.md
@@ -0,0 +1,66 @@
+# Part 1: Collector Architecture Lessons from Wasmtime
+
+Note author: shruti2522
+
+## Background
+
+Wasmtime recently shipped support for the Wasm GC proposal. They built a pluggable GC infrastructure in Rust to handle WasmGC structs and arrays. Wasmtime faces similar constraints to us, a safe Rust API, multiple collector options and adding GC onto an existing runtime that was not built with it in mind. Their architecture has some useful lessons.
+
+## Keep Collector Traits Internal
+
+Wasmtime has two internal traits for GC implementation, one used at compile time for object layout and write barrier insertion and one used at runtime for allocation and collection. These traits are explicitly not public API. Embedders never see them.
+
+Instead, embedders pick a collector through a simple public enum. An `Auto` variant picks a sensible default, which can change between releases without breaking anything.
+
+The same pattern makes sense for `boa_gc`:
+
+```rust
+// internal trait in boa_gc (not public)
+pub(crate) trait Collector {
+    fn allocate(&mut self, layout: Layout) -> *mut u8;
+    fn collect(&mut self);
+    fn write_barrier(&mut self, obj: GcBox<dyn Trace>);
+}
+
+// public configuration
+pub enum GcStrategy {
+    Auto,
+    MarkSweep,
+    NullCollector,
+}
+```
+
+This lets us add new collectors, change internal interfaces and swap the `Auto` default without breaking the public API.
+
+## Cargo Feature Flags Per Collector
+
+Each collector in Wasmtime is behind its own cargo feature. This keeps builds with no GC at zero cost, lets embedded builds include only lightweight collectors and speeds up test compile times.
+
+```toml
+[features]
+default = ["gc-mark-sweep"]
+gc-mark-sweep = []
+gc-null = []
+gc-drc = []
+```
+
+Test suites can then compile with `--no-default-features --features gc-null` for faster builds and embedded targets can avoid pulling in a full collector.
+
+## Build the Null Collector First
+
+Wasmtime built a null collector (bump allocation, no collection) before implementing DRC. It traps when memory runs out. They did this to test the object model without needing a working collector, to get a performance baseline and because it is a legitimate option for very short lived programs.
+
+The same approach works well for our prototype:
+
+```rust
+pub struct NullCollector {
+    heap: BumpAllocator,
+    limit: usize,
+}
+```
+
+The benefits are straightforward. We can run the full test suite against the new allocator before writing any collection code, validate `GcHeader` layout and `Trace` implementations early and measure allocation overhead separately from collection overhead.
+
+## Conclusion
+
+There are three clear takeaways from Wasmtime's architecture. Keep the collector trait internal to `boa_gc`. Put each collector behind its own feature flag. Build and validate with the null collector before adding a real one. Following this order keeps each step independently testable and reduces the risk of getting deep into collection code before the object model is solid.
diff --git a/notes/wasm_gc_research/2_deferred_ref.md b/notes/wasm_gc_research/2_deferred_ref.md
new file mode 100644
index 0000000..9f74582
--- /dev/null
+++ b/notes/wasm_gc_research/2_deferred_ref.md
@@ -0,0 +1,68 @@
+# Part 2: Deferred Reference Counting Analysis
+
+Note author: shruti2522
+
+## Background
+
+Wasmtime chose deferred reference counting (DRC) as their production GC. This note looks at why they made that choice and what we can take from it for boa_gc, even though we will likely go with mark sweep or something similar.
+
+## Why Reference Counting?
+
+Wasmtime picked reference counting over tracing GC for two reasons. First, refcounting spreads work across mutations rather than building up into large STW(stop the world) pauses. Second, the failure mode is safer. If tracing misses an object you get a dangling pointer. If refcounting misses one, we get a leak. Leaks are much easier to deal with than corruption when adding GC to an existing codebase.
+
+The second point matters for oscars. Boa was not designed around GC from day one. A refcount bug causing a leak during development is a lot better than a crash.
+
+## How Deferred Reference Counting Works
+
+Standard refcounting would mean an increment and decrement on every assignment. For Wasm that means refcount operations on every `local.get` and `local.set`, which is too expensive.
+
+Wasmtime avoids this by deferring. When a GC reference enters a Wasm frame it is inserted into `VMGcRefActivationsTable`. While Wasm runs, no refcount operations happen on local variables. Barriers only fire when a reference escapes the frame, such as being written to a struct field, global or table. Collection triggers when the activations table fills up. At that point Wasmtime walks the stack to find actually live objects and anything in the table that is no longer on the stack gets its refcount decremented.
+
+The fast path for table insertion is very cheap. The slow path only runs when the table fills.
+
+## The Cycle Problem
+
+DRC cannot collect cycles. Objects that reference each other keep each other alive until the entire `Store` is dropped. Wasmtime accepts this because many Wasm programs are short lived.
+
+For JavaScript this is not acceptable. JS creates cycles constantly. Closures capture their enclosing scope, prototype chains form back references, and event listeners hold references to the objects that registered them. A collector that cannot handle cycles will leak memory on almost any real program.
+
+## Lessons for boa_gc
+
+### Cycles Cannot Be Deferred
+
+Unlike Wasmtime, we cannot ship something that leaks cycles and patch it later. Cycle collection has to be part of the initial design. The simplest path is to use mark sweep from the start, which handles cycles naturally.
+
+```rust
+impl MarkSweepCollector {
+    fn collect(&mut self, roots: &RootSet) {
+        for root in roots.iter() {
+            self.mark(root);
+        }
+
+        self.heap.retain(|obj| {
+            if obj.header.is_marked() {
+                obj.header.unmark();
+                true
+            } else {
+                false
+            }
+        });
+    }
+}
+```
+
+Generational collection, incremental marking and concurrent marking can all be layered on later once we have performance data.
+
+### Do Not Optimize Early
+
+Wasmtime's DRC design is complex because it targets Wasm's specific performance needs. We should not carry that complexity over. Start simple and optimize when there is data showing where the bottlenecks actually are.
+
+### Deferred Barriers Are Still a Useful Idea
+
+Even without full DRC, the idea of skipping barriers on local variables is worth keeping in mind. If roots are tracked precisely through a handle table, temporary allocations that never escape the current scope do not need write barriers at all. Collection only scans the handle table, so dropped local handles are automatically excluded.
+
+## Conclusion
+
+Do not implement DRC for the prototype. The complexity is not worth it and the cycle limitation is a dealbreaker for JavaScript.
+
+Start with simple mark sweep, design precise root tracking from the start and defer optimization until there is real data to work from. The most useful thing to take from Wasmtime's DRC design is the principle of separating root tracking from the collection strategy not the algorithm itself.
diff --git a/notes/wasm_gc_research/3_object_layout.md b/notes/wasm_gc_research/3_object_layout.md
new file mode 100644
index 0000000..5a63518
--- /dev/null
+++ b/notes/wasm_gc_research/3_object_layout.md
@@ -0,0 +1,88 @@
+# Part 3: Object Layout and Header Design
+
+Note author: shruti2522
+
+## Background
+
+Every GC managed object needs metadata so the collector can do its job. This note looks at how Wasmtime structures object headers and what we can take from that for boa_gc.
+
+## What Wasmtime Does
+
+Every GC object in Wasmtime starts with a `VMGcHeader`:
+
+```rust
+#[repr(C)]
+pub struct VMGcHeader {
+    type_index: u32,
+    gc_metadata: GcMetadata,
+}
+```
+
+A few things stand out here. The header is always at the start of every allocation. The type index is a plain `u32` used for runtime type checks and downcasts. The metadata slot is used differently depending on the collector: DRC stores a reference count there, the null collector leaves it unused and a future tracing collector would use it for mark bits or a forwarding pointer. Types are interned at the engine level, so only the index lives per object, keeping the header small.
+
+After the header comes the payload. Structs store fields inline. Arrays store a `u32` length at a fixed offset, then elements. The length being at a fixed offset matters because it allows bounds checks without a layout lookup on every array access.
+
+## Type IDs
+
+Wasmtime uses `u32` for type IDs and didn't try to use a smaller type to save space. For JavaScript this is the right call. Shapes are created dynamically and the count can grow large in real programs. `u32` is a reasonable starting point until we have data showing otherwise.
+
+## What this means for boa_gc
+
+### The Header
+
+Every heap allocated object needs a fixed header. The header must be `#[repr(C)]` for predictable layout, the same structure for all objects and small but with space reserved for future collectors. Each allocation is laid out as a header followed immediately by the value:
+
+```rust
+#[repr(C)]
+pub struct GcBox<T: Trace> {
+    header: GcHeader,
+    value: T,
+}
+```
+
+### Reserve Space Now, Use Later
+
+The most important lesson here is to reserve header space even if the first collector does not use all of it. Adding a header field later means touching every allocation site, every size calculation and every unsafe offset in the codebase. An extra 8 bytes per object is negligible compared to the object payload. Moving collectors need forwarding pointers, generational GC needs age bits. Reserve the bits now:
+
+```rust
+pub struct GcHeader {
+    shape_id: u32,
+    gc_flags: u32,  // MARKED = 1 << 0, FORWARDED = 1 << 1, age bits, etc.
+}
+```
+
+Better to have unused space than to redesign the allocator later.
+
+### Shape Registry
+
+JS objects have dynamic shapes. We need a registry to keep per object headers small:
+
+```rust
+pub struct ShapeRegistry {
+    shapes: Vec<Shape>,
+    shape_map: HashMap<PropertyLayout, ShapeId>,
+}
+
+#[derive(Copy, Clone)]
+pub struct ShapeId(u32);
+```
+
+The object header stores only the `u32` shape ID. The full layout descriptor lives in the registry on `GcContext`.
+
+### Arrays
+
+Following Wasmtime's pattern of a fixed offset for length:
+
+```rust
+#[repr(C)]
+pub struct JsArray {
+    header: GcHeader,
+    length: u32,
+    capacity: u32,
+    elements: *mut JsValue,
+}
+```
+
+## Conclusion
+
+The header design is foundational. Getting it right early avoids expensive refactoring later. Define `GcHeader` now with reserved space, use `u32` for shape IDs, keep the layout fixed and `#[repr(C)]` and plan for future collectors by reserving bits for forwarding pointers and age tracking even if unused today.
diff --git a/notes/wasm_gc_research/4_root_tracking.md b/notes/wasm_gc_research/4_root_tracking.md
new file mode 100644
index 0000000..04cc76a
--- /dev/null
+++ b/notes/wasm_gc_research/4_root_tracking.md
@@ -0,0 +1,90 @@
+# Part 4: Precise Root Tracking
+
+Note author: shruti2522
+
+## Background
+
+The collector needs to find all live objects. This means identifying "roots", objects directly reachable without following pointers. Wasmtime handles this with precise stack maps generated at compile time. We do not have a JIT, so we need another approach.
+
+## Why Precise Roots Matter
+
+Conservative stack scanning blocks moving collectors. If any integer on the stack looks like a heap address, the collector cannot safely move that object. This rules out compacting, generational and copying collectors. It also causes false retentions, where an integer that happens to look like a pointer keeps an object alive when nothing actually references it.
+
+Precise roots avoid both problems.
+
+## The Problem for oscars
+
+Wasmtime uses Cranelift to generate stack maps at compile time. A stack map records which stack slots hold live GC references at each potential collection point. We cannot do this without a JIT.
+
+Two practical approaches exist.
+
+### Approach 1: Shadow Stack
+
+Maintain a separate stack of GC pointers alongside the native call stack. Push on allocation, pop on scope exit.
+
+Pros: precise, simple to implement, no compiler needed.
+
+Cons: manual push/pop at every allocation point, easy to forget, leads to bugs.
+
+### Approach 2: Handle Table (Recommended)
+
+Store all GC references in a table on the Context. Stack frames hold indices into this table, not raw pointers.
+
+```rust
+pub struct HandleTable {
+    entries: Vec<TableEntry>,
+    free_list: Vec<u32>,
+}
+
+struct TableEntry {
+    ptr: *mut GcHeader,
+    refcount: u32,
+}
+
+#[derive(Copy, Clone)]
+pub struct Handle<T> {
+    index: u32,
+    _marker: PhantomData<T>,
+}
+```
+
+The collector only needs to scan the table:
+
+```rust
+fn collect(&mut self, ctx: &mut Context) {
+    let roots: Vec<*mut GcHeader> = ctx.handle_table.iter_live().collect();
+    self.mark_sweep(&roots);
+}
+```
+
+When the collector moves an object, it updates the table entry. All existing handles continue to work with no other changes needed:
+
+```rust
+fn compact(&mut self, handle_table: &mut HandleTable) {
+    for entry in &mut handle_table.entries {
+        if entry.refcount > 0 {
+            let new_ptr = self.copy_object(entry.ptr);
+            entry.ptr = new_ptr;
+        }
+    }
+}
+```
+
+Pros: precise roots without manual tracking, safe to move objects, automatic cleanup when handles drop, no compiler needed.
+
+Cons: indirection overhead on every access, memory overhead for table storage
+
+## Comparison
+
+| Feature | Shadow Stack | Handle Table | Stack Maps (JIT) |
+|---|---|---|---|
+| Precision | Yes | Yes | Yes |
+| Manual tracking | Required | Automatic | Automatic |
+| Moving GC support | Yes | Yes | Yes |
+| Implementation complexity | Low | Medium | High |
+
+## Conclusion
+
+Use the handle table for the prototype. It gives precise roots without JIT support and keeps the door open for a compacting or generational collector later. Conservative scanning should not be used at all, it blocks future improvements. When a JIT is added later, stack maps can replace the handle table for JIT code while the interpreter keeps using the table.
+
+Precise root tracking is a hard requirement for compacting and generational GC. Getting this right early matters a lot.
diff --git a/notes/wasm_gc_research/5_api_safety_and_tracer_design.md b/notes/wasm_gc_research/5_api_safety_and_tracer_design.md
new file mode 100644
index 0000000..d800feb
--- /dev/null
+++ b/notes/wasm_gc_research/5_api_safety_and_tracer_design.md
@@ -0,0 +1,106 @@
+# Part 5: API Safety and Tracer Abstraction
+
+Note author: shruti2522
+
+## Background
+
+Two things need to be sorted out before the prototype is usable: keeping the public API safe for embedders, and structuring the tracer so collection is efficient.
+
+## API Safety
+
+Wasmtime's GC RFC is clear that losing the safe by default API would be a failure. The approach is straightforward. All methods on GC managed objects take a `Store` or `Context` reference. `Store` is not `Sync`, so passing `&mut Store` proves you are on the right thread without needing locks. All unsafe code lives inside `wasmtime_runtime` and never surfaces to the public API.
+
+The same idea applies to `boa_gc`. Embedders should not need to write unsafe code to use the GC.
+
+```rust
+pub struct Context {
+    handle_table: HandleTable,
+    collector: Box<dyn Collector>,
+}
+
+impl Context {
+    pub fn allocate<T: Trace>(&mut self, value: T) -> Handle<T> {
+        let ptr = self.collector.allocate(value);
+        let index = self.handle_table.insert(ptr);
+        Handle::new(index)
+    }
+
+    pub fn collect(&mut self) {
+        self.collector.collect(&self.handle_table);
+    }
+}
+
+impl !Sync for Context {}
+impl !Send for Context {}
+```
+
+All `unsafe` blocks stay inside the boa_gc implementation, documented with `SAFETY:` comments and localized to allocator and collector code. An unsafe fast path can be offered for performance critical code but the default path is always safe.
+
+## Tracer Abstraction
+
+### The Problem with the Current Approach
+
+Boa's current boa_gc uses a `Tracer` that collects reachable objects into a `Vec` while walking the object graph. There are a few issues with this. A flat Vec is tied to one traversal strategy, it does not distinguish between objects that are discovered, being traced or fully processed and growing a flat list is slow for large heaps.
+
+### Separating Root Discovery from Collection
+
+Wasmtime separates root discovery from collection. Root discovery builds the root set by walking the stack and scanning the activations table. Collection receives that root set and works from there. Changing the collector does not require changing root discovery and vice versa.
+
+```rust
+pub struct RootSet {
+    roots: Vec<*mut GcHeader>,
+}
+
+impl Context {
+    fn collect_roots(&self) -> RootSet {
+        let mut roots = Vec::new();
+        for ptr in self.handle_table.iter_live() {
+            roots.push(ptr);
+        }
+        RootSet { roots }
+    }
+}
+
+trait Collector {
+    fn collect(&mut self, roots: &RootSet);
+}
+```
+
+The tracing strategy stays internal to the collector. A tri-color work queue is more efficient than a flat `Vec`:
+
+```rust
+pub struct MarkSweepCollector {
+    heap: Vec<GcBox<dyn Trace>>,
+    grey_queue: VecDeque<*mut GcHeader>,
+}
+
+impl Collector for MarkSweepCollector {
+    fn collect(&mut self, roots: &RootSet) {
+        for &root in &roots.roots {
+            self.mark(root);
+        }
+        while let Some(grey) = self.grey_queue.pop_front() {
+            self.trace_children(grey);
+        }
+        self.sweep();
+    }
+}
+```
+
+White means not yet discovered. Grey means discovered but children not yet traced. Black means fully traced. Object state is always explicit.
+
+### Trace Trait
+
+The `Trace` trait should be simple:
+
+```rust
+pub trait Trace {
+    fn trace(&self, tracer: &mut dyn FnMut(Gc<dyn Trace>));
+}
+```
+
+A closure based tracer is simple to implement and lets the collector decide what to do with each child pointer. It does not lock you into a specific collection strategy.
+
+## Conclusion
+
+Keep the public API safe so embedders never need to write unsafe code. Separate root discovery from collection so each concern can change independently. Use a tri-color work queue inside the collector rather than a flat list. These decisions are straightforward to get right now and difficult to fix later.
diff --git a/notes/wasm_gc_research/conclusion.md b/notes/wasm_gc_research/conclusion.md
new file mode 100644
index 0000000..1bf145c
--- /dev/null
+++ b/notes/wasm_gc_research/conclusion.md
@@ -0,0 +1,110 @@
+# Conclusion: Key Learnings for Oscars GC Design
+
+Note author: shruti2522
+
+## Background
+
+This research looked at Wasmtime's GC implementation to pull out useful lessons for `boa_gc` and Oscars. The goal was not to copy their design, but to learn from their decisions and apply the relevant parts to our own API redesign.
+
+## What We Learned
+
+### Keep the Collector Interface Internal
+
+Wasmtime exposes a simple enum publicly but keeps the actual collector traits internal with no stability guarantees. We should do the same. Expose a simple config enum, keep collector traits private to `boa_gc`.
+
+```rust
+// Public
+pub enum GcStrategy { Auto, MarkSweep, NullCollector }
+
+// Internal (not public)
+pub(crate) trait Collector {...}
+```
+
+This lets us add new collectors, change internal interfaces and swap the default without breaking the public API.
+
+### Build the Null Collector First
+
+Wasmtime built a bump allocator with no collection before writing any real collector code. This separated testing the object model from testing collection logic. We should do the same. The first milestone for Oscars is a `NullCollector` that traps on heap exhaustion. It lets us validate object headers and layout, run the full test suite and measure pure allocation overhead before adding any collection complexity.
+
+### Precise Roots Are Not Optional
+
+Wasmtime uses precise stack maps to track roots. We do not have a JIT but we still need precise roots. The handle table is the right approach for the prototype.
+
+```rust
+pub struct Context {
+    handle_table: HandleTable,
+}
+
+pub struct Handle<T> {
+    index: u32,
+    _marker: PhantomData<T>,
+}
+```
+
+Conservative scanning would block future moving and generational collectors. Start with precise roots now.
+
+### Cycle Collection Cannot Wait
+
+Wasmtime's DRC collector cannot collect cycles. For Wasm workloads this is acceptable. For Js it is not. JS creates cycles constantly through closures, prototype chains and event listeners. The collector has to handle cycles from day one. Mark sweep is the simplest path since it handles cycles naturally.
+
+### Reserve Header Space Early
+
+Define `GcHeader` now with reserved fields for future collectors. Adding header fields later means touching every allocation site in the codebase.
+
+```rust
+#[repr(C)]
+pub struct GcHeader {
+    shape_id: u32,
+    gc_flags: u32,  // reserve even if only using 2 bits initially
+}
+```
+
+The memory overhead is negligible. The redesign cost later is not.
+
+### Separate Root Discovery from Collection
+
+Root finding and the collection algorithm should be separate concerns. The collector receives a root set and works from there.
+
+```rust
+impl Context {
+    fn collect_roots(&self) -> RootSet { /* scan handle table */ }
+}
+
+trait Collector {
+    fn collect(&mut self, roots: &RootSet);
+}
+```
+
+### Keep the Public API Safe
+
+All unsafe code should live inside the boa_gc implementation, documented with `SAFETY:` comments and never surface at the public API boundary. Embedders and `boa_engine` callers should not need to write unsafe code to use the GC.
+
+## What We Are Not Doing
+
+No deferred reference counting. The cycle limitation is a dealbreaker for JS and the complexity is not worth it at this stage. No conservative stack scanning. No exposing collector traits publicly. No premature optimization,start with mark sweep and optimize once there is real data
+
+## Recommended Order
+
+1. Define internal `Collector` trait (keep private)
+2. Implement `NullCollector` (bump allocator, no collection)
+3. Validate the object model with tests
+4. Build the handle table for precise root tracking
+5. Implement `MarkSweepCollector` with cycle collection
+6. Add cargo feature flags per collector
+7. Optimize later with data
+
+## Conclusion
+
+Three things have to be right from the start: precise root tracking, cycle collection and reserved header space. Everything else can be improved over time. These three are very hard to add in later.
+
+The useful thing from the Wasmtime research is not the DRC algorithm itself but the design patterns around it. Internal flexibility with a simple public API, separation of root discovery from collection and an incremental implementation path. Those apply regardless of which collector we use
+
+## References
+
+- Wasmtime DRC design: https://bytecodealliance.org/articles/reference-types-in-wasmtime
+- Wasmtime GC RFC : https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasm-gc.md
+- Wasmtime Collector enum docs: https://docs.wasmtime.dev/api/wasmtime/enum.Collector.html
+- Wasmtime proposal status: https://docs.wasmtime.dev/stability-wasm-proposals.html
+- Wasmtime 27.0 release: https://bytecodealliance.org/articles/wasmtime-27.0
+- New stack maps: https://bytecodealliance.org/articles/new-stack-maps-for-wasmtime
+- WasmGC proposal: https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md