Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions notes/wasm_gc_research/1_collector_arch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Part 1: Collector Architecture Lessons from Wasmtime

Note author: shruti2522

## Background

Wasmtime recently shipped support for the Wasm GC proposal. They built a pluggable GC infrastructure in Rust to handle WasmGC structs and arrays. Wasmtime faces similar constraints to us, a safe Rust API, multiple collector options and adding GC onto an existing runtime that was not built with it in mind. Their architecture has some useful lessons.

## Keep Collector Traits Internal

Wasmtime has two internal traits for GC implementation, one used at compile time for object layout and write barrier insertion and one used at runtime for allocation and collection. These traits are explicitly not public API. Embedders never see them.

Instead, embedders pick a collector through a simple public enum. An `Auto` variant picks a sensible default, which can change between releases without breaking anything.

The same pattern makes sense for `boa_gc`:

```rust
// internal trait in boa_gc (not public)
pub(crate) trait Collector {
fn allocate(&mut self, layout: Layout) -> *mut u8;
fn collect(&mut self);
fn write_barrier(&mut self, obj: GcBox<dyn Trace>);
}

// public configuration
pub enum GcStrategy {
Auto,
MarkSweep,
NullCollector,
}
```

This lets us add new collectors, change internal interfaces and swap the `Auto` default without breaking the public API.

## Cargo Feature Flags Per Collector

Each collector in Wasmtime is behind its own cargo feature. This keeps builds with no GC at zero cost, lets embedded builds include only lightweight collectors and speeds up test compile times.

```toml
[features]
default = ["gc-mark-sweep"]
gc-mark-sweep = []
gc-null = []
gc-drc = []
```

Test suites can then compile with `--no-default-features --features gc-null` for faster builds and embedded targets can avoid pulling in a full collector.

## Build the Null Collector First

Wasmtime built a null collector (bump allocation, no collection) before implementing DRC. It traps when memory runs out. They did this to test the object model without needing a working collector, to get a performance baseline and because it is a legitimate option for very short lived programs.

The same approach works well for our prototype:

```rust
pub struct NullCollector {
heap: BumpAllocator,
limit: usize,
}
```

The benefits are straightforward. We can run the full test suite against the new allocator before writing any collection code, validate `GcHeader` layout and `Trace` implementations early and measure allocation overhead separately from collection overhead.

## Conclusion

There are three clear takeaways from Wasmtime's architecture. Keep the collector trait internal to `boa_gc`. Put each collector behind its own feature flag. Build and validate with the null collector before adding a real one. Following this order keeps each step independently testable and reduces the risk of getting deep into collection code before the object model is solid.
68 changes: 68 additions & 0 deletions notes/wasm_gc_research/2_deferred_ref.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Part 2: Deferred Reference Counting Analysis

Note author: shruti2522

## Background

Wasmtime chose deferred reference counting (DRC) as their production GC. This note looks at why they made that choice and what we can take from it for boa_gc, even though we will likely go with mark sweep or something similar.

## Why Reference Counting?

Wasmtime picked reference counting over tracing GC for two reasons. First, refcounting spreads work across mutations rather than building up into large STW(stop the world) pauses. Second, the failure mode is safer. If tracing misses an object you get a dangling pointer. If refcounting misses one, we get a leak. Leaks are much easier to deal with than corruption when adding GC to an existing codebase.

The second point matters for oscars. Boa was not designed around GC from day one. A refcount bug causing a leak during development is a lot better than a crash.

## How Deferred Reference Counting Works

Standard refcounting would mean an increment and decrement on every assignment. For Wasm that means refcount operations on every `local.get` and `local.set`, which is too expensive.

Wasmtime avoids this by deferring. When a GC reference enters a Wasm frame it is inserted into `VMGcRefActivationsTable`. While Wasm runs, no refcount operations happen on local variables. Barriers only fire when a reference escapes the frame, such as being written to a struct field, global or table. Collection triggers when the activations table fills up. At that point Wasmtime walks the stack to find actually live objects and anything in the table that is no longer on the stack gets its refcount decremented.

The fast path for table insertion is very cheap. The slow path only runs when the table fills.

## The Cycle Problem

DRC cannot collect cycles. Objects that reference each other keep each other alive until the entire `Store` is dropped. Wasmtime accepts this because many Wasm programs are short lived.

For JavaScript this is not acceptable. JS creates cycles constantly. Closures capture their enclosing scope, prototype chains form back references, and event listeners hold references to the objects that registered them. A collector that cannot handle cycles will leak memory on almost any real program.

## Lessons for boa_gc

### Cycles Cannot Be Deferred

Unlike Wasmtime, we cannot ship something that leaks cycles and patch it later. Cycle collection has to be part of the initial design. The simplest path is to use mark sweep from the start, which handles cycles naturally.

```rust
impl MarkSweepCollector {
fn collect(&mut self, roots: &RootSet) {
for root in roots.iter() {
self.mark(root);
}

self.heap.retain(|obj| {
if obj.header.is_marked() {
obj.header.unmark();
true
} else {
false
}
});
}
}
```

Generational collection, incremental marking and concurrent marking can all be layered on later once we have performance data.

### Do Not Optimize Early

Wasmtime's DRC design is complex because it targets Wasm's specific performance needs. We should not carry that complexity over. Start simple and optimize when there is data showing where the bottlenecks actually are.

### Deferred Barriers Are Still a Useful Idea

Even without full DRC, the idea of skipping barriers on local variables is worth keeping in mind. If roots are tracked precisely through a handle table, temporary allocations that never escape the current scope do not need write barriers at all. Collection only scans the handle table, so dropped local handles are automatically excluded.

## Conclusion

Do not implement DRC for the prototype. The complexity is not worth it and the cycle limitation is a dealbreaker for JavaScript.

Start with simple mark sweep, design precise root tracking from the start and defer optimization until there is real data to work from. The most useful thing to take from Wasmtime's DRC design is the principle of separating root tracking from the collection strategy not the algorithm itself.
88 changes: 88 additions & 0 deletions notes/wasm_gc_research/3_object_layout.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Part 3: Object Layout and Header Design

Note author: shruti2522

## Background

Every GC managed object needs metadata so the collector can do its job. This note looks at how Wasmtime structures object headers and what we can take from that for boa_gc.

## What Wasmtime Does

Every GC object in Wasmtime starts with a `VMGcHeader`:

```rust
#[repr(C)]
pub struct VMGcHeader {
type_index: u32,
gc_metadata: GcMetadata,
}
```

A few things stand out here. The header is always at the start of every allocation. The type index is a plain `u32` used for runtime type checks and downcasts. The metadata slot is used differently depending on the collector: DRC stores a reference count there, the null collector leaves it unused and a future tracing collector would use it for mark bits or a forwarding pointer. Types are interned at the engine level, so only the index lives per object, keeping the header small.

After the header comes the payload. Structs store fields inline. Arrays store a `u32` length at a fixed offset, then elements. The length being at a fixed offset matters because it allows bounds checks without a layout lookup on every array access.

## Type IDs

Wasmtime uses `u32` for type IDs and didn't try to use a smaller type to save space. For JavaScript this is the right call. Shapes are created dynamically and the count can grow large in real programs. `u32` is a reasonable starting point until we have data showing otherwise.

## What this means for boa_gc

### The Header

Every heap allocated object needs a fixed header. The header must be `#[repr(C)]` for predictable layout, the same structure for all objects and small but with space reserved for future collectors. Each allocation is laid out as a header followed immediately by the value:

```rust
#[repr(C)]
pub struct GcBox<T: Trace> {
header: GcHeader,
value: T,
}
```

### Reserve Space Now, Use Later

The most important lesson here is to reserve header space even if the first collector does not use all of it. Adding a header field later means touching every allocation site, every size calculation and every unsafe offset in the codebase. An extra 8 bytes per object is negligible compared to the object payload. Moving collectors need forwarding pointers, generational GC needs age bits. Reserve the bits now:

```rust
pub struct GcHeader {
shape_id: u32,
gc_flags: u32, // MARKED = 1 << 0, FORWARDED = 1 << 1, age bits, etc.
}
```

Better to have unused space than to redesign the allocator later.

### Shape Registry

JS objects have dynamic shapes. We need a registry to keep per object headers small:

```rust
pub struct ShapeRegistry {
shapes: Vec<Shape>,
shape_map: HashMap<PropertyLayout, ShapeId>,
}

#[derive(Copy, Clone)]
pub struct ShapeId(u32);
```

The object header stores only the `u32` shape ID. The full layout descriptor lives in the registry on `GcContext`.

### Arrays

Following Wasmtime's pattern of a fixed offset for length:

```rust
#[repr(C)]
pub struct JsArray {
header: GcHeader,
length: u32,
capacity: u32,
elements: *mut JsValue,
}
```

## Conclusion

The header design is foundational. Getting it right early avoids expensive refactoring later. Define `GcHeader` now with reserved space, use `u32` for shape IDs, keep the layout fixed and `#[repr(C)]` and plan for future collectors by reserving bits for forwarding pointers and age tracking even if unused today.
90 changes: 90 additions & 0 deletions notes/wasm_gc_research/4_root_tracking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Part 4: Precise Root Tracking

Note author: shruti2522

## Background

The collector needs to find all live objects. This means identifying "roots", objects directly reachable without following pointers. Wasmtime handles this with precise stack maps generated at compile time. We do not have a JIT, so we need another approach.

## Why Precise Roots Matter

Conservative stack scanning blocks moving collectors. If any integer on the stack looks like a heap address, the collector cannot safely move that object. This rules out compacting, generational and copying collectors. It also causes false retentions, where an integer that happens to look like a pointer keeps an object alive when nothing actually references it.

Precise roots avoid both problems.

## The Problem for oscars

Wasmtime uses Cranelift to generate stack maps at compile time. A stack map records which stack slots hold live GC references at each potential collection point. We cannot do this without a JIT.

Two practical approaches exist.

### Approach 1: Shadow Stack

Maintain a separate stack of GC pointers alongside the native call stack. Push on allocation, pop on scope exit.

Pros: precise, simple to implement, no compiler needed.

Cons: manual push/pop at every allocation point, easy to forget, leads to bugs.

### Approach 2: Handle Table (Recommended)

Store all GC references in a table on the Context. Stack frames hold indices into this table, not raw pointers.

```rust
pub struct HandleTable {
entries: Vec<TableEntry>,
free_list: Vec<u32>,
}

struct TableEntry {
ptr: *mut GcHeader,
refcount: u32,
}

#[derive(Copy, Clone)]
pub struct Handle<T> {
index: u32,
_marker: PhantomData<T>,
}
```

The collector only needs to scan the table:

```rust
fn collect(&mut self, ctx: &mut Context) {
let roots: Vec<*mut GcHeader> = ctx.handle_table.iter_live().collect();
self.mark_sweep(&roots);
}
```

When the collector moves an object, it updates the table entry. All existing handles continue to work with no other changes needed:

```rust
fn compact(&mut self, handle_table: &mut HandleTable) {
for entry in &mut handle_table.entries {
if entry.refcount > 0 {
let new_ptr = self.copy_object(entry.ptr);
entry.ptr = new_ptr;
}
}
}
```

Pros: precise roots without manual tracking, safe to move objects, automatic cleanup when handles drop, no compiler needed.

Cons: indirection overhead on every access, memory overhead for table storage

## Comparison

| Feature | Shadow Stack | Handle Table | Stack Maps (JIT) |
|---|---|---|---|
| Precision | Yes | Yes | Yes |
| Manual tracking | Required | Automatic | Automatic |
| Moving GC support | Yes | Yes | Yes |
| Implementation complexity | Low | Medium | High |

## Conclusion

Use the handle table for the prototype. It gives precise roots without JIT support and keeps the door open for a compacting or generational collector later. Conservative scanning should not be used at all, it blocks future improvements. When a JIT is added later, stack maps can replace the handle table for JIT code while the interpreter keeps using the table.

Precise root tracking is a hard requirement for compacting and generational GC. Getting this right early matters a lot.
Loading