Skip to content

feat(rivetkit): native actor plugin host (cdylib via dlopen)#5312

Open
NathanFlurry wants to merge 1 commit into
mainfrom
feat/dylib-actor-plugin
Open

feat(rivetkit): native actor plugin host (cdylib via dlopen)#5312
NathanFlurry wants to merge 1 commit into
mainfrom
feat/dylib-actor-plugin

Conversation

@NathanFlurry

Copy link
Copy Markdown
Member

Rebased onto main (2.3.2). Adds the native actor-plugin host so external cdylib plugins (e.g. agent-os) load via dlopen/the RivetKit native-plugin ABI (createNativePluginFactory, host loader), and removes the in-tree bundled agent-os. Inherits main's engine auto-spawn + auto runner-config.

Verified locally: rivetkit JS + napi build clean; vanilla zero-config counter passes (auto engine spawn + auto runner-config) on the rebased build.

🤖 Generated with Claude Code

@railway-app railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-5312 June 22, 2026 07:57 Destroyed
@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Code Review: feat(rivetkit): native actor plugin host (cdylib via dlopen)

This PR introduces a generic dlopen-based native actor plugin ABI for RivetKit, removes the in-tree bundled agent-os TS implementation, and adds Rust encoding parity for $Uint8Array payloads. The core architecture is well-designed: stable C-ABI, version-locked lockstep, catch_unwind on every FFI boundary, per-allocator OwnedBuf free. Below are the issues that need addressing before merge.


Blockers

1. Planning docs committed to the repo root

PLAN2.md, RIVETKIT_RUST_FIX.md, and TODOLIST.md are committed to the repo root. Per CLAUDE.md, working files belong in ~/.agents/specs/ or ~/.agents/todo/ (never inside the repo). Design specs go in docs-internal/ or .claude/reference/ and are linked from the Reference Docs index. These three files should be removed from the PR.

2. TOCTOU race in load_plugin can call plugin_init twice

Two concurrent calls for the same path will both miss the cache check (the lock is dropped after the get) and both reach load_uncached, calling plugin_init twice. The spec requires exactly one init per path. Pattern in native_plugin.rs:

if let Some(existing) = cache().lock().get(&key).cloned() { return Ok(existing); }
// lock drops; both callers miss and both call load_uncached() -> plugin_init called twice
let loaded = unsafe { load_uncached(path) }?;
cache().lock().insert(key, arc.clone());

Fix: use scc::HashMap::entry_async for atomic check-then-init, or protect the entire load with a per-path OnceLock so only one caller performs the load.

3. Mutex<HashMap> prohibited by CLAUDE.md

cache() returns &'static Mutex<HashMap<PathBuf, Arc<LoadedPlugin>>>. CLAUDE.md: "Never use Mutex<HashMap<...>>" — use scc::HashMap (preferred) or moka::Cache. This guidance exists precisely for process-global caches like this one.

4. host_ctx_clone returns a dangling pointer on panic

extern "C" fn host_ctx_clone(ptr: *const c_void) -> *const c_void {
    let _ = std::panic::catch_unwind(|| unsafe {
        Arc::increment_strong_count(ptr as *const HostCtxState);
    });
    ptr   // returned regardless of whether the increment succeeded
}

If increment_strong_count panics, the caller receives ptr thinking it holds a new owner, but the ref count was never bumped. The eventual ctx_release would decrement past zero and corrupt the allocation. The correct failure mode on panic is to return null (or abort), not silently hand out a non-owning pointer. Same issue in host_ctx_release: silently discarding the decrement on panic leaks the object.


Important Issues

5. agentOs() is deleted with no in-PR replacement

All of src/agent-os/actor/index.ts, config.ts, types.ts, fs/database-vfs.ts, and the barrel index.ts are deleted. No new agentOs() export is introduced in this PR. Any consumer that imports from rivetkit/agent-os will break on upgrade. Either ship a fail-loud stub before deleting the old code, or keep the TS implementation until the Rust replacement lands.

6. rivet-actor-plugin-abi added unconditionally to rivetkit-core

libloading is correctly gated behind native-runtime, but rivet-actor-plugin-abi is pulled in unconditionally in rivetkit-core/Cargo.toml. It will not break wasm (no native deps), but it adds unnecessary binary size and a wasm-irrelevant dependency surface. Gate it behind native-runtime alongside libloading.

7. has_database = true hardcoded in NAPI from_native_plugin

In rivetkit-napi/src/actor_factory.rs, from_native_plugin unconditionally sets config.has_database = true, opting every native plugin into the SQLite lifecycle path regardless of whether the plugin uses SQLite. The plugin should declare this capability via its config envelope or the ActorConfigKnobs struct that already exists in the ABI crate.

8. nativeFactoryBuilder is a mutable public field

In rivetkit-typescript/packages/rivetkit/src/actor/definition.ts, nativeFactoryBuilder is a non-readonly public instance property. External packages set it via (definition as any).nativeFactoryBuilder = ..., which lets any code silently replace any actor's factory after construction. Make it readonly and settable only in the constructor, or expose a createNativeActorDefinition(builder) factory function that removes the as any workaround.


Minor

9. .npmrc global node-linker=hoisted

Changing from pnpm's default isolated linker to hoisted affects the entire monorepo's package resolution and can mask missing explicit dependencies. If this is needed for a specific package, scope it to that package's local .npmrc rather than the root.

10. ABI version starts at 13

RIVET_ACTOR_ABI_VERSION: u64 = 13 in a brand-new crate suggests multiple pre-PR iterations. Add a short comment explaining what changed between versions 1-12 (or "v13: initial public release") so future readers understand the version space.

11. NATIVE_NAPI_ACTOR_STUB uses a double type assertion

In drivers/engine/actor-driver.ts, NATIVE_NAPI_ACTOR_STUB is cast as as unknown as AnyActorInstance. If AnyActorInstance gains a new required method later, nothing will warn at compile time. Consider a discriminated field on the handler so the engine driver skips JS actor method calls for native actors entirely.


Summary

Count
Blockers 4
Important 4
Minor 3

The ABI design, catch_unwind coverage on every host vtable callback, per-allocator OwnedBuf free contract, and integration test fixture are solid. The main concerns are the repo-root planning docs, the TOCTOU race on first plugin load, the prohibited Mutex<HashMap> for the plugin cache, and the breaking removal of agentOs() without a replacement in this PR.

Generated with Claude Code

@railway-app

railway-app Bot commented Jun 22, 2026

Copy link
Copy Markdown

🚅 Environment rivet-pr-5312-9585139d-d004-44f6-b072-b38065040390 in rivet-frontend has no services deployed.

…tegration tests

Rebased onto main (2.3.2). Adds the native actor-plugin host (createNativePluginFactory, ABI host loader) so external cdylib plugins (e.g. agent-os) load via dlopen; removes the in-tree bundled agent-os; keeps main's engine auto-spawn + runner-config.
@NathanFlurry NathanFlurry force-pushed the feat/dylib-actor-plugin branch from c0c7f71 to c44621f Compare June 22, 2026 08:13
@railway-app railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-5312 June 22, 2026 08:13 Destroyed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant