Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ jobs:
python examples/http_driver_demo.py
python examples/tutorial.py
python examples/readme_quickstart.py
python examples/trace_export_demo.py

conformance_stub:
name: "Weaver Spec Conformance Stub (v0.1.0)"
Expand Down
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ See [docs/agent-context/review-checklist.md](docs/agent-context/review-checklist
| Driver integration patterns | [docs/integrations.md](docs/integrations.md) |
| Capability design conventions | [docs/capabilities.md](docs/capabilities.md) |
| Context firewall details | [docs/context_firewall.md](docs/context_firewall.md) |
| Action trace export contract | [docs/trace_export.md](docs/trace_export.md) |

## Update policy

Expand Down
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,27 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
a settings rename to `weaver-kernel` is the optional final step.

### Added
- **Action trace export contract (#94).** New `export_action_trace` /
`export_action_traces` produce a stable, versioned, JSON-serialisable shape
for `ActionTrace` records so downstream tools (e.g. LessonWeaver-style lesson
extraction) can consume the audit trail without depending on internals. The
export is derived only from already-redaction-safe trace fields — `args`
(memory payloads stripped) and `result_summary` (post-firewall counts/flags)
— so it cannot widen the I-01 boundary. `ActionTrace` now carries the invoked
capability's `sensitivity`, and downstream human-correction metadata can be
attached at export time. New [`docs/trace_export.md`](docs/trace_export.md)
(including how it differs from the OpenTelemetry export) and
[`examples/trace_export_demo.py`](examples/trace_export_demo.py), wired into
`make ci`.
- **Property-based invariant tests (#99).** New `tests/test_policy_properties.py`
uses Hypothesis to assert authorization invariants across generated
principals, capabilities, scopes, constraints, handles, and tokens: every
decision carries a stable reason code, `max_rows` never exceeds the policy
cap, handle expansion never exceeds the original grant (indirect-use
scenario), tokens never verify outside their scope and tampered/expired
tokens are always rejected, policy traces never leak raw scope values, and
the trace export is always JSON-serialisable. Adds `hypothesis` as a dev
dependency.
- README repositioned to lead with the unique **capability-token + tamper-evident
audit** value, with explicit boundary framing for the policy engine (vs
`AgentFence`, #111) and the context firewall (vs `contextweaver`, #110) so a
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,6 @@ example:
python examples/repository_safety_check.py
python examples/chainweaver_flow.py
python examples/evaluation_artifact_policy.py
python examples/trace_export_demo.py

ci: fmt-check lint type test example
4 changes: 3 additions & 1 deletion docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,9 @@ Transforms `RawResult → Frame`. Never exposes raw output to the LLM.
Stores full results by opaque handle ID with TTL. `expand()` supports pagination, field selection, and basic equality filtering.

### TraceStore
Records every `ActionTrace`. `explain(action_id)` returns the full audit record. On a successful invocation the trace also carries a `result_summary` — a redaction-safe dict of counts/flags (`fact_count`, `row_count`, `warning_count`, `has_handle`) derived from the firewalled `Frame`, never from raw driver data — so an invocation's outcome is auditable directly (e.g. a repository safety check passed iff `result_summary["row_count"] == 0`). Failed runs have `result_summary == None`.
Records every `ActionTrace`. `explain(action_id)` returns the full audit record. On a successful invocation the trace also carries a `result_summary` — a redaction-safe dict of counts/flags (`fact_count`, `row_count`, `warning_count`, `has_handle`) derived from the firewalled `Frame`, never from raw driver data — so an invocation's outcome is auditable directly (e.g. a repository safety check passed iff `result_summary["row_count"] == 0`). Failed runs have `result_summary == None`. Each trace also records the invoked capability's `sensitivity` (`NONE`/`PII`/`PCI`/`SECRETS`/`MEMORY`).

`export_action_trace` / `export_action_traces` serialise traces into a stable, versioned, JSON-serialisable shape for downstream analysis tools (distinct from the OpenTelemetry observability export). See [trace_export.md](trace_export.md).

### Adapters (`weaver_kernel.adapters`)
Vendor-specific tool-format adapters that translate between `Capability` objects
Expand Down
143 changes: 143 additions & 0 deletions docs/trace_export.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Action Trace Export

agent-kernel records an [`ActionTrace`](architecture.md) for every invocation.
The **trace export contract** turns those records into a stable,
JSON-serialisable shape that an external tool can consume — for example a
[LessonWeaver](https://github.com/dgenio/weaver-spec)-style lesson-extraction
layer that learns from past actions, policies, denials, corrections, and
outcomes.

```python
from weaver_kernel import export_action_traces

envelope = export_action_traces(kernel._traces.list_all())
```

Runnable companion: [`examples/trace_export_demo.py`](../examples/trace_export_demo.py).

## How this differs from OpenTelemetry export

agent-kernel also ships an OpenTelemetry integration
([`weaver_kernel.otel`](architecture.md), `pip install weaver-kernel[otel]`).
The two serve different consumers and do **not** compete:

| | OpenTelemetry (`instrument_kernel`) | Trace export (`export_action_traces`) |
|---|---|---|
| Consumer | Live observability backends (traces/metrics) | Offline analysis / learning tools |
| Shape | OTel spans + metrics, vendor-defined | Stable JSON envelope defined here |
| Timing | Emitted during execution | Pulled after the fact from the `TraceStore` |
| Stability | Tracks OTel semantic conventions | Versioned by `TRACE_EXPORT_VERSION` |

Use OTel for dashboards and alerting; use the export contract when another
program needs a durable, replayable record of what the agent did.

## Privacy

The export is derived **only** from fields the `ActionTrace` already holds,
all of which are redaction-safe by construction:

- `args` has memory payloads stripped at record time (keys like `payload`,
`content`, `value`, `memory`, `text`, `body` for `memory.*` capabilities
become `"[REDACTED]"`).
- `result_summary` carries counts and flags taken from the **post-firewall**
`Frame` — never raw driver data.

The contract adds no field the trace did not already carry, so exporting can
never widen the I-01 firewall boundary or leak sensitive payloads. A *denied*
request never produces an `ActionTrace` — policy gates before invocation
(I-02) — so the export only ever describes authorised invocations; denials are
surfaced separately via `PolicyDenied` / `Kernel.explain_denial`.

## Envelope shape

`export_action_traces(...)` returns a versioned envelope:

```json
{
"schema": "weaver_kernel.action_trace_export",
"version": "1",
"traces": [ /* one object per ActionTrace */ ]
}
```

Each trace object:

| Field | Type | Notes |
|-------|------|-------|
| `action_id` | string | Unique id; matches `Kernel.explain(action_id)`. |
| `capability_id` | string | The capability (tool) that was invoked. |
| `principal_id` | string | Who invoked it. |
| `token_id` | string | The capability token used. |
| `invoked_at` | string | ISO 8601 timestamp. |
| `response_mode` | string | `summary` / `table` / `handle_only` / `raw`. |
| `driver_id` | string | Driver that served the call (`""` on failure). |
| `handle_id` | string \| null | Handle for the full dataset, if one was minted. |
| `sensitivity` | string | `NONE` / `PII` / `PCI` / `SECRETS` / `MEMORY`. |
| `status` | string | `succeeded` or `failed` (derived from `error`). |
| `error` | string \| null | Failure reason; `null` on success. |
| `args` | object | Redacted invocation arguments. |
| `result_summary` | object \| null | Post-firewall counts/flags; `null` on failure. |
| `correction` | object \| null | Optional human-correction metadata (see below). |

### Human corrections

agent-kernel does not record human corrections itself. A downstream tool can
attach them at export time by passing a mapping of `action_id` → metadata:

```python
envelope = export_action_traces(
traces,
corrections={"act-123": {"corrected_by": "reviewer", "note": "wrong customer"}},
)
```

## Example output

```json
{
"schema": "weaver_kernel.action_trace_export",
"version": "1",
"traces": [
{
"action_id": "0a1b...",
"capability_id": "billing.list_invoices",
"principal_id": "agent-007",
"token_id": "f3c2...",
"invoked_at": "2026-06-05T12:00:00+00:00",
"response_mode": "summary",
"driver_id": "billing",
"handle_id": "9d7e...",
"sensitivity": "PII",
"status": "succeeded",
"error": null,
"args": {"operation": "list_invoices", "status": "paid"},
"result_summary": {"fact_count": 4, "row_count": 0, "warning_count": 1, "has_handle": true},
"correction": null
},
{
"action_id": "5e6f...",
"capability_id": "billing.flaky_report",
"principal_id": "agent-007",
"token_id": "11aa...",
"invoked_at": "2026-06-05T12:00:01+00:00",
"response_mode": "summary",
"driver_id": "",
"handle_id": null,
"sensitivity": "NONE",
"status": "failed",
"error": "Handler for operation='flaky_report' raised: reporting backend is unavailable",
"args": {"operation": "flaky_report"},
"result_summary": null,
"correction": {"corrected_by": "on-call", "note": "known outage; retried later"}
}
]
}
```

## Stability

`TRACE_EXPORT_VERSION` is bumped only on a **breaking** change to the field
shape. New optional fields may be added without a bump, so consumers should
ignore unknown keys. Assert on `status`, `sensitivity`, and the presence of
`error` rather than on human-readable strings (the `error` text itself may
evolve).
133 changes: 133 additions & 0 deletions examples/trace_export_demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
"""trace_export_demo.py — export action traces for downstream analysis (#94).

The written contract lives in ``docs/trace_export.md``. This script is the
runnable companion. It shows how to turn the kernel's audit trail into a
stable, redaction-safe JSON shape that an external tool (for example a
LessonWeaver-style lesson-extraction layer) can consume without depending on
agent-kernel internals.

The demo records two invocations so the export covers both outcomes the
contract distinguishes:

1. ``billing.list_invoices`` — a normal READ that **succeeds**.
2. ``billing.flaky_report`` — a READ whose driver **fails**, producing a
``status: "failed"`` trace (a *denied* request never reaches invoke, so
it never produces a trace; denials surface via ``explain_denial``).

It then prints the versioned export envelope, attaching optional human
correction metadata to one trace. Everything is offline and deterministic.

Run with: ``python examples/trace_export_demo.py``
"""

from __future__ import annotations

import asyncio
import json

from weaver_kernel import (
Capability,
CapabilityRegistry,
DriverError,
HMACTokenProvider,
Kernel,
Principal,
SafetyClass,
SensitivityTag,
StaticRouter,
export_action_traces,
make_billing_driver,
)
from weaver_kernel.drivers.base import ExecutionContext
from weaver_kernel.models import CapabilityRequest, ImplementationRef

_SECRET = "example-secret-do-not-use-in-prod"


def _build_kernel() -> Kernel:
capabilities = [
Capability(
capability_id="billing.list_invoices",
name="List Invoices",
description="List invoices for a customer",
safety_class=SafetyClass.READ,
sensitivity=SensitivityTag.PII,
allowed_fields=["id", "amount", "currency", "status", "date"],
impl=ImplementationRef(driver_id="billing", operation="list_invoices"),
),
Capability(
capability_id="billing.flaky_report",
name="Flaky Report",
description="A report whose backing service is currently failing",
safety_class=SafetyClass.READ,
impl=ImplementationRef(driver_id="billing", operation="flaky_report"),
),
]
registry = CapabilityRegistry()
registry.register_many(capabilities)

driver = make_billing_driver()

def flaky_report(ctx: ExecutionContext) -> object:
raise DriverError("reporting backend is unavailable")

driver.register_handler("flaky_report", flaky_report)

router = StaticRouter(
routes={
"billing.list_invoices": ["billing"],
"billing.flaky_report": ["billing"],
}
)
kernel = Kernel(
registry=registry,
token_provider=HMACTokenProvider(secret=_SECRET),
router=router,
)
kernel.register_driver(driver)
return kernel


async def main() -> None:
kernel = _build_kernel()
principal = Principal(
principal_id="agent-007",
roles=["reader"],
attributes={"tenant": "acme"},
)

# 1. A successful READ — produces a status="succeeded" trace.
list_req = CapabilityRequest(capability_id="billing.list_invoices", goal="list invoices")
list_token = kernel.get_token(list_req, principal, justification="")
ok_frame = await kernel.invoke(
list_token,
principal=principal,
args={"operation": "list_invoices", "status": "paid"},
)
print(f"succeeded: action_id={ok_frame.action_id} facts={len(ok_frame.facts)}")

# 2. A failing READ — produces a status="failed" trace.
flaky_req = CapabilityRequest(capability_id="billing.flaky_report", goal="run report")
flaky_token = kernel.get_token(flaky_req, principal, justification="")
failed_action_id = ""
try:
await kernel.invoke(flaky_token, principal=principal, args={"operation": "flaky_report"})
except DriverError as exc:
print(f"failed: {exc}")
# The failure was still recorded; grab the most recent trace's id.
failed_action_id = kernel._traces.list_all()[-1].action_id

# Export everything. Attach an optional human correction to the failed run.
corrections = (
{failed_action_id: {"corrected_by": "on-call", "note": "known outage; retried later"}}
if failed_action_id
else None
)
envelope = export_action_traces(kernel._traces.list_all(), corrections=corrections)

print("\nExported trace envelope:")
print(json.dumps(envelope, indent=2))


if __name__ == "__main__":
asyncio.run(main())
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ dev = [
"pytest>=8.0",
"pytest-cov>=5.0",
"pytest-asyncio>=0.23",
"hypothesis>=6.100",
"ruff>=0.4",
"mypy>=1.10",
"httpx>=0.27",
Expand Down
14 changes: 13 additions & 1 deletion src/weaver_kernel/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
Handles & traces::

from weaver_kernel import HandleStore, TraceStore
from weaver_kernel import export_action_trace, export_action_traces

LLM tool-format adapters::

Expand Down Expand Up @@ -140,7 +141,13 @@
from .registry import CapabilityRegistry
from .router import StaticRouter
from .tokens import CapabilityToken, HMACTokenProvider
from .trace import TraceStore
from .trace import (
TRACE_EXPORT_SCHEMA,
TRACE_EXPORT_VERSION,
TraceStore,
export_action_trace,
export_action_traces,
)

# Single source of truth: read the version from the installed distribution
# metadata (the PyPI dist name is ``weaver-kernel``, distinct from the import
Expand Down Expand Up @@ -250,6 +257,11 @@
# stores
"HandleStore",
"TraceStore",
# trace export (issue #94)
"TRACE_EXPORT_SCHEMA",
"TRACE_EXPORT_VERSION",
"export_action_trace",
"export_action_traces",
# adapters
"AnthropicMiddleware",
"OpenAIMiddleware",
Expand Down
1 change: 1 addition & 0 deletions src/weaver_kernel/kernel/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,7 @@ async def invoke(
args=args,
response_mode=response_mode,
plan=plan,
capability=capability,
)

async def invoke_stream(
Expand Down
Loading
Loading