Summary
Ship a small policy-testing toolkit so teams can unit-test their own policies the way they test code: declare scenario fixtures (principal, capability, request context) and expected decisions (allow/deny/ask + reason code), run them with pytest or a one-line runner, and get a readable diff when a policy change alters any decision.
Why this matters
A policy nobody can test is a policy nobody will trust in production. OPA's opa test made policy testing a category norm; bringing the same workflow to agent tool-call policy turns the DSL from "config file" into "engineered artifact", and gives teams a regression net for exactly the change AGENTS.md warns about (rule-ordering silently bypassing sensitivity checks). It also generates the shared fixture format the AgentFence policy-contract work (#111, #120) will need.
Proposed scope
weaver_kernel.policy_testing module: PolicyScenario dataclass (principal, capability id, safety class, sensitivity, intent/scope, justification) + expect (decision and reason code); run_scenarios(engine, scenarios) -> ScenarioReport.
- YAML scenario-file format mirroring the DSL's vocabulary, loadable with the existing
[policy] extra.
- Pytest integration: a documented pattern (parametrized fixture helper) so scenarios run as individual test cases with clear ids.
- Decision-diff output: given two engines (or one engine and two policy files), report every scenario whose decision changed — the building block for safe policy rollouts.
- Docs page (
docs/policy_testing.md) + scenarios for the cookbook recipes if the cookbook lands.
Implementation notes
- Evaluation path:
DefaultPolicyEngine.evaluate() / DeclarativePolicyEngine — scenarios should construct real CapabilityRequest/Principal objects, not mocks, so tests exercise the true code path.
- Assert stable codes from
policy_reasons.py, never message strings (repo convention).
dry_run() (kernel/_dry_run.py) covers full-kernel scenarios; the toolkit targets engine-level testing without needing drivers — document when to use which.
- New module must stay ≤300 lines; split loader/report if needed.
Acceptance criteria
Out of scope
References
- OPA
opa test and Cedar policy-test patterns as neutral ecosystem precedents.
- In-repo:
policy_reasons.py, policy_dsl.py, kernel/_dry_run.py, AGENTS.md "Adding a policy rule".
Priority: P1 · Effort: M · Impact: High
Summary
Ship a small policy-testing toolkit so teams can unit-test their own policies the way they test code: declare scenario fixtures (principal, capability, request context) and expected decisions (allow/deny/ask + reason code), run them with pytest or a one-line runner, and get a readable diff when a policy change alters any decision.
Why this matters
A policy nobody can test is a policy nobody will trust in production. OPA's
opa testmade policy testing a category norm; bringing the same workflow to agent tool-call policy turns the DSL from "config file" into "engineered artifact", and gives teams a regression net for exactly the change AGENTS.md warns about (rule-ordering silently bypassing sensitivity checks). It also generates the shared fixture format the AgentFence policy-contract work (#111, #120) will need.Proposed scope
weaver_kernel.policy_testingmodule:PolicyScenariodataclass (principal, capability id, safety class, sensitivity, intent/scope, justification) +expect(decision and reason code);run_scenarios(engine, scenarios) -> ScenarioReport.[policy]extra.docs/policy_testing.md) + scenarios for the cookbook recipes if the cookbook lands.Implementation notes
DefaultPolicyEngine.evaluate()/DeclarativePolicyEngine— scenarios should construct realCapabilityRequest/Principalobjects, not mocks, so tests exercise the true code path.policy_reasons.py, never message strings (repo convention).dry_run()(kernel/_dry_run.py) covers full-kernel scenarios; the toolkit targets engine-level testing without needing drivers — document when to use which.Acceptance criteria
Out of scope
References
opa testand Cedar policy-test patterns as neutral ecosystem precedents.policy_reasons.py,policy_dsl.py,kernel/_dry_run.py, AGENTS.md "Adding a policy rule".Priority: P1 · Effort: M · Impact: High