[DX] Policy testing framework: assert allow/deny/ask decisions as fixtures

## Summary

Ship a small policy-testing toolkit so teams can unit-test their *own* policies the way they test code: declare scenario fixtures (principal, capability, request context) and expected decisions (allow/deny/ask + reason code), run them with pytest or a one-line runner, and get a readable diff when a policy change alters any decision.

## Why this matters

A policy nobody can test is a policy nobody will trust in production. OPA's `opa test` made policy testing a category norm; bringing the same workflow to agent tool-call policy turns the DSL from "config file" into "engineered artifact", and gives teams a regression net for exactly the change AGENTS.md warns about (rule-ordering silently bypassing sensitivity checks). It also generates the shared fixture format the AgentFence policy-contract work (#111, #120) will need.

## Proposed scope

- `weaver_kernel.policy_testing` module: `PolicyScenario` dataclass (principal, capability id, safety class, sensitivity, intent/scope, justification) + `expect` (decision and reason code); `run_scenarios(engine, scenarios) -> ScenarioReport`.
- YAML scenario-file format mirroring the DSL's vocabulary, loadable with the existing `[policy]` extra.
- Pytest integration: a documented pattern (parametrized fixture helper) so scenarios run as individual test cases with clear ids.
- Decision-diff output: given two engines (or one engine and two policy files), report every scenario whose decision changed — the building block for safe policy rollouts.
- Docs page (`docs/policy_testing.md`) + scenarios for the cookbook recipes if the cookbook lands.

## Implementation notes

- Evaluation path: `DefaultPolicyEngine.evaluate()` / `DeclarativePolicyEngine` — scenarios should construct real `CapabilityRequest`/`Principal` objects, not mocks, so tests exercise the true code path.
- Assert stable codes from `policy_reasons.py`, never message strings (repo convention).
- `dry_run()` (`kernel/_dry_run.py`) covers full-kernel scenarios; the toolkit targets engine-level testing without needing drivers — document when to use which.
- New module must stay ≤300 lines; split loader/report if needed.

## Acceptance criteria

- [ ] Scenario dataclass + YAML format + runner shipped and exported.
- [ ] Pytest pattern documented and used in this repo's own tests for the default policy.
- [ ] Decision-diff between two policy files works and is covered by tests.
- [ ] Docs page with a worked example; CHANGELOG updated.

## Out of scope

- Property-based/fuzz generation of scenarios (#99 covers that direction).
- Cross-system conformance with AgentFence (#111/#120) — but the fixture format should be designed with that consumer in mind.
- A standalone CLI command (composes later with the CLI work in #124).

## References

- OPA `opa test` and Cedar policy-test patterns as neutral ecosystem precedents.
- In-repo: `policy_reasons.py`, `policy_dsl.py`, `kernel/_dry_run.py`, AGENTS.md "Adding a policy rule".

---
*Priority: P1 · Effort: M · Impact: High*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DX] Policy testing framework: assert allow/deny/ask decisions as fixtures #138

Summary

Why this matters

Proposed scope

Implementation notes

Acceptance criteria

Out of scope

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[DX] Policy testing framework: assert allow/deny/ask decisions as fixtures #138

Description

Summary

Why this matters

Proposed scope

Implementation notes

Acceptance criteria

Out of scope

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions