TeaAgent is a thin governance-first agent harness. It does not implement its own LLM framework — it connects to model providers through adapters and enforces safety boundaries around tool execution.
┌─────────────────────────────────────────────────────────────┐
│ CLI / TUI │
├─────────────────────────────────────────────────────────────┤
│ ModelDecisionEngine │
│ (prompt assembly → JSON parsing → structured decisions) │
├──────────────────────┬──────────────────────────────────────┤
│ AgentRunner │ ChatAgentConfig │
│ (decision loop, │ (high-level convenience wrapper │
│ budget, approval, │ around AgentRunner + LLM) │
│ audit) │ │
├──────────────────────┴───────────┬──────────────────────────┤
│ ToolRegistry │ ApprovalPolicy │
│ (register, dispatch, validate) │ (5 permission modes) │
├──────────────────────────────────┴──────────────────────────┤
│ Workspace Tools │
│ read_file · write_file · apply_patch · edit_at_hash │
│ run_shell_inspect · run_shell_mutate · list_files │
search_text · git_status │
├─────────────────────────────────────────────────────────────┤
│ Multi-Agent Coordination Layer (Phase 4-5) │
│ TaskCoordinator · AgentFactory · ToolPermissionManager │
│ WorkflowEngine (polish mode, multi-step execution) │
│ ContextBus (cross-sandbox Delta sharing) │
│ JITApprovalServer (remote SSE with timeout) │
│ CentralizedApprovalQueue (aggregated subagent approvals) │
├─────────────────────────────────────────────────────────────┤
│ State Layer │
│ AuditLogger · RunStore · MemoryCatalog · UltraworkStore │
├─────────────────────────────────────────────────────────────┤
│ Infrastructure │
│ OAuth 2.1/DPoP · MCP HTTP/stdio · OTel · Graph RAG │
│ Code Mode · LLM Conformance · Provability │
└─────────────────────────────────────────────────────────────┘
The TeaAgent governance system has been hardened through a comprehensive 5-loop architecture that provides complete operational closure and security boundaries:
- ToolRegistry with security tier mapping: Tools now include
security_tierannotations (Low, Medium, High, Critical) with automatic tier calculation based on annotations and capability manifests - Enhanced tool linting: Static validation checks for write-like keywords in
read_onlytool descriptions, capability manifest validation with tier mismatch warnings - AST-based fuzz checking:
selftest.pyincludes static analysis to detect tools marked asread_only=Truethat contain write operations in their implementation - Capability manifest enforcement: Tools must declare capabilities (filesystem_write, network) with preflight warnings for undeclared capabilities
- Strict plan-before-write enforcement:
workspace-writemode now requires plan binding by default (user-approved strict immediate block) - PlanContract file target validation: Plans include approved file target lists with
allows_file_write()method to prevent un-declared file modifications - Validation profile integration: Fast, Standard, and Strict validation profiles wired to WorkflowEngine with automatic rollback on strict validation failure
- JIT rollback integration: Strict validation failures trigger automatic rollback via UndoJournal
- Tiered audit levels: L0 (Metrics-only), L1 (Metadata), L2 (Redacted Payload), L3 (Full Local Trace) with configurable filtering
- Audit chain integrity verification: SHA-256 hash chain validation for trace import to prevent tampering
- Per-project encryption support: L3 audits support per-project encryption keys for metadata leakage prevention
- TUI run trace surface: Interactive run store management for trace, export, and replay operations
- Confidence-based blocking: Low-confidence failure cards never block execution automatically (enforced warning thresholds)
- Enhanced CLI curation suite:
teaagent memory failures review/prune/invalidatecommands with confidence filtering - Custom invalidation rules: Per-project automated invalidation rules (e.g., auto-pruning when target files change)
- Memory hygiene enforcement: TTL expiration rules and manual correction capabilities for memory poisoning
- Approval lineage tracing: Subagents carry parent-run IDs and inherit permission mode constraints with structured tracking
- Fail-fast approval logic: Tournament/parallel mode halts immediately if any subagent requires human permission (user-approved)
- Git worktree sandbox enforcement: Tournament runs require git worktree isolation as hard pre-condition for zero-contamination guarantees
- Security-aware tournament scoring: Weighted comparator schema (tests 40%, performance 15%, lint 10%, diff size 10%, architectural fit 15%, security 10%)
Core Phase 4 (consensus) and Phase 5 (sandbox routing/execution) modules are shipped
with CLI, unit tests, and E2E acceptance. Optional hardening (async vote polling,
WASM skill execution) is shipped; docker-smoke CI is advisory (see CONTRIBUTING.md).
Remaining Beta work is native WASM modules and deeper tournament benchmarks. See backlog-priority.md.
ConsensusEngine, peer registry, voting mechanisms, and attestation trail- Swarm pre-approval gate when
ConsensusConfig.enable_pre_approvalmatches task patterns - Async vote collection via
ConsensusConfig.async_vote_collection+poll_until_resolved - CLI:
teaagent consensus(teaagent/cli/_handlers/_consensus.py) - E2E:
tests/acceptance/test_consensus_flow.py
- Docker resource limits via
prepare_subagent_isolation(--cpus/--memory) - WASM runtime wrapper (
teaagent/wasm_runtime.py) and skill routing (teaagent/skill_router.py) - Skill execution:
teaagent/skill_executor.py,teaagent sandbox execute isolation=autoon subagents withskill_path/skill_risk_level- Resource monitoring CLI (
teaagent sandbox monitor) - E2E:
tests/acceptance/test_sandbox_enhancement_flow.py
SkillWriterpublish/review pipeline (teaagent/skill_writer.py)- Docker sandbox resource monitor + abort (
teaagent/docker_sandbox.py) - Prompt tournament fitness scoring in
SwarmManager(.teaagent/prompt_gene_pool.jsonl) - Control plane HTTP + dashboard (
teaagent/control_plane_api.py,teaagent/html_dashboard/) - CLI:
teaagent control-plane serve(workflow/focus/JIT SSE dashboard)
See backlog-priority.md for detailed task breakdowns and implementation status.
AgentRunner is the core execution loop. It accepts a DecisionFn — any callable
that takes a context dict and returns either a ToolRequest or FinalAnswer:
- ToolRequest: name + arguments to dispatch through
ToolRegistry. - FinalAnswer: content + metadata to return to the caller.
The loop enforces iteration limits, tool-call limits, and cost budgets on every
iteration. Every decision and execution is recorded through AuditLogger.
ModelDecisionEngine implements the standard LLM path: it assembles a system
prompt, appends tool metadata and memory, calls the LLM adapter, and parses the
JSON response into Decision objects. ChatAgentConfig bundles all the
configuration needed for a complete model-driven agent run.
All tools are registered through ToolRegistry with:
| Property | Purpose |
|---|---|
name |
Unique identifier (no spaces) |
description |
Human-readable purpose for prompt injection |
input_schema |
JSON Schema for argument validation |
output_schema |
JSON Schema for result validation |
annotations |
read_only, destructive, idempotent |
ApprovalPolicy sits between the decision loop and tool execution. It checks
annotations against the active PermissionMode before any destructive tool runs:
| Mode | Read | Write | Shell Mutate | Destructive Approval |
|---|---|---|---|---|
read-only |
Yes | No | No | Blocked |
workspace-write |
Yes | Yes | No | Blocked |
prompt |
Yes | Yes | Conditional | Human-in-the-loop |
allow |
Yes | Yes | Yes | Session-scoped |
danger-full-access |
Yes | Yes | Yes | None |
AuditLogger is the universal event sink. Every AgentRunner iteration,
tool call, approval decision, and final result produces an AuditEvent:
- Events are appended to a per-run JSONL file with
fcntl.LOCK_EXandfsync. - Sensitive keys (
api_key,token,secret, …) and tool argument values (content,command, …) are redacted before persistence. - String-level patterns (Bearer tokens,
sk-*keys, query-param secrets) are also redacted.
Sinks plug into AuditLogger.add_sink():
InMemoryMetricsSinkcollects counters and histogram samples.OTelAuditSinkconverts events into OpenTelemetry spans.OTelMetricsSinkconverts events into OTel counters/histograms.
RunStore manages per-run audit files and provides listing, inspection, task
replay, and heartbeat tracking for resumable agent runs.
Workspace tools operate within a configurable root directory. Every tool goes through:
- Path resolution — rejects
../, absolute paths, and symlink escapes. - Size enforcement —
max_read_bytes,max_write_bytes,max_shell_output_bytes. - Shell classification — quote-aware scanning splits commands into inspect
(safe:
ls,cat,git status) and mutate (everything else). - Shell execution — inspect commands run with
shell=Falseafter allowlist argv validation;find -delete/-execandgit -c/--configare blocked. - Edit safety —
apply_patchrequires unique match;edit_at_hashuses CRC32 line anchors.
Restricted Python execution with AST allow-list validation:
| Backend | Isolation Level |
|---|---|
| Child process (default) | RLIMIT_CPU, wall-clock timeout, advisory RLIMIT_AS |
| Container | Docker/Podman: --network none, --read-only, --cap-drop=ALL, non-root, tmpfs, CPU/memory/PID limits, streaming output cap, image digest pinning, image allowlist |
Code Mode allows only a fixed set of AST nodes and builtin functions — no imports, no attributes, no arbitrary calls.
Plan-before-Write Enforcement:
workspace-writemode now enforces plan-by-default for safety--require-planflag blocks destructive tools without a bound plan artifact--skip-plan-checkprovides explicit override for power users- Implemented in
teaagent/governance/plan_gate.pywith strict defaults
Automated Memory Invalidation:
- Conservative default rules prevent memory corruption:
file_signature_change: invalidate when files changetest_refactor: warn when test files are modifieddependency_version_change: warn on dependency updates
- Per-project customization via
.teaagent/config.json - CLI command:
teaagent memory failures auto-invalidate - Implemented in
teaagent/memory/failure_card.pywith signature tracking
Centralized Approval Queue:
- Aggregates destructive tool requests from multiple subagents
- Supports batch approval/deny with full lineage tracking
- Prevents approval fatigue in tournament/swarm modes
- Timeout handling and request lifecycle management
- Implemented in
teaagent/subagents/_approval_queue.py
Governance Fuzz Tests:
- Comprehensive adversarial test suite in
tests/test_governance_fuzz.py - Validates plan-before-write enforcement, memory invalidation, and approval queue security
- Tests conservative defaults and path filtering
- Integrated into CI governance gate
OAuth21AuthorizationServer and OAuth21ResourceServer implement the
authorization code grant with PKCE (S256) and optional DPoP proof-of-possession:
- Authorization codes are one-time (consume-and-delete semantics).
- Access tokens are HS256 JWTs with
kidfor key rotation. - DPoP nonces are consumed on validation (no replay).
- DPoP proof
jtivalues have short-lived replay caches. SQLiteOAuthStoreprovides durable client/authorization-code/nonce storage with PBKDF2-SHA256 client-secret hashing.
Two transports share the same handle_mcp_request() dispatch:
- stdio: Standard JSON-RPC over stdin/stdout.
- Streamable HTTP:
POST /mcp(JSON-RPC),GET /mcp(SSE keepalive),DELETE /mcp(session teardown),OPTIONS /mcp(CORS preflight).
The HTTP server enforces:
- Bearer token or OAuth 2.1 authentication for non-loopback binds.
- Origin allowlist for browser-initiated requests.
Mcp-Session-Idsession tracking.- Body size limits with
413for oversized payloads.
teaagent.llm provides a unified adapter layer (LLMAdapter) across 13
registered providers in PROVIDER_CONFIGS: claude, gpt, gemini,
openrouter, ollama, vllm, opencodezen, opencodezen-go, mistral,
deepseek, grok, workers-ai, and aigateway. Credential env vars are
unique per provider key except shared CLOUDFLARE_API_TOKEN (workers-ai and
aigateway) and shared OPENCODEZEN_API_KEY (both OpenCodeZen adapters). Each
adapter implements chat() returning an LLMResponse. Features include:
- Configurable exponential-backoff retry (
LLMRetryConfig). - Cost budget pre-flight.
- Streaming via
stream=Trueandon_chunkcallbacks.
TeaAgent treats ANP as an optional external federation surface through a bidirectional adapter boundary:
- Inbound (
ANP -> TeaAgent):ANPGovernedServicenormalizes network requests intoAgentRunnertool execution and must still passToolRegistry,ApprovalPolicy, budget enforcement, andAuditLogger. - Outbound (
TeaAgent -> ANP): selected tasks can be delegated to ANP peers through a typed client, then mapped back into internal result/audit models.
This keeps core runtime governance stable while enabling cross-organization agent interoperability. See ADR 0007 for scope and invariants.
User / CLI
│
├─ task ───────────────────────────────────────► AgentRunner.run()
│ │
│ ┌─────────────┴──────────────┐
│ │ while iter < budget: │
│ │ decision = decide(ctx) │
│ │ if FinalAnswer → return │
│ │ if ToolRequest: │
│ │ policy.assert_allowed │
│ │ result = reg.execute │
│ │ ctx.observations.add │
│ │ audit.record(every step) │
│ └────────────────────────────┘
│
└─ RunResult ◄──────────── final_answer, iterations, tool_calls, status
| Store | Medium | Locking | Purpose |
|---|---|---|---|
AuditLogger |
JSONL | fcntl.LOCK_EX + fsync |
Per-run event log |
MemoryCatalog |
JSONL | fcntl.LOCK_EX + fsync |
Workspace observations |
RunStore |
JSONL | atomic_write_text (lock + replace) |
Run history and replay |
UltraworkStore |
JSONL | atomic_write_text |
Worker lifecycle records |
SQLiteOAuthStore |
SQLite | WAL + BEGIN IMMEDIATE |
OAuth clients/codes/nonces |
ContextBus |
SQLite | WAL; per-thread connections | Cross-agent Delta cards |
FederatedGraphSync |
JSON | none (single-writer file) | Graph sync state + exports |
JSONL rows assume a single writer per workspace on a local or advisory-lock-safe filesystem. NFS multi-writer shared roots are unsupported — see ADR 0008.
All state is externalized to the filesystem. In-memory runner state is temporary only — every meaningful event persists to disk before the caller sees the result.
- New tool: register through
ToolRegistrywith schemas and annotations. - New LLM provider: implement
LLMAdapter.chat()returningLLMResponse. - New OAuth store: implement
OAuthStoreprotocol (SQL, Redis, …). - New Code Mode backend: implement
CodeModeBackendprotocol. - New audit sink: call
audit.add_sink(callback)with anyAuditEvent → None. - New MCP transport: call
handle_mcp_request(registry, payload). - New hook: register through
HookRegistrywith 8-event lifecycle. - New plugin: add to
.teaagent/plugins/withplugin.json.
TeaAgent implements Claude Code compatible 8-event hook system:
| Event | Trigger | Use Case |
|---|---|---|
SessionStart |
Before session begins | Initialize context, load configs |
UserPromptSubmit |
After user message | Log prompts, analyze intent |
PreToolUse |
Before tool execution | Permission checks, input validation |
PostToolUse |
After tool execution | Lint checks, test runs |
PreCompact |
Before context compression | Prepare for compaction |
Stop |
Before session stops | Save state, cleanup |
SubagentStop |
After subagent completes | Aggregate results |
SessionEnd |
After session ends | Finalize audit, memory flush |
Built-in hooks:
permission_check_hook- Enforce Allow/Ask/Deny patternslint_check_hook- Run linter after file modificationsrun_tests_hook- Run tests after code changesmcp_tool_filter_hook- Filter MCP tools by allow/block lists
Claude Code compatible memory hierarchy:
| Tier | Location | Git-tracked | Use Case |
|---|---|---|---|
| Project | .teaagent/memory.jsonl |
Yes | Team-shared context |
| Personal | ~/.config/teaagent/memory.jsonl |
No | User-specific notes |
| Auto-Memory | .claude/MEMORY.md |
No | Persistent learnings |
from teaagent.memory import MemoryHierarchy
mem = MemoryHierarchy(root="/path/to/project")
mem.project.add("Found a bug in auth module", tags=("bug", "auth"))
mem.personal.add("User prefers dark mode", tags=("preference",))
# Search across all tiers
results = mem.search_all("bug", limit=10)Four extension points (Claude Code compatible):
Slash commands that add CLI functionality.
Custom subagents with specialized prompts and tool subsets.
Lifecycle event handlers (see Hook System above).
External tool integrations.
Discovery order (first match wins):
- Project:
<workspace>/.teaagent/plugins/ - User:
~/.config/teaagent/plugins/ - Built-in:
teaagent/plugins/builtin/
Automatic context compression at threshold levels (Claude Code traffic light):
| Level | Token Usage | Behavior |
|---|---|---|
| Green | 0-75% | Normal operation |
| Yellow | 75-92% | User hints for session save |
| Red | 92%+ | Auto-triggered compaction |
from teaagent.context import CompactionManager
manager = CompactionManager()
if manager.should_compact(token_count=180000):
result = manager.check_and_compact(context, 180000)
# Tokens saved: result.tokens_savedRead-only exploration mode for safe codebase analysis:
from teaagent.plan_mode import PlanMode, PlanModeState
plan = PlanMode()
plan.enable("Analyzing unfamiliar codebase")
# Tools blocked in plan mode:
# - workspace_write_file
# - workspace_apply_patch
# - shell
plan.add_note("Found authentication module at line 42")
plan.disable()IDE integration for VS Code, Zed, and JetBrains:
CLI/TUI → ACPServer → JSON-RPC over stdio → IDE
Methods:
initialize- Handshake and capability negotiationtools/list- List available toolstools/call- Execute a toolcompletion- Request agent completiontools/cancel- Cancel running tool
TeaAgent is a governance-first agent harness. This section documents how it compares to the four mainstream coding-agent frameworks surveyed in scripts/refresh_agent_readme_survey.md, last refreshed 2026-05-24.
| Capability | TeaAgent | Claude Code | Codex | OpenCode |
|---|---|---|---|---|
| Terminal-first CLI/TUI | ✅ | ✅ claude CLI |
✅ codex TUI |
✅ opencode TUI |
| Multi-provider LLM | ✅ 13 providers | ❌ Anthropic only | ❌ OpenAI only | ✅ 11+ providers |
| Permission modes | ✅ 5 modes | ✅ deny/ask/allow | ✅ 4 sandbox policies | ✅ 3 approval modes |
| Hook system (8 events) | ✅ Claude Code compatible | ✅ 8+ events | ✅ 7+ events | — |
| MCP server / client | ✅ stdio + HTTP | ✅ | ✅ codex-mcp-server | ✅ stdio + SSE |
| Skills / plugins | ✅ 4 extension points | ✅ Plugin marketplace | ✅ Skills + Plugins | — |
| Sub-agent isolation | ✅ 3 modes (shared/worktree/container) | ✅ 2 modes (shared/worktree) | ✅ Thread manager | — |
| Context compaction | ✅ 75-92% traffic light | ✅ 98% auto | ✅ History compaction | ✅ Auto-compact |
| Three-tier memory | ✅ Project/Personal/Auto | ✅ CLAUDE.md files | ✅ 2-phase extraction | — |
| Audit / governance | ✅ JSONL hash chain | — | — | — |
| Undo / rollback | ✅ Undo journal | — | — | ✅ File history |
| Read-before-write mtime guard | ✅ since v0.1.0 | — | ✅ (via Edit tool) | ✅ mtime check |
| Protected paths (.git/.teaagent) | ✅ default deny rules | — | ✅ .git/.codex/.agents | — |
| IDE integration | ✅ ACP + VS Code ext | ✅ VS Code ext | ✅ App Server (Zed/VS Code) | — |
| Session resume | ✅ RunStore JSONL | ✅ rollout files | ✅ rollout + fork | ✅ SQLite |
| OAuth / security | ✅ OAuth 2.1 + DPoP | — | ✅ OAuth for MCP | — |
| Telemetry | ✅ OTEL spans + metrics | ✅ OTEL spans | ✅ OTEL | — |
| Cloud / background tasks | ✅ Ultrawork + BackgroundRun | ✅ background sessions | ✅ Cloud Tasks | — |
| Acceptance test coverage | ✅ See acceptance.md (pytest-collected AT, P0/P1/P2 tiers) | — | ✅ test suite | — |
| Declarative agent definitions | ✅ YAML/JSON/Markdown | ✅ .claude/agents/*.md | ✅ config.toml | — |
- Governance-first: Every tool call, decision, and error is recorded in an immutable JSONL audit log with hash-chain integrity. No other framework provides this level of auditability.
- Multi-protocol surface: MCP + ACP + A2A + ANP — more integration protocols than any single mainstream framework.
- Cross-provider: 13 LLM providers vs. 1 per vendor framework.
- Policy-as-code: Declarative deny rules in
policy.yamlthat cannot be bypassed, even indanger-full-accessmode. - Built-in undo: Automatic pre-write snapshots with user-facing
teaagent agent undocommand.
These gaps identified in the 2026-05-24 competitive analysis have been closed:
- LSP integration → Code analysis tools (
code_definition,code_references,code_diagnostics,code_symbols,code_tree_sitter_relations) registered whencode_analysis_enabled: true. - Read-before-write mtime guard →
workspace_write_fileacceptsexpected_mtimeand rejects overwrites on concurrent modification. - Protected paths →
.git/*and.teaagent/*are blocked by default through built-inFilePolicydeny rules. - Declarative sub-agent definitions → YAML/JSON/Markdown frontmatter files
in
.teaagent/subagents/withisolation,background,disallowed_tools, andeffortfields.