You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Disclosing affiliation upfront: I work on AgentGuard at GoPlus Security (https://github.com/GoPlusSecurity/agentguard, MIT, @goplus/agentguard). I'm filing this because AgentKit agents hold real funds, and I'd like to contribute a guarantee-level safety layer and do the work myself.
The problem. Today the model is the only thing standing between a prompt-injected agent and a signed transaction. The failure mode is specific to AgentKit's shape: an agent ingests untrusted content β a web page, a Farcaster cast, a tool output, an XMTP message β containing injected instructions, and the next action is nativeTransfer or sendTransaction to an attacker address. Text-layer safety in the LLM can't reliably catch this because the malicious step looks like a perfectly well-formed wallet action. An opt-in safety action (a tool the agent may call) doesn't close it either: an agent already following injected instructions won't volunteer to scan itself.
Concrete failure modes:
Injected instructions in fetched/social content pivot directly into a transfer or approval to an attacker address.
A tool call or agent output carries the wallet's private key or a CDP API secret in its arguments.
An unlimited ERC-20 approval, or a swap into a honeypot token the agent never thought to look up.
Data the agent has touched gets routed out through an outbound action (post, message, webhook).
Proposed feature. A GuardedWalletProvider in typescript/agentkit/src/wallet-providers/: it wraps any provider extending EvmWalletProvider (CDP, Privy, viem) and intercepts the signing/sending paths β sendTransaction, signTransaction, signMessage, signTypedData. Each intercepted call runs through AgentGuard's decision engine, including web3 transaction simulation, and returns allow | block | require_user_confirm with a machine-readable reason the agent loop can surface. Because it wraps the wallet interface rather than adding a tool, it's deterministic: the model can't decline to invoke it.
Properties that matter for this repo: it runs locally and deterministically β no API key, no hosted dependency, no per-call vendor cost. The decision engine averages ~0.13 ms per call on an open 84-sample benchmark (https://github.com/GoPlusSecurity/agentguard), so it's viable in the critical path of every wallet call, where an LLM-judge round-trip would not be. It scans against 24 detection rules across 10 threat categories. Eight of those rules are web3-specific and map directly onto wallet-action risk: WALLET_DRAINING, UNLIMITED_APPROVAL, HIDDEN_TRANSFER, SIGNATURE_REPLAY, DANGEROUS_SELFDESTRUCT, PROXY_UPGRADE, REENTRANCY_PATTERN, FLASH_LOAN_RISK. For a GuardedWalletProvider the first four are the core checks β drain-shaped transfers and approvals on sendTransaction, unbounded ERC-20 approvals, transfers hidden inside unrelated calldata, and replay-prone payloads on signMessage/signTypedData. The remaining four matter when an agent deploys or interacts with contracts. Optionally, the wrapper can consult GoPlus address/token threat intelligence for known-malicious counterparties β strictly opt-in; the default path stays local and deterministic with no network dependency.
The same decision engine already runs in production behind two other agent platforms on a shared adapter abstraction β Claude Code (PreToolUse/PostToolUse hooks) and OpenClaw (before_tool_call/after_tool_call). An AgentKit adapter is the third instance of an existing pattern, not a new design.
Relationship to existing work. This is complementary to #1258's tokenSafety action: that answers "is this token safe?" when the agent asks; this layer guarantees every wallet action gets checked whether or not the agent asks. It also pairs naturally with the spend-permissions item on the WISHLIST β spend permissions cap how much an agent can move; this checks where and why it's moving.
Scope I'm proposing. TypeScript first: one wrapper class + unit tests + a docs page + an example under typescript/examples/ showing a blocked injection attempt on base-sepolia. Python port as a follow-up PR if there's appetite. No changes to existing wallet providers, action providers, or core.
The one question I need answered: is a wrapper WalletProvider the shape you'd accept β or would you rather see this as framework-extension-level middleware? I'll bring a runnable demo and the PR to whichever answer fits.
Alternatives
An agentguard ActionProvider (a scan tool the agent can call). Rejected as the primary shape: opt-in safety fails exactly when it's needed β a prompt-injected agent won't call its own scanner. Could still be a useful follow-up for agent-invoked scans (skill scanning, registry lookups), but it can't be the guarantee layer.
Per-action safety lookups (the feat(typescript): add tokenSafety action providerΒ #1258 tokenSafety approach). Valuable, but coverage depends on the agent choosing to ask, and it covers tokens, not transfers/approvals/signatures generally. Complementary rather than alternative.
LLM-judge on each wallet call. Adds hundreds of ms and a second model dependency to every transaction; a deterministic local engine at ~0.13 ms avoids both.
Spend permissions / session keys (WISHLIST). Caps amount at risk but is content-blind β a capped transfer to an attacker address still goes through. Works best combined with this layer.
Additional context
Verified against evmWalletProvider.ts on main: the abstract base class exposes all the methods the wrapper needs (sign, signMessage, signTypedData, signTransaction, sendTransaction, waitForTransactionReceipt), so this requires zero core changes.
AgentGuard is MIT-licensed, npm @goplus/agentguard. Benchmark methodology and corpus are public and reproducible in the repo.
I can have a runnable demo (LangChain + AgentKit chatbot on base-sepolia, poisoned tool output triggering a blocked transfer) attached to the eventual PR.
Language Implementation
Feature Type
π The feature, motivation and pitch
Disclosing affiliation upfront: I work on AgentGuard at GoPlus Security (https://github.com/GoPlusSecurity/agentguard, MIT,
@goplus/agentguard). I'm filing this because AgentKit agents hold real funds, and I'd like to contribute a guarantee-level safety layer and do the work myself.The problem. Today the model is the only thing standing between a prompt-injected agent and a signed transaction. The failure mode is specific to AgentKit's shape: an agent ingests untrusted content β a web page, a Farcaster cast, a tool output, an XMTP message β containing injected instructions, and the next action is
nativeTransferorsendTransactionto an attacker address. Text-layer safety in the LLM can't reliably catch this because the malicious step looks like a perfectly well-formed wallet action. An opt-in safety action (a tool the agent may call) doesn't close it either: an agent already following injected instructions won't volunteer to scan itself.Concrete failure modes:
Proposed feature. A
GuardedWalletProviderintypescript/agentkit/src/wallet-providers/: it wraps any provider extendingEvmWalletProvider(CDP, Privy, viem) and intercepts the signing/sending paths βsendTransaction,signTransaction,signMessage,signTypedData. Each intercepted call runs through AgentGuard's decision engine, including web3 transaction simulation, and returnsallow | block | require_user_confirmwith a machine-readable reason the agent loop can surface. Because it wraps the wallet interface rather than adding a tool, it's deterministic: the model can't decline to invoke it.Properties that matter for this repo: it runs locally and deterministically β no API key, no hosted dependency, no per-call vendor cost. The decision engine averages ~0.13 ms per call on an open 84-sample benchmark (https://github.com/GoPlusSecurity/agentguard), so it's viable in the critical path of every wallet call, where an LLM-judge round-trip would not be. It scans against 24 detection rules across 10 threat categories. Eight of those rules are web3-specific and map directly onto wallet-action risk:
WALLET_DRAINING,UNLIMITED_APPROVAL,HIDDEN_TRANSFER,SIGNATURE_REPLAY,DANGEROUS_SELFDESTRUCT,PROXY_UPGRADE,REENTRANCY_PATTERN,FLASH_LOAN_RISK. For aGuardedWalletProviderthe first four are the core checks β drain-shaped transfers and approvals onsendTransaction, unbounded ERC-20 approvals, transfers hidden inside unrelated calldata, and replay-prone payloads onsignMessage/signTypedData. The remaining four matter when an agent deploys or interacts with contracts. Optionally, the wrapper can consult GoPlus address/token threat intelligence for known-malicious counterparties β strictly opt-in; the default path stays local and deterministic with no network dependency.The same decision engine already runs in production behind two other agent platforms on a shared adapter abstraction β Claude Code (
PreToolUse/PostToolUsehooks) and OpenClaw (before_tool_call/after_tool_call). An AgentKit adapter is the third instance of an existing pattern, not a new design.Relationship to existing work. This is complementary to #1258's tokenSafety action: that answers "is this token safe?" when the agent asks; this layer guarantees every wallet action gets checked whether or not the agent asks. It also pairs naturally with the spend-permissions item on the WISHLIST β spend permissions cap how much an agent can move; this checks where and why it's moving.
Scope I'm proposing. TypeScript first: one wrapper class + unit tests + a docs page + an example under
typescript/examples/showing a blocked injection attempt on base-sepolia. Python port as a follow-up PR if there's appetite. No changes to existing wallet providers, action providers, or core.The one question I need answered: is a wrapper WalletProvider the shape you'd accept β or would you rather see this as framework-extension-level middleware? I'll bring a runnable demo and the PR to whichever answer fits.
Alternatives
agentguardActionProvider (a scan tool the agent can call). Rejected as the primary shape: opt-in safety fails exactly when it's needed β a prompt-injected agent won't call its own scanner. Could still be a useful follow-up for agent-invoked scans (skill scanning, registry lookups), but it can't be the guarantee layer.Additional context
evmWalletProvider.tson main: the abstract base class exposes all the methods the wrapper needs (sign,signMessage,signTypedData,signTransaction,sendTransaction,waitForTransactionReceipt), so this requires zero core changes.@goplus/agentguard. Benchmark methodology and corpus are public and reproducible in the repo.