Skip to content

Proposal: pre-execution guard layer for wallet actions via a wrapper WalletProvider (prompt injection β†’ transaction, key leak, malicious address)Β #1282

Description

@theyavuzarslan

Language Implementation

  • Python
  • TypeScript

Feature Type

  • Action Provider Template
  • Wallet Provider Template
  • Framework Extension
  • Core Requirements
  • Other

πŸš€ The feature, motivation and pitch

Disclosing affiliation upfront: I work on AgentGuard at GoPlus Security (https://github.com/GoPlusSecurity/agentguard, MIT, @goplus/agentguard). I'm filing this because AgentKit agents hold real funds, and I'd like to contribute a guarantee-level safety layer and do the work myself.

The problem. Today the model is the only thing standing between a prompt-injected agent and a signed transaction. The failure mode is specific to AgentKit's shape: an agent ingests untrusted content β€” a web page, a Farcaster cast, a tool output, an XMTP message β€” containing injected instructions, and the next action is nativeTransfer or sendTransaction to an attacker address. Text-layer safety in the LLM can't reliably catch this because the malicious step looks like a perfectly well-formed wallet action. An opt-in safety action (a tool the agent may call) doesn't close it either: an agent already following injected instructions won't volunteer to scan itself.

Concrete failure modes:

  • Injected instructions in fetched/social content pivot directly into a transfer or approval to an attacker address.
  • A tool call or agent output carries the wallet's private key or a CDP API secret in its arguments.
  • An unlimited ERC-20 approval, or a swap into a honeypot token the agent never thought to look up.
  • Data the agent has touched gets routed out through an outbound action (post, message, webhook).

Proposed feature. A GuardedWalletProvider in typescript/agentkit/src/wallet-providers/: it wraps any provider extending EvmWalletProvider (CDP, Privy, viem) and intercepts the signing/sending paths β€” sendTransaction, signTransaction, signMessage, signTypedData. Each intercepted call runs through AgentGuard's decision engine, including web3 transaction simulation, and returns allow | block | require_user_confirm with a machine-readable reason the agent loop can surface. Because it wraps the wallet interface rather than adding a tool, it's deterministic: the model can't decline to invoke it.

const wallet = await CdpWalletProvider.configureWithWallet(config);
const guarded = new GuardedWalletProvider(wallet, {
  mode: "block",            // or "confirm" | "log"
  maxValuePerTx: "0.1",     // optional policy knobs
});
const agentkit = await AgentKit.from({ walletProvider: guarded, actionProviders });

Properties that matter for this repo: it runs locally and deterministically β€” no API key, no hosted dependency, no per-call vendor cost. The decision engine averages ~0.13 ms per call on an open 84-sample benchmark (https://github.com/GoPlusSecurity/agentguard), so it's viable in the critical path of every wallet call, where an LLM-judge round-trip would not be. It scans against 24 detection rules across 10 threat categories. Eight of those rules are web3-specific and map directly onto wallet-action risk: WALLET_DRAINING, UNLIMITED_APPROVAL, HIDDEN_TRANSFER, SIGNATURE_REPLAY, DANGEROUS_SELFDESTRUCT, PROXY_UPGRADE, REENTRANCY_PATTERN, FLASH_LOAN_RISK. For a GuardedWalletProvider the first four are the core checks β€” drain-shaped transfers and approvals on sendTransaction, unbounded ERC-20 approvals, transfers hidden inside unrelated calldata, and replay-prone payloads on signMessage/signTypedData. The remaining four matter when an agent deploys or interacts with contracts. Optionally, the wrapper can consult GoPlus address/token threat intelligence for known-malicious counterparties β€” strictly opt-in; the default path stays local and deterministic with no network dependency.

The same decision engine already runs in production behind two other agent platforms on a shared adapter abstraction β€” Claude Code (PreToolUse/PostToolUse hooks) and OpenClaw (before_tool_call/after_tool_call). An AgentKit adapter is the third instance of an existing pattern, not a new design.

Relationship to existing work. This is complementary to #1258's tokenSafety action: that answers "is this token safe?" when the agent asks; this layer guarantees every wallet action gets checked whether or not the agent asks. It also pairs naturally with the spend-permissions item on the WISHLIST β€” spend permissions cap how much an agent can move; this checks where and why it's moving.

Scope I'm proposing. TypeScript first: one wrapper class + unit tests + a docs page + an example under typescript/examples/ showing a blocked injection attempt on base-sepolia. Python port as a follow-up PR if there's appetite. No changes to existing wallet providers, action providers, or core.

The one question I need answered: is a wrapper WalletProvider the shape you'd accept β€” or would you rather see this as framework-extension-level middleware? I'll bring a runnable demo and the PR to whichever answer fits.

Alternatives

  1. An agentguard ActionProvider (a scan tool the agent can call). Rejected as the primary shape: opt-in safety fails exactly when it's needed β€” a prompt-injected agent won't call its own scanner. Could still be a useful follow-up for agent-invoked scans (skill scanning, registry lookups), but it can't be the guarantee layer.
  2. Per-action safety lookups (the feat(typescript): add tokenSafety action providerΒ #1258 tokenSafety approach). Valuable, but coverage depends on the agent choosing to ask, and it covers tokens, not transfers/approvals/signatures generally. Complementary rather than alternative.
  3. LLM-judge on each wallet call. Adds hundreds of ms and a second model dependency to every transaction; a deterministic local engine at ~0.13 ms avoids both.
  4. Spend permissions / session keys (WISHLIST). Caps amount at risk but is content-blind β€” a capped transfer to an attacker address still goes through. Works best combined with this layer.

Additional context

  • Verified against evmWalletProvider.ts on main: the abstract base class exposes all the methods the wrapper needs (sign, signMessage, signTypedData, signTransaction, sendTransaction, waitForTransactionReceipt), so this requires zero core changes.
  • AgentGuard is MIT-licensed, npm @goplus/agentguard. Benchmark methodology and corpus are public and reproducible in the repo.
  • I can have a runnable demo (LangChain + AgentKit chatbot on base-sepolia, poisoned tool output triggering a blocked transfer) attached to the eventual PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions