Revise WolframLanguageEvaluator and context tool descriptions by mtirard · Pull Request #189 · WolframResearch/AgentTools

mtirard · 2026-05-28T09:44:32Z

Motivation

Following recent mailing list feedback that agents in Claude Desktop reach for wolframscript via shell rather than the WolframLanguageEvaluator tool when computing in Wolfram Language. The MCP tool's description doesn't communicate what makes it preferable to a shell-spawned wolframscript invocation: a persistent kernel that avoids per-call startup cost.

Reviewing the four affected tool descriptions, several coupled issues compound that core gap:

WolframLanguageEvaluator's key advantage — a persistent kernel that avoids wolframscript's per-call startup cost — isn't stated in the description. The current text describes the mechanism ("Evaluates Wolfram Language code... in a Wolfram Language kernel") but not what makes it preferable.
The cross-tool nag "Always use the Wolfram context tool before using this tool" is unconditional. Unconditional "always" directives lose their force when applied to situations they don't fit, and dilute the credibility of other directives in the same prompt. The intent — "look things up first" — is valuable; the unconditional mandate undermines it.
The three context tools have nearly identical descriptions with an "Always use at the start of new conversations" mandate that doesn't condition on situation. An agent gets effectively the same signal from all three and can't disambiguate.
WolframLanguageEvaluator's "read access to local files" understates the default Method -> "Session" capability — already flagged by the TODO at Kernel/Tools/WolframLanguageEvaluator.wl:33.

These issues are coupled: addressing (2) requires the context tools to self-promote properly, which is what (3) is about. This PR addresses them together.

Design principles

Four cross-cutting principles guided the redesign. These are arguments, not measurements — happy to discuss any of them:

Positive framing. Rules of the form "do not X" require the agent to recognize and suppress X, which is fragile. Rules of the form "do Y, because Z" give a positive directive plus reasoning the agent can apply contextually. Throughout the new descriptions, directives are positive, with reasoning anchored in why they hold.
No cross-tool mandates. Tool descriptions should sell their own tool, not other tools. Cross-tool coupling (Tool A's description pointing the agent at Tool B first) doesn't scale and undermines its own credibility when applied unconditionally. The "look things up first" intent is preserved, but relocated to where it belongs.
Situational triggers over blanket mandates. Concrete situational conditions ("when verifying documented behavior", "when code isn't behaving as expected") give the agent something matchable against the current task. Blanket mandates require the agent to either over-apply them or ignore them — both failure modes.
Disambiguation through positive recommendation. WolframContext is the broadest of the three context tools — a naive agent under uncertainty will default to it, doubling latency and result volume. Rather than deprecating it, the new description positions it as a fallback by positively recommending the specific tools when the domain is clear.

Changes per file

Kernel/Tools/WolframLanguageEvaluator.wl

New opening: "Evaluates Wolfram Language code in a live, persistent kernel session. Definitions, variables, and loaded packages survive across calls." States the actual differentiator from a fresh wolframscript subprocess; the original description never did.
Replaced "Do not ask permission to evaluate code" with reasoning-based framing: "The user installed this MCP server deliberately — they want Wolfram Language used where it fits (computation, symbolic math, data lookups, plotting, etc.) to get results. When a request fits, evaluate code and return the result." The intent is for the agent to contextually calibrate (lean in when WL fits, avoid forcing WL into unrelated requests) rather than apply a bare rule.
Resolved the TODO at line 33: "read access to local files" → "Read and write local files directly from code (e.g. with Import, Export)". The original understated default Method -> "Session" capability. The examples also softly steer toward in-code file ops.
Removed "Always use the Wolfram context tool before using this tool...". The intent is preserved — relocated to the context tool descriptions themselves where it can be properly conditioned.
The \[FreeformPrompt] block is unchanged. Out of scope for this PR.

Kernel/Tools/Context.wl

All three context tool descriptions rewritten with situational triggers and disambiguation:

The "Always use at the start of new conversations or if the topic changes" mandate is removed in favor of per-tool triggers an agent can match against current state.
The "up to 250 words" / "as much detail as possible" guidance is removed — let the agent decide.
WolframLanguageContext triggers focus on programming (function lookup, behavior verification, symbol discovery).
WolframAlphaContext triggers focus on factual lookups (real-world data, entity resolution, knowledge cross-reference).
WolframContext leads with cross-domain queries (chemistry, physics, finance, geography) and explicitly recommends the specific tools when the domain is clear.

Testing

Smoke-tested locally with Claude Desktop on representative scenarios. Behavior matches expectations — agents reach for WolframLanguageEvaluator directly for computation requests without narrating first, and the context tools differentiate appropriately by query type.

Revise WolframLanguageEvaluator and context tool descriptions

cc8cd5f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise WolframLanguageEvaluator and context tool descriptions#189

Revise WolframLanguageEvaluator and context tool descriptions#189
mtirard wants to merge 1 commit into
WolframResearch:mainfrom
mtirard:feature/tool-description-revision

mtirard commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mtirard commented May 28, 2026

Motivation

Design principles

Changes per file

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant