Releases: askui/python-sdk
v0.36.0
v0.36.0
🎉 Overview
v0.36.0 lays the groundwork for using open-weights and third-party VLMs by introducing a coordinate-space abstraction that maps model-emitted coordinates back to device pixels regardless of the grid a model reasons in. Screenshot preprocessing is now provider-owned and configurable through pluggable image scalers and a image_edge_max / ASKUI_VLM_MAX_IMAGE_EDGE setting. A new experimental ComputerZoomTool lets agents inspect and interact with tiny UI elements by viewing screen regions at full resolution. The release also unifies the askui-agent-os dependency across platforms.
✨ New Features
ComputerZoomTool— experimental tool that crops and magnifies a screen region at full resolution so agents can read and interact with small elements (icons, tab titles, status-bar text, tiny checkboxes) that would otherwise be illegible after downscaling by @philipph-askui in #286- Coordinate-space abstraction for open-weights LLM support —
VlmCoordinateSpacewithPixelCoordinateSpace,NormalizedCoordinateSpace, andScaledCoordinateSpacedescribe the grid a model emits coordinates in and map them back to device pixels; non-pixel (normalized) spaces now map directly to device resolution by @philipph-askui in #282 - Provider-owned, configurable image scaling — pluggable
ImageScalerstrategies (PatchOptimizedImageScaler,ContainedImageScaler) plus aimage_edge_maxparameter andASKUI_VLM_MAX_IMAGE_EDGEenvironment variable to control the maximum pixel edge of screenshots sent to the model by @philipph-askui in #282 downscale_image()helper inaskui.utils.llm_image_utilsfor scaling of tool-result images to stay within the per-image limits (e.g. 2000×2000 px for Claude API) in long agentic loops by @philipph-askui in #283
🔧 Improvements
- Auto-scaling of cropped screenshots to the configured maximum size by @philipph-askui in #286
- Per-provider execution-cost tracking — built-in VLM providers ship default pricing, and
input_cost_per_million_tokens/output_cost_per_million_tokens(or a custompricingproperty) can override it by @philipph-askui in #282 - New BYOMP documentation covering provider slots (
vlm_provider,image_qa_provider,detection_provider), custom providers, image scaling, and theimage_edge_maxsetting by @philipph-askui in #282 - Documented the 2000×2000 px image-size limit for tool-returned images and the
downscale_image()workaround in the tools guide by @philipph-askui in #283
🐛 Bug Fixes
-
- Unified
askui-agent-osdependency to>=26.6.1across all platforms (previously split between macOS and other platforms) by @philipph-askui in #284
- Unified
Full Changelog: v0.35.0...v0.36.0
v0.35.0
v0.35.0
🎉 Overview
v0.35.0 adds support for OpenAI-compatible APIs as model providers, enabling the use of OpenAI, Ollama, vLLM, LM Studio, Together AI, RunPod, and any other service that exposes an OpenAI-compatible chat completions endpoint. Truncation strategies now preserve the first user message across summarization to retain the original task instructions, and the truncation headroom has been doubled to reduce the chance of hitting context limits immediately after truncation.
✨ New Features
OpenAIVlmProvider— VLM provider for any OpenAI-compatible API (OpenAI, vLLM, LM Studio, Together AI, etc.) by @philipph-askui in #268OpenAIImageQAProvider— image Q&A provider for any OpenAI-compatible API by @philipph-askui in #268OllamaVlmProvider— convenience wrapper for local Ollama instances with sensible defaults (base_url=http://localhost:11434/v1,model_id=qwen3.5) by @philipph-askui in #268OllamaImageQAProvider— image Q&A via local Ollama instances by @philipph-askui in #268OpenAICompatibleVlmProvider— VLM provider for endpoints that require an exact URL (e.g., RunPod, custom proxies) where the OpenAI SDK's automatic path appending would break the request by @philipph-askui in #268OpenAIMessagesApi— full translation layer between the internalMessageParamformat and OpenAI's chat completions API, handling tool calls, image content, thinking blocks, and role alternation by @philipph-askui in #268OpenAIGetModel—GetModelimplementation for OpenAI-compatible APIs with structured output support by @philipph-askui in #268- Built-in pricing data for
gpt-5.4,gpt-5.4-mini, andgpt-5.4-nanomodels by @philipph-askui in #268
🔧 Improvements
- Truncation strategies now preserve the first user message across summarization, ensuring the original task instructions are never lost when the conversation is truncated by @philipph-askui in #280
MAX_INPUT_TOKENSincreased from 100k to 200k andTRUNCATION_THRESHOLDlowered from 0.7 to 0.56, roughly doubling the headroom after truncation to reduce the chance of re-triggering truncation immediately by @philipph-askui in #280process_idparameter inlist_process_windowstool is now auto-converted toint, preventing tool errors when the agent passes it as a string by @philipph-askui in #279
🐛 Bug Fixes
AgentSpeakernow handles the case where the model returnsstop_reason='tool_use'but no actual tool call blocks in the content, preventing stopped executions by prompting the model to retry with a valid tool call by @philipph-askui in #278
Full Changelog: v0.34.0...v0.35.0
v0.34.0
v0.34.0
🎉 Overview
v0.34.0 adds new tools that let agents interact with the file system and display configuration on the automation target: ComputerGetFileTool reads files (text or image), ComputerGetFileNamesTool lists directory contents, and ComputerRemoveVirtualDisplaysTool tears down virtual displays. A new clean_virtual_displays controller setting auto-removes virtual displays on startup. The ComputerAgent docstring now documents per-call tool registration via act(..., tools=[...]).
✨ New Features
ComputerGetFileTool(experimental) — reads a file at an absolute path on the automation target, returning UTF-8 text as a string or decoded images asPIL.Image.Imageby @mlikasam-askui in #277ComputerGetFileNamesTool(experimental) — lists regular file names (not subdirectories) in a directory on the automation target by @mlikasam-askui in #277ComputerRemoveVirtualDisplaysTool(experimental) — removes all virtual displays from the controller, leaving only physical displays active by @mlikasam-askui in #277clean_virtual_displayssetting onAskUiControllerClientSettings— when enabled, automatically removes all virtual displays after the controller connects by @mlikasam-askui in #277
🔧 Improvements
ComputerAgentdocstring updated with examples for per-call tool registration viaact(..., tools=[...])by @mlikasam-askui in #277- Pinned
askui-agent-os>=26.4.1on macOS and>=26.5.1on other platforms to ensure gRPC compatibility with the new commands by @mlikasam-askui in #277
Full Changelog: v0.33.0...v0.34.0
v0.33.0
v0.33.0
🎉 Overview
v0.33.0 introduces AutomationError — a new exception type for unfixable errors that immediately terminate agent execution instead of being auto-corrected. The conversation control loop now properly cleans up via try/finally, ensuring reporters and teardown always run even when errors propagate. This release also corrects the typing speed unit documentation and fixes a bug where messages could be lost if the truncation strategy crashed.
✨ New Features
AutomationError— new exception type for unfixable errors (e.g., missing credentials, unreachable services) that propagates immediately to the caller, bypassing the agent's auto-correction retry loop. Regular exceptions remain fixable by the agent as before. by @philipph-askui in #271- Documentation for error handling in tools — added a new "Error Handling in Tools" section to the tools guide explaining the distinction between fixable errors (regular exceptions) and unfixable errors (
AutomationError) by @philipph-askui in #271
🔧 Improvements
- Conversation control loop now uses
try/finallyto guarantee_on_conversation_end()and_teardown_control_loop()execute even when anAutomationErroror other exception propagates, preventing resource leaks by @philipph-askui in #271 - Messages are now reported to the reporter before being passed to the truncation strategy, preventing data loss if truncation crashes by @philipph-askui in #274
- Truncation failures are now caught, logged, and reported to the reporter with the message
"Truncation Failed with error: {e}"before re-raising, improving observability of context-window management errors by @philipph-askui in #274
🐛 Bug Fixes
- Corrected typing speed unit in
ComputerTypeTooldescription andAgentOs.type()docstring from "characters per minute" to "characters per second" by @philipph-askui in #272
⚠️ Breaking Changes
AgentExceptionrenamed toAgentError— if you were catchingAgentExceptiondirectly, update your imports to useAgentErrorfromaskui.models.shared.tools
Full Changelog: v0.32.1...v0.33.0
v0.32.1
v0.32.1
🎉 Overview
v0.32.1 fixes a bug that led to a crash if the optional "web" dependency group was not installed.
🐛 Bug Fixes
- fix: add missing import guard for PlaywrightBaseTool by @philipph-askui in #270
Full Changelog: v0.32.0...v0.32.1
v0.32.0
v0.32.0
🎉 Overview
v0.32.0 introduces the new WebAgent, a browser automation agent with native Playwright tools for mouse, keyboard, and screenshot interactions. The release also adds numpad key support across the AgentOS keyboard abstraction.
✨ New Features
WebAgent— a new browser automation agent with a full suite of Playwright tools (screenshot,move_mouse,mouse_click,mouse_scroll,mouse_hold_down,mouse_release,type,keyboard_tap,keyboard_pressed,keyboard_release) in addition to the existing navigation tools by @philipph-askui in #267- Numpad key support — added
numpad_lock,numpad_0–numpad_9,numpad_+,numpad_-,numpad_*,numpad_/, andnumpad_.toPcKeywith corresponding Playwright key mappings by @mlikasam-askui in #269
🔧 Improvements
- Set
is_cacheableflag onlist_process_toolfor improved caching by @philipph-askui in #267
⚠️ Breaking Changes
WebVisionAgentis deprecated — useWebAgentinstead.WebVisionAgentstill works but emits aDeprecationWarningWebAgentnow extendsAgentdirectly instead ofComputerAgent, with a new constructor signature that acceptscallbacksandtruncation_strategyparameters- Playwright navigation tools (
PlaywrightGotoTool,PlaywrightBackTool, etc.) now inherit fromPlaywrightBaseToolinstead ofTooland require aPlaywrightAgentOs(or compatible) instance as their agent OS
Full Changelog: v0.31.0...v0.32.0
v0.31.0
v0.31.0
🎉 Overview
v0.31.0 substantially improves the memory efficiency of askui. The SimpleHtmlReporter has been rearchitected to stream message rows (including base64-encoded screenshots) to a temporary file on disk instead of accumulating them in memory, significantly reducing memory usage during long-running sessions. Further, reporters are now wrapped with automatic error handling so that a failure in one reporter no longer crashes the agent.
✨ New Features
ReporterErrorHandler— a decorator that wraps anyReporterwith try/except error handling; on first failure the reporter is disabled for the rest of the session, preventing reporting errors from interrupting agent execution by @mlikasam-askui in #258
🔧 Improvements
SimpleHtmlReporternow streams HTML message rows to a temporary file as they arrive instead of holding all base64 image data in memory, reducing peak memory usage for screenshot-heavy sessions by @mlikasam-askui in #258CompositeReporternow automatically wraps all reporters inReporterErrorHandler, making error resilience the default behavior by @mlikasam-askui in #258
Full Changelog: v0.30.0...v0.31.0
v0.30.0
v0.30.0
🎉 Overview
v0.30.0 introduces a new infrastructure-error handling prompt that prevents agents from entering unfixable retry loops when the underlying controller, session, or RPC connection fails. It also enriches the HTML report's conversation breakdown with per-conversation step counts, durations, and cache token statistics, and quiets noisy tool-failure logs by demoting them from WARNING to INFO.
✨ New Features
- Infrastructure / tool error prompt added to the computer, Android, and multi-device agent capabilities — instructs agents to retry infrastructure failures (connection lost, session expired, RPC errors, stream closed, service unavailable, controller timeouts) at most once and otherwise stop immediately with a
BROKENreport status instead of looping on unfixable errors by @philipph-askui in #265 - Step count,
cache_creation_input_tokens, andcache_read_input_tokensadded to the per-conversation usage breakdown inSimpleHtmlReporterby @philipph-askui in #264 - Per-conversation duration added to the HTML report breakdown —
started_at/ended_attimestamps are captured on conversation summaries and rendered in a human-readable elapsed-time format by @philipph-askui in #266
🔧 Improvements
Tool failedlogs inToolCollectiondemoted fromWARNINGtoINFOto reduce log noise during normal agent operation by @philipph-askui in #264
⚠️ Breaking Changes
UsageTrackingCallbackrenamed toConversationStatisticsCallback
Full Changelog: v0.29.0...v0.30.0
v0.29.0
v0.29.0
🎉 Overview
v0.29.0 replaces the simple message-dropping truncation strategy with a new VLM-based SummarizingTruncationStrategy that summarizes older conversation history to preserve context while staying within token limits. It also fixes mouse scroll coordinate scaling issues, improves scroll tool descriptions with OS-specific guidance, removes get and locate from the default agent tools, hardens the move_mouse tool against malformed coordinate inputs, and makes base64 image truncation in html reports more robust.
✨ New Features
SummarizingTruncationStrategy— new default truncation strategy that uses the VLM to summarize older conversation history instead of dropping messages, with prompt caching support during summarization for cost efficiency by @philipph-askui in #257SlidingImageWindowSummarizingTruncationStrategy(experimental) — extends summarization with dynamic image removal from older messages to reduce network traffic and latencies while staying compatible with prompt caching by @philipph-askui in #257truncation_strategyinit parameter onComputerAgent,AndroidAgent, andAgent— allows passing a custom truncation strategy with auto-injection of conversation dependencies (vlm_provider,reporter,callbacks) by @philipph-askui in #257
🔧 Improvements
- Mouse scroll tool description now includes OS-dependent scroll guidance (start with
dy=150/dy=-150, macOS direction info) by @programminx-askui in #260 truncate_contentin reporting replaced bytruncate_base64_images— only base64 image data is replaced with placeholders, leaving all other content (prompts, tool outputs) untouched by @philipph-askui in #259move_mousetool now robustly parses coordinates when the agent passes them as strings or comma-separated values, with clearer tool description and improved error messages by @philipph-askui in #262
🐛 Bug Fixes
- Fix incorrect coordinate scaling on mouse scroll deltas —
ComputerAgentOsFacade.mouse_scrollno longer applies display scaling to scroll amounts (SOLENG-332) by @programminx-askui in #260
⚠️ Breaking Changes
SimpleTruncationStrategyandSimpleTruncationStrategyFactoryremoved — replaced bySummarizingTruncationStrategyas the new defaultConversationconstructor parametertruncation_strategy_factoryreplaced bytruncation_strategy(a strategy instance instead of a factory)getandlocatetools removed fromAgent's default tool list — they are no longer auto-added when anagent_osis providedmouse_scrollparameters renamed fromx/ytodx/dyacross allAgentOsimplementations (AskUiControllerClient,PlaywrightAgentOs,ComputerAgentOsFacade,ComputerAgent)truncate_contentfunction inreporting.pyremoved — replaced bytruncate_base64_images
Full Changelog: v0.28.0...v0.29.0
v0.28.0
v0.28.0
🎉 Overview
v0.28.0 integrates AgentOS as a Python package dependency (no more manual installation), adds a UIAutomator hierarchy tool for Android agents, improves support for Anthropic prompt caching to reduce inference cost, introduces Tool.from_mcp_tool() for wrapping FastMCP tools, and overhauls usage tracking with per-step and per-conversation cost breakdowns including cache token costs in the HTML reports.
✨ New Features
- AgentOS shipped as Python package (
askui-agent-os) — no manual installation needed by @mlikasam-askui in #246 - Anthropic prompt caching (auto strategy) with
cache_controlparameter by @philipph-askui in #253 AndroidGetUIAutomatorHierarchyTool— accessibility hierarchy dump for Android agents, providing structured UI element data (text, resource IDs, tap centers) as an alternative to screenshot-based inference by @mlikasam-askui in #251- Hierarchical usage tracking with per-step, per-conversation, and aggregate cost breakdowns including cache token costs in HTML reports by @mlikasam-askui in #253
🔧 Improvements
Tool.from_mcp_tool()to wrap FastMCP tools as AskUI Tools by @mlikasam-askui in #250markitdownandbsonmoved to optional dependencies (office-document) andpure-python-adbpromoted to core to streamline the installation by @mlikasam-askui in #255- Documented optional install extras (
office-document,bedrock,vertex,otel,web) in README by @mlikasam-askui in #255 - Workspace ID (
askui.workspace.id) added to OTEL trace resource attributes by @philipph-askui in #256 - Improved tracing structure with
_get_next_message()span for better observability by @philipph-askui in #256
🐛 Bug Fixes
- Fix prompt caching breakpoints to improve prompt caching efficiency by @philipph-askui in #253
- Fix report formatting and cache statistics accumulation by @philipph-askui in #253
- Constrain
grpcio<1.80.0to avoid compatibility issues by @philipph-askui in #250 - Clean up OTEL tracing: remove stale
cluster_nameconfig and unnecessary SQLAlchemy instrumentation by @philipph-askui in #256
⚠️ Breaking Changes
ASKUI_COMPONENT_REGISTRY_FILE,ASKUI_INSTALLATION_DIRECTORY, andASKUI_CONTROLLER_PATHenvironment variables are no longer recognized — AgentOS is now auto-discovered via theaskui-agent-ospackageOtelSettings.cluster_namefield andASKUI__OTEL_CLUSTER_NAMEenv var removed; replaced byworkspace_id/ASKUI_WORKSPACE_ID- Minimum
anthropicSDK version bumped from>=0.72.0to>=0.86.0 androidoptional extra removed —pure-python-adbis now a core dependency; useoffice-documentextra for MarkItDown features previously bundled by defaultbsonandmarkitdownremoved from default dependencies — installaskui[office-document]if you need Office file conversion
Full Changelog: v0.27.0...v0.28.0