24 Jun 11:09

0200202

v0.36.0 Latest

Latest

v0.36.0

🎉 Overview

v0.36.0 lays the groundwork for using open-weights and third-party VLMs by introducing a coordinate-space abstraction that maps model-emitted coordinates back to device pixels regardless of the grid a model reasons in. Screenshot preprocessing is now provider-owned and configurable through pluggable image scalers and a image_edge_max / ASKUI_VLM_MAX_IMAGE_EDGE setting. A new experimental ComputerZoomTool lets agents inspect and interact with tiny UI elements by viewing screen regions at full resolution. The release also unifies the askui-agent-os dependency across platforms.

✨ New Features

ComputerZoomTool — experimental tool that crops and magnifies a screen region at full resolution so agents can read and interact with small elements (icons, tab titles, status-bar text, tiny checkboxes) that would otherwise be illegible after downscaling by @philipph-askui in #286
Coordinate-space abstraction for open-weights LLM support — VlmCoordinateSpace with PixelCoordinateSpace, NormalizedCoordinateSpace, and ScaledCoordinateSpace describe the grid a model emits coordinates in and map them back to device pixels; non-pixel (normalized) spaces now map directly to device resolution by @philipph-askui in #282
Provider-owned, configurable image scaling — pluggable ImageScaler strategies (PatchOptimizedImageScaler, ContainedImageScaler) plus a image_edge_max parameter and ASKUI_VLM_MAX_IMAGE_EDGE environment variable to control the maximum pixel edge of screenshots sent to the model by @philipph-askui in #282
downscale_image() helper in askui.utils.llm_image_utils for scaling of tool-result images to stay within the per-image limits (e.g. 2000×2000 px for Claude API) in long agentic loops by @philipph-askui in #283

🔧 Improvements

Auto-scaling of cropped screenshots to the configured maximum size by @philipph-askui in #286
Per-provider execution-cost tracking — built-in VLM providers ship default pricing, and input_cost_per_million_tokens / output_cost_per_million_tokens (or a custom pricing property) can override it by @philipph-askui in #282
New BYOMP documentation covering provider slots (vlm_provider, image_qa_provider, detection_provider), custom providers, image scaling, and the image_edge_max setting by @philipph-askui in #282
Documented the 2000×2000 px image-size limit for tool-returned images and the downscale_image() workaround in the tools guide by @philipph-askui in #283

🐛 Bug Fixes

- Unified askui-agent-os dependency to >=26.6.1 across all platforms (previously split between macOS and other platforms) by @philipph-askui in #284

Full Changelog: v0.35.0...v0.36.0

Contributors

philipph-askui

Assets 2

02 Jun 08:53

philipph-askui

v0.35.0

89a8cec

v0.35.0

🎉 Overview

v0.35.0 adds support for OpenAI-compatible APIs as model providers, enabling the use of OpenAI, Ollama, vLLM, LM Studio, Together AI, RunPod, and any other service that exposes an OpenAI-compatible chat completions endpoint. Truncation strategies now preserve the first user message across summarization to retain the original task instructions, and the truncation headroom has been doubled to reduce the chance of hitting context limits immediately after truncation.

✨ New Features

OpenAIVlmProvider — VLM provider for any OpenAI-compatible API (OpenAI, vLLM, LM Studio, Together AI, etc.) by @philipph-askui in #268
OpenAIImageQAProvider — image Q&A provider for any OpenAI-compatible API by @philipph-askui in #268
OllamaVlmProvider — convenience wrapper for local Ollama instances with sensible defaults (base_url=http://localhost:11434/v1, model_id=qwen3.5) by @philipph-askui in #268
OllamaImageQAProvider — image Q&A via local Ollama instances by @philipph-askui in #268
OpenAICompatibleVlmProvider — VLM provider for endpoints that require an exact URL (e.g., RunPod, custom proxies) where the OpenAI SDK's automatic path appending would break the request by @philipph-askui in #268
OpenAIMessagesApi — full translation layer between the internal MessageParam format and OpenAI's chat completions API, handling tool calls, image content, thinking blocks, and role alternation by @philipph-askui in #268
OpenAIGetModel — GetModel implementation for OpenAI-compatible APIs with structured output support by @philipph-askui in #268
Built-in pricing data for gpt-5.4, gpt-5.4-mini, and gpt-5.4-nano models by @philipph-askui in #268

🔧 Improvements

Truncation strategies now preserve the first user message across summarization, ensuring the original task instructions are never lost when the conversation is truncated by @philipph-askui in #280
MAX_INPUT_TOKENS increased from 100k to 200k and TRUNCATION_THRESHOLD lowered from 0.7 to 0.56, roughly doubling the headroom after truncation to reduce the chance of re-triggering truncation immediately by @philipph-askui in #280
process_id parameter in list_process_windows tool is now auto-converted to int, preventing tool errors when the agent passes it as a string by @philipph-askui in #279

🐛 Bug Fixes

AgentSpeaker now handles the case where the model returns stop_reason='tool_use' but no actual tool call blocks in the content, preventing stopped executions by prompting the model to retry with a valid tool call by @philipph-askui in #278

Full Changelog: v0.34.0...v0.35.0

Contributors

philipph-askui

Assets 2

20 May 06:10

philipph-askui

v0.34.0

12b46f6

v0.34.0

🎉 Overview

v0.34.0 adds new tools that let agents interact with the file system and display configuration on the automation target: ComputerGetFileTool reads files (text or image), ComputerGetFileNamesTool lists directory contents, and ComputerRemoveVirtualDisplaysTool tears down virtual displays. A new clean_virtual_displays controller setting auto-removes virtual displays on startup. The ComputerAgent docstring now documents per-call tool registration via act(..., tools=[...]).

✨ New Features

ComputerGetFileTool (experimental) — reads a file at an absolute path on the automation target, returning UTF-8 text as a string or decoded images as PIL.Image.Image by @mlikasam-askui in #277
ComputerGetFileNamesTool (experimental) — lists regular file names (not subdirectories) in a directory on the automation target by @mlikasam-askui in #277
ComputerRemoveVirtualDisplaysTool (experimental) — removes all virtual displays from the controller, leaving only physical displays active by @mlikasam-askui in #277
clean_virtual_displays setting on AskUiControllerClientSettings — when enabled, automatically removes all virtual displays after the controller connects by @mlikasam-askui in #277

🔧 Improvements

ComputerAgent docstring updated with examples for per-call tool registration via act(..., tools=[...]) by @mlikasam-askui in #277
Pinned askui-agent-os>=26.4.1 on macOS and >=26.5.1 on other platforms to ensure gRPC compatibility with the new commands by @mlikasam-askui in #277

Full Changelog: v0.33.0...v0.34.0

Contributors

mlikasam-askui

Assets 2

12 May 14:29

philipph-askui

v0.33.0

9b6de63

v0.33.0

🎉 Overview

v0.33.0 introduces AutomationError — a new exception type for unfixable errors that immediately terminate agent execution instead of being auto-corrected. The conversation control loop now properly cleans up via try/finally, ensuring reporters and teardown always run even when errors propagate. This release also corrects the typing speed unit documentation and fixes a bug where messages could be lost if the truncation strategy crashed.

✨ New Features

AutomationError — new exception type for unfixable errors (e.g., missing credentials, unreachable services) that propagates immediately to the caller, bypassing the agent's auto-correction retry loop. Regular exceptions remain fixable by the agent as before. by @philipph-askui in #271
Documentation for error handling in tools — added a new "Error Handling in Tools" section to the tools guide explaining the distinction between fixable errors (regular exceptions) and unfixable errors (AutomationError) by @philipph-askui in #271

🔧 Improvements

Conversation control loop now uses try/finally to guarantee _on_conversation_end() and _teardown_control_loop() execute even when an AutomationError or other exception propagates, preventing resource leaks by @philipph-askui in #271
Messages are now reported to the reporter before being passed to the truncation strategy, preventing data loss if truncation crashes by @philipph-askui in #274
Truncation failures are now caught, logged, and reported to the reporter with the message "Truncation Failed with error: {e}" before re-raising, improving observability of context-window management errors by @philipph-askui in #274

🐛 Bug Fixes

Corrected typing speed unit in ComputerTypeTool description and AgentOs.type() docstring from "characters per minute" to "characters per second" by @philipph-askui in #272

⚠️ Breaking Changes

AgentException renamed to AgentError — if you were catching AgentException directly, update your imports to use AgentError from askui.models.shared.tools

Full Changelog: v0.32.1...v0.33.0

Contributors

philipph-askui

Assets 2

30 Apr 13:54

philipph-askui

v0.32.1

de61dbf

v0.32.1

🎉 Overview

v0.32.1 fixes a bug that led to a crash if the optional "web" dependency group was not installed.

🐛 Bug Fixes

fix: add missing import guard for PlaywrightBaseTool by @philipph-askui in #270

Full Changelog: v0.32.0...v0.32.1

Contributors

philipph-askui

Assets 2

30 Apr 08:54

philipph-askui

v0.32.0

e480db5

v0.32.0

🎉 Overview

v0.32.0 introduces the new WebAgent, a browser automation agent with native Playwright tools for mouse, keyboard, and screenshot interactions. The release also adds numpad key support across the AgentOS keyboard abstraction.

✨ New Features

WebAgent — a new browser automation agent with a full suite of Playwright tools (screenshot, move_mouse, mouse_click, mouse_scroll, mouse_hold_down, mouse_release, type, keyboard_tap, keyboard_pressed, keyboard_release) in addition to the existing navigation tools by @philipph-askui in #267
Numpad key support — added numpad_lock, numpad_0–numpad_9, numpad_+, numpad_-, numpad_*, numpad_/, and numpad_. to PcKey with corresponding Playwright key mappings by @mlikasam-askui in #269

🔧 Improvements

Set is_cacheable flag on list_process_tool for improved caching by @philipph-askui in #267

⚠️ Breaking Changes

WebVisionAgent is deprecated — use WebAgent instead. WebVisionAgent still works but emits a DeprecationWarning
WebAgent now extends Agent directly instead of ComputerAgent, with a new constructor signature that accepts callbacks and truncation_strategy parameters
Playwright navigation tools (PlaywrightGotoTool, PlaywrightBackTool, etc.) now inherit from PlaywrightBaseTool instead of Tool and require a PlaywrightAgentOs (or compatible) instance as their agent OS

Full Changelog: v0.31.0...v0.32.0

Contributors

mlikasam-askui and philipph-askui

Assets 2

22 Apr 10:08

philipph-askui

v0.31.0

301dab3

v0.31.0

🎉 Overview

v0.31.0 substantially improves the memory efficiency of askui. The SimpleHtmlReporter has been rearchitected to stream message rows (including base64-encoded screenshots) to a temporary file on disk instead of accumulating them in memory, significantly reducing memory usage during long-running sessions. Further, reporters are now wrapped with automatic error handling so that a failure in one reporter no longer crashes the agent.

✨ New Features

ReporterErrorHandler — a decorator that wraps any Reporter with try/except error handling; on first failure the reporter is disabled for the rest of the session, preventing reporting errors from interrupting agent execution by @mlikasam-askui in #258

🔧 Improvements

SimpleHtmlReporter now streams HTML message rows to a temporary file as they arrive instead of holding all base64 image data in memory, reducing peak memory usage for screenshot-heavy sessions by @mlikasam-askui in #258
CompositeReporter now automatically wraps all reporters in ReporterErrorHandler, making error resilience the default behavior by @mlikasam-askui in #258

Full Changelog: v0.30.0...v0.31.0

Contributors

mlikasam-askui

Assets 2

15 Apr 12:11

philipph-askui

v0.30.0

2a08598

v0.30.0

🎉 Overview

v0.30.0 introduces a new infrastructure-error handling prompt that prevents agents from entering unfixable retry loops when the underlying controller, session, or RPC connection fails. It also enriches the HTML report's conversation breakdown with per-conversation step counts, durations, and cache token statistics, and quiets noisy tool-failure logs by demoting them from WARNING to INFO.

✨ New Features

Infrastructure / tool error prompt added to the computer, Android, and multi-device agent capabilities — instructs agents to retry infrastructure failures (connection lost, session expired, RPC errors, stream closed, service unavailable, controller timeouts) at most once and otherwise stop immediately with a BROKEN report status instead of looping on unfixable errors by @philipph-askui in #265
Step count, cache_creation_input_tokens, and cache_read_input_tokens added to the per-conversation usage breakdown in SimpleHtmlReporter by @philipph-askui in #264
Per-conversation duration added to the HTML report breakdown — started_at / ended_at timestamps are captured on conversation summaries and rendered in a human-readable elapsed-time format by @philipph-askui in #266

🔧 Improvements

Tool failed logs in ToolCollection demoted from WARNING to INFO to reduce log noise during normal agent operation by @philipph-askui in #264

⚠️ Breaking Changes

UsageTrackingCallback renamed to ConversationStatisticsCallback

Full Changelog: v0.29.0...v0.30.0

Contributors

philipph-askui

Assets 2

10 Apr 12:02

philipph-askui

v0.29.0

dfc4b51

v0.29.0

🎉 Overview

v0.29.0 replaces the simple message-dropping truncation strategy with a new VLM-based SummarizingTruncationStrategy that summarizes older conversation history to preserve context while staying within token limits. It also fixes mouse scroll coordinate scaling issues, improves scroll tool descriptions with OS-specific guidance, removes get and locate from the default agent tools, hardens the move_mouse tool against malformed coordinate inputs, and makes base64 image truncation in html reports more robust.

✨ New Features

SummarizingTruncationStrategy — new default truncation strategy that uses the VLM to summarize older conversation history instead of dropping messages, with prompt caching support during summarization for cost efficiency by @philipph-askui in #257
SlidingImageWindowSummarizingTruncationStrategy (experimental) — extends summarization with dynamic image removal from older messages to reduce network traffic and latencies while staying compatible with prompt caching by @philipph-askui in #257
truncation_strategy init parameter on ComputerAgent, AndroidAgent, and Agent — allows passing a custom truncation strategy with auto-injection of conversation dependencies (vlm_provider, reporter, callbacks) by @philipph-askui in #257

🔧 Improvements

Mouse scroll tool description now includes OS-dependent scroll guidance (start with dy=150/dy=-150, macOS direction info) by @programminx-askui in #260
truncate_content in reporting replaced by truncate_base64_images — only base64 image data is replaced with placeholders, leaving all other content (prompts, tool outputs) untouched by @philipph-askui in #259
move_mouse tool now robustly parses coordinates when the agent passes them as strings or comma-separated values, with clearer tool description and improved error messages by @philipph-askui in #262

🐛 Bug Fixes

Fix incorrect coordinate scaling on mouse scroll deltas — ComputerAgentOsFacade.mouse_scroll no longer applies display scaling to scroll amounts (SOLENG-332) by @programminx-askui in #260

⚠️ Breaking Changes

SimpleTruncationStrategy and SimpleTruncationStrategyFactory removed — replaced by SummarizingTruncationStrategy as the new default
Conversation constructor parameter truncation_strategy_factory replaced by truncation_strategy (a strategy instance instead of a factory)
get and locate tools removed from Agent's default tool list — they are no longer auto-added when an agent_os is provided
mouse_scroll parameters renamed from x/y to dx/dy across all AgentOs implementations (AskUiControllerClient, PlaywrightAgentOs, ComputerAgentOsFacade, ComputerAgent)
truncate_content function in reporting.py removed — replaced by truncate_base64_images

Full Changelog: v0.28.0...v0.29.0

Contributors

programminx-askui and philipph-askui

Assets 2

03 Apr 10:05

philipph-askui

v0.28.0

927aa6d

v0.28.0

🎉 Overview

v0.28.0 integrates AgentOS as a Python package dependency (no more manual installation), adds a UIAutomator hierarchy tool for Android agents, improves support for Anthropic prompt caching to reduce inference cost, introduces Tool.from_mcp_tool() for wrapping FastMCP tools, and overhauls usage tracking with per-step and per-conversation cost breakdowns including cache token costs in the HTML reports.

✨ New Features

AgentOS shipped as Python package (askui-agent-os) — no manual installation needed by @mlikasam-askui in #246
Anthropic prompt caching (auto strategy) with cache_control parameter by @philipph-askui in #253
AndroidGetUIAutomatorHierarchyTool — accessibility hierarchy dump for Android agents, providing structured UI element data (text, resource IDs, tap centers) as an alternative to screenshot-based inference by @mlikasam-askui in #251
Hierarchical usage tracking with per-step, per-conversation, and aggregate cost breakdowns including cache token costs in HTML reports by @mlikasam-askui in #253

🔧 Improvements

Tool.from_mcp_tool() to wrap FastMCP tools as AskUI Tools by @mlikasam-askui in #250
markitdown and bson moved to optional dependencies (office-document) and pure-python-adb promoted to core to streamline the installation by @mlikasam-askui in #255
Documented optional install extras (office-document, bedrock, vertex, otel, web) in README by @mlikasam-askui in #255
Workspace ID (askui.workspace.id) added to OTEL trace resource attributes by @philipph-askui in #256
Improved tracing structure with _get_next_message() span for better observability by @philipph-askui in #256

🐛 Bug Fixes

Fix prompt caching breakpoints to improve prompt caching efficiency by @philipph-askui in #253
Fix report formatting and cache statistics accumulation by @philipph-askui in #253
Constrain grpcio<1.80.0 to avoid compatibility issues by @philipph-askui in #250
Clean up OTEL tracing: remove stale cluster_name config and unnecessary SQLAlchemy instrumentation by @philipph-askui in #256

⚠️ Breaking Changes

ASKUI_COMPONENT_REGISTRY_FILE, ASKUI_INSTALLATION_DIRECTORY, and ASKUI_CONTROLLER_PATH environment variables are no longer recognized — AgentOS is now auto-discovered via the askui-agent-os package
OtelSettings.cluster_name field and ASKUI__OTEL_CLUSTER_NAME env var removed; replaced by workspace_id / ASKUI_WORKSPACE_ID
Minimum anthropic SDK version bumped from >=0.72.0 to >=0.86.0
android optional extra removed — pure-python-adb is now a core dependency; use office-document extra for MarkItDown features previously bundled by default
bson and markitdown removed from default dependencies — install askui[office-document] if you need Office file conversion

Full Changelog: v0.27.0...v0.28.0

Contributors

mlikasam-askui and philipph-askui

Assets 2

Uh oh!

Releases: askui/python-sdk

v0.36.0

v0.36.0

🎉 Overview

✨ New Features

🔧 Improvements

🐛 Bug Fixes

Contributors

Uh oh!

v0.35.0

v0.35.0

🎉 Overview

✨ New Features

🔧 Improvements

🐛 Bug Fixes

Contributors

Uh oh!

v0.34.0

v0.34.0

🎉 Overview

✨ New Features

🔧 Improvements

Contributors

Uh oh!

v0.33.0

v0.33.0

🎉 Overview

✨ New Features

🔧 Improvements

🐛 Bug Fixes

⚠️ Breaking Changes

Contributors

Uh oh!

v0.32.1

v0.32.1

🎉 Overview

🐛 Bug Fixes

Contributors

Uh oh!

v0.32.0

v0.32.0

🎉 Overview

✨ New Features

🔧 Improvements

⚠️ Breaking Changes

Contributors

Uh oh!

v0.31.0

v0.31.0

🎉 Overview

✨ New Features

🔧 Improvements

Contributors

Uh oh!

v0.30.0

v0.30.0

🎉 Overview

✨ New Features

🔧 Improvements

⚠️ Breaking Changes

Contributors

Uh oh!

v0.29.0

v0.29.0

🎉 Overview

✨ New Features

🔧 Improvements

🐛 Bug Fixes

⚠️ Breaking Changes

Contributors

Uh oh!

v0.28.0

v0.28.0

🎉 Overview

✨ New Features

🔧 Improvements

🐛 Bug Fixes

⚠️ Breaking Changes

Contributors