Skip to content

Releases: askui/python-sdk

v0.32.1

30 Apr 13:54
de61dbf

Choose a tag to compare

v0.32.1

🎉 Overview

v0.32.1 fixes a bug that led to a crash if the optional "web" dependency group was not installed.

🐛 Bug Fixes

Full Changelog: v0.32.0...v0.32.1

v0.32.0

30 Apr 08:54
e480db5

Choose a tag to compare

v0.32.0

🎉 Overview

v0.32.0 introduces the new WebAgent, a browser automation agent with native Playwright tools for mouse, keyboard, and screenshot interactions. The release also adds numpad key support across the AgentOS keyboard abstraction.

✨ New Features

  • WebAgent — a new browser automation agent with a full suite of Playwright tools (screenshot, move_mouse, mouse_click, mouse_scroll, mouse_hold_down, mouse_release, type, keyboard_tap, keyboard_pressed, keyboard_release) in addition to the existing navigation tools by @philipph-askui in #267
  • Numpad key support — added numpad_lock, numpad_0numpad_9, numpad_+, numpad_-, numpad_*, numpad_/, and numpad_. to PcKey with corresponding Playwright key mappings by @mlikasam-askui in #269

🔧 Improvements

⚠️ Breaking Changes

  • WebVisionAgent is deprecated — use WebAgent instead. WebVisionAgent still works but emits a DeprecationWarning
  • WebAgent now extends Agent directly instead of ComputerAgent, with a new constructor signature that accepts callbacks and truncation_strategy parameters
  • Playwright navigation tools (PlaywrightGotoTool, PlaywrightBackTool, etc.) now inherit from PlaywrightBaseTool instead of Tool and require a PlaywrightAgentOs (or compatible) instance as their agent OS

Full Changelog: v0.31.0...v0.32.0

v0.31.0

22 Apr 10:08
301dab3

Choose a tag to compare

v0.31.0

🎉 Overview

v0.31.0 substantially improves the memory efficiency of askui. The SimpleHtmlReporter has been rearchitected to stream message rows (including base64-encoded screenshots) to a temporary file on disk instead of accumulating them in memory, significantly reducing memory usage during long-running sessions. Further, reporters are now wrapped with automatic error handling so that a failure in one reporter no longer crashes the agent.

✨ New Features

  • ReporterErrorHandler — a decorator that wraps any Reporter with try/except error handling; on first failure the reporter is disabled for the rest of the session, preventing reporting errors from interrupting agent execution by @mlikasam-askui in #258

🔧 Improvements

  • SimpleHtmlReporter now streams HTML message rows to a temporary file as they arrive instead of holding all base64 image data in memory, reducing peak memory usage for screenshot-heavy sessions by @mlikasam-askui in #258
  • CompositeReporter now automatically wraps all reporters in ReporterErrorHandler, making error resilience the default behavior by @mlikasam-askui in #258

Full Changelog: v0.30.0...v0.31.0

v0.30.0

15 Apr 12:11

Choose a tag to compare

v0.30.0

🎉 Overview

v0.30.0 introduces a new infrastructure-error handling prompt that prevents agents from entering unfixable retry loops when the underlying controller, session, or RPC connection fails. It also enriches the HTML report's conversation breakdown with per-conversation step counts, durations, and cache token statistics, and quiets noisy tool-failure logs by demoting them from WARNING to INFO.

✨ New Features

  • Infrastructure / tool error prompt added to the computer, Android, and multi-device agent capabilities — instructs agents to retry infrastructure failures (connection lost, session expired, RPC errors, stream closed, service unavailable, controller timeouts) at most once and otherwise stop immediately with a BROKEN report status instead of looping on unfixable errors by @philipph-askui in #265
  • Step count, cache_creation_input_tokens, and cache_read_input_tokens added to the per-conversation usage breakdown in SimpleHtmlReporter by @philipph-askui in #264
  • Per-conversation duration added to the HTML report breakdown — started_at / ended_at timestamps are captured on conversation summaries and rendered in a human-readable elapsed-time format by @philipph-askui in #266

🔧 Improvements

  • Tool failed logs in ToolCollection demoted from WARNING to INFO to reduce log noise during normal agent operation by @philipph-askui in #264

⚠️ Breaking Changes

  • UsageTrackingCallback renamed to ConversationStatisticsCallback

Full Changelog: v0.29.0...v0.30.0

v0.29.0

10 Apr 12:02
dfc4b51

Choose a tag to compare

v0.29.0

🎉 Overview

v0.29.0 replaces the simple message-dropping truncation strategy with a new VLM-based SummarizingTruncationStrategy that summarizes older conversation history to preserve context while staying within token limits. It also fixes mouse scroll coordinate scaling issues, improves scroll tool descriptions with OS-specific guidance, removes get and locate from the default agent tools, hardens the move_mouse tool against malformed coordinate inputs, and makes base64 image truncation in html reports more robust.

✨ New Features

  • SummarizingTruncationStrategy — new default truncation strategy that uses the VLM to summarize older conversation history instead of dropping messages, with prompt caching support during summarization for cost efficiency by @philipph-askui in #257
  • SlidingImageWindowSummarizingTruncationStrategy (experimental) — extends summarization with dynamic image removal from older messages to reduce network traffic and latencies while staying compatible with prompt caching by @philipph-askui in #257
  • truncation_strategy init parameter on ComputerAgent, AndroidAgent, and Agent — allows passing a custom truncation strategy with auto-injection of conversation dependencies (vlm_provider, reporter, callbacks) by @philipph-askui in #257

🔧 Improvements

  • Mouse scroll tool description now includes OS-dependent scroll guidance (start with dy=150/dy=-150, macOS direction info) by @programminx-askui in #260
  • truncate_content in reporting replaced by truncate_base64_images — only base64 image data is replaced with placeholders, leaving all other content (prompts, tool outputs) untouched by @philipph-askui in #259
  • move_mouse tool now robustly parses coordinates when the agent passes them as strings or comma-separated values, with clearer tool description and improved error messages by @philipph-askui in #262

🐛 Bug Fixes

  • Fix incorrect coordinate scaling on mouse scroll deltas — ComputerAgentOsFacade.mouse_scroll no longer applies display scaling to scroll amounts (SOLENG-332) by @programminx-askui in #260

⚠️ Breaking Changes

  • SimpleTruncationStrategy and SimpleTruncationStrategyFactory removed — replaced by SummarizingTruncationStrategy as the new default
  • Conversation constructor parameter truncation_strategy_factory replaced by truncation_strategy (a strategy instance instead of a factory)
  • get and locate tools removed from Agent's default tool list — they are no longer auto-added when an agent_os is provided
  • mouse_scroll parameters renamed from x/y to dx/dy across all AgentOs implementations (AskUiControllerClient, PlaywrightAgentOs, ComputerAgentOsFacade, ComputerAgent)
  • truncate_content function in reporting.py removed — replaced by truncate_base64_images

Full Changelog: v0.28.0...v0.29.0

v0.28.0

03 Apr 10:05
927aa6d

Choose a tag to compare

v0.28.0

🎉 Overview

v0.28.0 integrates AgentOS as a Python package dependency (no more manual installation), adds a UIAutomator hierarchy tool for Android agents, improves support for Anthropic prompt caching to reduce inference cost, introduces Tool.from_mcp_tool() for wrapping FastMCP tools, and overhauls usage tracking with per-step and per-conversation cost breakdowns including cache token costs in the HTML reports.

✨ New Features

  • AgentOS shipped as Python package (askui-agent-os) — no manual installation needed by @mlikasam-askui in #246
  • Anthropic prompt caching (auto strategy) with cache_control parameter by @philipph-askui in #253
  • AndroidGetUIAutomatorHierarchyTool — accessibility hierarchy dump for Android agents, providing structured UI element data (text, resource IDs, tap centers) as an alternative to screenshot-based inference by @mlikasam-askui in #251
  • Hierarchical usage tracking with per-step, per-conversation, and aggregate cost breakdowns including cache token costs in HTML reports by @mlikasam-askui in #253

🔧 Improvements

  • Tool.from_mcp_tool() to wrap FastMCP tools as AskUI Tools by @mlikasam-askui in #250
  • markitdown and bson moved to optional dependencies (office-document) and pure-python-adb promoted to core to streamline the installation by @mlikasam-askui in #255
  • Documented optional install extras (office-document, bedrock, vertex, otel, web) in README by @mlikasam-askui in #255
  • Workspace ID (askui.workspace.id) added to OTEL trace resource attributes by @philipph-askui in #256
  • Improved tracing structure with _get_next_message() span for better observability by @philipph-askui in #256

🐛 Bug Fixes

  • Fix prompt caching breakpoints to improve prompt caching efficiency by @philipph-askui in #253
  • Fix report formatting and cache statistics accumulation by @philipph-askui in #253
  • Constrain grpcio<1.80.0 to avoid compatibility issues by @philipph-askui in #250
  • Clean up OTEL tracing: remove stale cluster_name config and unnecessary SQLAlchemy instrumentation by @philipph-askui in #256

⚠️ Breaking Changes

  • ASKUI_COMPONENT_REGISTRY_FILE, ASKUI_INSTALLATION_DIRECTORY, and ASKUI_CONTROLLER_PATH environment variables are no longer recognized — AgentOS is now auto-discovered via the askui-agent-os package
  • OtelSettings.cluster_name field and ASKUI__OTEL_CLUSTER_NAME env var removed; replaced by workspace_id / ASKUI_WORKSPACE_ID
  • Minimum anthropic SDK version bumped from >=0.72.0 to >=0.86.0
  • android optional extra removed — pure-python-adb is now a core dependency; use office-document extra for MarkItDown features previously bundled by default
  • bson and markitdown removed from default dependencies — install askui[office-document] if you need Office file conversion

Full Changelog: v0.27.0...v0.28.0

v0.27.0

18 Mar 09:55
6f6b83b

Choose a tag to compare

v0.27.0

🎉 Overview

v0.27.0 adds a MultiDeviceAgent that can operate android and computer devices simultaneously, improves the SDK structure by introducing default tool lists for the ComputerAgent, and AndroidAgent, and fixes a bug with single-display handling on android.

✨ New Features

🔧 Improvements

🐛 Bug Fixes

Full Changelog: v0.26.1...v0.27.0

v0.26.1

11 Mar 09:45
01b2e31

Choose a tag to compare

What's Changed

  • calls to the print_to_console_tool are now skipped during cached executions, to save time and tokens by @philipph-askui in #245

Full Changelog: v0.26.0...v0.26.1

v0.26.0

10 Mar 12:07
bdf15bd

Choose a tag to compare

v0.26.0

🎉 Overview

v0.26.0 adds a MultiDeviceAgent that can operate android and computer devices simultaneously, improves the SDK structure by introducing default tool lists for the ComputerAgent, and AndroidAgent, and fixes a bug with single-display handling on android.

✨ New Features

🔧 Improvements

🐛 Bug Fixes

  • refactor(android): single-display handling and SingleAndroidDisplay by @mlikasam-askui in #240

Full Changelog: v0.25.1...v0.26.0

v0.25.1

05 Mar 15:53
f13564e

Choose a tag to compare

What's Changed

the value of model_id for vlm_providers can now be set as env variable by @philipph-askui in #239

Full Changelog: v0.25.0...v0.25.1