Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
9ee2367
moving the browsergym.experiment.benchmark module to agentlab
TLSDC Apr 23, 2025
c2e2b9c
added comment for new parameter
TLSDC Apr 23, 2025
596fcd2
BaseMessages take into account 'input_text' key too (for xray)
TLSDC Apr 23, 2025
f9d7b91
convenient array to base64 function
TLSDC Apr 23, 2025
73ba428
tool agent embryo
TLSDC Apr 23, 2025
c11db49
Merge branch 'main' of github.com:ServiceNow/AgentLab into tlsdc/tool…
TLSDC Apr 24, 2025
6604dbc
added the MessageBuilder class, which should help interfacing APIs
TLSDC Apr 24, 2025
ef6f648
claude
TLSDC Apr 30, 2025
4e973ac
adding markdown display for MessageBuilder in xray
TLSDC May 1, 2025
54ec412
changed LLM structure to be more versatile
TLSDC May 1, 2025
0fc43cc
unified claude and openai response apis
TLSDC May 2, 2025
19cdaf9
i dont think this is relevant anymore
TLSDC May 2, 2025
5b3f469
backtracking from moving bgym.benchmarks etc
TLSDC May 2, 2025
087ad75
defaulting to claude bc it's better
TLSDC May 2, 2025
8a17470
kind of forced to comment this to avoid circular imports atm
TLSDC May 2, 2025
5f675ba
Merge branch 'main' of github.com:ServiceNow/AgentLab into tlsdc/tool…
TLSDC May 2, 2025
234be09
parametrized env output to agent_args
TLSDC May 2, 2025
544908e
fixing broken import in test
TLSDC May 2, 2025
16cc3cd
Add pricing tracking for Anthropic model and refactor pricing functions
recursix May 8, 2025
c674094
Enhance ToolUseAgent with token counting and improved message handlin…
recursix May 9, 2025
528b513
Update action in ClaudeResponseModel to None for improved clarity
recursix May 9, 2025
417893c
typo
recursix May 13, 2025
c676eab
typo
recursix May 13, 2025
ab2d331
Remove unnecessary import of anthropic for cleaner code
recursix May 13, 2025
bf57591
moving some utils to agent_utils.py
amanjaiswal73892 May 14, 2025
ce72b41
Fix: Formatting ang Darglint
amanjaiswal73892 May 15, 2025
fe05d75
Refactor: Simplify message builder methods and add support for chat c…
amanjaiswal73892 May 15, 2025
97a39cc
added vllm-support-for-tool-use-agent
amanjaiswal73892 May 17, 2025
7d8a08c
Moving some functions to llm utils.py
amanjaiswal73892 May 21, 2025
c45df5a
Refactor: Substitute OpenRouterAPIMessageBuilder with OpenAIChatCompl…
amanjaiswal73892 May 21, 2025
230701e
Refactor: Enhance API call retry logic and add pricing tracking for O…
amanjaiswal73892 May 22, 2025
9eaf044
Add comprehensive tests for OpenAI and Anthropic API message builders
amanjaiswal73892 May 22, 2025
fd0dc97
enclose reasoning content in <think> tags
amanjaiswal73892 May 26, 2025
1ce85a0
Added more tests and refactor tools argument.
amanjaiswal73892 May 26, 2025
2892720
Refactor response API to unify output structure and enhance message b…
amanjaiswal73892 May 27, 2025
5dda4c6
Refactor ToolUseAgentArgs and MessageBuilder for improved message han…
amanjaiswal73892 May 28, 2025
0f3f849
Add MultiToolUseAgent implementation with goal and observation handling
recursix Jun 2, 2025
abaa6ef
Refactor: Simplify retrieval of use_raw_page_output in ExpArgs
recursix Jun 2, 2025
48598fc
Refactor: Substitute OpenRouterAPIMessageBuilder with OpenAIChatCompl…
amanjaiswal73892 May 21, 2025
01cf22f
Refactor: Enhance API call retry logic and add pricing tracking for O…
amanjaiswal73892 May 22, 2025
4771ac0
Add comprehensive tests for OpenAI and Anthropic API message builders
amanjaiswal73892 May 22, 2025
85c9edd
enclose reasoning content in <think> tags
amanjaiswal73892 May 26, 2025
ce06956
Added more tests and refactor tools argument.
amanjaiswal73892 May 26, 2025
3ad97e0
Refactor response API to unify output structure and enhance message b…
amanjaiswal73892 May 27, 2025
3814c37
Refactor ToolUseAgentArgs and MessageBuilder for improved message han…
amanjaiswal73892 May 28, 2025
70e7bfb
Refactor ToolUseAgent and multi_tool_agent to improve message handlin…
recursix Jun 2, 2025
ffbccac
Refactor response assertions in tests for consistent string formatting
amanjaiswal73892 Jun 2, 2025
4defc32
Merge Multitool agent
amanjaiswal73892 Jun 2, 2025
ada020a
add defaults to LLMOutputs
amanjaiswal73892 Jun 2, 2025
c95f55c
Refactor Obs and ToolUseAgent to improve message handling and respons…
recursix Jun 2, 2025
aed851f
Enhance mouse pointer functionality in agent utilities and update met…
recursix Jun 2, 2025
d766684
merge new changes from allac/next-agent
amanjaiswal73892 Jun 2, 2025
cb585a8
Improve message formatting in MessageBuilder and add spacing in Claud…
recursix Jun 3, 2025
14f9914
Refactor Obs class to adjust webpage usage settings and add TODO for …
recursix Jun 3, 2025
caa058a
Fix anthropic error code 400
amanjaiswal73892 Jun 3, 2025
27da915
Pass system message as an API param for anthropic API
amanjaiswal73892 Jun 3, 2025
e51033c
Feature: Added generic "usage" stats tracking for API that support "u…
amanjaiswal73892 Jun 4, 2025
55da176
Refactor message formatting to include role-based markdown conversion…
recursix Jun 4, 2025
206f7c3
Update parallel backend options in Study class documentation
recursix Jun 4, 2025
21be58f
Refactor Obs and Summarizer classes: remove unused tool_calls in Obs,…
recursix Jun 4, 2025
6481fae
adding task hints
recursix Jun 5, 2025
99595a9
add simple tool caching for anthropic
amanjaiswal73892 Jun 6, 2025
b1dc69c
Added effective cost using cache per api call in stats
amanjaiswal73892 Jun 6, 2025
39084c0
Update hints in hint_db.csv for filling up forms and search results
recursix Jun 9, 2025
c5ea716
Enhance summarization functionality in ToolUseAgent: add detailed ini…
recursix Jun 9, 2025
1f83b1c
Improve error handling and calculation in global stats and summarizat…
recursix Jun 9, 2025
3fd1e59
Refactor action default value to None in LLMOutput and related models…
recursix Jun 9, 2025
3bb882f
Add cache_complete_prompt option for caching complete prompts in LLM …
amanjaiswal73892 Jun 10, 2025
a17c119
Add draw_click_indicator function to enhance image annotation with cl…
recursix Jun 10, 2025
d4780b0
Disable mouse pointer addition in Obs class; refine summarizer instru…
recursix Jun 10, 2025
dbc065c
Merge branch 'aj/tool_use_agent_chat_completion_support' into allac/n…
recursix Jun 10, 2025
60eed9e
Add StructuredDiscussion class to manage message groups and improve m…
recursix Jun 11, 2025
d70433f
Update hint_db.csv with new hints for dragging items and improve exis…
amanjaiswal73892 Jun 11, 2025
a16f024
Merge remote-tracking branch 'origin/allac/next-agent' into aj/tool_u…
amanjaiswal73892 Jun 11, 2025
cd40737
Enhance screenshot tagging with mouse drag-and-drop support and add a…
amanjaiswal73892 Jun 11, 2025
459afad
Added ability to add custom cache breakpoints for anthropic models
amanjaiswal73892 Jun 12, 2025
b813df7
Enhance StructuredDiscussion to group messages with summaries and adj…
recursix Jun 12, 2025
7f4c018
Merge branch 'aj/tool_use_agent_chat_completion_support' into allac/n…
recursix Jun 12, 2025
8ebbd7f
display "effective cost" instead of cost in xray interface if available.
amanjaiswal73892 Jun 12, 2025
b18e1e0
Merge remote-tracking branch 'origin/allac/next-agent' into aj/tool_u…
amanjaiswal73892 Jun 12, 2025
ca1c42c
Add method to retrieve last summary and update ToolUseAgent logic for…
recursix Jun 16, 2025
8e9ff18
Add loading functionality for experiment directory choices in demo
recursix Jun 16, 2025
b22ffb4
Add function to parse function call strings and extract name and argu…
amanjaiswal73892 Jun 16, 2025
f3eff52
Remove unnecessary comments and whitespace in agent.py
amanjaiswal73892 Jun 16, 2025
707462e
Remove commented-out code in MessageBuilder and response models
amanjaiswal73892 Jun 16, 2025
b53582c
updated openai tests to wrap string args in qoutes.
amanjaiswal73892 Jun 16, 2025
7c0e018
Refactor whitespace and formatting in tracking.py
amanjaiswal73892 Jun 16, 2025
a361ffb
Some clean up in preparation for merging
recursix Jun 17, 2025
ede24f9
it was comented out for cirular import, but seems like there no cirul…
recursix Jun 18, 2025
acecba0
Remove unnecessary blank line and add note about click actions in zoo…
recursix Jun 18, 2025
619626d
More hints
recursix Jun 18, 2025
b8da2e9
Reformatting, improve docstring and fixing type hints for python < 3.12
amanjaiswal73892 Jun 18, 2025
50c0811
fix qoutes in f-string
amanjaiswal73892 Jun 18, 2025
af93036
fix white space and formatting
amanjaiswal73892 Jun 18, 2025
4fb4621
Stash last commit
recursix Jun 18, 2025
f173885
Add python version compatibility check in github workflow
amanjaiswal73892 Jun 18, 2025
f6060c2
update darglint version 3.11 in github workflow
amanjaiswal73892 Jun 18, 2025
ebb5b92
refactor action string parsing to support python 3.11
amanjaiswal73892 Jun 18, 2025
422ce80
removing openai CUA from here as it will be moved to a different file
amanjaiswal73892 Jun 18, 2025
c0bf32f
Mock OpenAI client in tests to avoid dependency on OPENAI_API_KEY
amanjaiswal73892 Jun 19, 2025
0b3943f
Add docstrings for function arguments and return types in agent_utils…
amanjaiswal73892 Jun 19, 2025
a4b38a4
black refactoring and improve docstring in response_api.py
amanjaiswal73892 Jun 19, 2025
e7bb788
Update Python version in Darglint workflow to 3.12 and fix formatting…
amanjaiswal73892 Jun 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/darglint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'
python-version: '3.12'
cache: 'pip' # caching pip dependencies

- name: Pip install
Expand Down
40 changes: 40 additions & 0 deletions .github/workflows/python_version_compatibility.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: Python Compatibility (Info Only)

on:
push:
branches:
- main
pull_request:

jobs:
info-check:
runs-on: ubuntu-latest
continue-on-error: true
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v4

# Optional: Cache uv for faster runs
- name: Cache uv
uses: actions/cache@v4
with:
path: ~/.cargo/bin/uv
key: uv-${{ runner.os }}

- name: Install uv
run: |
if [ ! -f ~/.cargo/bin/uv ]; then
curl -LsSf https://astral.sh/uv/install.sh | sh
fi

- name: Check Python ${{ matrix.python-version }}
continue-on-error: true
run: |
export PATH="$HOME/.cargo/bin:$PATH"
if uvx --python ${{ matrix.python-version }} --from python --with-requirements requirements.txt python -c "print('✅ Compatible')"; then
echo "✅ Python ${{ matrix.python-version }} works"
else
echo "❌ Python ${{ matrix.python-version }} incompatible"
fi
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -171,3 +171,4 @@ results/
outputs/
miniwob-plusplus/
.miniwob-server.pid
debugging_results/
4 changes: 2 additions & 2 deletions src/agentlab/agents/agent_args.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import bgym
from bgym import AbstractAgentArgs
from bgym import AbstractAgentArgs, Benchmark


class AgentArgs(AbstractAgentArgs):
Expand All @@ -14,7 +14,7 @@ class MyAgentArgs(AgentArgs):
Note: for working properly with AgentXRay, the arguments need to be serializable and hasable.
"""

def set_benchmark(self, benchmark: bgym.Benchmark, demo_mode: bool):
def set_benchmark(self, benchmark: Benchmark, demo_mode: bool):
"""Optional method to set benchmark specific flags.

This allows the agent to have minor adjustments based on the benchmark.
Expand Down
267 changes: 267 additions & 0 deletions src/agentlab/agents/agent_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
from logging import warning
from typing import Optional, Tuple

import numpy as np
from PIL import Image, ImageDraw
from playwright.sync_api import Page

"""
This module contains utility functions for handling observations and actions in the context of agent interactions.
"""


def tag_screenshot_with_action(screenshot: Image, action: str) -> Image:
"""
If action is a coordinate action, try to render it on the screenshot.

e.g. mouse_click(120, 130) -> draw a dot at (120, 130) on the screenshot

Args:
screenshot: The screenshot to tag.
action: The action to tag the screenshot with.

Returns:
The tagged screenshot.

Raises:
ValueError: If the action parsing fails.
"""
if action.startswith("mouse_click"):
try:
coords = action[action.index("(") + 1 : action.index(")")].split(",")
coords = [c.strip() for c in coords]
if len(coords) not in [2, 3]:
raise ValueError(f"Invalid coordinate format: {coords}")
if coords[0].startswith("x="):
coords[0] = coords[0][2:]
if coords[1].startswith("y="):
coords[1] = coords[1][2:]
x, y = float(coords[0].strip()), float(coords[1].strip())
draw = ImageDraw.Draw(screenshot)
radius = 5
draw.ellipse(
(x - radius, y - radius, x + radius, y + radius), fill="blue", outline="blue"
)
except (ValueError, IndexError) as e:
warning(f"Failed to parse action '{action}': {e}")

elif action.startswith("mouse_drag_and_drop"):
try:
func_name, parsed_args = parse_func_call_string(action)
if func_name == "mouse_drag_and_drop" and parsed_args is not None:
args, kwargs = parsed_args
x1, y1, x2, y2 = None, None, None, None

if args and len(args) >= 4:
# Positional arguments: mouse_drag_and_drop(x1, y1, x2, y2)
x1, y1, x2, y2 = map(float, args[:4])
elif kwargs:
# Keyword arguments: mouse_drag_and_drop(from_x=x1, from_y=y1, to_x=x2, to_y=y2)
x1 = float(kwargs.get("from_x", 0))
y1 = float(kwargs.get("from_y", 0))
x2 = float(kwargs.get("to_x", 0))
y2 = float(kwargs.get("to_y", 0))

if all(coord is not None for coord in [x1, y1, x2, y2]):
draw = ImageDraw.Draw(screenshot)
# Draw the main line
draw.line((x1, y1, x2, y2), fill="red", width=2)
# Draw arrowhead at the end point using the helper function
draw_arrowhead(draw, (x1, y1), (x2, y2))
except (ValueError, IndexError) as e:
warning(f"Failed to parse action '{action}': {e}")
return screenshot


def add_mouse_pointer_from_action(screenshot: Image, action: str) -> Image.Image:

if action.startswith("mouse_click"):
try:
coords = action[action.index("(") + 1 : action.index(")")].split(",")
coords = [c.strip() for c in coords]
if len(coords) not in [2, 3]:
raise ValueError(f"Invalid coordinate format: {coords}")
if coords[0].startswith("x="):
coords[0] = coords[0][2:]
if coords[1].startswith("y="):
coords[1] = coords[1][2:]
x, y = int(coords[0].strip()), int(coords[1].strip())
screenshot = draw_mouse_pointer(screenshot, x, y)
except (ValueError, IndexError) as e:
warning(f"Failed to parse action '{action}': {e}")
return screenshot


def draw_mouse_pointer(image: Image.Image, x: int, y: int) -> Image.Image:
"""
Draws a semi-transparent mouse pointer at (x, y) on the image.
Returns a new image with the pointer drawn.

Args:
image: The image to draw the mouse pointer on.
x: The x coordinate for the mouse pointer.
y: The y coordinate for the mouse pointer.

Returns:
A new image with the mouse pointer drawn.
"""
pointer_size = 20 # Length of the pointer
overlay = image.convert("RGBA").copy()
draw = ImageDraw.Draw(overlay)

# Define pointer shape (a simple arrow)
pointer_shape = [
(x, y),
(x + pointer_size, y + pointer_size // 2),
(x + pointer_size // 2, y + pointer_size // 2),
(x + pointer_size // 2, y + pointer_size),
]

draw.polygon(pointer_shape, fill=(0, 0, 0, 128)) # 50% transparent black

return Image.alpha_composite(image.convert("RGBA"), overlay)


def draw_arrowhead(draw, start, end, arrow_length=15, arrow_angle=30):
from math import atan2, cos, radians, sin

angle = atan2(end[1] - start[1], end[0] - start[0])
left = (
end[0] - arrow_length * cos(angle - radians(arrow_angle)),
end[1] - arrow_length * sin(angle - radians(arrow_angle)),
)
right = (
end[0] - arrow_length * cos(angle + radians(arrow_angle)),
end[1] - arrow_length * sin(angle + radians(arrow_angle)),
)
draw.line([end, left], fill="red", width=4)
draw.line([end, right], fill="red", width=4)


def draw_click_indicator(image: Image.Image, x: int, y: int) -> Image.Image:
"""
Draws a click indicator (+ shape with disconnected lines) at (x, y) on the image.
Returns a new image with the click indicator drawn.

Args:
image: The image to draw the click indicator on.
x: The x coordinate for the click indicator.
y: The y coordinate for the click indicator.

Returns:
A new image with the click indicator drawn.
"""
line_length = 10 # Length of each line segment
gap = 4 # Gap from center point
line_width = 2 # Thickness of lines

overlay = image.convert("RGBA").copy()
draw = ImageDraw.Draw(overlay)

# Draw 4 lines forming a + shape with gaps in the center
# Each line has a white outline and black center for visibility on any background

# Top line
draw.line(
[(x, y - gap - line_length), (x, y - gap)], fill=(255, 255, 255, 200), width=line_width + 2
) # White outline
draw.line(
[(x, y - gap - line_length), (x, y - gap)], fill=(0, 0, 0, 255), width=line_width
) # Black center

# Bottom line
draw.line(
[(x, y + gap), (x, y + gap + line_length)], fill=(255, 255, 255, 200), width=line_width + 2
) # White outline
draw.line(
[(x, y + gap), (x, y + gap + line_length)], fill=(0, 0, 0, 255), width=line_width
) # Black center

# Left line
draw.line(
[(x - gap - line_length, y), (x - gap, y)], fill=(255, 255, 255, 200), width=line_width + 2
) # White outline
draw.line(
[(x - gap - line_length, y), (x - gap, y)], fill=(0, 0, 0, 255), width=line_width
) # Black center

# Right line
draw.line(
[(x + gap, y), (x + gap + line_length, y)], fill=(255, 255, 255, 200), width=line_width + 2
) # White outline
draw.line(
[(x + gap, y), (x + gap + line_length, y)], fill=(0, 0, 0, 255), width=line_width
) # Black center

return Image.alpha_composite(image.convert("RGBA"), overlay)


def zoom_webpage(page: Page, zoom_factor: float = 1.5):
"""
Zooms the webpage to the specified zoom factor.

NOTE: Click actions with bid doesn't work properly when zoomed in.

Args:
page: The Playwright Page object.
zoom_factor: The zoom factor to apply (default is 1.5).

Returns:
Page: The modified Playwright Page object.

Raises:
ValueError: If zoom_factor is less than or equal to 0.
"""

if zoom_factor <= 0:
raise ValueError("Zoom factor must be greater than 0.")

page.evaluate(f"document.documentElement.style.zoom='{zoom_factor*100}%'")
return page


def parse_func_call_string(call_str: str) -> Tuple[Optional[str], Optional[Tuple[list, dict]]]:
"""
Parse a function call string and extract the function name and arguments.

Args:
call_str (str): A string like "mouse_click(100, 200)" or "mouse_drag_and_drop(x=10, y=20)"

Returns:
Tuple (func_name, (args, kwargs)), or (None, None) if parsing fails
"""
import ast

try:
tree = ast.parse(call_str.strip(), mode="eval")
if not isinstance(tree.body, ast.Call):
return None, None

call_node = tree.body

# Function name
if isinstance(call_node.func, ast.Name):
func_name = call_node.func.id
else:
return None, None

# Positional arguments
args = []
for arg in call_node.args:
try:
args.append(ast.literal_eval(arg))
except (ValueError, TypeError):
return None, None

# Keyword arguments
kwargs = {}
for kw in call_node.keywords:
try:
kwargs[kw.arg] = ast.literal_eval(kw.value)
except (ValueError, TypeError):
return None, None

return func_name, (args, kwargs)

except (SyntaxError, ValueError, TypeError):
return None, None
10 changes: 3 additions & 7 deletions src/agentlab/agents/dynamic_prompting.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,9 @@
from warnings import warn

import bgym
from bgym import HighLevelActionSetArgs
from browsergym.core.action.base import AbstractActionSet
from browsergym.utils.obs import (
flatten_axtree_to_str,
flatten_dom_to_str,
overlay_som,
prune_html,
)
from browsergym.utils.obs import flatten_axtree_to_str, flatten_dom_to_str, overlay_som, prune_html

from agentlab.llm.llm_utils import (
BaseMessage,
Expand Down Expand Up @@ -99,7 +95,7 @@ class ObsFlags(Flags):

@dataclass
class ActionFlags(Flags):
action_set: bgym.HighLevelActionSetArgs = None # should be set by the set_benchmark method
action_set: HighLevelActionSetArgs = None # should be set by the set_benchmark method
long_description: bool = True
individual_examples: bool = False

Expand Down
Loading