Skip to content

Conversation

@amanjaiswal73892
Copy link
Collaborator

@amanjaiswal73892 amanjaiswal73892 commented Aug 13, 2025

This pull request introduces support for OpenAI GPT-5 models ("gpt-5-mini" and "gpt-5-nano") across the codebase, updates agent and tool use configurations to leverage these new models, modernizes and standardizes dependency management (moving to pyproject.toml and uv for installs), and adds a minimal benchmark setup utility for MiniWob++. It also includes improvements to GitHub Actions workflows to use uv for Python environment management, ensuring faster and more reproducible CI runs.

Major changes:

GPT-5 Model Integration

  • Added AGENT_GPT5_MINI and corresponding configuration to agent_configs.py, and registered it in __init__.py for generic agents. This agent uses the new GPT-5-mini model and sets appropriate action flags. [1] [2] [3]
  • Added GPT-5 model arguments (GPT_5_mini, GPT_5_nano) to tool_use_agent.py, and updated default tool-use agent (OAI_AGENT) to use GPT-5-mini. Also added agent configs for both GPT-5-mini and GPT-5-nano. [1] [2]
  • Extended CHAT_MODEL_ARGS_DICT in llm_configs.py to include OpenAI GPT-5-mini and GPT-5-nano with appropriate token limits and temperature settings. [1] [2]

Dependency and Packaging Modernization

  • Moved all core and development dependencies into pyproject.toml under [project] and [project.optional-dependencies], removing dynamic dependency loading from requirements files. [1] [2] [3]
  • Updated docs/source/requirements.txt to only include documentation-specific dependencies, and updated .readthedocs.yaml to use the new dependency structure. [1] [2]

CI/CD Workflow Improvements

  • Refactored GitHub Actions workflows (unit_tests.yml, code_format.yml, python_version_compatibility.yml) to use uv for Python installation, dependency syncing, and running commands, replacing pip and improving caching and reproducibility. [1] [2] [3] [4]
  • Standardized package listing and script invocation in CI to use uv pip list and uv run .... [1] [2]

Benchmark Setup Utility

  • Added src/agentlab/benchmarks/setup_benchmark.py, a minimal helper for setting up MiniWob++ benchmarks by cloning the repo at a pinned commit and updating .env with the local URL.

Other Notable Changes

  • Updated the OpenAI API call in chat_api.py to use max_completion_tokens (for newer OpenAI models) instead of max_tokens.
  • Minor notebook and import cleanups for consistency and kernel/version updates. [1] [2] [3] [4]

Let me know if you want a deeper dive into any particular change!

Description by Korbit AI

What change is being made?

Add tutorial documentation, integrate GPT-5 models, and improve the installation process using uv for dependency management and environment setup.

Why are these changes being made?

These changes enhance the project by providing users with step-by-step tutorials for launching agents and evaluating on MiniWob, incorporate the usage of new GPT-5 models for improved AI capabilities, and streamline the installation and management of dependencies to improve developer experience. The use of uv aims to simplify and standardize the installation process, replacing manual Python setup steps and pip commands, while new models and security-focused tutorials modernize and expand the project's capabilities.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

recursix and others added 30 commits August 8, 2025 14:00
recursix and others added 14 commits August 12, 2025 11:44
- Created new attack scenario in `attack_2.txt` to simulate identity verification prompts for agents and digital assistants.
- Added detailed instructions and observations in `prompt_0.txt` for listing reviewers mentioning small ear cups.
- Introduced `prompt_2.txt` to track food-related shopping expenses for March 2023, including comprehensive action space and interaction history.
@korbit-ai
Copy link

korbit-ai bot commented Aug 13, 2025

Korbit doesn't automatically review large (3000+ lines changed) pull requests such as this one. If you want me to review anyway, use /korbit-review.

@amanjaiswal73892 amanjaiswal73892 merged commit 6522057 into main Aug 13, 2025
6 checks passed
@amanjaiswal73892 amanjaiswal73892 deleted the tutorial branch August 13, 2025 16:00
@amanjaiswal73892 amanjaiswal73892 restored the tutorial branch August 13, 2025 16:03
amanjaiswal73892 added a commit that referenced this pull request Aug 20, 2025
* tutorial

* Update readme to include test note

* update toml to dynamic requirements and add uv.lock file

* Add tutorial to setup python env with uv

* tutorial 2

* Update dependencies in pyproject.toml and uv.lock to allow for newer versions of torch and add anthropic

* Implement code changes to enhance functionality and improve performance

* Fix tutorial instructions by moving git clone and cd commands to the correct section

* Refactor tutorial content and remove commented-out dependencies in pyproject.toml

* add instruction to activate the env

* Add support for GPT-5 models and update tutorial instructions

* Update OpenAI API Key instructions in tutorial

* Refactor tutorial headings for consistency and clarity

* add oai oss and gpt-5 models

* Update deperecated param `max_tokens`-> `max_completion_tokens` in chat_api

* add OpenRouter versions of gpt 5 model series.

* port o3 model to openrouter

* update response api test

* remove deprecated o1-mini model from main.py

* Add Gpt-5-nano in tool-use-agent

* fix GPT 5 mini and nano config

* Add litellm pricing as a backup princing backend.

* Add GPT-5 mini agent

* Add GPT-5-Mini to agentlab-assistant.

* Add initial readme for prompt injection tutorial

* add ipykernal and dot_env to dependency

* add notebook to setup miniwob and launch experiments.

* update formatting in launch_experiments.ipynb

* update readme in 2_eval_on_miniwob

* update readme for 2_eval_on_miniwob and grammar fix.

* grammar fix readme tutorial 2.

* Add prompt injection tutorials and update attack scenarios

- Created new attack scenario in `attack_2.txt` to simulate identity verification prompts for agents and digital assistants.
- Added detailed instructions and observations in `prompt_0.txt` for listing reviewers mentioning small ear cups.
- Introduced `prompt_2.txt` to track food-related shopping expenses for March 2023, including comprehensive action space and interaction history.

* update T1 readme with a note to install additional playwright deps.

* Update readme.md

* Update readme.md

* Update readme.md

* clear output

* add miniwob automatic install in agentlab.

* update experiment.py  to include miniwob auto-install and envars export in T2

* black refactor agent-config.py

* Add cmd to checkout tutorial branch

* remove launch_experiment notebook from T2

* minor fixes in T1 read me and spell check,

* update CI/CD to use uv

* Implement code changes to enhance functionality and improve performance

* Update README and experiment script for clarity and consistency

* Fix stale tests.

* fix stale test

* add darglint as dev dependency

* update CI/CD apply formatting only src.

* update darglint to be run from py3.12

---------

Co-authored-by: recursix <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants