feat(sweagent): add GenAI instrumentation, tox envs, and generated CI workflows#165
feat(sweagent): add GenAI instrumentation, tox envs, and generated CI workflows#165Cirilla-zmh wants to merge 7 commits intoalibaba:mainfrom
Conversation
Change-Id: I876d6a26e1c7dcf15fb7ef3ebd02da2c6c5e8f54 Co-developed-by: Cursor <noreply@cursor.com>
…calls Change-Id: Idb16cdc6aef2c3a8bec1a187c812d1971cdc7f43 Co-developed-by: Cursor <noreply@cursor.com>
…e_agent spans Change-Id: If5c17ac8090a245ae98c7025dcd4b4036de2a50d Co-developed-by: Cursor <noreply@cursor.com>
Change-Id: I90919511ec19d7c980316096a7cef71705cfa612 Co-developed-by: Cursor <noreply@cursor.com>
Change-Id: Ic3ed18ef1a227d00f0edaa7c54b8e772e854f01a Co-developed-by: Cursor <noreply@cursor.com>
Change-Id: Ide6ef51bfed2ef26760dceca28509699cb08b65e Co-developed-by: Cursor <noreply@cursor.com>
Change-Id: Ie9235740bdb0c8bec886036af3977061aca8b7a6 Co-developed-by: Cursor <noreply@cursor.com>
There was a problem hiding this comment.
Pull request overview
Adds a new LoongSuite instrumentation package for SWE-agent that emits GenAI semantic spans via opentelemetry-util-genai, along with tox/CI wiring and a focused unit test suite.
Changes:
- Introduces
loongsuite-instrumentation-sweagentwithSweagentInstrumentorand hook/method wrappers to generate entry / invoke_agent / react step / execute_tool spans. - Adds pytest fixtures + tests validating span names, core
gen_ai.*attributes, and parent/child hierarchy. - Extends
tox-loongsuite.iniand generated GitHub Actions workflows to run lint/tests for SWE-agent across Python 3.11–3.13.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tox-loongsuite.ini | Adds sweagent test/lint envs and dependency factors. |
| instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/test_spans.py | Verifies span emission, attributes, and nesting. |
| instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/requirements.oldest.txt | Oldest-pin test dependency set for sweagent instrumentation. |
| instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/requirements.latest.txt | Latest test dependency set (PyPI-pinned OTEL) for sweagent instrumentation. |
| instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/conftest.py | Test fixtures + env setup for SWE-agent import/runtime expectations. |
| instrumentation-loongsuite/loongsuite-instrumentation-sweagent/src/opentelemetry/instrumentation/sweagent/version.py | Defines package version. |
| instrumentation-loongsuite/loongsuite-instrumentation-sweagent/src/opentelemetry/instrumentation/sweagent/patch.py | Implements SWE-agent hook/action wrappers using ExtendedTelemetryHandler. |
| instrumentation-loongsuite/loongsuite-instrumentation-sweagent/src/opentelemetry/instrumentation/sweagent/package.py | Declares instrumented dependency metadata. |
| instrumentation-loongsuite/loongsuite-instrumentation-sweagent/src/opentelemetry/instrumentation/sweagent/init.py | Implements SweagentInstrumentor and registers wrapper installation/removal. |
| instrumentation-loongsuite/loongsuite-instrumentation-sweagent/pyproject.toml | New package metadata, deps, and entrypoint. |
| instrumentation-loongsuite/loongsuite-instrumentation-sweagent/examples/basic_example.py | Minimal usage example for local/manual validation. |
| instrumentation-loongsuite/loongsuite-instrumentation-sweagent/examples/init.py | Marks examples as a package (empty init). |
| instrumentation-loongsuite/loongsuite-instrumentation-sweagent/README.rst | Documents produced spans and usage/installation guidance. |
| instrumentation-loongsuite/loongsuite-instrumentation-sweagent/CHANGELOG.md | Adds initial unreleased changelog entry for the new package. |
| .github/workflows/loongsuite_test_0.yml | Generated CI updates to run sweagent tests in matrix. |
| .github/workflows/loongsuite_lint_0.yml | Generated CI updates to lint sweagent instrumentation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| inv = getattr(instance, "_loongsuite_react_invocation", None) | ||
| if inv is None: | ||
| return | ||
| step = kwargs.get("step") |
There was a problem hiding this comment.
wrap_combined_agent_hook_on_step_done only reads step from kwargs. If on_step_done is invoked positionally, finish_reason will never be set on the ReactStepInvocation. Please extract step from args when not present in kwargs.
| step = kwargs.get("step") | |
| step = kwargs.get("step") | |
| if step is None and args: | |
| step = args[0] |
| def test_entry_run_hooks_span(instrumented_sweagent, span_exporter): | ||
| hooks = CombinedRunHooks() | ||
| prob = MagicMock() | ||
| prob.id = "issue-42" | ||
| prob.get_problem_statement.return_value = "Fix the crash" | ||
|
|
||
| hooks.on_instance_start(index=0, env=MagicMock(), problem_statement=prob) | ||
| result = AgentRunResult( | ||
| info=AgentInfo(exit_status="Submitted"), trajectory=[] | ||
| ) | ||
| hooks.on_instance_completed(result=result) | ||
|
|
There was a problem hiding this comment.
Current tests only exercise hook wrappers using keyword arguments, so they won't catch the positional-argument cases that the wrappers should support (on_instance_start/result, on_run_done info/trajectory, on_step_done step). Adding at least one test that calls these hook methods positionally would prevent regressions.
| def wrap_combined_run_hooks_on_instance_start( | ||
| handler: ExtendedTelemetryHandler, wrapped, instance, args, kwargs | ||
| ): | ||
| instance_id, body = _problem_statement_id_and_text( | ||
| kwargs.get("problem_statement") | ||
| ) | ||
| inv = EntryInvocation( | ||
| session_id=str(instance_id) if instance_id is not None else None, | ||
| input_messages=[ | ||
| InputMessage(role="user", parts=[Text(content=body or "(empty)")]) | ||
| ], | ||
| ) | ||
| handler.start_entry(inv) | ||
| setattr(instance, "_loongsuite_entry_invocation", inv) | ||
| _instance_tls.problem_statement = kwargs.get("problem_statement") | ||
| try: |
There was a problem hiding this comment.
wrap_combined_run_hooks_on_instance_start only reads problem_statement from kwargs. If SWE-agent calls this hook positionally (common for internal calls), the entry span session_id/input_messages and the thread-local link to agent hooks will be missing. Please extract problem_statement from args as well (similar to other wrappers in this repo that use args[0] fallback).
| def wrap_combined_run_hooks_on_instance_completed( | ||
| handler: ExtendedTelemetryHandler, wrapped, instance, args, kwargs | ||
| ): | ||
| try: | ||
| return wrapped(*args, **kwargs) | ||
| finally: | ||
| inv = getattr(instance, "_loongsuite_entry_invocation", None) | ||
| if inv is None: | ||
| return | ||
| result = kwargs.get("result") | ||
| summary = ( | ||
| _build_entry_output_summary(result) | ||
| if result is not None | ||
| else "(no result)" | ||
| ) |
There was a problem hiding this comment.
wrap_combined_run_hooks_on_instance_completed only reads result from kwargs. If the hook is invoked with a positional result argument, the entry span output summary will incorrectly fall back to "(no result)". Consider pulling result from args when kwargs doesn't contain it.
| def wrap_combined_agent_hook_on_run_done( | ||
| handler: ExtendedTelemetryHandler, wrapped, instance, args, kwargs | ||
| ): | ||
| try: | ||
| return wrapped(*args, **kwargs) | ||
| finally: | ||
| inv = getattr(instance, "_loongsuite_invoke_invocation", None) | ||
| if inv is None: | ||
| return | ||
| # Same summary text as entry ``on_instance_completed`` (``AgentRunResult``-like). | ||
| result_like = SimpleNamespace( | ||
| info=kwargs.get("info"), | ||
| trajectory=kwargs.get("trajectory"), | ||
| ) | ||
| summary = _build_entry_output_summary(result_like) |
There was a problem hiding this comment.
wrap_combined_agent_hook_on_run_done assumes info/trajectory are passed via kwargs. If CombinedAgentHook.on_run_done is called positionally, invoke_agent output summary and token/finish_reason extraction (via _apply_agent_info_to_invocation) will silently miss data. Please add args-based fallbacks for info/trajectory.
|
Hi @Cirilla-zmh , thanks again for this PR.
Thanks! |
Description
What changed
SWE-agent instrumentation (
loongsuite-instrumentation-sweagent)Introduces
SweagentInstrumentor, which usesExtendedTelemetryHandlerfromopentelemetry-util-genaito emit GenAI semantic spans for SWE-agent runs: application entry (enter_ai_application_system),invoke_agent, react step, andexecute_tool(includingsweagent_bashviaDefaultAgent.handle_actionso error paths stay covered).Combined run hooks (
on_instance_start,on_instance_completed) and combined agent hooks (on_run_start/on_run_done,on_step_start/on_step_done) are wrapped; thread-local state links instance/problem context where hooks do not pass it explicitly.execute_toolderives tool name and arguments from the model’stool_callspayload when present (OpenAI-stylefunction.name/function.arguments), with fallback tosweagent_bashwhen there are no native tool calls.Packaging includes
pyproject.toml, README, LICENSE,examples/basic_example.py, and dependency pins undertests/requirements.*.txt.CI and tox
tox-loongsuite.ini: envs forpy3{11,12,13}-test-loongsuite-instrumentation-sweagent-{oldest,latest}andlint-loongsuite-instrumentation-sweagent(SWE-agent requires Python ≥ 3.11);sweagent-latestuses pinned requirements as documented in tox comments..github/workflows/:loongsuite_lint_0.ymlandloongsuite_test_0.ymlare generated (see file header:tox -e generate-workflows).Tests
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/test_spans.py— span names, coregen_ai.*attributes, and parent hierarchy for the instrumented flows.instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/conftest.py— shared test fixtures.Documentation
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/CHANGELOG.md— Unreleased notes for GenAI telemetry, dependency changes, and initial skeleton/tox.Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Does This PR Require a Core Repo Change?
Checklist:
See contributing.md for styleguide, changelog guidelines, and more.