feat(sweagent): add GenAI instrumentation, tox envs, and generated CI workflows by Cirilla-zmh · Pull Request #165 · alibaba/loongsuite-python-agent

Cirilla-zmh · 2026-04-14T14:44:31Z

Description

What changed

SWE-agent instrumentation (`loongsuite-instrumentation-sweagent`)

Introduces SweagentInstrumentor, which uses ExtendedTelemetryHandler from opentelemetry-util-genai to emit GenAI semantic spans for SWE-agent runs: application entry (enter_ai_application_system), invoke_agent, react step, and execute_tool (including sweagent_bash via DefaultAgent.handle_action so error paths stay covered).

Combined run hooks (on_instance_start, on_instance_completed) and combined agent hooks (on_run_start / on_run_done, on_step_start / on_step_done) are wrapped; thread-local state links instance/problem context where hooks do not pass it explicitly. execute_tool derives tool name and arguments from the model’s tool_calls payload when present (OpenAI-style function.name / function.arguments), with fallback to sweagent_bash when there are no native tool calls.

Packaging includes pyproject.toml, README, LICENSE, examples/basic_example.py, and dependency pins under tests/requirements.*.txt.

CI and tox

tox-loongsuite.ini: envs for py3{11,12,13}-test-loongsuite-instrumentation-sweagent-{oldest,latest} and lint-loongsuite-instrumentation-sweagent (SWE-agent requires Python ≥ 3.11); sweagent-latest uses pinned requirements as documented in tox comments.
.github/workflows/: loongsuite_lint_0.yml and loongsuite_test_0.yml are generated (see file header: tox -e generate-workflows).

Tests

instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/test_spans.py — span names, core gen_ai.* attributes, and parent hierarchy for the instrumented flows.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/conftest.py — shared test fixtures.

Documentation

instrumentation-loongsuite/loongsuite-instrumentation-sweagent/CHANGELOG.md — Unreleased notes for GenAI telemetry, dependency changes, and initial skeleton/tox.

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Unit tests

Does This PR Require a Core Repo Change?

No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

Followed the style guidelines of this project
Changelogs have been updated
Unit tests have been added
Documentation has been updated

Change-Id: I876d6a26e1c7dcf15fb7ef3ebd02da2c6c5e8f54 Co-developed-by: Cursor <noreply@cursor.com>

…calls Change-Id: Idb16cdc6aef2c3a8bec1a187c812d1971cdc7f43 Co-developed-by: Cursor <noreply@cursor.com>

…e_agent spans Change-Id: If5c17ac8090a245ae98c7025dcd4b4036de2a50d Co-developed-by: Cursor <noreply@cursor.com>

Change-Id: I90919511ec19d7c980316096a7cef71705cfa612 Co-developed-by: Cursor <noreply@cursor.com>

Change-Id: Ic3ed18ef1a227d00f0edaa7c54b8e772e854f01a Co-developed-by: Cursor <noreply@cursor.com>

Change-Id: Ide6ef51bfed2ef26760dceca28509699cb08b65e Co-developed-by: Cursor <noreply@cursor.com>

Change-Id: Ie9235740bdb0c8bec886036af3977061aca8b7a6 Co-developed-by: Cursor <noreply@cursor.com>

Copilot

Pull request overview

Adds a new LoongSuite instrumentation package for SWE-agent that emits GenAI semantic spans via opentelemetry-util-genai, along with tox/CI wiring and a focused unit test suite.

Changes:

Introduces loongsuite-instrumentation-sweagent with SweagentInstrumentor and hook/method wrappers to generate entry / invoke_agent / react step / execute_tool spans.
Adds pytest fixtures + tests validating span names, core gen_ai.* attributes, and parent/child hierarchy.
Extends tox-loongsuite.ini and generated GitHub Actions workflows to run lint/tests for SWE-agent across Python 3.11–3.13.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tox-loongsuite.ini	Adds sweagent test/lint envs and dependency factors.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/test_spans.py	Verifies span emission, attributes, and nesting.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/requirements.oldest.txt	Oldest-pin test dependency set for sweagent instrumentation.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/requirements.latest.txt	Latest test dependency set (PyPI-pinned OTEL) for sweagent instrumentation.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/conftest.py	Test fixtures + env setup for SWE-agent import/runtime expectations.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/src/opentelemetry/instrumentation/sweagent/version.py	Defines package version.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/src/opentelemetry/instrumentation/sweagent/patch.py	Implements SWE-agent hook/action wrappers using `ExtendedTelemetryHandler`.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/src/opentelemetry/instrumentation/sweagent/package.py	Declares instrumented dependency metadata.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/src/opentelemetry/instrumentation/sweagent/init.py	Implements `SweagentInstrumentor` and registers wrapper installation/removal.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/pyproject.toml	New package metadata, deps, and entrypoint.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/examples/basic_example.py	Minimal usage example for local/manual validation.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/examples/init.py	Marks examples as a package (empty init).
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/README.rst	Documents produced spans and usage/installation guidance.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/CHANGELOG.md	Adds initial unreleased changelog entry for the new package.
.github/workflows/loongsuite_test_0.yml	Generated CI updates to run sweagent tests in matrix.
.github/workflows/loongsuite_lint_0.yml	Generated CI updates to lint sweagent instrumentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T00:34:45Z

+        inv = getattr(instance, "_loongsuite_react_invocation", None)
+        if inv is None:
+            return
+        step = kwargs.get("step")


wrap_combined_agent_hook_on_step_done only reads step from kwargs. If on_step_done is invoked positionally, finish_reason will never be set on the ReactStepInvocation. Please extract step from args when not present in kwargs.

Suggested change

step = kwargs.get("step")

step = kwargs.get("step")

if step is None and args:

step = args[0]

Copilot · 2026-04-17T00:34:46Z

+def test_entry_run_hooks_span(instrumented_sweagent, span_exporter):
+    hooks = CombinedRunHooks()
+    prob = MagicMock()
+    prob.id = "issue-42"
+    prob.get_problem_statement.return_value = "Fix the crash"
+
+    hooks.on_instance_start(index=0, env=MagicMock(), problem_statement=prob)
+    result = AgentRunResult(
+        info=AgentInfo(exit_status="Submitted"), trajectory=[]
+    )
+    hooks.on_instance_completed(result=result)
+


Current tests only exercise hook wrappers using keyword arguments, so they won't catch the positional-argument cases that the wrappers should support (on_instance_start/result, on_run_done info/trajectory, on_step_done step). Adding at least one test that calls these hook methods positionally would prevent regressions.

Copilot · 2026-04-17T00:34:46Z

+def wrap_combined_run_hooks_on_instance_start(
+    handler: ExtendedTelemetryHandler, wrapped, instance, args, kwargs
+):
+    instance_id, body = _problem_statement_id_and_text(
+        kwargs.get("problem_statement")
+    )
+    inv = EntryInvocation(
+        session_id=str(instance_id) if instance_id is not None else None,
+        input_messages=[
+            InputMessage(role="user", parts=[Text(content=body or "(empty)")])
+        ],
+    )
+    handler.start_entry(inv)
+    setattr(instance, "_loongsuite_entry_invocation", inv)
+    _instance_tls.problem_statement = kwargs.get("problem_statement")
+    try:


wrap_combined_run_hooks_on_instance_start only reads problem_statement from kwargs. If SWE-agent calls this hook positionally (common for internal calls), the entry span session_id/input_messages and the thread-local link to agent hooks will be missing. Please extract problem_statement from args as well (similar to other wrappers in this repo that use args[0] fallback).

Copilot · 2026-04-17T00:34:46Z

+def wrap_combined_run_hooks_on_instance_completed(
+    handler: ExtendedTelemetryHandler, wrapped, instance, args, kwargs
+):
+    try:
+        return wrapped(*args, **kwargs)
+    finally:
+        inv = getattr(instance, "_loongsuite_entry_invocation", None)
+        if inv is None:
+            return
+        result = kwargs.get("result")
+        summary = (
+            _build_entry_output_summary(result)
+            if result is not None
+            else "(no result)"
+        )


wrap_combined_run_hooks_on_instance_completed only reads result from kwargs. If the hook is invoked with a positional result argument, the entry span output summary will incorrectly fall back to "(no result)". Consider pulling result from args when kwargs doesn't contain it.

Copilot · 2026-04-17T00:34:46Z

+def wrap_combined_agent_hook_on_run_done(
+    handler: ExtendedTelemetryHandler, wrapped, instance, args, kwargs
+):
+    try:
+        return wrapped(*args, **kwargs)
+    finally:
+        inv = getattr(instance, "_loongsuite_invoke_invocation", None)
+        if inv is None:
+            return
+        # Same summary text as entry ``on_instance_completed`` (``AgentRunResult``-like).
+        result_like = SimpleNamespace(
+            info=kwargs.get("info"),
+            trajectory=kwargs.get("trajectory"),
+        )
+        summary = _build_entry_output_summary(result_like)


wrap_combined_agent_hook_on_run_done assumes info/trajectory are passed via kwargs. If CombinedAgentHook.on_run_done is called positionally, invoke_agent output summary and token/finish_reason extraction (via _apply_agent_info_to_invocation) will silently miss data. Please add args-based fallbacks for info/trajectory.

sipercai · 2026-04-24T03:17:59Z

Hi @Cirilla-zmh , thanks again for this PR.

main has moved forward and this PR is now a bit behind the latest base, so could you please rebase it onto the current main and rerun the checks when convenient?

Thanks!

Cirilla-zmh added 5 commits April 14, 2026 22:41

Initialize instrumentation of swe-agent

4a778f9

Change-Id: I876d6a26e1c7dcf15fb7ef3ebd02da2c6c5e8f54 Co-developed-by: Cursor <noreply@cursor.com>

feat(sweagent): derive execute_tool name and arguments from LLM tool_…

d54fc08

…calls Change-Id: Idb16cdc6aef2c3a8bec1a187c812d1971cdc7f43 Co-developed-by: Cursor <noreply@cursor.com>

feat(sweagent): tool_calls-aware execute_tool and entry-aligned invok…

a891150

…e_agent spans Change-Id: If5c17ac8090a245ae98c7025dcd4b4036de2a50d Co-developed-by: Cursor <noreply@cursor.com>

align latest genai util

3fe139e

Change-Id: I90919511ec19d7c980316096a7cef71705cfa612 Co-developed-by: Cursor <noreply@cursor.com>

generate workflows

e7bd71e

Change-Id: Ic3ed18ef1a227d00f0edaa7c54b8e772e854f01a Co-developed-by: Cursor <noreply@cursor.com>

Cirilla-zmh added enhancement New feature or request instrumentaion The instrumentation label represents issues related to instrumentation. genai The genai label represents issues related to generative AI. labels Apr 14, 2026

github-actions Bot assigned 123liuziming, Cirilla-zmh and ralf0131 Apr 14, 2026

github-actions Bot requested review from 123liuziming and ralf0131 April 14, 2026 14:45

Cirilla-zmh added 2 commits April 14, 2026 22:50

add change log

84536ed

Change-Id: Ide6ef51bfed2ef26760dceca28509699cb08b65e Co-developed-by: Cursor <noreply@cursor.com>

fix version

6303cbd

Change-Id: Ie9235740bdb0c8bec886036af3977061aca8b7a6 Co-developed-by: Cursor <noreply@cursor.com>

ralf0131 requested a review from Copilot April 17, 2026 00:30

Copilot started reviewing on behalf of ralf0131 April 17, 2026 00:30 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sweagent): add GenAI instrumentation, tox envs, and generated CI workflows#165

feat(sweagent): add GenAI instrumentation, tox envs, and generated CI workflows#165
Cirilla-zmh wants to merge 7 commits intoalibaba:mainfrom
Cirilla-zmh:feat/swe-agent

Cirilla-zmh commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

sipercai commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Cirilla-zmh commented Apr 14, 2026

Description

What changed

SWE-agent instrumentation (loongsuite-instrumentation-sweagent)

CI and tox

Tests

Documentation

Type of change

How Has This Been Tested?

Does This PR Require a Core Repo Change?

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

sipercai commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SWE-agent instrumentation (`loongsuite-instrumentation-sweagent`)