Skip to content

feat(sweagent): add GenAI instrumentation, tox envs, and generated CI workflows#165

Open
Cirilla-zmh wants to merge 7 commits intoalibaba:mainfrom
Cirilla-zmh:feat/swe-agent
Open

feat(sweagent): add GenAI instrumentation, tox envs, and generated CI workflows#165
Cirilla-zmh wants to merge 7 commits intoalibaba:mainfrom
Cirilla-zmh:feat/swe-agent

Conversation

@Cirilla-zmh
Copy link
Copy Markdown
Collaborator

Description

What changed

SWE-agent instrumentation (loongsuite-instrumentation-sweagent)

Introduces SweagentInstrumentor, which uses ExtendedTelemetryHandler from opentelemetry-util-genai to emit GenAI semantic spans for SWE-agent runs: application entry (enter_ai_application_system), invoke_agent, react step, and execute_tool (including sweagent_bash via DefaultAgent.handle_action so error paths stay covered).

Combined run hooks (on_instance_start, on_instance_completed) and combined agent hooks (on_run_start / on_run_done, on_step_start / on_step_done) are wrapped; thread-local state links instance/problem context where hooks do not pass it explicitly. execute_tool derives tool name and arguments from the model’s tool_calls payload when present (OpenAI-style function.name / function.arguments), with fallback to sweagent_bash when there are no native tool calls.

Packaging includes pyproject.toml, README, LICENSE, examples/basic_example.py, and dependency pins under tests/requirements.*.txt.

CI and tox

  • tox-loongsuite.ini: envs for py3{11,12,13}-test-loongsuite-instrumentation-sweagent-{oldest,latest} and lint-loongsuite-instrumentation-sweagent (SWE-agent requires Python ≥ 3.11); sweagent-latest uses pinned requirements as documented in tox comments.
  • .github/workflows/: loongsuite_lint_0.yml and loongsuite_test_0.yml are generated (see file header: tox -e generate-workflows).

Tests

  • instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/test_spans.py — span names, core gen_ai.* attributes, and parent hierarchy for the instrumented flows.
  • instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/conftest.py — shared test fixtures.

Documentation

  • instrumentation-loongsuite/loongsuite-instrumentation-sweagent/CHANGELOG.md — Unreleased notes for GenAI telemetry, dependency changes, and initial skeleton/tox.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Unit tests

Does This PR Require a Core Repo Change?

  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Change-Id: I876d6a26e1c7dcf15fb7ef3ebd02da2c6c5e8f54
Co-developed-by: Cursor <noreply@cursor.com>
…calls

Change-Id: Idb16cdc6aef2c3a8bec1a187c812d1971cdc7f43
Co-developed-by: Cursor <noreply@cursor.com>
…e_agent spans

Change-Id: If5c17ac8090a245ae98c7025dcd4b4036de2a50d
Co-developed-by: Cursor <noreply@cursor.com>
Change-Id: I90919511ec19d7c980316096a7cef71705cfa612
Co-developed-by: Cursor <noreply@cursor.com>
Change-Id: Ic3ed18ef1a227d00f0edaa7c54b8e772e854f01a
Co-developed-by: Cursor <noreply@cursor.com>
@Cirilla-zmh Cirilla-zmh added enhancement New feature or request instrumentaion The instrumentation label represents issues related to instrumentation. genai The genai label represents issues related to generative AI. labels Apr 14, 2026
Change-Id: Ide6ef51bfed2ef26760dceca28509699cb08b65e
Co-developed-by: Cursor <noreply@cursor.com>
Change-Id: Ie9235740bdb0c8bec886036af3977061aca8b7a6
Co-developed-by: Cursor <noreply@cursor.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new LoongSuite instrumentation package for SWE-agent that emits GenAI semantic spans via opentelemetry-util-genai, along with tox/CI wiring and a focused unit test suite.

Changes:

  • Introduces loongsuite-instrumentation-sweagent with SweagentInstrumentor and hook/method wrappers to generate entry / invoke_agent / react step / execute_tool spans.
  • Adds pytest fixtures + tests validating span names, core gen_ai.* attributes, and parent/child hierarchy.
  • Extends tox-loongsuite.ini and generated GitHub Actions workflows to run lint/tests for SWE-agent across Python 3.11–3.13.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tox-loongsuite.ini Adds sweagent test/lint envs and dependency factors.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/test_spans.py Verifies span emission, attributes, and nesting.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/requirements.oldest.txt Oldest-pin test dependency set for sweagent instrumentation.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/requirements.latest.txt Latest test dependency set (PyPI-pinned OTEL) for sweagent instrumentation.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/tests/conftest.py Test fixtures + env setup for SWE-agent import/runtime expectations.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/src/opentelemetry/instrumentation/sweagent/version.py Defines package version.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/src/opentelemetry/instrumentation/sweagent/patch.py Implements SWE-agent hook/action wrappers using ExtendedTelemetryHandler.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/src/opentelemetry/instrumentation/sweagent/package.py Declares instrumented dependency metadata.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/src/opentelemetry/instrumentation/sweagent/init.py Implements SweagentInstrumentor and registers wrapper installation/removal.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/pyproject.toml New package metadata, deps, and entrypoint.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/examples/basic_example.py Minimal usage example for local/manual validation.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/examples/init.py Marks examples as a package (empty init).
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/README.rst Documents produced spans and usage/installation guidance.
instrumentation-loongsuite/loongsuite-instrumentation-sweagent/CHANGELOG.md Adds initial unreleased changelog entry for the new package.
.github/workflows/loongsuite_test_0.yml Generated CI updates to run sweagent tests in matrix.
.github/workflows/loongsuite_lint_0.yml Generated CI updates to lint sweagent instrumentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

inv = getattr(instance, "_loongsuite_react_invocation", None)
if inv is None:
return
step = kwargs.get("step")
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap_combined_agent_hook_on_step_done only reads step from kwargs. If on_step_done is invoked positionally, finish_reason will never be set on the ReactStepInvocation. Please extract step from args when not present in kwargs.

Suggested change
step = kwargs.get("step")
step = kwargs.get("step")
if step is None and args:
step = args[0]

Copilot uses AI. Check for mistakes.
Comment on lines +125 to +136
def test_entry_run_hooks_span(instrumented_sweagent, span_exporter):
hooks = CombinedRunHooks()
prob = MagicMock()
prob.id = "issue-42"
prob.get_problem_statement.return_value = "Fix the crash"

hooks.on_instance_start(index=0, env=MagicMock(), problem_statement=prob)
result = AgentRunResult(
info=AgentInfo(exit_status="Submitted"), trajectory=[]
)
hooks.on_instance_completed(result=result)

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current tests only exercise hook wrappers using keyword arguments, so they won't catch the positional-argument cases that the wrappers should support (on_instance_start/result, on_run_done info/trajectory, on_step_done step). Adding at least one test that calls these hook methods positionally would prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +245 to +260
def wrap_combined_run_hooks_on_instance_start(
handler: ExtendedTelemetryHandler, wrapped, instance, args, kwargs
):
instance_id, body = _problem_statement_id_and_text(
kwargs.get("problem_statement")
)
inv = EntryInvocation(
session_id=str(instance_id) if instance_id is not None else None,
input_messages=[
InputMessage(role="user", parts=[Text(content=body or "(empty)")])
],
)
handler.start_entry(inv)
setattr(instance, "_loongsuite_entry_invocation", inv)
_instance_tls.problem_statement = kwargs.get("problem_statement")
try:
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap_combined_run_hooks_on_instance_start only reads problem_statement from kwargs. If SWE-agent calls this hook positionally (common for internal calls), the entry span session_id/input_messages and the thread-local link to agent hooks will be missing. Please extract problem_statement from args as well (similar to other wrappers in this repo that use args[0] fallback).

Copilot uses AI. Check for mistakes.
Comment on lines +269 to +283
def wrap_combined_run_hooks_on_instance_completed(
handler: ExtendedTelemetryHandler, wrapped, instance, args, kwargs
):
try:
return wrapped(*args, **kwargs)
finally:
inv = getattr(instance, "_loongsuite_entry_invocation", None)
if inv is None:
return
result = kwargs.get("result")
summary = (
_build_entry_output_summary(result)
if result is not None
else "(no result)"
)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap_combined_run_hooks_on_instance_completed only reads result from kwargs. If the hook is invoked with a positional result argument, the entry span output summary will incorrectly fall back to "(no result)". Consider pulling result from args when kwargs doesn't contain it.

Copilot uses AI. Check for mistakes.
Comment on lines +323 to +337
def wrap_combined_agent_hook_on_run_done(
handler: ExtendedTelemetryHandler, wrapped, instance, args, kwargs
):
try:
return wrapped(*args, **kwargs)
finally:
inv = getattr(instance, "_loongsuite_invoke_invocation", None)
if inv is None:
return
# Same summary text as entry ``on_instance_completed`` (``AgentRunResult``-like).
result_like = SimpleNamespace(
info=kwargs.get("info"),
trajectory=kwargs.get("trajectory"),
)
summary = _build_entry_output_summary(result_like)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap_combined_agent_hook_on_run_done assumes info/trajectory are passed via kwargs. If CombinedAgentHook.on_run_done is called positionally, invoke_agent output summary and token/finish_reason extraction (via _apply_agent_info_to_invocation) will silently miss data. Please add args-based fallbacks for info/trajectory.

Copilot uses AI. Check for mistakes.
@sipercai
Copy link
Copy Markdown
Collaborator

Hi @Cirilla-zmh , thanks again for this PR.

main has moved forward and this PR is now a bit behind the latest base, so could you please rebase it onto the current main and rerun the checks when convenient?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request genai The genai label represents issues related to generative AI. instrumentaion The instrumentation label represents issues related to instrumentation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants