feat(api): prefer x-agenta-flags for evaluation chat detection #3623

mmabrouk · 2026-02-03T19:09:32Z

Summary

Stacked on #3622

Updates evaluation OpenAPI parsing to prefer the new x-agenta-flags.is_chat vendor extension, with fallback to legacy heuristics.

Changes

Parse x-agenta-flags.is_chat from OpenAPI /test or /run operations
Fall back to legacy heuristic (check for messages property or x-parameter: messages)
Thread is_chat through payload construction so message parsing only runs for chat apps
Add temporary logging to distinguish which detection path was used

Logging (temporary)

Chat detection from x-agenta-flags  is_chat=True  path=/test

or

Chat detection fallback to heuristic  is_chat=True  path=/test

These logs will be removed after validation.

Files Changed

api/oss/src/services/llm_apps_service.py
api/oss/src/core/evaluations/tasks/legacy.py
docs/design/chat-interface-rfc/status.md

- Parse x-agenta-flags.is_chat from OpenAPI operations when available - Fall back to legacy heuristic based on messages fields - Thread is_chat into evaluation payload building and add temporary logs

vercel · 2026-02-03T19:09:38Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Feb 9, 2026 10:32pm

…detection

The SDK (PR #3622) changed the OpenAPI vendor extension from a flat 'x-agenta-flags' key to a nested 'x-agenta: {flags: {...}}' structure. Update _get_openapi_chat_flag to read from the new nested path. Also removes unused imports (common, make_hash_id) caught by ruff.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

api/oss/src/services/llm_apps_service.py

…llback, restore removed variables - Fix TODO comment: says 'messages' column, not 'chats' - Remove datapoint.get('chat') fallback — 'chat' was the old column name, the FE now uses 'messages'. No need for backward compat. - Restore references/links variables + imports that were removed by ruff as unused — they belong to a commented-out make_hash_id call and are out of scope for this PR.

…SaveTestsetModal SaveTestsetModal.tsx hardcodes 'chat' as the column name when re-saving evaluation results to a testset. Several FE readers (DebugSection, CHAT_ARRAY_KEYS, legacy evaluation exports) also reference 'chat'. Keep the fallback: prefer 'messages', fall back to 'chat'.

mmabrouk · 2026-02-09T23:01:52Z

Testing Results

What was tested

Deployed feat/chat-interface-eval-detection on dev environment (port 8180)
Ran an evaluation on a chat app with a testset containing a messages column
Verified worker logs confirm x-agenta.flags path is used (not heuristic fallback):
```
Chat detection from x-agenta.flags  is_chat=True  path=/test
```

What's confirmed working

_get_openapi_chat_flag() correctly reads the nested x-agenta: {flags: {is_chat: true}} from OpenAPI
is_chat=True is passed through batch_invoke → run_with_retry → invoke_app → make_payload
make_payload gates payload["messages"] behind is_chat (no longer injected unconditionally)

Additional tests needed

Non-chat app evaluation: run eval on a completion (non-chat) app and confirm is_chat=False or None — payload["messages"] should NOT be injected
Heuristic fallback: test with an app that doesn't emit x-agenta.flags (e.g. older SDK) — should fall back to messages property/parameter heuristic and log "Chat detection fallback to heuristic"
chat column fallback: run eval with a testset that has a chat column (not messages) — verify datapoint.get("chat") fallback works

feat(api): prefer x-agenta-flags for evaluation chat detection

e6de2a6

- Parse x-agenta-flags.is_chat from OpenAPI operations when available - Fall back to legacy heuristic based on messages fields - Thread is_chat into evaluation payload building and add temporary logs

vercel bot deployed to Preview February 3, 2026 19:10 View deployment

mmabrouk added 2 commits February 9, 2026 20:24

Merge branch 'feat/new-chat-interface' into feat/chat-interface-eval-…

bc0e5c9

…detection

mmabrouk force-pushed the feat/chat-interface-eval-detection branch from 8ef59cf to a7478cd Compare February 9, 2026 19:31

vercel bot deployed to Preview February 9, 2026 19:32 View deployment

mmabrouk marked this pull request as ready for review February 9, 2026 19:37

dosubot bot added Backend Evaluation labels Feb 9, 2026

devin-ai-integration bot reviewed Feb 9, 2026

View reviewed changes

mmabrouk commented Feb 9, 2026

View reviewed changes

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 9, 2026

vercel bot deployed to Preview February 9, 2026 22:19 View deployment

vercel bot deployed to Preview February 9, 2026 22:24 View deployment

style(api): ruff format legacy.py

15878f6

vercel bot deployed to Preview February 9, 2026 22:32 View deployment

mmabrouk requested a review from jp-agenta February 9, 2026 23:00

mmabrouk mentioned this pull request Feb 9, 2026

feat(frontend): prefer x-agenta.flags.is_chat for chat variant detection #3690

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): prefer x-agenta-flags for evaluation chat detection #3623

feat(api): prefer x-agenta-flags for evaluation chat detection #3623

mmabrouk commented Feb 3, 2026

Uh oh!

vercel bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mmabrouk commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(api): prefer x-agenta-flags for evaluation chat detection #3623

Are you sure you want to change the base?

feat(api): prefer x-agenta-flags for evaluation chat detection #3623

Conversation

mmabrouk commented Feb 3, 2026

Summary

Changes

Logging (temporary)

Files Changed

Uh oh!

vercel bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mmabrouk commented Feb 9, 2026

Testing Results

What was tested

What's confirmed working

Additional tests needed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Feb 3, 2026 •

edited

Loading