Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions api/oss/src/core/evaluations/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@
},
]

METRICS_STEP_TYPES = {"invocation", "annotation"}

DEFAULT_REFRESH_INTERVAL = 1 # minute(s)


Expand Down Expand Up @@ -887,8 +889,14 @@ async def _refresh_metrics(
log.warning("run or run.data or run.data.steps not found")
return []

step_types_by_key: Dict[str, str] = {
step.key: step.type
for step in run.data.steps
if step.type in METRICS_STEP_TYPES
}

steps_metrics_keys: Dict[str, List[Dict[str, str]]] = {
step.key: [] for step in run.data.steps if step.type == "annotation"
step_key: [] for step_key in step_types_by_key
}

if not steps_metrics_keys:
Expand Down Expand Up @@ -929,6 +937,9 @@ async def _refresh_metrics(
inferred_metrics_keys_by_step: Dict[str, List[Dict[str, str]]] = {}

for step in run.data.steps:
if step.type not in METRICS_STEP_TYPES:
continue

steps_metrics_keys[step.key] = deepcopy(DEFAULT_METRICS)

if step.type == "annotation":
Expand Down Expand Up @@ -1051,11 +1062,6 @@ async def _refresh_metrics(
path=metric.get("path") or "*",
)
for metric in step_metrics_keys
] + [
MetricSpec(
type=MetricType.JSON,
path="attributes.ag",
)
]
Comment on lines 1062 to 1065
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 JSON MetricSpec removed for all step types, but PR intent is to keep it for annotation steps

The PR description explicitly states change #3: "Only append the JSON metric spec (attributes.ag) for annotation steps." However, the code removes the JSON MetricSpec entirely for all step types, including annotation steps.

Root Cause and Impact

The old code unconditionally appended a MetricSpec(type=MetricType.JSON, path="attributes.ag") to every step's specs list. The stated fix was to stop appending it for invocation steps (to avoid unwanted JSON output metrics) while keeping it for annotation steps.

But the new code at lines 1059–1065 simply builds specs from step_metrics_keys without any conditional JSON spec:

specs = [
    MetricSpec(
        type=MetricType(metric.get("type")),
        path=metric.get("path") or "*",
    )
    for metric in step_metrics_keys
]

The old code had:

] + [
    MetricSpec(
        type=MetricType.JSON,
        path="attributes.ag",
    )
]

This additional JSON spec is now gone for all steps. For annotation steps, this means the analytics query no longer requests JSON-type extraction of attributes.ag, which was previously used to produce JSON metric output (see api/oss/src/dbs/postgres/tracing/utils.py:983 and :1848 where MetricType.JSON drives specific CTE logic).

The step_types_by_key dict created at line 892 tracks each step's type, so the loop at line 1034 could conditionally add the JSON spec for annotation steps using step_types_by_key.get(step_key) == "annotation", but this check is missing.

Impact: Annotation steps lose their JSON metric output that was previously computed, causing a regression in evaluation metrics for annotation steps.

(Refers to lines 1059-1065)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Member Author

@mmabrouk mmabrouk Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was explicitely removed. See reasoning @jp-agenta


# log.info(f"[METRICS] Step '{step_key}': {len(specs)} metric specs")
Expand Down