ref(evals): Upgrade Slack evals to vitest-evals 0.9 by dcramer · Pull Request #283 · getsentry/junior

dcramer · 2026-05-03T22:58:40Z

Upgrade the Slack eval suite to vitest-evals@0.9.0-beta.1 and cut over to the harness-first API. Eval cases now use describeEval() with direct it(..., { run }) calls, and the old slackEval(...) wrapper is gone.

Harness Judge Path

The Slack harness now owns the judge prompt seam. RubricJudge reads JudgeContext.harness.prompt(...), which keeps judging on Junior's Pi client and Vercel AI Gateway path with openai/gpt-5.4.

Dependency Cleanup

@sentry/junior-evals no longer depends directly on @ai-sdk/gateway or zod. The suite relies on Junior's existing Pi/Gateway client and a small local parser for the judge response shape.

Eval Authoring

Eval docs and the testing spec now describe describeEval() as the canonical authoring style. The output-contract eval also narrows the heading rule to the actual contract: avoid hash-prefixed markdown headings.

Migrate Slack behavior evals to the harness-first describeEval API. Remove the old slackEval wrapper. Reuse the Slack harness prompt seam for judging through Junior's Pi client and Vercel AI Gateway. Drop direct eval-package dependencies on AI SDK Gateway and Zod after the clean cutover. Co-Authored-By: GPT-5 Codex <codex@openai.com>

vercel · 2026-05-03T22:58:45Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
junior-docs	Ready	Preview, Comment	May 3, 2026 10:58pm

sentry · 2026-05-04T01:25:29Z

+    const object = parseJudgeResult(
+      await harness.prompt(
+        formatJudgePrompt(output, formatRubric(inputValue.criteria)),


Bug: RubricJudge passes the output object (Record<string, unknown>) directly to formatJudgePrompt(), which expects a string, causing incorrect type coercion.
_{Severity: CRITICAL}

Suggested Fix

The HarnessRun context provides session.outputText, which is a string representation of the output. Use session.outputText instead of the output object when calling formatJudgePrompt to ensure the correct data is passed. Alternatively, serialize the output object to a string (e.g., using JSON.stringify) before the function call.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: packages/junior-evals/evals/helpers.ts#L383-L385 Potential issue: The `RubricJudge` function receives an `output` parameter from `JudgeContext` which is typed as `Record<string, unknown>`. This object is then passed directly to the `formatJudgePrompt` function, which expects its first argument to be a string. Due to JavaScript's type coercion, the object is converted to the literal string `"[object Object]"`. This results in every evaluation being judged against a meaningless, corrupted prompt, leading to incorrect scores and silently failing evals. The `HarnessRun` object contains both a record `output` and a string `session.outputText`, suggesting the latter should have been used.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

dcramer marked this pull request as ready for review May 4, 2026 01:22

dcramer merged commit 58eaca4 into main May 4, 2026
13 of 14 checks passed

dcramer deleted the ref/junior-evals-vitest-0-9 branch May 4, 2026 01:22

sentry Bot reviewed May 4, 2026

View reviewed changes

devin-ai-integration Bot reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref(evals): Upgrade Slack evals to vitest-evals 0.9#283

ref(evals): Upgrade Slack evals to vitest-evals 0.9#283
dcramer merged 1 commit intomainfrom
ref/junior-evals-vitest-0-9

dcramer commented May 3, 2026

Uh oh!

vercel Bot commented May 3, 2026

Uh oh!

Uh oh!

sentry Bot May 4, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dcramer commented May 3, 2026

Uh oh!

vercel Bot commented May 3, 2026

Uh oh!

Uh oh!

sentry Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant