Fix open-source filter for mini-SWE-agent-v2 on Verified leaderboard by aorwall · Pull Request #56 · SWE-bench/swe-bench.github.io

aorwall · 2026-03-27T10:56:51Z

Summary

Fixes SWE-bench/SWE-bench#544 — selecting Verified → mini-SWE-agent-v2 → Open source only showed no results because leaderboards.json was stale.

Root cause: os_model was false for 4 open-weight models (DeepSeek V3.2, GLM-5, Kimi K2.5, MiniMax M2.5) in the generated JSON, even though their metadata.yaml files in experiments/ correctly had os_model: true. The JSON had not been regenerated.
Fix: Regenerated leaderboards.json via python -m analysis.get_leaderboard from the experiments repo.

All changes from regeneration

os_model: false → true for DeepSeek V3.2, GLM-5, Kimi K2.5, MiniMax M2.5 (+ their Verified cross-listings)
model_release_date populated (was null for all bash-only/multilingual entries)
Duplicate "GPT 5.2 Codex" entry removed from bash-only and Multilingual
S3 logs URLs added for GPT 5.2 Codex entries

Related: SWE-bench/experiments#433 (metadata name fix for GPT 5.2 Codex)

Test plan

Open https://www.swebench.com/index.html after deploy
Select Verified → mini-SWE-agent-v2 → Open source only → verify 4 models appear
Select Proprietary only → verify 9 models appear
Select All models → verify all 13 models appear

ofirpress · 2026-03-28T16:28:53Z

name corrected to "GPT-5-2 Codex"
hmm it should stay "GPT 5.2 Codex" i think

Fixes SWE-bench/SWE-bench#544 — the "Open source only" filter on the Verified leaderboard for mini-SWE-agent-v2 showed no results because four open-weight models had os_model incorrectly set to false. Changes from regeneration: - Fix os_model: false → true for DeepSeek V3.2, GLM-5, Kimi K2.5, MiniMax M2.5 (plus their Verified cross-listings) - Populate model_release_date (was null for all bash-only/multilingual entries) - Remove duplicate "GPT 5.2 Codex" entry, fix name to "GPT-5-2 Codex" - Add S3 logs URLs for GPT-5-2 Codex entries

aorwall · 2026-03-29T10:19:32Z

name corrected to "GPT-5-2 Codex"
hmm it should stay "GPT 5.2 Codex" i think

Updated SWE-bench/experiments#433

aorwall force-pushed the fix/regenerate-leaderboards-os-model branch from 36b3d65 to 12dd14d Compare March 29, 2026 10:14

aorwall mentioned this pull request Mar 29, 2026

Fix GPT 5.2 Codex display name in metadata SWE-bench/experiments#433

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix open-source filter for mini-SWE-agent-v2 on Verified leaderboard#56

Fix open-source filter for mini-SWE-agent-v2 on Verified leaderboard#56
aorwall wants to merge 1 commit intomasterfrom
fix/regenerate-leaderboards-os-model

aorwall commented Mar 27, 2026 •

edited

Loading

Uh oh!

ofirpress commented Mar 28, 2026

Uh oh!

aorwall commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aorwall commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

All changes from regeneration

Test plan

Uh oh!

ofirpress commented Mar 28, 2026

Uh oh!

aorwall commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aorwall commented Mar 27, 2026 •

edited

Loading