Skip to content

Commit 51177e6

Browse files
committed
update LMArena+AA
1 parent 76ed167 commit 51177e6

File tree

2 files changed

+13
-1
lines changed

2 files changed

+13
-1
lines changed

LMArena+AA.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,19 @@
77

88
| 大模型 | ReLE评测 | | AA-Intelligence | AA-Coding | AA-Math | | LMArena-Text-overall | LMArena-Text-coding | LMArena-WebDev |
99
|:---------------------------------------|:---------|:---|:------------------|:------------|:----------|:---|:-----------------------|:----------------------|:-----------------|
10+
| gemini-3-pro-preview(new) | / | | 72.8 | 62.3 | 95.7 | | 1495 | 1541 | 1487 |
11+
| gpt-5.1-high(new) | / | | 69.7 | 57.5 | 94 | | 1454 | 1496 | / |
12+
| gpt-5.1-medium(new) | 69.3 | | / | / | / | | / | / | / |
13+
| gpt-5.1(new) | 57.6 | | 42.9 | 35.7 | 38 | | 1435 | 1492 | / |
1014
| gpt-5-high | / | | 68.5 | 52.7 | 94.3 | | 1436 | 1470 | 1473 |
1115
| GPT-5 Codex (high) | / | | 68.5 | 53.5 | 98.7 | | / | / | / |
1216
| kimi-k2-thinking(new) | 67.93 | | 67 | 52.2 | 94.7 | | 1422 | 1473 | / |
1317
| gpt-5-2025-08-07 | 68.92 | | 66.4 | 49.2 | 91.7 | | / | / | / |
1418
| o3 | / | | 65.5 | 52.2 | 88.3 | | 1435 | 1458 | 1186 |
1519
| grok-4-0709 | 61.18 | | 65.3 | 55.1 | 92.7 | | 1410 | 1435 | 1174 |
1620
| gpt-5-mini-high | / | | 64.3 | 51.4 | 90.7 | | 1392 | 1427 | / |
21+
| grok-4-1-fast-reasoning(new) | / | | 64.1 | 49.7 | 89.3 | | 1481 | 1518 | / |
22+
| grok-4-1-fast-non-reasoning(new) | / | | / | / | / | | 1462 | 1500 | / |
1723
| claude-sonnet-4-5-20250929-thinking | / | | 62.7 | 49.8 | 88 | | 1448 | 1524 | 1420 |
1824
| MiniMax-M2(new) | 59.56 | | 61.4 | 47.6 | 78.3 | | / | / | 1405 |
1925
| gpt-5-mini-2025-08-07 | 63.3 | | 60.8 | 45.7 | 85 | | 1392 | / | / |

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -970,17 +970,23 @@ BFCL-V3是加州大学伯克利分校发布的工具调用评测集,首创多
970970

971971
## 9、整合LMArena和AA分数
972972
整合LMArena和Artificial Analysis(简称AA)排行榜数据
973-
974973
| 大模型 | ReLE评测 | | AA-Intelligence | AA-Coding | AA-Math | | LMArena-Text-overall | LMArena-Text-coding | LMArena-WebDev |
975974
|:---------------------------------------|:---------|:---|:------------------|:------------|:----------|:---|:-----------------------|:----------------------|:-----------------|
975+
| gemini-3-pro-preview(new) | / | | 72.8 | 62.3 | 95.7 | | 1495 | 1541 | 1487 |
976+
| gpt-5.1-high(new) | / | | 69.7 | 57.5 | 94 | | 1454 | 1496 | / |
977+
| gpt-5.1-medium(new) | 69.3 | | / | / | / | | / | / | / |
978+
| gpt-5.1(new) | 57.6 | | 42.9 | 35.7 | 38 | | 1435 | 1492 | / |
976979
| gpt-5-high | / | | 68.5 | 52.7 | 94.3 | | 1436 | 1470 | 1473 |
977980
| GPT-5 Codex (high) | / | | 68.5 | 53.5 | 98.7 | | / | / | / |
978981
| kimi-k2-thinking(new) | 67.93 | | 67 | 52.2 | 94.7 | | 1422 | 1473 | / |
979982
| gpt-5-2025-08-07 | 68.92 | | 66.4 | 49.2 | 91.7 | | / | / | / |
980983
| o3 | / | | 65.5 | 52.2 | 88.3 | | 1435 | 1458 | 1186 |
981984
| grok-4-0709 | 61.18 | | 65.3 | 55.1 | 92.7 | | 1410 | 1435 | 1174 |
982985
| gpt-5-mini-high | / | | 64.3 | 51.4 | 90.7 | | 1392 | 1427 | / |
986+
| grok-4-1-fast-reasoning(new) | / | | 64.1 | 49.7 | 89.3 | | 1481 | 1518 | / |
987+
| grok-4-1-fast-non-reasoning(new) | / | | / | / | / | | 1462 | 1500 | / |
983988
| claude-sonnet-4-5-20250929-thinking | / | | 62.7 | 49.8 | 88 | | 1448 | 1524 | 1420 |
989+
| MiniMax-M2(new) | 59.56 | | 61.4 | 47.6 | 78.3 | | / | / | 1405 |
984990
| ... | ... | | ... | ... | ... | | ... | ... | ... |
985991

986992
完整分数见[LMArena+AA](LMArena+AA.md)

0 commit comments

Comments
 (0)