File tree Expand file tree Collapse file tree 2 files changed +13
-1
lines changed
Expand file tree Collapse file tree 2 files changed +13
-1
lines changed Original file line number Diff line number Diff line change 77
88| 大模型 | ReLE评测 | | AA-Intelligence | AA-Coding | AA-Math | | LMArena-Text-overall | LMArena-Text-coding | LMArena-WebDev |
99| :---------------------------------------| :---------| :---| :------------------| :------------| :----------| :---| :-----------------------| :----------------------| :-----------------|
10+ | gemini-3-pro-preview(new) | / | | 72.8 | 62.3 | 95.7 | | 1495 | 1541 | 1487 |
11+ | gpt-5.1-high(new) | / | | 69.7 | 57.5 | 94 | | 1454 | 1496 | / |
12+ | gpt-5.1-medium(new) | 69.3 | | / | / | / | | / | / | / |
13+ | gpt-5.1(new) | 57.6 | | 42.9 | 35.7 | 38 | | 1435 | 1492 | / |
1014| gpt-5-high | / | | 68.5 | 52.7 | 94.3 | | 1436 | 1470 | 1473 |
1115| GPT-5 Codex (high) | / | | 68.5 | 53.5 | 98.7 | | / | / | / |
1216| kimi-k2-thinking(new) | 67.93 | | 67 | 52.2 | 94.7 | | 1422 | 1473 | / |
1317| gpt-5-2025-08-07 | 68.92 | | 66.4 | 49.2 | 91.7 | | / | / | / |
1418| o3 | / | | 65.5 | 52.2 | 88.3 | | 1435 | 1458 | 1186 |
1519| grok-4-0709 | 61.18 | | 65.3 | 55.1 | 92.7 | | 1410 | 1435 | 1174 |
1620| gpt-5-mini-high | / | | 64.3 | 51.4 | 90.7 | | 1392 | 1427 | / |
21+ | grok-4-1-fast-reasoning(new) | / | | 64.1 | 49.7 | 89.3 | | 1481 | 1518 | / |
22+ | grok-4-1-fast-non-reasoning(new) | / | | / | / | / | | 1462 | 1500 | / |
1723| claude-sonnet-4-5-20250929-thinking | / | | 62.7 | 49.8 | 88 | | 1448 | 1524 | 1420 |
1824| MiniMax-M2(new) | 59.56 | | 61.4 | 47.6 | 78.3 | | / | / | 1405 |
1925| gpt-5-mini-2025-08-07 | 63.3 | | 60.8 | 45.7 | 85 | | 1392 | / | / |
Original file line number Diff line number Diff line change @@ -970,17 +970,23 @@ BFCL-V3是加州大学伯克利分校发布的工具调用评测集,首创多
970970
971971## 9、整合LMArena和AA分数
972972整合LMArena和Artificial Analysis(简称AA)排行榜数据
973-
974973| 大模型 | ReLE评测 | | AA-Intelligence | AA-Coding | AA-Math | | LMArena-Text-overall | LMArena-Text-coding | LMArena-WebDev |
975974| :---------------------------------------| :---------| :---| :------------------| :------------| :----------| :---| :-----------------------| :----------------------| :-----------------|
975+ | gemini-3-pro-preview(new) | / | | 72.8 | 62.3 | 95.7 | | 1495 | 1541 | 1487 |
976+ | gpt-5.1-high(new) | / | | 69.7 | 57.5 | 94 | | 1454 | 1496 | / |
977+ | gpt-5.1-medium(new) | 69.3 | | / | / | / | | / | / | / |
978+ | gpt-5.1(new) | 57.6 | | 42.9 | 35.7 | 38 | | 1435 | 1492 | / |
976979| gpt-5-high | / | | 68.5 | 52.7 | 94.3 | | 1436 | 1470 | 1473 |
977980| GPT-5 Codex (high) | / | | 68.5 | 53.5 | 98.7 | | / | / | / |
978981| kimi-k2-thinking(new) | 67.93 | | 67 | 52.2 | 94.7 | | 1422 | 1473 | / |
979982| gpt-5-2025-08-07 | 68.92 | | 66.4 | 49.2 | 91.7 | | / | / | / |
980983| o3 | / | | 65.5 | 52.2 | 88.3 | | 1435 | 1458 | 1186 |
981984| grok-4-0709 | 61.18 | | 65.3 | 55.1 | 92.7 | | 1410 | 1435 | 1174 |
982985| gpt-5-mini-high | / | | 64.3 | 51.4 | 90.7 | | 1392 | 1427 | / |
986+ | grok-4-1-fast-reasoning(new) | / | | 64.1 | 49.7 | 89.3 | | 1481 | 1518 | / |
987+ | grok-4-1-fast-non-reasoning(new) | / | | / | / | / | | 1462 | 1500 | / |
983988| claude-sonnet-4-5-20250929-thinking | / | | 62.7 | 49.8 | 88 | | 1448 | 1524 | 1420 |
989+ | MiniMax-M2(new) | 59.56 | | 61.4 | 47.6 | 78.3 | | / | / | 1405 |
984990| ... | ... | | ... | ... | ... | | ... | ... | ... |
985991
986992完整分数见[ LMArena+AA] ( LMArena+AA.md )
You can’t perform that action at this time.
0 commit comments