Skip to content

Commit 7f1784d

Browse files
committed
.
1 parent dfd446c commit 7f1784d

File tree

7 files changed

+111
-215
lines changed

7 files changed

+111
-215
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ client.chat.completions.create(
164164
|类别|机构|大模型|【总分】准确率|平均耗时|平均消耗token|花费/千次(元)|排名(准确率)|
165165
|---|---|-----|-------------------|-------|-----------|-----------|-----------|
166166
|商用|google|gemini-3-pro-preview(new)|72.5%|64s|3119|247.3|1|
167-
|商用|豆包|doubao-seed-1-6-thinking-250715|71.7%|37s|2162|15.6|2|
167+
|商用|豆包|doubao-seed-1-6-thinking-250715|71.7%|27s|2162|15.6|2|
168168

169169

170170
详细数据见:[综合能力排行榜](leaderboard/总分.md) | [通用能力排行榜](leaderboard/通用能力.md) | [专业能力排行榜](leaderboard/专业能力.md)
@@ -779,8 +779,8 @@ Antonio,36,男,西班牙,182,75,博士
779779
> 设集合 $S=\{1, 2, 3, \cdots, 9 9 7, 9 9 8 \}$,集合 $S$ 的 $k$ 个 $499$ 元子集 $A_{1},A_{2}, \cdots, A_{k}$ 满足:对 $S$ 中任一二元子集 $B$,均存在 $i \in\{1, 2, \cdots, k \}$,使得 $B \subset A_{i}$。求 $k$ 的最小值。
780780
>
781781
782-
完整排行榜见[高中奥林匹克数学竞赛](leaderboard/Math24o.md)<br>
783-
☛查看[高中奥林匹克数学竞赛:badcase](https://nonelinear.com/static/badcase/badcase-of-benchmark.html?benchmark=Math24o)
782+
完整排行榜见[高中奥林匹克数学竞赛](leaderboard/高中奥数.md)<br>
783+
☛查看[高中奥林匹克数学竞赛:badcase](https://nonelinear.com/static/badcase/badcase-of-benchmark.html?benchmark=高中奥数)
784784
<br><br>
785785

786786

leaderboard/Math24o.md

Lines changed: 0 additions & 104 deletions
This file was deleted.

leaderboard/total.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
|类别|机构|大模型|【总分】准确率|平均耗时|平均消耗token|花费/千次(元)|排名(准确率)|
33
|---|---|-----|-------------------|-------|-----------|-----------|-----------|
44
|商用|google|gemini-3-pro-preview(new)|72.5%|64s|3119|247.3|1|
5-
|商用|豆包|doubao-seed-1-6-thinking-250715|71.7%|37s|2162|15.6|2|
5+
|商用|豆包|doubao-seed-1-6-thinking-250715|71.7%|27s|2162|15.6|2|
66
|开源|深度求索|DeepSeek-V3.2-Exp-Think(new)|70.1%|248s|2106|6.1|3|
77
|商用|openAI|gpt-5.1-high(new)|69.7%|117s|2745|180.0|4|
88
|商用|openAI|gpt-5.1-medium(new)|69.3%|160s|1448|87.9|5|
@@ -28,7 +28,7 @@
2828
|开源|阿里巴巴|qwen3-235b-a22b-thinking-2507|65.5%|143s|3422|61.2|25|
2929
|开源|智谱AI|GLM-4.5-Air|65.4%|89s|3215|18.0|26|
3030
|开源|阿里巴巴|Qwen3-30B-A3B-Thinking-2507|65.0%|106s|3303|8.8|27|
31-
|商用|豆包|doubao-seed-1-6-250615|64.9%|90s|625|3.1|28|
31+
|商用|豆包|doubao-seed-1-6-250615|64.9%|89s|625|3.1|28|
3232
|商用|anthropic|claude-opus-4.5(new)|64.9%|16s|1063|146.1|29|
3333
|开源|阿里巴巴|qwen3-next-80b-a3b-instruct|64.6%|67s|1146|3.9|30|
3434
|商用|百度|ERNIE-X1.1-Preview(new)|64.5%|174s|2505|9.3|31|
@@ -47,12 +47,12 @@
4747
|开源|月之暗面|kimi-k2-0905(new)|61.8%|80s|998|13.2|44|
4848
|开源|智谱AI|GLM-4.5-nothink|61.8%|68s|1263|15.3|45|
4949
|商用|XAI|grok-3-mini|61.7%|182s|1526|5.2|46|
50-
|商用|XAI|grok-4-0709|61.2%|293s|2379|241.5|47|
50+
|商用|XAI|grok-4-0709|61.2%|293s|2376|241.1|47|
5151
|商用|百度|ERNIE-4.5-Turbo-32K|61.1%|66s|713|1.8|48|
5252
|开源|百度|ERNIE-4.5-300B-A47B|60.8%|133s|592|3.4|49|
5353
|商用|百度|ERNIE-X1-Turbo-32K|60.8%|288s|2609|9.7|50|
5454
|商用|google|gemini-2.5-flash|60.6%|40s|2586|43.2|51|
55-
|开源|腾讯|Hunyuan-A13B-Instruct|59.8%|119s|2088|7.7|52|
55+
|开源|腾讯|Hunyuan-A13B-Instruct|59.8%|119s|2068|7.6|52|
5656
|开源|minimax|MiniMax-M2(new)|59.6%|56s|2931|23.1|53|
5757
|开源|阿里巴巴|Qwen3-30B-A3B-Instruct-2507|59.2%|49s|1158|2.9|54|
5858
|开源|openAI|gpt-oss-120b|59.1%|86s|1108|2.9|55|
@@ -64,32 +64,32 @@
6464
|开源|minimax|MiniMax-M1|56.2%|226s|4392|32.0|61|
6565
|开源|阿里巴巴|Qwen3-32B|56.2%|110s|2769|10.4|62|
6666
|开源|智谱AI|GLM-4.5-Air-nothink|55.8%|64s|1920|10.4|63|
67-
|商用|豆包|doubao-seed-1-6-flash-thinking-250615|55.4%|19s|1712|2.2|64|
67+
|商用|豆包|doubao-seed-1-6-flash-thinking-250615|55.4%|9s|1712|2.2|64|
6868
|开源|阿里巴巴|Qwen3-8B|55.3%|262s|6511|0.0|65|
69-
|商用|科大讯飞|xunfei-spark-x1-0725|55.3%|/|2060|24.6|66|
69+
|商用|科大讯飞|xunfei-spark-x1-0725|55.3%|/|2057|24.6|66|
7070
|商用|阿里巴巴|qwen-turbo-think-2025-07-15|55.3%|/|3132|8.8|67|
7171
|商用|智谱AI|GLM-4.5-Flash-nothink|54.8%|34s|1680|0.0|68|
7272
|商用|anthropic|claude-haiku-4.5(new)|54.5%|13s|775|18.9|69|
73-
|开源|深度求索|DeepSeek-R1-0528-Qwen3-8B|54.5%|425s|3551|0.0|70|
73+
|开源|深度求索|DeepSeek-R1-0528-Qwen3-8B|54.5%|426s|3550|0.0|70|
7474
|商用|豆包|Doubao-1.5-lite-32k-250115|54.2%|6s|416|0.2|71|
7575
|开源|openAI|gpt-oss-20b|54.2%|135s|1984|2.1|72|
76-
|商用|豆包|doubao-seed-1-6-flash-250615|53.4%|7s|614|0.6|73|
76+
|商用|豆包|doubao-seed-1-6-flash-250615|53.4%|4s|614|0.6|73|
7777
|商用|360|360zhinao2-o1|52.9%|/|/|/|74|
7878
|开源|阿里巴巴|Qwen3-32B-nothink|52.3%|94s|738|2.3|75|
7979
|商用|阿里巴巴|qwen-turbo-2025-07-15|52.3%|46s|713|0.4|76|
80-
|开源|腾讯|Hunyuan-A13B-Instruct-nothink|51.5%|392s|588|1.7|77|
80+
|开源|腾讯|Hunyuan-A13B-Instruct-nothink|51.5%|394s|588|1.7|77|
8181
|开源|阿里巴巴|Qwen3-4B|51.2%|71s|2340|6.4|78|
8282
|开源|百度|ERNIE-4.5-21B-A3B|51.1%|65s|812|0.1|79|
8383
|开源|minimax|MiniMax-Text-01|49.0%|14s|975|7.3|80|
8484
|商用|百川智能|Baichuan4-Turbo|48.3%|/|/|/|81|
8585
|开源|阿里巴巴|Qwen3-14B-nothink|47.8%|44s|848|1.3|82|
8686
|商用|阿里巴巴|qwen-long-2025-01-25|47.7%|51s|475|0.7|83|
8787
|商用|XAI|grok-4-1-fast-non-reasoning(new)|47.6%|60s|685|1.6|84|
88-
|商用|Mistral|mistral-medium-2508|47.0%|159s|751|7.9|85|
88+
|商用|Mistral|mistral-medium-2508|47.0%|157s|751|7.9|85|
8989
|商用|google|gemini-2.5-flash-lite|46.8%|46s|3231|8.9|86|
9090
|开源|阿里巴巴|Qwen3-8B-nothink|45.3%|37s|801|0.0|87|
9191
|开源|meta|Llama-4-Scout-17B-16E-Instruct|45.3%|13s|590|1.1|88|
92-
|开源|Mistral|Magistral-Small-2507|44.8%|197s|6663|70.6|89|
92+
|开源|Mistral|Magistral-Small-2507|44.8%|197s|6657|70.5|89|
9393
|开源|Mistral|Mistral-Small-3.2-24B-Instruct-2506|42.9%|126s|1208|2.2|90|
9494
|开源|阿里巴巴|Qwen3-1.7B|42.7%|62s|2903|8.1|91|
9595
|商用|百川智能|Baichuan4-Air|41.6%|/|/|/|92|

0 commit comments

Comments
 (0)