Skip to content

Commit 93bc3e8

Browse files
committed
v4.13
1 parent 29e1adc commit 93bc3e8

File tree

354 files changed

+14493
-15201
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

354 files changed

+14493
-15201
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,8 @@
5252

5353
# 最近更新
5454
- [2025/8/26] v4.13版本
55-
- 多模态评测新增gpt-5系列、gemini-2.5系列模型,详见[多模态评测](README-多模态评测.md)
55+
- 多模态评测新增qwen-vl-max-2025-08-13、qwen-vl-plus-2025-08-15、gpt-5系列、gemini-2.5系列模型,详见[多模态评测](README-多模态评测.md)
56+
- 删除陈旧的模型:chatgpt-4o-latest、gpt-4.1、gpt-4.1-mini、step-r1-v-mini
5657
- [2025/8/20] v4.12版本
5758
- 新增3个大模型:DeepSeek-V3.1、DeepSeek-V3.1-Think、gemini-2.5-flash-lite,☛查看[模型完整信息](https://nonelinear.com/static/models.html)
5859
- 更新“算术能力”、“公式识别”(多模态)评测集:剔除过于简单的样本并新增部分数据,各模型相关分数有所更新

leaderboard/2025高考化学.md

Lines changed: 75 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -9,104 +9,100 @@
99
|商用|腾讯|hunyuan-t1-20250711|82.9%|91s|5911|20.9|6|
1010
|商用|google|gemini-2.5-pro|82.9%|49s|4785|335.3|7|
1111
|开源|深度求索|DeepSeek-V3.1-Think(new)|82.3%|113s|2377|27.1|8|
12-
|开源|深度求索|deepseek-chat-v3-0324|80.5%|243s|1260|9.0|9|
13-
|开源|阿里巴巴|qwen3-235b-a22b-instruct-2507(new)|80.5%|74s|2436|18.3|10|
12+
|商用|阿里巴巴|qwen-flash-think-2025-07-28(new)|80.5%|42s|4663|6.7|9|
13+
|开源|月之暗面|kimi-k2-0711-preview|80.5%|129s|1528|22.3|10|
1414
|开源|阿里巴巴|Qwen3-14B|80.5%|348s|15239|30.2|11|
1515
|开源|智谱AI|GLM-4.5(new)|80.5%|91s|4388|59.4|12|
16-
|商用|阿里巴巴|qwen-flash-think-2025-07-28(new)|80.5%|42s|4663|6.7|13|
17-
|开源|月之暗面|kimi-k2-0711-preview|80.5%|129s|1528|22.3|14|
18-
|商用|阿里巴巴|qwen-flash-2025-07-28(new)|80.5%|21s|2056|2.8|15|
16+
|商用|阿里巴巴|qwen-flash-2025-07-28(new)|80.5%|21s|2056|2.8|13|
17+
|开源|阿里巴巴|qwen3-235b-a22b-instruct-2507(new)|80.5%|74s|2436|18.3|14|
18+
|开源|深度求索|deepseek-chat-v3-0324|80.5%|243s|1260|9.0|15|
1919
|开源|深度求索|DeepSeek-V3.1(new)|79.0%|37s|863|9.0|16|
20-
|开源|深度求索|DeepSeek-R1-0528|78.0%|431s|4587|71.3|17|
21-
|开源|智谱AI|GLM-Z1-32B-0414|78.0%|200s|5280|20.6|18|
20+
|开源|智谱AI|GLM-Z1-32B-0414|78.0%|200s|5280|20.6|17|
21+
|开源|阶跃星辰|step-3(new)|78.0%|245s|4809|18.8|18|
2222
|商用|智谱AI|GLM-4.5-Flash(new)|78.0%|67s|4515|0.0|19|
2323
|商用|豆包|doubao-seed-1-6-250615|78.0%|117s|808|4.9|20|
24-
|开源|智谱AI|GLM-4.5-Air(new)|78.0%|72s|4493|26.0|21|
25-
|开源|阶跃星辰|step-3(new)|78.0%|245s|4809|18.8|22|
26-
|开源|华为|pangu-pro-moe|78.0%|151s|3210|12.3|23|
24+
|开源|华为|pangu-pro-moe|78.0%|151s|3210|12.3|21|
25+
|开源|腾讯|Hunyuan-A13B-Instruct|78.0%|140s|3076|11.7|22|
26+
|开源|深度求索|DeepSeek-R1-0528|78.0%|431s|4587|71.3|23|
2727
|商用|anthropic|claude-4-sonnet|78.0%|69s|880|73.4|24|
28-
|开源|腾讯|Hunyuan-A13B-Instruct|78.0%|140s|3076|11.7|25|
29-
|商用|科大讯飞|xunfei-spark-x1-0725(new)|75.6%|/|3449|41.4|26|
30-
|商用|豆包|Doubao-1.5-pro-32k-250115|75.6%|99s|797|1.4|27|
31-
|开源|智谱AI|GLM-4.5-Air-nothink|75.6%|49s|3311|18.9|28|
32-
|商用|openAI|gpt-5-2025-08-07(new)|75.6%|63s|923|53.8|29|
33-
|商用|anthropic|claude-4-sonnet-thinking|75.6%|100s|1942|188.0|30|
34-
|开源|深度求索|DeepSeek-R1-0528-Qwen3-8B|75.6%|309s|4565|0.0|31|
35-
|商用|阿里巴巴|qwen-turbo-2025-07-15|75.6%|17s|1226|0.7|32|
36-
|开源|阿里巴巴|Qwen3-30B-A3B-Thinking-2507(new)|75.6%|87s|4606|12.5|33|
37-
|开源|智谱AI|GLM-4.5-nothink|75.6%|99s|3242|43.3|34|
38-
|商用|百度|ERNIE-X1-Turbo-32K|73.2%|390s|6974|26.8|35|
39-
|开源|meta|Llama-4-Maverick-17B-128E-Instruct-FP8|73.2%|92s|957|3.6|36|
40-
|开源|minimax|MiniMax-M1|73.2%|207s|8019|61.6|37|
28+
|开源|智谱AI|GLM-4.5-Air(new)|78.0%|72s|4493|26.0|25|
29+
|商用|anthropic|claude-4-sonnet-thinking|75.6%|100s|1942|188.0|26|
30+
|商用|阿里巴巴|qwen-turbo-2025-07-15|75.6%|17s|1226|0.7|27|
31+
|商用|科大讯飞|xunfei-spark-x1-0725(new)|75.6%|/|3449|41.4|28|
32+
|开源|阿里巴巴|Qwen3-30B-A3B-Thinking-2507(new)|75.6%|87s|4606|12.5|29|
33+
|开源|智谱AI|GLM-4.5-nothink|75.6%|99s|3242|43.3|30|
34+
|商用|openAI|gpt-5-2025-08-07(new)|75.6%|63s|923|53.8|31|
35+
|开源|智谱AI|GLM-4.5-Air-nothink|75.6%|49s|3311|18.9|32|
36+
|开源|深度求索|DeepSeek-R1-0528-Qwen3-8B|75.6%|309s|4565|0.0|33|
37+
|商用|豆包|Doubao-1.5-pro-32k-250115|75.6%|99s|797|1.4|34|
38+
|开源|meta|Llama-4-Maverick-17B-128E-Instruct-FP8|73.2%|92s|957|3.6|35|
39+
|开源|阿里巴巴|Qwen3-8B-nothink|73.2%|63s|1205|0.0|36|
40+
|商用|百度|ERNIE-X1-Turbo-32K|73.2%|390s|6974|26.8|37|
4141
|商用|豆包|doubao-seed-1-6-flash-thinking-250615|73.2%|45s|1972|2.6|38|
42-
|开源|阿里巴巴|Qwen3-8B-nothink|73.2%|63s|1205|0.0|39|
43-
|开源|智谱AI|GLM-4-32B-0414|70.7%|108s|1172|2.2|40|
42+
|开源|minimax|MiniMax-M1|73.2%|207s|8019|61.6|39|
43+
|商用|奇虎360|360zhinao2-o1|70.7%|419s|4381|42.6|40|
4444
|商用|商汤|SenseChat-5-1202|70.7%|54s|577|9.3|41|
4545
|商用|阿里巴巴|qwq-plus-2025-03-05|70.7%|258s|6307|24.8|42|
4646
|商用|豆包|doubao-seed-1-6-flash-250615|70.7%|16s|964|1.2|43|
47-
|开源|智谱AI|GLM-Z1-9B-0414|70.7%|213s|6725|0.0|44|
48-
|商用|奇虎360|360zhinao2-o1|70.7%|419s|4381|42.6|45|
47+
|开源|百度|ERNIE-4.5-300B-A47B|70.7%|290s|1268|9.0|44|
48+
|开源|智谱AI|GLM-4-32B-0414|70.7%|108s|1172|2.2|45|
4949
|商用|腾讯|hunyuan-turbos-20250716(new)|70.7%|54s|2554|4.8|46|
5050
|开源|阿里巴巴|Qwen3-8B|70.7%|451s|11394|0.0|47|
51-
|开源|百度|ERNIE-4.5-300B-A47B|70.7%|290s|1268|9.0|48|
51+
|开源|智谱AI|GLM-Z1-9B-0414|70.7%|213s|6725|0.0|48|
5252
|开源|阿里巴巴|Qwen3-32B|70.7%|375s|8893|35.0|49|
53-
|开源|阿里巴巴|qwq-32b|68.3%|276s|9030|53.4|50|
53+
|商用|XAI|grok-3-mini|68.3%|120s|1963|6.8|50|
5454
|商用|智谱AI|GLM-4.5-Flash-nothink|68.3%|43s|3405|0.0|51|
55-
|商用|XAI|grok-3-mini|68.3%|120s|1963|6.8|52|
56-
|开源|百度|ERNIE-4.5-21B-A3B|68.3%|53s|1206|0.0|53|
55+
|开源|百度|ERNIE-4.5-21B-A3B|68.3%|53s|1206|0.0|52|
56+
|开源|阿里巴巴|qwq-32b|68.3%|276s|9030|53.4|53|
5757
|商用|豆包|Doubao-1.5-lite-32k-250115|68.3%|50s|509|0.2|54|
5858
|商用|奇虎360|360gpt2-o1|68.3%|546s|5139|251.0|55|
5959
|商用|月之暗面|kimi-latest-8k|68.3%|95s|822|9.9|56|
6060
|商用|XAI|grok-4-0709|66.7%|376s|3563|373.2|57|
61-
|开源|阿里巴巴|Qwen3-4B|65.9%|258s|6314|18.4|58|
62-
|商用|百度|ERNIE-3.5-8K|65.9%|109s|605|1.0|59|
63-
|商用|百川智能|Baichuan4-Turbo|65.9%|80s|632|9.5|60|
64-
|开源|minimax|MiniMax-Text-01|65.9%|63s|1070|3.6|61|
65-
|商用|科大讯飞|xunfei-spark-max|65.9%|67s|729|21.9|62|
66-
|商用|阶跃星辰|step-r1-v-mini|65.9%|111s|3937|30.3|63|
67-
|开源|阿里巴巴|Qwen3-14B-nothink|65.9%|24s|1343|2.4|64|
68-
|商用|openAI|gpt-5-nano-2025-08-07(new)|63.4%|59s|4963|13.9|65|
69-
|开源|腾讯|Hunyuan-A13B-Instruct-nothink|63.4%|782s|1035|3.6|66|
61+
|商用|科大讯飞|xunfei-spark-max|65.9%|67s|729|21.9|58|
62+
|开源|阿里巴巴|Qwen3-14B-nothink|65.9%|24s|1343|2.4|59|
63+
|开源|minimax|MiniMax-Text-01|65.9%|63s|1070|3.6|60|
64+
|商用|百度|ERNIE-3.5-8K|65.9%|109s|605|1.0|61|
65+
|开源|阿里巴巴|Qwen3-4B|65.9%|258s|6314|18.4|62|
66+
|商用|百川智能|Baichuan4-Turbo|65.9%|80s|632|9.5|63|
67+
|商用|openAI|gpt-5-nano-2025-08-07(new)|63.4%|59s|4963|13.9|64|
68+
|开源|腾讯|Hunyuan-A13B-Instruct-nothink|63.4%|782s|1035|3.6|65|
69+
|开源|openAI|gpt-oss-120b(new)|63.4%|28s|1931|5.6|66|
7070
|开源|深度求索|DeepSeek-R1-Distill-Qwen-32B|63.4%|100s|4778|5.7|67|
71-
|开源|openAI|gpt-oss-120b(new)|63.4%|28s|1931|5.6|68|
72-
|商用|google|gemini-2.5-flash-lite(new)|62.9%|18s|5837|16.5|69|
73-
|商用|openAI|gpt-5-mini-2025-08-07(new)|61.0%|125s|2115|28.0|70|
74-
|商用|openAI|o4-mini|61.0%|78s|1887|55.1|71|
75-
|开源|深度求索|DeepSeek-R1-Distill-Qwen-14B|61.0%|118s|4449|2.7|72|
76-
|开源|meta|Llama-4-Scout-17B-16E-Instruct|61.0%|73s|779|1.4|73|
77-
|开源|智谱AI|GLM-4-9B-0414|61.0%|113s|963|0.0|74|
78-
|商用|百度|ERNIE-4.5-Turbo-32K|61.0%|90s|1582|4.6|75|
79-
|商用|openAI|gpt-4.1|61.0%|66s|866|40.9|76|
80-
|开源|阿里巴巴|Qwen3-32B-nothink|61.0%|54s|1207|4.2|77|
81-
|商用|智谱AI|GLM-Z1-Flash|58.5%|135s|7708|0.0|78|
82-
|开源|Google|gemma-3-27b-it|58.5%|70s|675|0.8|79|
83-
|商用|科大讯飞|xunfei-4.0Ultra|58.5%|/|/|/|80|
84-
|商用|openAI|chatgpt-4o-latest|58.5%|52s|914|54.6|81|
85-
|商用|阿里巴巴|qwen-long-2025-01-25|56.3%|38s|846|1.4|82|
86-
|开源|阿里巴巴|Qwen3-4B-nothink|56.1%|26s|977|2.4|83|
87-
|开源|openAI|gpt-oss-20b(new)|56.1%|27s|4594|5.1|84|
88-
|商用|百川智能|Baichuan4-Air|56.1%|113s|647|0.6|85|
89-
|商用|科大讯飞|xunfei-spark-pro|56.1%|60s|523|3.7|86|
90-
|商用|奇虎360|360gpt2-pro|51.2%|64s|640|2.6|87|
91-
|商用|零一万物|yi-lightning|51.2%|43s|795|0.8|88|
92-
|开源|阿里巴巴|Qwen3-1.7B-nothink|51.2%|13s|1026|2.5|89|
93-
|商用|百度|ERNIE-Speed-8K|48.8%|97s|398|0.0|90|
94-
|商用|百度|ERNIE-Lite-8K|48.8%|38s|508|0.0|91|
95-
|商用|阶跃星辰|step-2-mini|46.3%|39s|563|0.9|92|
96-
|开源|Google|gemma-3-12b-it|46.3%|72s|708|0.0|93|
97-
|商用|openAI|gpt-4.1-mini|43.9%|50s|963|9.3|94|
98-
|开源|Google|gemma-3-4b-it|43.9%|50s|705|0.0|95|
99-
|开源|Mistral|Mistral-Small-3.1-24B-Instruct-2503|43.9%|84s|762|1.3|96|
100-
|商用|Mistral|mistral-large|43.9%|79s|864|29.7|97|
101-
|开源|阿里巴巴|Qwen3-0.6B|41.5%|130s|4066|11.7|98|
102-
|商用|Mistral|mistral-small|36.6%|64s|739|1.3|99|
103-
|开源|阿里巴巴|Qwen3-1.7B|36.6%|167s|6727|19.7|100|
104-
|商用|科大讯飞|xunfei-spark-lite|34.1%|27s|470|0.0|101|
105-
|商用|百度|ERNIE-Tiny-8K|29.3%|66s|411|0.0|102|
106-
|开源|阿里巴巴|Qwen3-0.6B-nothink|29.3%|8s|613|1.3|103|
107-
|商用|Mistral|ministral-8b|26.8%|87s|669|0.5|104|
108-
|商用|Mistral|ministral-3b|24.4%|51s|676|0.2|105|
109-
|开源|百度|ERNIE-4.5-0.3B|24.4%|65s|689|0.0|106|
71+
|商用|google|gemini-2.5-flash-lite(new)|62.9%|18s|5837|16.5|68|
72+
|开源|智谱AI|GLM-4-9B-0414|61.0%|113s|963|0.0|69|
73+
|开源|深度求索|DeepSeek-R1-Distill-Qwen-14B|61.0%|118s|4449|2.7|70|
74+
|开源|阿里巴巴|Qwen3-32B-nothink|61.0%|54s|1207|4.2|71|
75+
|商用|百度|ERNIE-4.5-Turbo-32K|61.0%|90s|1582|4.6|72|
76+
|商用|openAI|gpt-5-mini-2025-08-07(new)|61.0%|125s|2115|28.0|73|
77+
|商用|openAI|o4-mini|61.0%|78s|1887|55.1|74|
78+
|开源|meta|Llama-4-Scout-17B-16E-Instruct|61.0%|73s|779|1.4|75|
79+
|商用|科大讯飞|xunfei-4.0Ultra|58.5%|/|/|/|76|
80+
|商用|智谱AI|GLM-Z1-Flash|58.5%|135s|7708|0.0|77|
81+
|开源|Google|gemma-3-27b-it|58.5%|70s|675|0.8|78|
82+
|商用|阿里巴巴|qwen-long-2025-01-25|56.3%|38s|846|1.4|79|
83+
|开源|阿里巴巴|Qwen3-4B-nothink|56.1%|26s|977|2.4|80|
84+
|开源|openAI|gpt-oss-20b(new)|56.1%|27s|4594|5.1|81|
85+
|商用|百川智能|Baichuan4-Air|56.1%|113s|647|0.6|82|
86+
|商用|科大讯飞|xunfei-spark-pro|56.1%|60s|523|3.7|83|
87+
|商用|零一万物|yi-lightning|51.2%|43s|795|0.8|84|
88+
|商用|奇虎360|360gpt2-pro|51.2%|64s|640|2.6|85|
89+
|开源|阿里巴巴|Qwen3-1.7B-nothink|51.2%|13s|1026|2.5|86|
90+
|商用|百度|ERNIE-Lite-8K|48.8%|38s|508|0.0|87|
91+
|商用|百度|ERNIE-Speed-8K|48.8%|97s|398|0.0|88|
92+
|商用|阶跃星辰|step-2-mini|46.3%|39s|563|0.9|89|
93+
|开源|Google|gemma-3-12b-it|46.3%|72s|708|0.0|90|
94+
|开源|Google|gemma-3-4b-it|43.9%|50s|705|0.0|91|
95+
|开源|Mistral|Mistral-Small-3.1-24B-Instruct-2503|43.9%|84s|762|1.3|92|
96+
|商用|Mistral|mistral-large|43.9%|79s|864|29.7|93|
97+
|开源|阿里巴巴|Qwen3-0.6B|41.5%|130s|4066|11.7|94|
98+
|商用|Mistral|mistral-small|36.6%|64s|739|1.3|95|
99+
|开源|阿里巴巴|Qwen3-1.7B|36.6%|167s|6727|19.7|96|
100+
|商用|科大讯飞|xunfei-spark-lite|34.1%|27s|470|0.0|97|
101+
|商用|百度|ERNIE-Tiny-8K|29.3%|66s|411|0.0|98|
102+
|开源|阿里巴巴|Qwen3-0.6B-nothink|29.3%|8s|613|1.3|99|
103+
|商用|Mistral|ministral-8b|26.8%|87s|669|0.5|100|
104+
|商用|Mistral|ministral-3b|24.4%|51s|676|0.2|101|
105+
|开源|百度|ERNIE-4.5-0.3B|24.4%|65s|689|0.0|102|
110106

111107

112108
![lin](../pic/2025高考化学.png)

0 commit comments

Comments
 (0)