Skip to content

Commit 89b97eb

Browse files
committed
v5.5
1 parent ad58c08 commit 89b97eb

File tree

373 files changed

+14882
-14503
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

373 files changed

+14882
-14503
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
## 最近更新
2+
- [2025/9/30] v5.5版本
3+
- 新增大模型:开源DeepSeek-V3.2-Exp、DeepSeek-V3.2-Exp-Think,☛查看[模型完整信息](https://nonelinear.com/static/models.html)
24
- [2025/9/22] v5.4版本
35
- “agent与工具调用”领域新增BFCL-V3排行榜,详见[link](#82-BFCL-V3)
46
- [2025/9/14] v5.3版本

README.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -46,14 +46,16 @@
4646
- [7.13 句子理解TODO](#713-句子理解TODO)  |  [7.14 标点符号TODO](#714-标点符号TODO)  |  [7.15 汉字繁简转换TODO](#715-汉字繁简转换TODO)
4747
- [7.16 语种识别TODO](#716-语种识别TODO)
4848
- [8、agent与工具调用排行榜](#8agent与工具调用排行榜)
49-
- [8.1 TAU-airline](#81-TAU-airline)
49+
- [8.1 TAU](#81-TAU)
5050
- [8.2 BFCL-V3](#82-BFCL-V3)
5151

5252
- [🌐各项能力评分](#🌐各项能力评分)
5353
- [为什么做榜单?](#为什么做榜单)
5454
- [大模型选型及评测交流群](#大模型评测交流群)
5555

5656
# 最近更新
57+
- [2025/9/30] v5.5版本
58+
- 新增大模型:开源DeepSeek-V3.2-Exp、DeepSeek-V3.2-Exp-Think,☛查看[模型完整信息](https://nonelinear.com/static/models.html)
5759
- [2025/9/22] v5.4版本
5860
- “agent与工具调用”领域新增BFCL-V3排行榜,详见[link](#82-BFCL-V3)
5961
- 删除陈旧的模型:xunfei-4.0Ultra、xunfei-spark-pro、xunfei-spark-max、yi-lightning、360gpt2-pro、360gpt2-o1、ERNIE-3.5-8K
@@ -197,7 +199,7 @@ client.chat.completions.create(
197199
|类别|机构|大模型|【总分】准确率|平均耗时|平均消耗token|花费/千次(元)|排名(准确率)|
198200
|---|---|-----|-------------------|-------|-----------|-----------|-----------|
199201
|商用|豆包|doubao-seed-1-6-thinking-250715|88.0%|37s|2144|15.5|1|
200-
|商用|腾讯|hunyuan-t1-20250711|85.5%|40s|2693|9.9|2|
202+
|开源|深度求索|DeepSeek-V3.2-Exp-Think(new)|85.6%|255s|2094|6.1|2|
201203

202204

203205
详细数据见:
@@ -211,7 +213,7 @@ client.chat.completions.create(
211213
|排名|大模型|机构|输出价格|总分| |教育|医疗与心理健康|金融|法律与行政公务|推理与数学计算|语言与指令遵从|
212214
|---|-----|---|-------|---|-|---|-----------|----|-----------|------------|-----------|
213215
|1|doubao-seed-1-6-thinking-250715☛[去体验](https://nonelinear.com/static/modelcompare.html?type=proprietary)|豆包|8.0元|88.0%| | 89.8%|87.8%|84.1%| 85.0%|90.0%|88.5%|
214-
|2|hunyuan-t1-20250711[去体验](https://nonelinear.com/static/modelcompare.html?type=proprietary)|腾讯|4.0元|85.5%| | 89.3%|82.9%|83.6%| 76.5%|87.0%|89.0%|
216+
|2|DeepSeek-V3.2-Exp-Think(new)[去体验](https://nonelinear.com/static/modelcompare.html?type=open-source)|深度求索|3.0元|85.6%| | 84.3%|80.9%|82.5%| 82.0%|88.1%|89.4%|
215217

216218
完整排行榜见[推理模型排行榜](leaderboard/reasonmodel.md)<br>
217219
<br>
@@ -229,8 +231,8 @@ client.chat.completions.create(
229231

230232
|排名|大模型|机构|输出价格|总分| |教育|医疗与心理健康|金融|法律与行政公务|推理与数学计算|语言与指令遵从|
231233
|---|-----|---|-------|---|-|---|-----------|----|-----------|------------|-----------|
232-
|1|hunyuan-t1-20250711[去体验](https://nonelinear.com/static/modelcompare.html?type=proprietary)|腾讯|4.0元|85.5%| | 89.3%|82.9%|83.6%| 76.5%|87.0%|89.0%|
233-
|2|Seed-OSS-36B-Instruct(new)[去体验](https://nonelinear.com/static/modelcompare.html?type=open-source)|豆包|4.0元|85.2%| | 89.6%|82.5%|75.9%| 81.0%|90.2%|86.0%|
234+
|1|DeepSeek-V3.2-Exp-Think(new)[去体验](https://nonelinear.com/static/modelcompare.html?type=open-source)|深度求索|3.0元|85.6%| | 84.3%|80.9%|82.5%| 82.0%|88.1%|89.4%|
235+
|2|hunyuan-t1-20250711[去体验](https://nonelinear.com/static/modelcompare.html?type=proprietary)|腾讯|4.0元|85.5%| | 89.3%|82.9%|83.6%| 76.5%|87.0%|89.0%|
234236

235237
完整排行榜见[1~5元商用大模型](leaderboard/commerce2.md)<br><br>
236238

@@ -269,8 +271,8 @@ DIY自定义维度筛选榜单:☛ [link](https://nonelinear.com/static/benchm
269271

270272
|排名|大模型|机构|输出价格|总分| |教育|医疗与心理健康|金融|法律与行政公务|推理与数学计算|语言与指令遵从|
271273
|---|-----|---|-------|---|-|---|-----------|----|-----------|------------|-----------|
272-
|1|Seed-OSS-36B-Instruct(new)☛[去体验](https://nonelinear.com/static/modelcompare.html?type=open-source)|豆包|4.0元|85.2%| | 89.6%|82.5%|75.9%| 81.0%|90.2%|86.0%|
273-
|2|DeepSeek-R1-0528[去体验](https://nonelinear.com/static/modelcompare.html?type=open-source)|深度求索|16.0元|84.4%| | 82.6%|80.6%|79.0%| 81.0%|88.5%|87.6%|
274+
|1|DeepSeek-V3.2-Exp-Think(new)☛[去体验](https://nonelinear.com/static/modelcompare.html?type=open-source)|深度求索|3.0元|85.6%| | 84.3%|80.9%|82.5%| 82.0%|88.1%|89.4%|
275+
|2|Seed-OSS-36B-Instruct(new)[去体验](https://nonelinear.com/static/modelcompare.html?type=open-source)|豆包|4.0元|85.2%| | 89.6%|82.5%|75.9%| 81.0%|90.2%|86.0%|
274276

275277
完整排行榜见[20B以上开源大模型](leaderboard/opensource3.md)<br><br>
276278

leaderboard/2025高考化学.md

Lines changed: 84 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -2,100 +2,102 @@
22
|类别|机构|大模型|【2025高考化学】准确率|平均耗时|平均消耗token|花费/千次(元)|排名(准确率)|
33
|---|---|-----|-------------------|-------|-----------|-----------|-----------|
44
|开源|阿里巴巴|qwen3-next-80b-a3b-instruct(new)|88.7%|21s|1644|6.0|1|
5-
|商用|阿里巴巴|qwen-plus-think-2025-07-28(new)|87.1%|/|4614|35.5|2|
6-
|商用|阿里巴巴|qwen-plus-2025-07-28(new)|85.5%|50s|1825|3.4|3|
5+
|商用|阿里巴巴|qwen-plus-think-2025-07-28|87.1%|/|4614|35.5|2|
6+
|商用|阿里巴巴|qwen-plus-2025-07-28|85.5%|50s|1825|3.4|3|
77
|开源|豆包|Seed-OSS-36B-Instruct(new)|85.5%|234s|3569|13.8|4|
8-
|开源|阿里巴巴|Qwen3-30B-A3B-Instruct-2507|85.4%|19s|2210|6.2|5|
9-
|商用|豆包|doubao-seed-1-6-thinking-250715|85.4%|50s|3292|25.0|6|
10-
|开源|阿里巴巴|qwen3-235b-a22b-thinking-2507|85.4%|134s|5202|100.5|7|
11-
|商用|google|gemini-2.5-pro|82.9%|49s|4785|335.3|8|
8+
|开源|深度求索|DeepSeek-V3.2-Exp(new)|85.5%|236s|824|2.3|5|
9+
|开源|阿里巴巴|qwen3-235b-a22b-thinking-2507|85.4%|134s|5202|100.5|6|
10+
|开源|阿里巴巴|Qwen3-30B-A3B-Instruct-2507|85.4%|19s|2210|6.2|7|
11+
|商用|豆包|doubao-seed-1-6-thinking-250715|85.4%|50s|3292|25.0|8|
1212
|商用|腾讯|hunyuan-t1-20250711|82.9%|91s|5911|20.9|9|
13-
|商用|google|gemini-2.5-flash|82.9%|24s|4326|75.5|10|
14-
|开源|深度求索|DeepSeek-V3.1-Think(new)|82.3%|113s|2377|27.1|11|
15-
|商用|阿里巴巴|qwen3-max-preview(new)|80.6%|28s|1260|26.6|12|
16-
|开源|深度求索|deepseek-chat-v3-0324|80.5%|243s|1065|7.9|13|
17-
|开源|阿里巴巴|Qwen3-14B|80.5%|348s|15239|30.2|14|
18-
|开源|智谱AI|GLM-4.5|80.5%|91s|4388|59.4|15|
19-
|商用|阿里巴巴|qwen-flash-think-2025-07-28|80.5%|42s|4663|6.7|16|
20-
|开源|月之暗面|kimi-k2-0711-preview|80.5%|129s|1528|22.3|17|
21-
|开源|阿里巴巴|qwen3-235b-a22b-instruct-2507|80.5%|74s|2436|18.3|18|
22-
|商用|阿里巴巴|qwen-flash-2025-07-28|80.5%|21s|2056|2.8|19|
23-
|开源|深度求索|DeepSeek-V3.1(new)|79.0%|37s|863|9.0|20|
24-
|开源|华为|pangu-pro-moe|78.0%|151s|3210|12.3|21|
25-
|开源|阶跃星辰|step-3|78.0%|245s|4809|18.8|22|
13+
|商用|google|gemini-2.5-pro|82.9%|49s|4785|335.3|10|
14+
|商用|google|gemini-2.5-flash|82.9%|24s|4326|75.5|11|
15+
|开源|深度求索|DeepSeek-V3.1-Think(new)|82.3%|113s|2377|27.1|12|
16+
|开源|深度求索|DeepSeek-V3.2-Exp-Think(new)|82.3%|122s|3473|10.2|13|
17+
|商用|阿里巴巴|qwen3-max-preview(new)|80.6%|28s|1260|26.6|14|
18+
|开源|阿里巴巴|qwen3-235b-a22b-instruct-2507|80.5%|74s|2436|18.3|15|
19+
|开源|深度求索|deepseek-chat-v3-0324|80.5%|243s|1065|7.9|16|
20+
|商用|阿里巴巴|qwen-flash-2025-07-28|80.5%|21s|2056|2.8|17|
21+
|开源|月之暗面|kimi-k2-0711-preview|80.5%|129s|1528|22.3|18|
22+
|开源|智谱AI|GLM-4.5|80.5%|91s|4388|59.4|19|
23+
|商用|阿里巴巴|qwen-flash-think-2025-07-28|80.5%|42s|4663|6.7|20|
24+
|开源|阿里巴巴|Qwen3-14B|80.5%|348s|15239|30.2|21|
25+
|开源|深度求索|DeepSeek-V3.1(new)|79.0%|37s|863|9.0|22|
2626
|开源|智谱AI|GLM-Z1-32B-0414|78.0%|529s|4474|17.6|23|
27-
|开源|深度求索|DeepSeek-R1-0528|78.0%|431s|4587|71.3|24|
28-
|开源|智谱AI|GLM-4.5-Air|78.0%|72s|4493|26.0|25|
29-
|商用|anthropic|claude-4-sonnet|78.0%|69s|880|73.4|26|
30-
|商用|智谱AI|GLM-4.5-Flash|78.0%|67s|4515|0.0|27|
31-
|商用|豆包|doubao-seed-1-6-250615|78.0%|117s|808|4.9|28|
27+
|开源|智谱AI|GLM-4.5-Air|78.0%|72s|4493|26.0|24|
28+
|商用|智谱AI|GLM-4.5-Flash|78.0%|67s|4515|0.0|25|
29+
|开源|深度求索|DeepSeek-R1-0528|78.0%|431s|4587|71.3|26|
30+
|开源|阶跃星辰|step-3|78.0%|245s|4809|18.8|27|
31+
|开源|华为|pangu-pro-moe|78.0%|151s|3210|12.3|28|
3232
|开源|腾讯|Hunyuan-A13B-Instruct|78.0%|140s|3076|11.7|29|
33-
|商用|阿里巴巴|qwen-turbo-think-2025-07-15(new)|77.4%|/|4905|14.2|30|
34-
|商用|豆包|Doubao-1.5-pro-32k-250115|75.6%|99s|797|1.4|31|
35-
|开源|智谱AI|GLM-4.5-Air-nothink|75.6%|49s|3311|18.9|32|
36-
|开源|智谱AI|GLM-4.5-nothink|75.6%|99s|3242|43.3|33|
37-
|商用|openAI|gpt-5-2025-08-07(new)|75.6%|63s|923|53.8|34|
38-
|开源|阿里巴巴|Qwen3-30B-A3B-Thinking-2507|75.6%|87s|4606|12.5|35|
39-
|开源|深度求索|DeepSeek-R1-0528-Qwen3-8B|75.6%|309s|4565|0.0|36|
40-
|商用|anthropic|claude-4-sonnet-thinking|75.6%|100s|1942|188.0|37|
41-
|商用|科大讯飞|xunfei-spark-x1-0725|75.6%|/|3449|41.4|38|
33+
|商用|anthropic|claude-4-sonnet|78.0%|69s|880|73.4|30|
34+
|商用|豆包|doubao-seed-1-6-250615|78.0%|117s|808|4.9|31|
35+
|商用|阿里巴巴|qwen-turbo-think-2025-07-15|77.4%|/|4905|14.2|32|
36+
|商用|豆包|Doubao-1.5-pro-32k-250115|75.6%|99s|797|1.4|33|
37+
|商用|科大讯飞|xunfei-spark-x1-0725|75.6%|/|3449|41.4|34|
38+
|开源|智谱AI|GLM-4.5-Air-nothink|75.6%|49s|3311|18.9|35|
39+
|开源|阿里巴巴|Qwen3-30B-A3B-Thinking-2507|75.6%|87s|4606|12.5|36|
40+
|开源|智谱AI|GLM-4.5-nothink|75.6%|99s|3242|43.3|37|
41+
|商用|openAI|gpt-5-2025-08-07(new)|75.6%|63s|923|53.8|38|
4242
|商用|阿里巴巴|qwen-turbo-2025-07-15|75.6%|17s|1226|0.7|39|
43-
|商用|百度|ERNIE-X1-Turbo-32K|73.2%|390s|6974|26.8|40|
44-
|商用|豆包|doubao-seed-1-6-flash-thinking-250615|73.2%|45s|1972|2.6|41|
45-
|开源|meta|Llama-4-Maverick-17B-128E-Instruct-FP8|73.2%|332s|827|3.3|42|
46-
|开源|阿里巴巴|Qwen3-8B-nothink|73.2%|63s|1205|0.0|43|
47-
|开源|minimax|MiniMax-M1|73.2%|274s|5441|40.0|44|
48-
|商用|Mistral|mistral-medium-2508(new)|71.0%|35s|1024|12.2|45|
49-
|商用|豆包|doubao-seed-1-6-flash-250615|70.7%|16s|964|1.2|46|
50-
|商用|腾讯|hunyuan-turbos-20250716|70.7%|54s|2554|4.8|47|
51-
|开源|智谱AI|GLM-Z1-9B-0414|70.7%|213s|6725|0.0|48|
52-
|商用|奇虎360|360zhinao2-o1|70.7%|419s|4381|42.6|49|
53-
|开源|百度|ERNIE-4.5-300B-A47B|70.7%|290s|1268|9.0|50|
43+
|开源|深度求索|DeepSeek-R1-0528-Qwen3-8B|75.6%|309s|4565|0.0|40|
44+
|商用|anthropic|claude-4-sonnet-thinking|75.6%|100s|1942|188.0|41|
45+
|商用|百度|ERNIE-X1-Turbo-32K|73.2%|390s|6974|26.8|42|
46+
|开源|minimax|MiniMax-M1|73.2%|274s|5441|40.0|43|
47+
|开源|阿里巴巴|Qwen3-8B-nothink|73.2%|63s|1205|0.0|44|
48+
|商用|豆包|doubao-seed-1-6-flash-thinking-250615|73.2%|45s|1972|2.6|45|
49+
|开源|meta|Llama-4-Maverick-17B-128E-Instruct-FP8|73.2%|332s|827|3.3|46|
50+
|商用|Mistral|mistral-medium-2508(new)|71.0%|35s|1024|12.2|47|
51+
|开源|百度|ERNIE-4.5-300B-A47B|70.7%|290s|1268|9.0|48|
52+
|商用|豆包|doubao-seed-1-6-flash-250615|70.7%|16s|964|1.2|49|
53+
|开源|阿里巴巴|Qwen3-8B|70.7%|600s|18695|0.0|50|
5454
|开源|阿里巴巴|Qwen3-32B|70.7%|375s|8893|35.0|51|
55-
|开源|阿里巴巴|Qwen3-8B|70.7%|600s|18695|0.0|52|
55+
|商用|奇虎360|360zhinao2-o1|70.7%|419s|4381|42.6|52|
5656
|开源|智谱AI|GLM-4-32B-0414|70.7%|108s|1172|2.2|53|
57-
|商用|月之暗面|kimi-latest-8k|68.3%|408s|643|7.7|54|
58-
|商用|豆包|Doubao-1.5-lite-32k-250115|68.3%|50s|509|0.2|55|
59-
|商用|智谱AI|GLM-4.5-Flash-nothink|68.3%|43s|3405|0.0|56|
60-
|商用|XAI|grok-3-mini|68.3%|120s|1963|6.8|57|
57+
|商用|腾讯|hunyuan-turbos-20250716|70.7%|54s|2554|4.8|54|
58+
|开源|智谱AI|GLM-Z1-9B-0414|70.7%|213s|6725|0.0|55|
59+
|商用|月之暗面|kimi-latest-8k|68.3%|408s|643|7.7|56|
60+
|商用|豆包|Doubao-1.5-lite-32k-250115|68.3%|50s|509|0.2|57|
6161
|开源|百度|ERNIE-4.5-21B-A3B|68.3%|53s|1206|0.0|58|
62-
|商用|XAI|grok-4-0709|66.7%|376s|3563|373.2|59|
63-
|开源|阿里巴巴|Qwen3-4B|65.9%|258s|6314|18.4|60|
64-
|开源|minimax|MiniMax-Text-01|65.9%|303s|986|3.3|61|
62+
|商用|XAI|grok-3-mini|68.3%|120s|1963|6.8|59|
63+
|商用|智谱AI|GLM-4.5-Flash-nothink|68.3%|43s|3405|0.0|60|
64+
|商用|XAI|grok-4-0709|66.7%|376s|3563|373.2|61|
6565
|商用|百川智能|Baichuan4-Turbo|65.9%|80s|632|9.5|62|
6666
|开源|阿里巴巴|Qwen3-14B-nothink|65.9%|24s|1343|2.4|63|
67-
|商用|openAI|gpt-5-nano-2025-08-07(new)|63.4%|59s|4963|13.9|64|
68-
|开源|openAI|gpt-oss-120b(new)|63.4%|28s|1931|5.6|65|
67+
|开源|minimax|MiniMax-Text-01|65.9%|303s|986|3.3|64|
68+
|开源|阿里巴巴|Qwen3-4B|65.9%|258s|6314|18.4|65|
6969
|开源|腾讯|Hunyuan-A13B-Instruct-nothink|63.4%|782s|1035|3.6|66|
7070
|开源|深度求索|DeepSeek-R1-Distill-Qwen-32B|63.4%|102s|3389|4.3|67|
71-
|商用|google|gemini-2.5-flash-lite|62.9%|18s|5837|16.5|68|
72-
|开源|Mistral|Magistral-Small-2507|61.3%|197s|9200|98.4|69|
73-
|商用|openAI|o4-mini|61.0%|78s|1887|55.1|70|
74-
|开源|阿里巴巴|Qwen3-32B-nothink|61.0%|54s|1207|4.2|71|
75-
|商用|openAI|gpt-5-mini-2025-08-07(new)|61.0%|125s|2115|28.0|72|
76-
|开源|meta|Llama-4-Scout-17B-16E-Instruct|61.0%|360s|621|1.2|73|
77-
|开源|智谱AI|GLM-4-9B-0414|61.0%|113s|963|0.0|74|
78-
|商用|百度|ERNIE-4.5-Turbo-32K|61.0%|20s|569|1.5|75|
79-
|开源|深度求索|DeepSeek-R1-Distill-Qwen-14B|61.0%|118s|4449|2.7|76|
80-
|开源|Google|gemma-3-27b-it|58.5%|70s|675|0.8|77|
81-
|商用|阿里巴巴|qwen-long-2025-01-25|56.3%|21s|770|1.4|78|
82-
|商用|百川智能|Baichuan4-Air|56.1%|113s|647|0.6|79|
83-
|开源|openAI|gpt-oss-20b(new)|56.1%|27s|4594|5.1|80|
84-
|开源|阿里巴巴|Qwen3-4B-nothink|56.1%|26s|977|2.4|81|
85-
|开源|Mistral|Mistral-Small-3.2-24B-Instruct-2506|53.2%|21s|2094|4.2|82|
86-
|开源|阿里巴巴|Qwen3-1.7B-nothink|51.2%|13s|1026|2.5|83|
87-
|商用|百度|ERNIE-Speed-8K|48.8%|97s|398|0.0|84|
88-
|商用|百度|ERNIE-Lite-8K|48.8%|38s|508|0.0|85|
89-
|开源|Google|gemma-3-12b-it|46.3%|72s|708|0.0|86|
90-
|商用|阶跃星辰|step-2-mini|46.3%|251s|408|0.7|87|
91-
|开源|Google|gemma-3-4b-it|43.9%|50s|705|0.0|88|
92-
|开源|阿里巴巴|Qwen3-0.6B|41.5%|130s|4066|11.7|89|
93-
|开源|阿里巴巴|Qwen3-1.7B|36.6%|167s|6727|19.7|90|
94-
|商用|科大讯飞|xunfei-spark-lite|34.1%|27s|470|0.0|91|
95-
|开源|阿里巴巴|Qwen3-0.6B-nothink|29.3%|8s|613|1.3|92|
96-
|商用|Mistral|ministral-8b|26.8%|87s|669|0.5|93|
97-
|商用|Mistral|ministral-3b|24.4%|51s|676|0.2|94|
98-
|开源|百度|ERNIE-4.5-0.3B|24.4%|65s|689|0.0|95|
71+
|商用|openAI|gpt-5-nano-2025-08-07(new)|63.4%|59s|4963|13.9|68|
72+
|开源|openAI|gpt-oss-120b(new)|63.4%|28s|1931|5.6|69|
73+
|商用|google|gemini-2.5-flash-lite|62.9%|18s|5837|16.5|70|
74+
|开源|Mistral|Magistral-Small-2507|61.3%|197s|9200|98.4|71|
75+
|开源|深度求索|DeepSeek-R1-Distill-Qwen-14B|61.0%|118s|4449|2.7|72|
76+
|商用|百度|ERNIE-4.5-Turbo-32K|61.0%|20s|569|1.5|73|
77+
|开源|阿里巴巴|Qwen3-32B-nothink|61.0%|54s|1207|4.2|74|
78+
|商用|openAI|gpt-5-mini-2025-08-07(new)|61.0%|125s|2115|28.0|75|
79+
|商用|openAI|o4-mini|61.0%|78s|1887|55.1|76|
80+
|开源|智谱AI|GLM-4-9B-0414|61.0%|113s|963|0.0|77|
81+
|开源|meta|Llama-4-Scout-17B-16E-Instruct|61.0%|360s|621|1.2|78|
82+
|开源|Google|gemma-3-27b-it|58.5%|70s|675|0.8|79|
83+
|商用|阿里巴巴|qwen-long-2025-01-25|56.3%|21s|770|1.4|80|
84+
|商用|百川智能|Baichuan4-Air|56.1%|113s|647|0.6|81|
85+
|开源|openAI|gpt-oss-20b(new)|56.1%|27s|4594|5.1|82|
86+
|开源|阿里巴巴|Qwen3-4B-nothink|56.1%|26s|977|2.4|83|
87+
|开源|Mistral|Mistral-Small-3.2-24B-Instruct-2506|53.2%|21s|2094|4.2|84|
88+
|开源|阿里巴巴|Qwen3-1.7B-nothink|51.2%|13s|1026|2.5|85|
89+
|商用|百度|ERNIE-Speed-8K|48.8%|97s|398|0.0|86|
90+
|商用|百度|ERNIE-Lite-8K|48.8%|38s|508|0.0|87|
91+
|商用|阶跃星辰|step-2-mini|46.3%|251s|408|0.7|88|
92+
|开源|Google|gemma-3-12b-it|46.3%|72s|708|0.0|89|
93+
|开源|Google|gemma-3-4b-it|43.9%|50s|705|0.0|90|
94+
|开源|阿里巴巴|Qwen3-0.6B|41.5%|130s|4066|11.7|91|
95+
|开源|阿里巴巴|Qwen3-1.7B|36.6%|167s|6727|19.7|92|
96+
|商用|科大讯飞|xunfei-spark-lite|34.1%|27s|470|0.0|93|
97+
|开源|阿里巴巴|Qwen3-0.6B-nothink|29.3%|8s|613|1.3|94|
98+
|商用|Mistral|ministral-8b|26.8%|87s|669|0.5|95|
99+
|商用|Mistral|ministral-3b|24.4%|51s|676|0.2|96|
100+
|开源|百度|ERNIE-4.5-0.3B|24.4%|65s|689|0.0|97|
99101

100102

101103
![lin](../pic/2025高考化学.png)

0 commit comments

Comments
 (0)