Skip to content

Commit 9fb5647

Browse files
committed
v5.5
1 parent 3ae816c commit 9fb5647

File tree

367 files changed

+14148
-13963
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

367 files changed

+14148
-13963
lines changed

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## 最近更新
22
- [2025/9/30] v5.5版本
3-
- 新增大模型:开源DeepSeek-V3.2-Exp、DeepSeek-V3.2-Exp-Think,☛查看[模型完整信息](https://nonelinear.com/static/models.html)
3+
- 新增大模型:开源DeepSeek-V3.2-Exp、DeepSeek-V3.2-Exp-Think、hunyuan-turbos-20250926,☛查看[模型完整信息](https://nonelinear.com/static/models.html)
44
- [2025/9/22] v5.4版本
55
- “agent与工具调用”领域新增BFCL-V3排行榜,详见[link](#82-BFCL-V3)
66
- [2025/9/14] v5.3版本

GitHub热门评测repo.md

Lines changed: 51 additions & 51 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22
# ReLE中文大模型能力评测(持续更新)
33
- ReLE (**R**eally R**e**liable **L**ive **E**valuation for LLM),原名CLiB
4-
- 目前已囊括300个大模型,覆盖chatgpt、gpt-5、o4-mini、谷歌gemini-2.5、Claude4、智谱GLM-Z1、文心一言、qwen3-max、百川、讯飞星火、商汤senseChat、minimax等商用模型,
4+
- 目前已囊括301个大模型,覆盖chatgpt、gpt-5、o4-mini、谷歌gemini-2.5、Claude4、智谱GLM-Z1、文心一言、qwen3-max、百川、讯飞星火、商汤senseChat、minimax等商用模型,
55
以及kimi-k2、ernie4.5、minimax-M1、DeepSeek-R1-0528、deepseek-v3.2、qwen3-2507、llama4、GLM4.5、gemma3、mistral等开源大模型。
66
- 支持多维度能力评测,包括教育、医疗与心理健康、金融、法律与行政公务、推理与数学计算、语言与指令遵从等6个领域,以及细分的~300个维度(比如牙科、高中语文…)。
77
- 不仅提供排行榜,也提供规模**超200万的大模型缺陷库**!方便广大社区研究分析、改进大模型。
@@ -55,7 +55,7 @@
5555

5656
# 最近更新
5757
- [2025/9/30] v5.5版本
58-
- 新增大模型:开源DeepSeek-V3.2-Exp、DeepSeek-V3.2-Exp-Think,☛查看[模型完整信息](https://nonelinear.com/static/models.html)
58+
- 新增大模型:开源DeepSeek-V3.2-Exp、DeepSeek-V3.2-Exp-Think、hunyuan-turbos-20250926,☛查看[模型完整信息](https://nonelinear.com/static/models.html)
5959
- [2025/9/22] v5.4版本
6060
- “agent与工具调用”领域新增BFCL-V3排行榜,详见[link](#82-BFCL-V3)
6161
- 删除陈旧的模型:xunfei-4.0Ultra、xunfei-spark-pro、xunfei-spark-max、yi-lightning、360gpt2-pro、360gpt2-o1、ERNIE-3.5-8K

leaderboard/2025高考化学.md

Lines changed: 79 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -3,101 +3,102 @@
33
|---|---|-----|-------------------|-------|-----------|-----------|-----------|
44
|开源|阿里巴巴|qwen3-next-80b-a3b-instruct(new)|88.7%|21s|1644|6.0|1|
55
|商用|阿里巴巴|qwen-plus-think-2025-07-28|87.1%|/|4614|35.5|2|
6-
|商用|阿里巴巴|qwen-plus-2025-07-28|85.5%|50s|1825|3.4|3|
7-
|开源|豆包|Seed-OSS-36B-Instruct(new)|85.5%|234s|3569|13.8|4|
6+
|商用|腾讯|hunyuan-turbos-20250926(new)|87.1%|37s|1712|3.2|3|
7+
|商用|阿里巴巴|qwen-plus-2025-07-28|85.5%|50s|1825|3.4|4|
88
|开源|深度求索|DeepSeek-V3.2-Exp(new)|85.5%|236s|824|2.3|5|
9-
|开源|阿里巴巴|qwen3-235b-a22b-thinking-2507|85.4%|134s|5202|100.5|6|
10-
|开源|阿里巴巴|Qwen3-30B-A3B-Instruct-2507|85.4%|19s|2210|6.2|7|
11-
|商用|豆包|doubao-seed-1-6-thinking-250715|85.4%|50s|3292|25.0|8|
12-
|商用|腾讯|hunyuan-t1-20250711|82.9%|91s|5911|20.9|9|
9+
|开源|豆包|Seed-OSS-36B-Instruct(new)|85.5%|234s|3569|13.8|6|
10+
|商用|豆包|doubao-seed-1-6-thinking-250715|85.4%|50s|3292|25.0|7|
11+
|开源|阿里巴巴|qwen3-235b-a22b-thinking-2507|85.4%|134s|5202|100.5|8|
12+
|开源|阿里巴巴|Qwen3-30B-A3B-Instruct-2507|85.4%|19s|2210|6.2|9|
1313
|商用|google|gemini-2.5-pro|82.9%|49s|4785|335.3|10|
14-
|商用|google|gemini-2.5-flash|82.9%|24s|4326|75.5|11|
15-
|开源|深度求索|DeepSeek-V3.1-Think(new)|82.3%|113s|2377|27.1|12|
16-
|开源|深度求索|DeepSeek-V3.2-Exp-Think(new)|82.3%|122s|3473|10.2|13|
17-
|商用|阿里巴巴|qwen3-max-preview(new)|80.6%|28s|1260|26.6|14|
18-
|开源|阿里巴巴|qwen3-235b-a22b-instruct-2507|80.5%|74s|2436|18.3|15|
19-
|开源|深度求索|deepseek-chat-v3-0324|80.5%|243s|1065|7.9|16|
14+
|商用|腾讯|hunyuan-t1-20250711|82.9%|91s|5911|20.9|11|
15+
|商用|google|gemini-2.5-flash|82.9%|24s|4326|75.5|12|
16+
|开源|深度求索|DeepSeek-V3.1-Think(new)|82.3%|113s|2377|27.1|13|
17+
|开源|深度求索|DeepSeek-V3.2-Exp-Think(new)|82.3%|122s|3473|10.2|14|
18+
|商用|阿里巴巴|qwen3-max-preview(new)|80.6%|28s|1260|26.6|15|
19+
|开源|阿里巴巴|qwen3-235b-a22b-instruct-2507|80.5%|74s|2436|18.3|16|
2020
|商用|阿里巴巴|qwen-flash-2025-07-28|80.5%|21s|2056|2.8|17|
21-
|开源|月之暗面|kimi-k2-0711-preview|80.5%|129s|1528|22.3|18|
22-
|开源|智谱AI|GLM-4.5|80.5%|91s|4388|59.4|19|
23-
|商用|阿里巴巴|qwen-flash-think-2025-07-28|80.5%|42s|4663|6.7|20|
24-
|开源|阿里巴巴|Qwen3-14B|80.5%|348s|15239|30.2|21|
25-
|开源|深度求索|DeepSeek-V3.1(new)|79.0%|37s|863|9.0|22|
26-
|开源|智谱AI|GLM-Z1-32B-0414|78.0%|529s|4474|17.6|23|
27-
|开源|智谱AI|GLM-4.5-Air|78.0%|72s|4493|26.0|24|
28-
|商用|智谱AI|GLM-4.5-Flash|78.0%|67s|4515|0.0|25|
21+
|开源|智谱AI|GLM-4.5|80.5%|91s|4388|59.4|18|
22+
|商用|阿里巴巴|qwen-flash-think-2025-07-28|80.5%|42s|4663|6.7|19|
23+
|开源|月之暗面|kimi-k2-0711-preview|80.5%|129s|1528|22.3|20|
24+
|开源|深度求索|deepseek-chat-v3-0324|80.5%|243s|1065|7.9|21|
25+
|开源|阿里巴巴|Qwen3-14B|80.5%|348s|15239|30.2|22|
26+
|开源|深度求索|DeepSeek-V3.1(new)|79.0%|37s|863|9.0|23|
27+
|开源|智谱AI|GLM-Z1-32B-0414|78.0%|529s|4474|17.6|24|
28+
|开源|阶跃星辰|step-3|78.0%|245s|4809|18.8|25|
2929
|开源|深度求索|DeepSeek-R1-0528|78.0%|431s|4587|71.3|26|
30-
|开源|阶跃星辰|step-3|78.0%|245s|4809|18.8|27|
31-
|开源|华为|pangu-pro-moe|78.0%|151s|3210|12.3|28|
32-
|开源|腾讯|Hunyuan-A13B-Instruct|78.0%|140s|3076|11.7|29|
33-
|商用|anthropic|claude-4-sonnet|78.0%|69s|880|73.4|30|
34-
|商用|豆包|doubao-seed-1-6-250615|78.0%|117s|808|4.9|31|
35-
|商用|阿里巴巴|qwen-turbo-think-2025-07-15|77.4%|/|4905|14.2|32|
36-
|商用|豆包|Doubao-1.5-pro-32k-250115|75.6%|99s|797|1.4|33|
37-
|商用|科大讯飞|xunfei-spark-x1-0725|75.6%|/|3449|41.4|34|
38-
|开源|智谱AI|GLM-4.5-Air-nothink|75.6%|49s|3311|18.9|35|
39-
|开源|阿里巴巴|Qwen3-30B-A3B-Thinking-2507|75.6%|87s|4606|12.5|36|
40-
|开源|智谱AI|GLM-4.5-nothink|75.6%|99s|3242|43.3|37|
41-
|商用|openAI|gpt-5-2025-08-07(new)|75.6%|63s|923|53.8|38|
42-
|商用|阿里巴巴|qwen-turbo-2025-07-15|75.6%|17s|1226|0.7|39|
30+
|商用|anthropic|claude-4-sonnet|78.0%|69s|880|73.4|27|
31+
|开源|智谱AI|GLM-4.5-Air|78.0%|72s|4493|26.0|28|
32+
|开源|华为|pangu-pro-moe|78.0%|151s|3210|12.3|29|
33+
|开源|腾讯|Hunyuan-A13B-Instruct|78.0%|140s|3076|11.7|30|
34+
|商用|智谱AI|GLM-4.5-Flash|78.0%|67s|4515|0.0|31|
35+
|商用|豆包|doubao-seed-1-6-250615|78.0%|117s|808|4.9|32|
36+
|商用|阿里巴巴|qwen-turbo-think-2025-07-15|77.4%|/|4905|14.2|33|
37+
|商用|豆包|Doubao-1.5-pro-32k-250115|75.6%|99s|797|1.4|34|
38+
|商用|openAI|gpt-5-2025-08-07(new)|75.6%|63s|923|53.8|35|
39+
|商用|科大讯飞|xunfei-spark-x1-0725|75.6%|/|3449|41.4|36|
40+
|开源|阿里巴巴|Qwen3-30B-A3B-Thinking-2507|75.6%|87s|4606|12.5|37|
41+
|开源|智谱AI|GLM-4.5-nothink|75.6%|99s|3242|43.3|38|
42+
|开源|智谱AI|GLM-4.5-Air-nothink|75.6%|49s|3311|18.9|39|
4343
|开源|深度求索|DeepSeek-R1-0528-Qwen3-8B|75.6%|309s|4565|0.0|40|
4444
|商用|anthropic|claude-4-sonnet-thinking|75.6%|100s|1942|188.0|41|
45-
|商用|百度|ERNIE-X1-Turbo-32K|73.2%|390s|6974|26.8|42|
45+
|商用|阿里巴巴|qwen-turbo-2025-07-15|75.6%|17s|1226|0.7|42|
4646
|开源|minimax|MiniMax-M1|73.2%|274s|5441|40.0|43|
47-
|开源|阿里巴巴|Qwen3-8B-nothink|73.2%|63s|1205|0.0|44|
48-
|商用|豆包|doubao-seed-1-6-flash-thinking-250615|73.2%|45s|1972|2.6|45|
49-
|开源|meta|Llama-4-Maverick-17B-128E-Instruct-FP8|73.2%|332s|827|3.3|46|
50-
|商用|Mistral|mistral-medium-2508(new)|71.0%|35s|1024|12.2|47|
51-
|开源|百度|ERNIE-4.5-300B-A47B|70.7%|290s|1268|9.0|48|
47+
|商用|豆包|doubao-seed-1-6-flash-thinking-250615|73.2%|45s|1972|2.6|44|
48+
|开源|阿里巴巴|Qwen3-8B-nothink|73.2%|63s|1205|0.0|45|
49+
|商用|百度|ERNIE-X1-Turbo-32K|73.2%|390s|6974|26.8|46|
50+
|开源|meta|Llama-4-Maverick-17B-128E-Instruct-FP8|73.2%|332s|827|3.3|47|
51+
|商用|Mistral|mistral-medium-2508(new)|71.0%|35s|1024|12.2|48|
5252
|商用|豆包|doubao-seed-1-6-flash-250615|70.7%|16s|964|1.2|49|
53-
|开源|阿里巴巴|Qwen3-8B|70.7%|600s|18695|0.0|50|
54-
|开源|阿里巴巴|Qwen3-32B|70.7%|375s|8893|35.0|51|
55-
|商用|奇虎360|360zhinao2-o1|70.7%|419s|4381|42.6|52|
53+
|开源|百度|ERNIE-4.5-300B-A47B|70.7%|290s|1268|9.0|50|
54+
|商用|奇虎360|360zhinao2-o1|70.7%|419s|4381|42.6|51|
55+
|开源|阿里巴巴|Qwen3-8B|70.7%|600s|18695|0.0|52|
5656
|开源|智谱AI|GLM-4-32B-0414|70.7%|108s|1172|2.2|53|
57-
|商用|腾讯|hunyuan-turbos-20250716|70.7%|54s|2554|4.8|54|
58-
|开源|智谱AI|GLM-Z1-9B-0414|70.7%|213s|6725|0.0|55|
59-
|商用|月之暗面|kimi-latest-8k|68.3%|408s|643|7.7|56|
57+
|开源|智谱AI|GLM-Z1-9B-0414|70.7%|213s|6725|0.0|54|
58+
|商用|腾讯|hunyuan-turbos-20250716|70.7%|54s|2554|4.8|55|
59+
|开源|阿里巴巴|Qwen3-32B|70.7%|375s|8893|35.0|56|
6060
|商用|豆包|Doubao-1.5-lite-32k-250115|68.3%|50s|509|0.2|57|
61-
|开源|百度|ERNIE-4.5-21B-A3B|68.3%|53s|1206|0.0|58|
61+
|商用|月之暗面|kimi-latest-8k|68.3%|408s|643|7.7|58|
6262
|商用|XAI|grok-3-mini|68.3%|120s|1963|6.8|59|
6363
|商用|智谱AI|GLM-4.5-Flash-nothink|68.3%|43s|3405|0.0|60|
64-
|商用|XAI|grok-4-0709|66.7%|376s|3563|373.2|61|
65-
|商用|百川智能|Baichuan4-Turbo|65.9%|80s|632|9.5|62|
66-
|开源|阿里巴巴|Qwen3-14B-nothink|65.9%|24s|1343|2.4|63|
67-
|开源|minimax|MiniMax-Text-01|65.9%|303s|986|3.3|64|
68-
|开源|阿里巴巴|Qwen3-4B|65.9%|258s|6314|18.4|65|
69-
|开源|腾讯|Hunyuan-A13B-Instruct-nothink|63.4%|782s|1035|3.6|66|
70-
|开源|深度求索|DeepSeek-R1-Distill-Qwen-32B|63.4%|102s|3389|4.3|67|
71-
|商用|openAI|gpt-5-nano-2025-08-07(new)|63.4%|59s|4963|13.9|68|
64+
|开源|百度|ERNIE-4.5-21B-A3B|68.3%|53s|1206|0.0|61|
65+
|商用|XAI|grok-4-0709|66.7%|376s|3563|373.2|62|
66+
|商用|百川智能|Baichuan4-Turbo|65.9%|80s|632|9.5|63|
67+
|开源|阿里巴巴|Qwen3-14B-nothink|65.9%|24s|1343|2.4|64|
68+
|开源|minimax|MiniMax-Text-01|65.9%|303s|986|3.3|65|
69+
|开源|阿里巴巴|Qwen3-4B|65.9%|258s|6314|18.4|66|
70+
|开源|腾讯|Hunyuan-A13B-Instruct-nothink|63.4%|782s|1035|3.6|67|
71+
|开源|深度求索|DeepSeek-R1-Distill-Qwen-32B|63.4%|102s|3389|4.3|68|
7272
|开源|openAI|gpt-oss-120b(new)|63.4%|28s|1931|5.6|69|
73-
|商用|google|gemini-2.5-flash-lite|62.9%|18s|5837|16.5|70|
74-
|开源|Mistral|Magistral-Small-2507|61.3%|197s|9200|98.4|71|
75-
|开源|深度求索|DeepSeek-R1-Distill-Qwen-14B|61.0%|118s|4449|2.7|72|
76-
|商用|百度|ERNIE-4.5-Turbo-32K|61.0%|20s|569|1.5|73|
73+
|商用|openAI|gpt-5-nano-2025-08-07(new)|63.4%|59s|4963|13.9|70|
74+
|商用|google|gemini-2.5-flash-lite|62.9%|18s|5837|16.5|71|
75+
|开源|Mistral|Magistral-Small-2507|61.3%|197s|9200|98.4|72|
76+
|开源|深度求索|DeepSeek-R1-Distill-Qwen-14B|61.0%|118s|4449|2.7|73|
7777
|开源|阿里巴巴|Qwen3-32B-nothink|61.0%|54s|1207|4.2|74|
78-
|商用|openAI|gpt-5-mini-2025-08-07(new)|61.0%|125s|2115|28.0|75|
79-
|商用|openAI|o4-mini|61.0%|78s|1887|55.1|76|
78+
|商用|百度|ERNIE-4.5-Turbo-32K|61.0%|20s|569|1.5|75|
79+
|商用|openAI|gpt-5-mini-2025-08-07(new)|61.0%|125s|2115|28.0|76|
8080
|开源|智谱AI|GLM-4-9B-0414|61.0%|113s|963|0.0|77|
81-
|开源|meta|Llama-4-Scout-17B-16E-Instruct|61.0%|360s|621|1.2|78|
82-
|开源|Google|gemma-3-27b-it|58.5%|70s|675|0.8|79|
83-
|商用|阿里巴巴|qwen-long-2025-01-25|56.3%|21s|770|1.4|80|
84-
|商用|百川智能|Baichuan4-Air|56.1%|113s|647|0.6|81|
85-
|开源|openAI|gpt-oss-20b(new)|56.1%|27s|4594|5.1|82|
86-
|开源|阿里巴巴|Qwen3-4B-nothink|56.1%|26s|977|2.4|83|
87-
|开源|Mistral|Mistral-Small-3.2-24B-Instruct-2506|53.2%|21s|2094|4.2|84|
88-
|开源|阿里巴巴|Qwen3-1.7B-nothink|51.2%|13s|1026|2.5|85|
89-
|商用|百度|ERNIE-Speed-8K|48.8%|97s|398|0.0|86|
81+
|商用|openAI|o4-mini|61.0%|78s|1887|55.1|78|
82+
|开源|meta|Llama-4-Scout-17B-16E-Instruct|61.0%|360s|621|1.2|79|
83+
|开源|Google|gemma-3-27b-it|58.5%|70s|675|0.8|80|
84+
|商用|阿里巴巴|qwen-long-2025-01-25|56.3%|21s|770|1.4|81|
85+
|商用|百川智能|Baichuan4-Air|56.1%|113s|647|0.6|82|
86+
|开源|openAI|gpt-oss-20b(new)|56.1%|27s|4594|5.1|83|
87+
|开源|阿里巴巴|Qwen3-4B-nothink|56.1%|26s|977|2.4|84|
88+
|开源|Mistral|Mistral-Small-3.2-24B-Instruct-2506|53.2%|21s|2094|4.2|85|
89+
|开源|阿里巴巴|Qwen3-1.7B-nothink|51.2%|13s|1026|2.5|86|
9090
|商用|百度|ERNIE-Lite-8K|48.8%|38s|508|0.0|87|
91-
|商用|阶跃星辰|step-2-mini|46.3%|251s|408|0.7|88|
91+
|商用|百度|ERNIE-Speed-8K|48.8%|97s|398|0.0|88|
9292
|开源|Google|gemma-3-12b-it|46.3%|72s|708|0.0|89|
93-
|开源|Google|gemma-3-4b-it|43.9%|50s|705|0.0|90|
94-
|开源|阿里巴巴|Qwen3-0.6B|41.5%|130s|4066|11.7|91|
95-
|开源|阿里巴巴|Qwen3-1.7B|36.6%|167s|6727|19.7|92|
96-
|商用|科大讯飞|xunfei-spark-lite|34.1%|27s|470|0.0|93|
97-
|开源|阿里巴巴|Qwen3-0.6B-nothink|29.3%|8s|613|1.3|94|
98-
|商用|Mistral|ministral-8b|26.8%|87s|669|0.5|95|
99-
|商用|Mistral|ministral-3b|24.4%|51s|676|0.2|96|
100-
|开源|百度|ERNIE-4.5-0.3B|24.4%|65s|689|0.0|97|
93+
|商用|阶跃星辰|step-2-mini|46.3%|251s|408|0.7|90|
94+
|开源|Google|gemma-3-4b-it|43.9%|50s|705|0.0|91|
95+
|开源|阿里巴巴|Qwen3-0.6B|41.5%|130s|4066|11.7|92|
96+
|开源|阿里巴巴|Qwen3-1.7B|36.6%|167s|6727|19.7|93|
97+
|商用|科大讯飞|xunfei-spark-lite|34.1%|27s|470|0.0|94|
98+
|开源|阿里巴巴|Qwen3-0.6B-nothink|29.3%|8s|613|1.3|95|
99+
|商用|Mistral|ministral-8b|26.8%|87s|669|0.5|96|
100+
|商用|Mistral|ministral-3b|24.4%|51s|676|0.2|97|
101+
|开源|百度|ERNIE-4.5-0.3B|24.4%|65s|689|0.0|98|
101102

102103

103104
![lin](../pic/2025高考化学.png)

0 commit comments

Comments
 (0)