Skip to content

Commit 16ef578

Browse files
committed
新增“agent与工具调用”大领域排行榜
1 parent 33b1df7 commit 16ef578

File tree

8 files changed

+340
-0
lines changed

8 files changed

+340
-0
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
## 最近更新
22
- [2025/9/14] v5.3版本
3+
- 新增“agent与工具调用”大领域排行榜,详见[link](#8agent与工具调用排行榜)
34
- 新增大模型:阿里开源qwen3-next-80b-a3b-instruct,☛查看[模型完整信息](https://nonelinear.com/static/models.html)
45
- [2025/9/10] v5.2版本
56
- 新增大模型:豆包开源Seed-OSS-36B-Instruct,☛查看[模型完整信息](https://nonelinear.com/static/models.html)

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,13 +45,18 @@
4545
- [7.10 汉字字形](#710-汉字字形)  |  [7.11 汉语拼音TODO](#711-汉语拼音TODO)  |  [7.12 找错别字TODO](#712-找错别字TODO)
4646
- [7.13 句子理解TODO](#713-句子理解TODO)  |  [7.14 标点符号TODO](#714-标点符号TODO)  |  [7.15 汉字繁简转换TODO](#715-汉字繁简转换TODO)
4747
- [7.16 语种识别TODO](#716-语种识别TODO)
48+
- [8、agent与工具调用排行榜](#8agent与工具调用排行榜)
49+
- [8.1 TAU-airline](#81-TAU-airline)
50+
- [8.2 TAU-retail](#82-TAU-retail)
51+
4852
- [🌐各项能力评分](#🌐各项能力评分)
4953
- [⚖️原始评测数据](#⚖️原始评测数据)
5054
- [为什么做榜单?](#为什么做榜单)
5155
- [大模型选型及评测交流群](#大模型评测交流群)
5256

5357
# 最近更新
5458
- [2025/9/14] v5.3版本
59+
- 新增“agent与工具调用”大领域排行榜,详见[link](#8agent与工具调用排行榜)
5560
- 新增大模型:阿里开源qwen3-next-80b-a3b-instruct,☛查看[模型完整信息](https://nonelinear.com/static/models.html)
5661
- [2025/9/10] v5.2版本
5762
- 新增大模型:豆包开源Seed-OSS-36B-Instruct,☛查看[模型完整信息](https://nonelinear.com/static/models.html)
@@ -1025,6 +1030,16 @@ Antonio,36,男,西班牙,182,75,博士
10251030
<br><br><br>
10261031

10271032

1033+
## 8、agent与工具调用排行榜
1034+
☛☛完整排行榜见[agent与工具调用排行榜](leaderboard/agent与工具调用.md)<br>
1035+
1036+
### 8.1 TAU-airline
1037+
完整排行榜见[TAU-airline](leaderboard/TAU-airline.md)<br>
1038+
1039+
### 8.2 TAU-retail
1040+
完整排行榜见[TAU-retail](leaderboard/TAU-retail.md)<br>
1041+
<br><br><br>
1042+
10281043

10291044
## 🌐各项能力评分
10301045
评分方法:从各个维度给大模型打分,每个维度都对应一个评测数据集,包含若干道题。

leaderboard/TAU-airline.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
2+
|类别|机构|大模型|【TAU-airline】准确率|平均耗时|平均消耗token|花费/千次(元)|排名(准确率)|
3+
|---|---|-----|-------------------|-------|-----------|-----------|-----------|
4+
|商用|腾讯|hunyuan-turbos-20250716|80.0%|/|/|/|1|
5+
|开源|智谱AI|GLM-4.5|75.0%|/|/|/|2|
6+
|开源|智谱AI|GLM-4.5-Air|65.0%|/|/|/|3|
7+
|商用|google|gemini-2.5-flash|65.0%|/|/|/|4|
8+
|商用|零一万物|yi-lightning|65.0%|/|/|/|5|
9+
|商用|智谱AI|GLM-4.5-Flash|65.0%|/|/|/|6|
10+
|商用|google|gemini-2.5-pro|60.0%|/|/|/|7|
11+
|商用|XAI|grok-3-mini|60.0%|/|/|/|8|
12+
|开源|月之暗面|kimi-k2-0711-preview|60.0%|/|/|/|9|
13+
|开源|智谱AI|GLM-4.5-Air-nothink|60.0%|/|/|/|10|
14+
|开源|深度求索|DeepSeek-V3.1(new)|55.0%|/|/|/|11|
15+
|开源|智谱AI|GLM-4.5-nothink|55.0%|/|/|/|12|
16+
|商用|openAI|o4-mini|55.0%|/|/|/|13|
17+
|开源|阿里巴巴|qwen3-235b-a22b-instruct-2507|55.0%|/|/|/|14|
18+
|商用|豆包|doubao-seed-1-6-250615|55.0%|/|/|/|15|
19+
|商用|智谱AI|GLM-4.5-Flash-nothink|50.0%|/|/|/|16|
20+
|开源|minimax|MiniMax-M1|50.0%|/|/|/|17|
21+
|商用|阿里巴巴|qwen-flash-think-2025-07-28|50.0%|/|/|/|18|
22+
|开源|阿里巴巴|Qwen3-30B-A3B-Thinking-2507|50.0%|/|/|/|19|
23+
|开源|openAI|gpt-oss-120b(new)|50.0%|/|/|/|20|
24+
|商用|腾讯|hunyuan-t1-20250711|45.0%|/|/|/|21|
25+
|开源|深度求索|DeepSeek-V3.1-Think(new)|45.0%|/|/|/|22|
26+
|开源|Google|gemma-3-4b-it|45.0%|/|/|/|23|
27+
|开源|Google|gemma-3-12b-it|45.0%|/|/|/|24|
28+
|商用|豆包|doubao-seed-1-6-thinking-250715|45.0%|/|/|/|25|
29+
|开源|meta|Llama-4-Scout-17B-16E-Instruct|45.0%|/|/|/|26|
30+
|开源|阿里巴巴|Qwen3-4B|45.0%|/|/|/|27|
31+
|商用|豆包|Doubao-1.5-pro-32k-250115|45.0%|/|/|/|28|
32+
|开源|minimax|MiniMax-Text-01|45.0%|/|/|/|29|
33+
|开源|深度求索|deepseek-chat-v3-0324|45.0%|/|/|/|30|
34+
|商用|百度|ERNIE-X1-Turbo-32K|45.0%|/|/|/|31|
35+
|开源|openAI|gpt-oss-20b(new)|40.0%|/|/|/|32|
36+
|开源|阿里巴巴|Qwen3-1.7B|40.0%|/|/|/|33|
37+
|开源|智谱AI|GLM-4-32B-0414|40.0%|/|/|/|34|
38+
|开源|Google|gemma-3-27b-it|40.0%|/|/|/|35|
39+
|开源|深度求索|DeepSeek-R1-Distill-Qwen-32B|40.0%|/|/|/|36|
40+
|开源|阿里巴巴|qwen3-235b-a22b-thinking-2507|40.0%|/|/|/|37|
41+
|商用|阿里巴巴|qwen-flash-2025-07-28|40.0%|/|/|/|38|
42+
|开源|智谱AI|GLM-Z1-9B-0414|35.0%|/|/|/|39|
43+
|商用|百川智能|Baichuan4-Turbo|35.0%|/|/|/|40|
44+
|商用|openAI|gpt-5-2025-08-07(new)|35.0%|/|/|/|41|
45+
|商用|openAI|gpt-5-mini-2025-08-07(new)|35.0%|/|/|/|42|
46+
|开源|阿里巴巴|Qwen3-0.6B|35.0%|/|/|/|43|
47+
|商用|豆包|doubao-seed-1-6-flash-250615|35.0%|/|/|/|44|
48+
|商用|阿里巴巴|qwen-turbo-2025-07-15|35.0%|/|/|/|45|
49+
|开源|阿里巴巴|Qwen3-14B|30.0%|/|/|/|46|
50+
|开源|阿里巴巴|Qwen3-32B|30.0%|/|/|/|47|
51+
|开源|阿里巴巴|Qwen3-30B-A3B-Instruct-2507|30.0%|/|/|/|48|
52+
|商用|豆包|doubao-seed-1-6-flash-thinking-250615|25.0%|/|/|/|49|
53+
|商用|openAI|gpt-5-nano-2025-08-07(new)|25.0%|/|/|/|50|
54+
|商用|阶跃星辰|step-2-mini|25.0%|/|/|/|51|
55+
|商用|豆包|Doubao-1.5-lite-32k-250115|25.0%|/|/|/|52|
56+
|开源|阿里巴巴|Qwen3-32B-nothink|25.0%|/|/|/|53|
57+
|商用|百度|ERNIE-4.5-Turbo-32K|25.0%|/|/|/|54|
58+
|开源|智谱AI|GLM-4-9B-0414|25.0%|/|/|/|55|
59+
|商用|google|gemini-2.5-flash-lite|20.0%|/|/|/|56|
60+
|商用|百度|ERNIE-3.5-8K|20.0%|/|/|/|57|
61+
|开源|阿里巴巴|Qwen3-1.7B-nothink|20.0%|/|/|/|58|
62+
|开源|阿里巴巴|Qwen3-0.6B-nothink|20.0%|/|/|/|59|
63+
|开源|阿里巴巴|Qwen3-14B-nothink|20.0%|/|/|/|60|
64+
|开源|深度求索|DeepSeek-R1-Distill-Qwen-14B|15.0%|/|/|/|61|
65+
|商用|月之暗面|kimi-latest-8k|10.0%|/|/|/|62|
66+
|开源|阿里巴巴|Qwen3-4B-nothink|10.0%|/|/|/|63|
67+
|商用|百川智能|Baichuan4-Air|10.0%|/|/|/|64|
68+
|商用|阿里巴巴|qwen-long-2025-01-25|5.0%|/|/|/|65|
69+
|开源|阿里巴巴|Qwen3-8B-nothink|5.0%|/|/|/|66|
70+
|商用|百度|ERNIE-Speed-8K|/%|/|/|/|67|
71+
|商用|科大讯飞|xunfei-4.0Ultra|/%|/|/|/|68|
72+
|商用|科大讯飞|xunfei-spark-pro|/%|/|/|/|69|
73+
|商用|科大讯飞|xunfei-spark-max|/%|/|/|/|70|
74+
|开源|阿里巴巴|Qwen3-8B|/%|/|/|/|71|
75+
|开源|智谱AI|GLM-Z1-32B-0414|/%|/|/|/|72|
76+
|商用|科大讯飞|xunfei-spark-lite|/%|/|/|/|73|
77+
|开源|meta|Llama-4-Maverick-17B-128E-Instruct-FP8|/%|/|/|/|74|
78+
|商用|奇虎360|360gpt2-o1|/%|/|/|/|75|
79+
|商用|Mistral|ministral-8b|/%|/|/|/|76|
80+
|商用|百度|ERNIE-Lite-8K|/%|/|/|/|77|
81+
|商用|奇虎360|360zhinao2-o1|/%|/|/|/|78|
82+
|商用|Mistral|ministral-3b|/%|/|/|/|79|
83+
|商用|奇虎360|360gpt2-pro|/%|/|/|/|80|
84+
|开源|深度求索|DeepSeek-R1-0528-Qwen3-8B|/%|/|/|/|81|
85+
|开源|深度求索|DeepSeek-R1-0528|/%|/|/|/|82|
86+
|商用|anthropic|claude-4-sonnet-thinking|/%|/|/|/|83|
87+
|商用|anthropic|claude-4-sonnet|/%|/|/|/|84|
88+
|开源|百度|ERNIE-4.5-300B-A47B|/%|/|/|/|85|
89+
|开源|腾讯|Hunyuan-A13B-Instruct|/%|/|/|/|86|
90+
|商用|科大讯飞|xunfei-spark-x1-0725|/%|/|/|/|87|
91+
|开源|腾讯|Hunyuan-A13B-Instruct-nothink|/%|/|/|/|88|
92+
|商用|XAI|grok-4-0709|/%|/|/|/|89|
93+
|开源|华为|pangu-pro-moe|/%|/|/|/|90|
94+
|开源|百度|ERNIE-4.5-0.3B|/%|/|/|/|91|
95+
|开源|百度|ERNIE-4.5-21B-A3B|/%|/|/|/|92|
96+
|开源|阶跃星辰|step-3|/%|/|/|/|93|
97+
|商用|Mistral|mistral-medium-2508(new)|/%|/|/|/|94|
98+
|开源|Mistral|Magistral-Small-2507(new)|/%|/|/|/|95|
99+
|开源|Mistral|Mistral-Small-3.2-24B-Instruct-2506(new)|/%|/|/|/|96|
100+
|商用|阿里巴巴|qwen-plus-2025-07-28(new)|/%|/|/|/|97|
101+
|商用|阿里巴巴|qwen-plus-think-2025-07-28(new)|/%|/|/|/|98|
102+
|商用|阿里巴巴|qwen-turbo-think-2025-07-15(new)|/%|/|/|/|99|
103+
|商用|阿里巴巴|qwen3-max-preview(new)|/%|/|/|/|100|
104+
|开源|豆包|Seed-OSS-36B-Instruct(new)|/%|/|/|/|101|
105+
|开源|阿里巴巴|qwen3-next-80b-a3b-instruct(new)|/%|/|/|/|102|
106+
107+
108+
![lin](../pic/TAU-airline.png)

leaderboard/TAU-retail.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
2+
|类别|机构|大模型|【TAU-retail】准确率|平均耗时|平均消耗token|花费/千次(元)|排名(准确率)|
3+
|---|---|-----|-------------------|-------|-----------|-----------|-----------|
4+
|商用|智谱AI|GLM-4.5-Flash|85.0%|/|/|/|1|
5+
|开源|智谱AI|GLM-4.5-Air|85.0%|/|/|/|2|
6+
|商用|腾讯|hunyuan-turbos-20250716|80.0%|/|/|/|3|
7+
|开源|月之暗面|kimi-k2-0711-preview|80.0%|/|/|/|4|
8+
|商用|零一万物|yi-lightning|80.0%|/|/|/|5|
9+
|开源|智谱AI|GLM-4.5-nothink|75.0%|/|/|/|6|
10+
|商用|openAI|o4-mini|75.0%|/|/|/|7|
11+
|商用|百度|ERNIE-X1-Turbo-32K|75.0%|/|/|/|8|
12+
|商用|google|gemini-2.5-pro|70.0%|/|/|/|9|
13+
|开源|阿里巴巴|qwen3-235b-a22b-thinking-2507|70.0%|/|/|/|10|
14+
|商用|openAI|gpt-5-2025-08-07(new)|70.0%|/|/|/|11|
15+
|商用|阿里巴巴|qwen-flash-think-2025-07-28|70.0%|/|/|/|12|
16+
|商用|openAI|gpt-5-nano-2025-08-07(new)|70.0%|/|/|/|13|
17+
|商用|openAI|gpt-5-mini-2025-08-07(new)|70.0%|/|/|/|14|
18+
|开源|深度求索|DeepSeek-V3.1-Think(new)|70.0%|/|/|/|15|
19+
|开源|深度求索|deepseek-chat-v3-0324|70.0%|/|/|/|16|
20+
|商用|豆包|doubao-seed-1-6-250615|65.0%|/|/|/|17|
21+
|开源|智谱AI|GLM-4.5-Air-nothink|65.0%|/|/|/|18|
22+
|开源|智谱AI|GLM-4.5|65.0%|/|/|/|19|
23+
|商用|智谱AI|GLM-4.5-Flash-nothink|65.0%|/|/|/|20|
24+
|开源|深度求索|DeepSeek-V3.1(new)|65.0%|/|/|/|21|
25+
|开源|阿里巴巴|Qwen3-30B-A3B-Thinking-2507|60.0%|/|/|/|22|
26+
|商用|百度|ERNIE-4.5-Turbo-32K|60.0%|/|/|/|23|
27+
|开源|阿里巴巴|qwen3-235b-a22b-instruct-2507|60.0%|/|/|/|24|
28+
|商用|豆包|doubao-seed-1-6-thinking-250715|60.0%|/|/|/|25|
29+
|开源|minimax|MiniMax-M1|55.0%|/|/|/|26|
30+
|商用|豆包|Doubao-1.5-pro-32k-250115|55.0%|/|/|/|27|
31+
|商用|XAI|grok-3-mini|55.0%|/|/|/|28|
32+
|商用|google|gemini-2.5-flash|55.0%|/|/|/|29|
33+
|商用|腾讯|hunyuan-t1-20250711|50.0%|/|/|/|30|
34+
|开源|阿里巴巴|Qwen3-14B|50.0%|/|/|/|31|
35+
|开源|openAI|gpt-oss-120b(new)|45.0%|/|/|/|32|
36+
|开源|openAI|gpt-oss-20b(new)|45.0%|/|/|/|33|
37+
|开源|阿里巴巴|Qwen3-32B|40.0%|/|/|/|34|
38+
|开源|阿里巴巴|Qwen3-14B-nothink|35.0%|/|/|/|35|
39+
|商用|google|gemini-2.5-flash-lite|35.0%|/|/|/|36|
40+
|开源|阿里巴巴|Qwen3-30B-A3B-Instruct-2507|35.0%|/|/|/|37|
41+
|商用|百度|ERNIE-3.5-8K|35.0%|/|/|/|38|
42+
|开源|阿里巴巴|Qwen3-32B-nothink|30.0%|/|/|/|39|
43+
|开源|阿里巴巴|Qwen3-4B|30.0%|/|/|/|40|
44+
|商用|阿里巴巴|qwen-turbo-2025-07-15|25.0%|/|/|/|41|
45+
|商用|阿里巴巴|qwen-flash-2025-07-28|25.0%|/|/|/|42|
46+
|商用|豆包|Doubao-1.5-lite-32k-250115|25.0%|/|/|/|43|
47+
|商用|月之暗面|kimi-latest-8k|25.0%|/|/|/|44|
48+
|开源|阿里巴巴|Qwen3-8B|25.0%|/|/|/|45|
49+
|商用|阿里巴巴|qwen-long-2025-01-25|25.0%|/|/|/|46|
50+
|开源|阿里巴巴|Qwen3-8B-nothink|20.0%|/|/|/|47|
51+
|商用|百川智能|Baichuan4-Air|10.0%|/|/|/|48|
52+
|开源|智谱AI|GLM-4-9B-0414|5.0%|/|/|/|49|
53+
|开源|Google|gemma-3-12b-it|5.0%|/|/|/|50|
54+
|开源|阿里巴巴|Qwen3-4B-nothink|5.0%|/|/|/|51|
55+
|商用|百川智能|Baichuan4-Turbo|5.0%|/|/|/|52|
56+
|开源|minimax|MiniMax-Text-01|5.0%|/|/|/|53|
57+
|开源|Google|gemma-3-4b-it|5.0%|/|/|/|54|
58+
|开源|深度求索|DeepSeek-R1-Distill-Qwen-14B|5.0%|/|/|/|55|
59+
|商用|科大讯飞|xunfei-4.0Ultra|/%|/|/|/|56|
60+
|商用|科大讯飞|xunfei-spark-pro|/%|/|/|/|57|
61+
|商用|科大讯飞|xunfei-spark-max|/%|/|/|/|58|
62+
|商用|百度|ERNIE-Speed-8K|/%|/|/|/|59|
63+
|开源|Google|gemma-3-27b-it|/%|/|/|/|60|
64+
|开源|meta|Llama-4-Scout-17B-16E-Instruct|/%|/|/|/|61|
65+
|开源|meta|Llama-4-Maverick-17B-128E-Instruct-FP8|/%|/|/|/|62|
66+
|商用|奇虎360|360gpt2-o1|/%|/|/|/|63|
67+
|商用|Mistral|ministral-8b|/%|/|/|/|64|
68+
|商用|Mistral|ministral-3b|/%|/|/|/|65|
69+
|商用|奇虎360|360gpt2-pro|/%|/|/|/|66|
70+
|商用|百度|ERNIE-Lite-8K|/%|/|/|/|67|
71+
|开源|深度求索|DeepSeek-R1-Distill-Qwen-32B|/%|/|/|/|68|
72+
|商用|奇虎360|360zhinao2-o1|/%|/|/|/|69|
73+
|商用|阶跃星辰|step-2-mini|/%|/|/|/|70|
74+
|开源|阿里巴巴|Qwen3-1.7B|/%|/|/|/|71|
75+
|开源|智谱AI|GLM-Z1-9B-0414|/%|/|/|/|72|
76+
|开源|智谱AI|GLM-4-32B-0414|/%|/|/|/|73|
77+
|开源|智谱AI|GLM-Z1-32B-0414|/%|/|/|/|74|
78+
|商用|科大讯飞|xunfei-spark-lite|/%|/|/|/|75|
79+
|开源|阿里巴巴|Qwen3-1.7B-nothink|/%|/|/|/|76|
80+
|开源|深度求索|DeepSeek-R1-0528|/%|/|/|/|77|
81+
|开源|阿里巴巴|Qwen3-0.6B|/%|/|/|/|78|
82+
|商用|anthropic|claude-4-sonnet-thinking|/%|/|/|/|79|
83+
|商用|anthropic|claude-4-sonnet|/%|/|/|/|80|
84+
|商用|豆包|doubao-seed-1-6-flash-thinking-250615|/%|/|/|/|81|
85+
|商用|豆包|doubao-seed-1-6-flash-250615|/%|/|/|/|82|
86+
|开源|百度|ERNIE-4.5-0.3B|/%|/|/|/|83|
87+
|开源|百度|ERNIE-4.5-21B-A3B|/%|/|/|/|84|
88+
|开源|百度|ERNIE-4.5-300B-A47B|/%|/|/|/|85|
89+
|开源|腾讯|Hunyuan-A13B-Instruct|/%|/|/|/|86|
90+
|开源|深度求索|DeepSeek-R1-0528-Qwen3-8B|/%|/|/|/|87|
91+
|开源|阿里巴巴|Qwen3-0.6B-nothink|/%|/|/|/|88|
92+
|商用|科大讯飞|xunfei-spark-x1-0725|/%|/|/|/|89|
93+
|开源|腾讯|Hunyuan-A13B-Instruct-nothink|/%|/|/|/|90|
94+
|商用|XAI|grok-4-0709|/%|/|/|/|91|
95+
|开源|华为|pangu-pro-moe|/%|/|/|/|92|
96+
|开源|阶跃星辰|step-3|/%|/|/|/|93|
97+
|商用|Mistral|mistral-medium-2508(new)|/%|/|/|/|94|
98+
|开源|Mistral|Magistral-Small-2507(new)|/%|/|/|/|95|
99+
|开源|Mistral|Mistral-Small-3.2-24B-Instruct-2506(new)|/%|/|/|/|96|
100+
|商用|阿里巴巴|qwen-plus-2025-07-28(new)|/%|/|/|/|97|
101+
|商用|阿里巴巴|qwen-plus-think-2025-07-28(new)|/%|/|/|/|98|
102+
|商用|阿里巴巴|qwen-turbo-think-2025-07-15(new)|/%|/|/|/|99|
103+
|商用|阿里巴巴|qwen3-max-preview(new)|/%|/|/|/|100|
104+
|开源|豆包|Seed-OSS-36B-Instruct(new)|/%|/|/|/|101|
105+
|开源|阿里巴巴|qwen3-next-80b-a3b-instruct(new)|/%|/|/|/|102|
106+
107+
108+
![lin](../pic/TAU-retail.png)
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
2+
|类别|机构|大模型|【agent与工具调用】准确率|平均耗时|平均消耗token|花费/千次(元)|排名(准确率)|
3+
|---|---|-----|-------------------|-------|-----------|-----------|-----------|
4+
|商用|腾讯|hunyuan-turbos-20250716|80.0%|/|/|/|1|
5+
|商用|智谱AI|GLM-4.5-Flash|75.0%|/|/|/|2|
6+
|开源|智谱AI|GLM-4.5-Air|75.0%|/|/|/|3|
7+
|商用|零一万物|yi-lightning|72.5%|/|/|/|4|
8+
|开源|月之暗面|kimi-k2-0711-preview|70.0%|/|/|/|5|
9+
|开源|智谱AI|GLM-4.5|70.0%|/|/|/|6|
10+
|开源|智谱AI|GLM-4.5-nothink|65.0%|/|/|/|7|
11+
|商用|google|gemini-2.5-pro|65.0%|/|/|/|8|
12+
|商用|openAI|o4-mini|65.0%|/|/|/|9|
13+
|开源|智谱AI|GLM-4.5-Air-nothink|62.5%|/|/|/|10|
14+
|商用|百度|ERNIE-X1-Turbo-32K|60.0%|/|/|/|11|
15+
|开源|深度求索|DeepSeek-V3.1(new)|60.0%|/|/|/|12|
16+
|商用|阿里巴巴|qwen-flash-think-2025-07-28|60.0%|/|/|/|13|
17+
|商用|豆包|doubao-seed-1-6-250615|60.0%|/|/|/|14|
18+
|商用|google|gemini-2.5-flash|60.0%|/|/|/|15|
19+
|商用|XAI|grok-3-mini|57.5%|/|/|/|16|
20+
|开源|阿里巴巴|qwen3-235b-a22b-instruct-2507|57.5%|/|/|/|17|
21+
|开源|深度求索|deepseek-chat-v3-0324|57.5%|/|/|/|18|
22+
|开源|深度求索|DeepSeek-V3.1-Think(new)|57.5%|/|/|/|19|
23+
|商用|智谱AI|GLM-4.5-Flash-nothink|57.5%|/|/|/|20|
24+
|开源|阿里巴巴|Qwen3-30B-A3B-Thinking-2507|55.0%|/|/|/|21|
25+
|开源|阿里巴巴|qwen3-235b-a22b-thinking-2507|55.0%|/|/|/|22|
26+
|商用|openAI|gpt-5-2025-08-07(new)|52.5%|/|/|/|23|
27+
|开源|minimax|MiniMax-M1|52.5%|/|/|/|24|
28+
|商用|豆包|doubao-seed-1-6-thinking-250715|52.5%|/|/|/|25|
29+
|商用|openAI|gpt-5-mini-2025-08-07(new)|52.5%|/|/|/|26|
30+
|商用|豆包|Doubao-1.5-pro-32k-250115|50.0%|/|/|/|27|
31+
|商用|腾讯|hunyuan-t1-20250711|47.5%|/|/|/|28|
32+
|商用|openAI|gpt-5-nano-2025-08-07(new)|47.5%|/|/|/|29|
33+
|开源|openAI|gpt-oss-120b(new)|47.5%|/|/|/|30|
34+
|开源|openAI|gpt-oss-20b(new)|42.5%|/|/|/|31|
35+
|商用|百度|ERNIE-4.5-Turbo-32K|42.5%|/|/|/|32|
36+
|开源|阿里巴巴|Qwen3-14B|40.0%|/|/|/|33|
37+
|开源|智谱AI|GLM-4-32B-0414|40.0%|/|/|/|34|
38+
|开源|阿里巴巴|Qwen3-4B|37.5%|/|/|/|35|
39+
|开源|阿里巴巴|Qwen3-32B|35.0%|/|/|/|36|
40+
|开源|阿里巴巴|Qwen3-30B-A3B-Instruct-2507|32.5%|/|/|/|37|
41+
|商用|阿里巴巴|qwen-flash-2025-07-28|32.5%|/|/|/|38|
42+
|商用|阿里巴巴|qwen-turbo-2025-07-15|30.0%|/|/|/|39|
43+
|商用|google|gemini-2.5-flash-lite|27.5%|/|/|/|40|
44+
|开源|阿里巴巴|Qwen3-32B-nothink|27.5%|/|/|/|41|
45+
|开源|阿里巴巴|Qwen3-14B-nothink|27.5%|/|/|/|42|
46+
|商用|百度|ERNIE-3.5-8K|27.5%|/|/|/|43|
47+
|开源|阿里巴巴|Qwen3-8B|25.0%|/|/|/|44|
48+
|商用|豆包|Doubao-1.5-lite-32k-250115|25.0%|/|/|/|45|
49+
|商用|阶跃星辰|step-2-mini|25.0%|/|/|/|46|
50+
|开源|minimax|MiniMax-Text-01|25.0%|/|/|/|47|
51+
|开源|Google|gemma-3-12b-it|25.0%|/|/|/|48|
52+
|开源|Google|gemma-3-4b-it|25.0%|/|/|/|49|
53+
|开源|meta|Llama-4-Scout-17B-16E-Instruct|22.5%|/|/|/|50|
54+
|商用|百川智能|Baichuan4-Turbo|20.0%|/|/|/|51|
55+
|开源|深度求索|DeepSeek-R1-Distill-Qwen-32B|20.0%|/|/|/|52|
56+
|开源|Google|gemma-3-27b-it|20.0%|/|/|/|53|
57+
|开源|阿里巴巴|Qwen3-1.7B|20.0%|/|/|/|54|
58+
|商用|豆包|doubao-seed-1-6-flash-250615|17.5%|/|/|/|55|
59+
|开源|智谱AI|GLM-Z1-9B-0414|17.5%|/|/|/|56|
60+
|商用|月之暗面|kimi-latest-8k|17.5%|/|/|/|57|
61+
|开源|阿里巴巴|Qwen3-0.6B|17.5%|/|/|/|58|
62+
|商用|阿里巴巴|qwen-long-2025-01-25|15.0%|/|/|/|59|
63+
|开源|智谱AI|GLM-4-9B-0414|15.0%|/|/|/|60|
64+
|商用|豆包|doubao-seed-1-6-flash-thinking-250615|12.5%|/|/|/|61|
65+
|开源|阿里巴巴|Qwen3-8B-nothink|12.5%|/|/|/|62|
66+
|开源|阿里巴巴|Qwen3-1.7B-nothink|10.0%|/|/|/|63|
67+
|商用|百川智能|Baichuan4-Air|10.0%|/|/|/|64|
68+
|开源|深度求索|DeepSeek-R1-Distill-Qwen-14B|10.0%|/|/|/|65|
69+
|开源|阿里巴巴|Qwen3-0.6B-nothink|10.0%|/|/|/|66|
70+
|开源|阿里巴巴|Qwen3-4B-nothink|7.5%|/|/|/|67|
71+
|商用|百度|ERNIE-Speed-8K|/%|/|/|/|68|
72+
|商用|科大讯飞|xunfei-spark-pro|/%|/|/|/|69|
73+
|商用|科大讯飞|xunfei-spark-max|/%|/|/|/|70|
74+
|开源|智谱AI|GLM-Z1-32B-0414|/%|/|/|/|71|
75+
|开源|meta|Llama-4-Maverick-17B-128E-Instruct-FP8|/%|/|/|/|72|
76+
|商用|科大讯飞|xunfei-spark-lite|/%|/|/|/|73|
77+
|商用|奇虎360|360gpt2-pro|/%|/|/|/|74|
78+
|商用|奇虎360|360gpt2-o1|/%|/|/|/|75|
79+
|商用|Mistral|ministral-8b|/%|/|/|/|76|
80+
|商用|百度|ERNIE-Lite-8K|/%|/|/|/|77|
81+
|商用|奇虎360|360zhinao2-o1|/%|/|/|/|78|
82+
|商用|Mistral|ministral-3b|/%|/|/|/|79|
83+
|商用|科大讯飞|xunfei-4.0Ultra|/%|/|/|/|80|
84+
|开源|深度求索|DeepSeek-R1-0528-Qwen3-8B|/%|/|/|/|81|
85+
|开源|深度求索|DeepSeek-R1-0528|/%|/|/|/|82|
86+
|商用|anthropic|claude-4-sonnet-thinking|/%|/|/|/|83|
87+
|商用|anthropic|claude-4-sonnet|/%|/|/|/|84|
88+
|开源|百度|ERNIE-4.5-300B-A47B|/%|/|/|/|85|
89+
|开源|腾讯|Hunyuan-A13B-Instruct|/%|/|/|/|86|
90+
|商用|科大讯飞|xunfei-spark-x1-0725|/%|/|/|/|87|
91+
|开源|腾讯|Hunyuan-A13B-Instruct-nothink|/%|/|/|/|88|
92+
|商用|XAI|grok-4-0709|/%|/|/|/|89|
93+
|开源|华为|pangu-pro-moe|/%|/|/|/|90|
94+
|开源|百度|ERNIE-4.5-0.3B|/%|/|/|/|91|
95+
|开源|百度|ERNIE-4.5-21B-A3B|/%|/|/|/|92|
96+
|开源|阶跃星辰|step-3|/%|/|/|/|93|
97+
|商用|Mistral|mistral-medium-2508(new)|/%|/|/|/|94|
98+
|开源|Mistral|Magistral-Small-2507(new)|/%|/|/|/|95|
99+
|开源|Mistral|Mistral-Small-3.2-24B-Instruct-2506(new)|/%|/|/|/|96|
100+
|商用|阿里巴巴|qwen-plus-2025-07-28(new)|/%|/|/|/|97|
101+
|商用|阿里巴巴|qwen-plus-think-2025-07-28(new)|/%|/|/|/|98|
102+
|商用|阿里巴巴|qwen-turbo-think-2025-07-15(new)|/%|/|/|/|99|
103+
|商用|阿里巴巴|qwen3-max-preview(new)|/%|/|/|/|100|
104+
|开源|豆包|Seed-OSS-36B-Instruct(new)|/%|/|/|/|101|
105+
|开源|阿里巴巴|qwen3-next-80b-a3b-instruct(new)|/%|/|/|/|102|
106+
107+
108+
![lin](../pic/agent与工具调用.png)

pic/TAU-airline.png

1.41 MB
Loading

pic/TAU-retail.png

1.39 MB
Loading

pic/agent与工具调用.png

1.45 MB
Loading

0 commit comments

Comments
 (0)