|
2 | 2 | |类别|机构|大模型|【总分】准确率|平均耗时|平均消耗token|花费/千次(元)|排名(准确率)| |
3 | 3 | |---|---|-----|-------------------|-------|-----------|-----------|-----------| |
4 | 4 | |商用|google|gemini-3-pro-preview(new)|72.5%|64s|3119|247.3|1| |
5 | | -|商用|豆包|doubao-seed-1-6-thinking-250715|71.7%|37s|2162|15.6|2| |
| 5 | +|商用|豆包|doubao-seed-1-6-thinking-250715|71.7%|27s|2162|15.6|2| |
6 | 6 | |开源|深度求索|DeepSeek-V3.2-Exp-Think(new)|70.1%|248s|2106|6.1|3| |
7 | 7 | |商用|openAI|gpt-5.1-high(new)|69.7%|117s|2745|180.0|4| |
8 | 8 | |商用|openAI|gpt-5.1-medium(new)|69.3%|160s|1448|87.9|5| |
|
28 | 28 | |开源|阿里巴巴|qwen3-235b-a22b-thinking-2507|65.5%|143s|3422|61.2|25| |
29 | 29 | |开源|智谱AI|GLM-4.5-Air|65.4%|89s|3215|18.0|26| |
30 | 30 | |开源|阿里巴巴|Qwen3-30B-A3B-Thinking-2507|65.0%|106s|3303|8.8|27| |
31 | | -|商用|豆包|doubao-seed-1-6-250615|64.9%|90s|625|3.1|28| |
| 31 | +|商用|豆包|doubao-seed-1-6-250615|64.9%|89s|625|3.1|28| |
32 | 32 | |商用|anthropic|claude-opus-4.5(new)|64.9%|16s|1063|146.1|29| |
33 | 33 | |开源|阿里巴巴|qwen3-next-80b-a3b-instruct|64.6%|67s|1146|3.9|30| |
34 | 34 | |商用|百度|ERNIE-X1.1-Preview(new)|64.5%|174s|2505|9.3|31| |
|
47 | 47 | |开源|月之暗面|kimi-k2-0905(new)|61.8%|80s|998|13.2|44| |
48 | 48 | |开源|智谱AI|GLM-4.5-nothink|61.8%|68s|1263|15.3|45| |
49 | 49 | |商用|XAI|grok-3-mini|61.7%|182s|1526|5.2|46| |
50 | | -|商用|XAI|grok-4-0709|61.2%|293s|2379|241.5|47| |
| 50 | +|商用|XAI|grok-4-0709|61.2%|293s|2376|241.1|47| |
51 | 51 | |商用|百度|ERNIE-4.5-Turbo-32K|61.1%|66s|713|1.8|48| |
52 | 52 | |开源|百度|ERNIE-4.5-300B-A47B|60.8%|133s|592|3.4|49| |
53 | 53 | |商用|百度|ERNIE-X1-Turbo-32K|60.8%|288s|2609|9.7|50| |
54 | 54 | |商用|google|gemini-2.5-flash|60.6%|40s|2586|43.2|51| |
55 | | -|开源|腾讯|Hunyuan-A13B-Instruct|59.8%|119s|2088|7.7|52| |
| 55 | +|开源|腾讯|Hunyuan-A13B-Instruct|59.8%|119s|2068|7.6|52| |
56 | 56 | |开源|minimax|MiniMax-M2(new)|59.6%|56s|2931|23.1|53| |
57 | 57 | |开源|阿里巴巴|Qwen3-30B-A3B-Instruct-2507|59.2%|49s|1158|2.9|54| |
58 | 58 | |开源|openAI|gpt-oss-120b|59.1%|86s|1108|2.9|55| |
|
64 | 64 | |开源|minimax|MiniMax-M1|56.2%|226s|4392|32.0|61| |
65 | 65 | |开源|阿里巴巴|Qwen3-32B|56.2%|110s|2769|10.4|62| |
66 | 66 | |开源|智谱AI|GLM-4.5-Air-nothink|55.8%|64s|1920|10.4|63| |
67 | | -|商用|豆包|doubao-seed-1-6-flash-thinking-250615|55.4%|19s|1712|2.2|64| |
| 67 | +|商用|豆包|doubao-seed-1-6-flash-thinking-250615|55.4%|9s|1712|2.2|64| |
68 | 68 | |开源|阿里巴巴|Qwen3-8B|55.3%|262s|6511|0.0|65| |
69 | | -|商用|科大讯飞|xunfei-spark-x1-0725|55.3%|/|2060|24.6|66| |
| 69 | +|商用|科大讯飞|xunfei-spark-x1-0725|55.3%|/|2057|24.6|66| |
70 | 70 | |商用|阿里巴巴|qwen-turbo-think-2025-07-15|55.3%|/|3132|8.8|67| |
71 | 71 | |商用|智谱AI|GLM-4.5-Flash-nothink|54.8%|34s|1680|0.0|68| |
72 | 72 | |商用|anthropic|claude-haiku-4.5(new)|54.5%|13s|775|18.9|69| |
73 | | -|开源|深度求索|DeepSeek-R1-0528-Qwen3-8B|54.5%|425s|3551|0.0|70| |
| 73 | +|开源|深度求索|DeepSeek-R1-0528-Qwen3-8B|54.5%|426s|3550|0.0|70| |
74 | 74 | |商用|豆包|Doubao-1.5-lite-32k-250115|54.2%|6s|416|0.2|71| |
75 | 75 | |开源|openAI|gpt-oss-20b|54.2%|135s|1984|2.1|72| |
76 | | -|商用|豆包|doubao-seed-1-6-flash-250615|53.4%|7s|614|0.6|73| |
| 76 | +|商用|豆包|doubao-seed-1-6-flash-250615|53.4%|4s|614|0.6|73| |
77 | 77 | |商用|360|360zhinao2-o1|52.9%|/|/|/|74| |
78 | 78 | |开源|阿里巴巴|Qwen3-32B-nothink|52.3%|94s|738|2.3|75| |
79 | 79 | |商用|阿里巴巴|qwen-turbo-2025-07-15|52.3%|46s|713|0.4|76| |
80 | | -|开源|腾讯|Hunyuan-A13B-Instruct-nothink|51.5%|392s|588|1.7|77| |
| 80 | +|开源|腾讯|Hunyuan-A13B-Instruct-nothink|51.5%|394s|588|1.7|77| |
81 | 81 | |开源|阿里巴巴|Qwen3-4B|51.2%|71s|2340|6.4|78| |
82 | 82 | |开源|百度|ERNIE-4.5-21B-A3B|51.1%|65s|812|0.1|79| |
83 | 83 | |开源|minimax|MiniMax-Text-01|49.0%|14s|975|7.3|80| |
84 | 84 | |商用|百川智能|Baichuan4-Turbo|48.3%|/|/|/|81| |
85 | 85 | |开源|阿里巴巴|Qwen3-14B-nothink|47.8%|44s|848|1.3|82| |
86 | 86 | |商用|阿里巴巴|qwen-long-2025-01-25|47.7%|51s|475|0.7|83| |
87 | 87 | |商用|XAI|grok-4-1-fast-non-reasoning(new)|47.6%|60s|685|1.6|84| |
88 | | -|商用|Mistral|mistral-medium-2508|47.0%|159s|751|7.9|85| |
| 88 | +|商用|Mistral|mistral-medium-2508|47.0%|157s|751|7.9|85| |
89 | 89 | |商用|google|gemini-2.5-flash-lite|46.8%|46s|3231|8.9|86| |
90 | 90 | |开源|阿里巴巴|Qwen3-8B-nothink|45.3%|37s|801|0.0|87| |
91 | 91 | |开源|meta|Llama-4-Scout-17B-16E-Instruct|45.3%|13s|590|1.1|88| |
92 | | -|开源|Mistral|Magistral-Small-2507|44.8%|197s|6663|70.6|89| |
| 92 | +|开源|Mistral|Magistral-Small-2507|44.8%|197s|6657|70.5|89| |
93 | 93 | |开源|Mistral|Mistral-Small-3.2-24B-Instruct-2506|42.9%|126s|1208|2.2|90| |
94 | 94 | |开源|阿里巴巴|Qwen3-1.7B|42.7%|62s|2903|8.1|91| |
95 | 95 | |商用|百川智能|Baichuan4-Air|41.6%|/|/|/|92| |
|
0 commit comments