update每周最新模型

jeinlee1991 · jeinlee1991 · commit 3ae816c49ec4 · 2025-09-30T15:14:13.000+08:00
diff --git a/README.md b/README.md
@@ -1,8 +1,8 @@
 
 # ReLE中文大模型能力评测（持续更新）
 - ReLE （**R**eally R**e**liable **L**ive **E**valuation for LLM），原名CLiB
-- 目前已囊括298个大模型，覆盖chatgpt、gpt-5、o4-mini、谷歌gemini-2.5、Claude4、智谱GLM-Z1、文心一言、qwen3-max、百川、讯飞星火、商汤senseChat、minimax等商用模型，
-以及kimi-k2、ernie4.5、minimax-M1、DeepSeek-R1-0528、deepseek-v3.1、qwen3-2507、llama4、phi-4、GLM4.5、gemma3、mistral等开源大模型。
+- 目前已囊括300个大模型，覆盖chatgpt、gpt-5、o4-mini、谷歌gemini-2.5、Claude4、智谱GLM-Z1、文心一言、qwen3-max、百川、讯飞星火、商汤senseChat、minimax等商用模型，
+以及kimi-k2、ernie4.5、minimax-M1、DeepSeek-R1-0528、deepseek-v3.2、qwen3-2507、llama4、GLM4.5、gemma3、mistral等开源大模型。
 - 支持多维度能力评测，包括教育、医疗与心理健康、金融、法律与行政公务、推理与数学计算、语言与指令遵从等6个领域，以及细分的~300个维度（比如牙科、高中语文…）。
 - 不仅提供排行榜，也提供规模**超200万的大模型缺陷库**！方便广大社区研究分析、改进大模型。
 - 为您的私有大模型提供免费评测服务，联系我们：[加微信](#联系我们)
@@ -96,36 +96,7 @@ Qwen3-235B-A22B、Qwen3-235B-A22B-nothink、Qwen3-30B-A3B、Qwen3-30B-A3B-nothin
   - 新增多个大模型：阿里开源Qwen3-30B-A3B-Thinking-2507、阶跃星辰开源step-3、GLM4.5-nothink系列（关闭思考）
   - 删除陈旧的模型：doubao-seed-1-6-thinking-250615、xunfei-spark-x1、SenseChat-5-beta、SenseChat-Turbo-120、
   GLM-4-Flash、GLM-4-Air、qwen-plus-2025-04-28、qwen-turbo-2025-04-28
-- [2025/7/29] v4.7版本
-  - 新增多个大模型：GLM4.5系列、阿里开源Qwen3-30B-A3B-Instruct-2507、Qwen3-nothink系列（关闭思考）
-- [2025/7/26] v4.6版本
-  - 新增2个语言大模型：阿里开源qwen3-235b-a22b-thinking-2507、讯飞闭源xunfei-spark-x1-0725
-  - 删除陈旧的模型：hunyuan-t1-20250529
-- [2025/7/23] v4.5版本
-  - 新增4个语言大模型：阿里开源qwen3-235b-a22b-instruct-2507、阿里闭源qwen-turbo-2025-07-15、阿里闭源qwen-plus-2025-07-14、豆包闭源doubao-seed-1-6-thinking-250715，☛查看[模型完整信息](https://nonelinear.com/static/models.html)
-  - 删除陈旧的模型：Doubao-1.5-thinking-pro
-- [2025/7/17] v4.4版本
-  - 新增各模型在各评测维度的费用信息，详见各维度榜单
-  - 新增2个语言大模型：华为开源模型pangu-pro-moe、腾讯闭源推理模型hunyuan-t1-20250711
-  - 删除陈旧的模型：moonshot-v1-8k、hunyuan-turbo
-- [2025/7/13] v4.3版本
-  - 新增2个语言大模型：首个万亿参数开源模型kimi-k2-0711-preview、Qwen3-235B-A22B-nothink（关闭思考），☛查看[模型完整信息](https://nonelinear.com/static/models.html)
-  - 删除陈旧的模型：gemini-2.5-flash-preview-05-20、gemini-2.5-pro-preview-05-06
-- [2025/7/12] v4.2版本
-  - 多模态评测新增“2025高考（图形题）”，详见[多模态评测](README-多模态评测.md)<br>
-  - 新增2个语言大模型：马斯克xAI的grok-4-0709、grok-3-mini，☛查看[模型完整信息](https://nonelinear.com/static/models.html)
-  - 删除陈旧的模型：DeepSeek-R1（0120）
-- [2025/7/9] v4.1版本
-  - 8大评测领域调整为6大：“心理健康”合并到“医疗与心理健康”、“行政公务”合并到“法律与行政公务”，各模型整体排名有所变化
-  - 新增各模型在各评测维度的耗时、消耗token等信息，详见各维度榜单
-  - 新增3个语言大模型：Gemini2.5系列（gemini-2.5-pro稳定版、gemini-2.5-flash稳定版、gemini-2.5-flash-lite-preview-06-17），☛查看[模型完整信息](https://nonelinear.com/static/models.html)
-  - 新增3个多模态模型：GLM-4.1V-Thinking-FlashX、GLM-4.1V-Thinking-Flash、GLM-4.1V-9B-Thinking，☛查看[模型完整信息](https://nonelinear.com/static/models.html)
-- [2025/7/2] v4.0版本
-  - 首次新增多模态评测：“公式识别”，覆盖常见的数学、物理、化学公式，详见[link](leaderboard/公式识别.md)
-  - 新增4个语言大模型：腾讯首个混合推理模型 Hunyuan-A13B-Instruct、百度ERNIE4.5系列开源模型（ERNIE-4.5-0.3B、ERNIE-4.5-21B-A3B、ERNIE-4.5-300B-A47B），☛查看[模型完整信息](https://nonelinear.com/static/models.html)
-  - 更新数据：各维度新增及更新部分评测数据，各模型相关分数有所更新
-  - 删除陈旧的模型：hunyuan-turbos-20250313、hunyuan-t1-20250321、DeepSeek-R1-Distill-Qwen-7B、DeepSeek-R1-Distill-Llama-8B、DeepSeek-R1-Distill-Llama-70B、qwen-turbo-2025-02-11、qwen-plus-2025-01-25
-- [2025/6/23]v3.33版本，[2025/6/18]v3.32版本，[2025/6/16]v3.31版本，[2025/6/13]v3.30版本，[2025/6/9]v3.29版本，[2025/6/4]v3.28版本，[2025/5/29]v3.27版本，[2025/5/23]v3.26版本，[2025/5/18]v3.25版本，[2025/5/15]v3.24版本，[2025/5/10]v3.23版本，[2025/5/5]v3.22版本，[2025/5/2]v3.21版本，[2025/4/30]v3.20版本，[2025/4/28]v3.19版本，[2025/4/22]v3.18版本，[2025/4/17]v3.17版本，[2025/4/9]v3.16版本，[2025/4/5]v3.15版本，[2025/4/3]v3.14版本，[2025/3/31]v3.13版本，[2025/3/29]v3.12版本，[2025/3/27]v3.11版本，[2025/3/25]v3.10版本，[2025/3/23]v3.9版本，[2025/3/21]v3.8版本，[2025/3/19]v3.7版本，[2025/3/17]v3.6版本，[2025/3/15]v3.5版本，[2025/3/13]v3.4版本，[2025/3/11]v3.3版本，[2025/3/10]v3.2版本，[2025/3/7]v3.1版本，[2025/3/4]v3.0版本，[2025/3/3]v2.22版本，[2025/2/28]v2.21版本，[2025/2/24]v2.20版本，[2025/2/22]v2.19版本，[2025/2/18]v2.18版本，[2025/2/14]v2.17版本，[2025/2/13]v2.16版本，[2025/2/12]v2.15版本，[2025/2/10]v2.14版本，[2025/1/29]v2.13版本，[2025/1/25]v2.12版本，[2025/1/23]v2.11版本，[2025/1/22]v2.10版本，[2025/1/20]v2.9版本，[2025/1/17]v2.8版本，[2025/1/7]v2.7版本
+- [2025/7/29]v4.7版本，[2025/7/26]v4.6版本，[2025/7/23]v4.5版本，[2025/7/17]v4.4版本，[2025/7/13]v4.3版本，[2025/7/12]v4.2版本，[2025/7/9]v4.1版本，[2025/7/2]v4.0版本，[2025/6/23]v3.33版本，[2025/6/18]v3.32版本，[2025/6/16]v3.31版本，[2025/6/13]v3.30版本，[2025/6/9]v3.29版本，[2025/6/4]v3.28版本，[2025/5/29]v3.27版本，[2025/5/23]v3.26版本，[2025/5/18]v3.25版本，[2025/5/15]v3.24版本，[2025/5/10]v3.23版本，[2025/5/5]v3.22版本，[2025/5/2]v3.21版本，[2025/4/30]v3.20版本，[2025/4/28]v3.19版本，[2025/4/22]v3.18版本，[2025/4/17]v3.17版本，[2025/4/9]v3.16版本，[2025/4/5]v3.15版本，[2025/4/3]v3.14版本，[2025/3/31]v3.13版本，[2025/3/29]v3.12版本，[2025/3/27]v3.11版本，[2025/3/25]v3.10版本，[2025/3/23]v3.9版本，[2025/3/21]v3.8版本，[2025/3/19]v3.7版本，[2025/3/17]v3.6版本，[2025/3/15]v3.5版本，[2025/3/13]v3.4版本，[2025/3/11]v3.3版本，[2025/3/10]v3.2版本，[2025/3/7]v3.1版本，[2025/3/4]v3.0版本，[2025/3/3]v2.22版本，[2025/2/28]v2.21版本，[2025/2/24]v2.20版本，[2025/2/22]v2.19版本，[2025/2/18]v2.18版本，[2025/2/14]v2.17版本，[2025/2/13]v2.16版本，[2025/2/12]v2.15版本，[2025/2/10]v2.14版本，[2025/1/29]v2.13版本，[2025/1/25]v2.12版本，[2025/1/23]v2.11版本，[2025/1/22]v2.10版本，[2025/1/20]v2.9版本，[2025/1/17]v2.8版本，[2025/1/7]v2.7版本
 - 2024年：[2024/12/28]v2.6版本，[2024/12/27]v2.5版本，[2024/12/25]v2.4版本, [2024/10/20]v2.3版本，[2024/9/29]v2.2版本，[2024/8/27]v2.1版本，[2024/8/7]v2.0版本，[2024/7/26]v1.21版本，[2024/7/15]v1.20版本，[2024/6/29]v1.19版本，[2024/6/2]v1.18版本，[2024/5/8]v1.17版本，[2024/4/13]v1.16版本，[2024/3/20]v1.15版本，[2024/2/28]v1.14版本，[2024/1/29]v1.13版本
 - 2023年：[2023/12/10]v1.12版本，[2023/11/22]v1.11版本，[2023/11/5]v1.10版本，[2023/10/11]v1.9版本，[2023/9/13]v1.8版本，[2023/8/29]v1.7版本，[2023/8/13]v1.6版本，[2023/7/26]v1.5版本， [2023/7/18]v1.4版本， [2023/7/2]v1.3版本， [2023/6/17]v1.2版， [2023/6/10]v1.1版本， [2023/6/4]v1版本
 
@@ -138,9 +109,9 @@ Qwen3-235B-A22B、Qwen3-235B-A22B-nothink、Qwen3-30B-A3B、Qwen3-30B-A3B-nothin
 |------------------------------------------------------------------------------------|-------|--------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | [langfuse](https://github.com/langfuse/langfuse)                                   | 14.9k | 国外     | Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23                                                                     |
 | [opik](https://github.com/comet-ml/opik)                                           | 12.5k | 国外     | Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.                                                                                              |
-| [ragas](https://github.com/explodinggradients/ragas)                               | 10.3k | 国外     | Supercharge Your LLM Application Evaluations 🚀                                                                                                                                                                                                                         |
+| [deepeval](https://github.com/confident-ai/deepeval)                      | 11.3k | 国外     | The LLM Evaluation Framework                                                                                                                                                                                                                                            |
 |……|……|……|……|
-| [⭐chinese-llm-benchmark（我们）](https://github.com/jeinlee1991/chinese-llm-benchmark) | 4.7k  | **国内** | ReLE中文大模型能力评测（持续更新） |                                                                                               |
+| [⭐chinese-llm-benchmark（我们）](https://github.com/jeinlee1991/chinese-llm-benchmark) | 4.9k  | **国内** | ReLE中文大模型能力评测（持续更新） |                                                                                               |
 |……|……|……|……|
 
 详见[hot50](GitHub热门评测repo.md)
@@ -149,9 +120,9 @@ Qwen3-235B-A22B、Qwen3-235B-A22B-nothink、Qwen3-30B-A3B、Qwen3-30B-A3B-nothin
 
 # 大模型基本信息
 - [每周最新模型](每周最新模型.md)
+  - [9月22~9月28](每周最新模型.md#9月229月28)
   - [9月15~9月21](每周最新模型.md#9月159月21)
   - [9月8~9月14](每周最新模型.md#9月89月14)
-  - [9月1~9月7](每周最新模型.md#9月19月7)
 - 更多信息详见[模型列表](https://nonelinear.com/static/models.html)
 <br><br>
 
diff --git a/每周最新模型.md b/每周最新模型.md
@@ -1,4 +1,5 @@
 ## 目录
+- [9月22~9月28](#9月229月28)
 - [9月15~9月21](#9月159月21)
 - [9月8~9月14](#9月89月14)
 - [9月1~9月7](#9月19月7)
@@ -17,6 +18,40 @@
 - [6月2~6月8](#6月26月8)
 <br><br>
 
+
+## 9月22~9月28
+### 9月28日
+- 【开源】腾讯混元发布混元图像3.0（HunyuanImage 3.0），首个工业级原生多模态生图模型，参数规模80B，目前效果最好、参数量最大的开源生图模型。详情请参见https://modelscope.cn/models/Tencent-Hunyuan/HunyuanImage-3.0
+
+### 9月26日
+- 【闭源】腾讯混元发布hunyuan-turbos-20250926，理科类平均提升10.9%（数学能力提升13.8%，逻辑推理提升12.3%），文科类写作、知识问答、Agent领域提升约2%。详情请参见https://cloud.tencent.com/document/product/1729/104753
+
+### 9月25日
+- 【闭源】谷歌发布Gemini Robotics-ER 1.5预览版模型，专为机器人技术应用设计。详情请参见https://ai.google.dev/gemini-api/docs/robotics-overview
+- 【闭源】谷歌发布gemini-2.5-flash-preview-09-2025和gemini-2.5-flash-lite-preview-09-2025两款预览模型。详情请参见https://ai.google.dev/gemini-api/docs/models
+
+### 9月23日
+- 【闭源｜语音识别】阿里发布语音识别模型fun-asr-realtime，集成创新RAG技术，支持大规模热词自定义、ITN规范化、标点预测等，显著提升识别准确率与语境贴合度，支持中英文自由切换，具备更强噪声鲁棒性。详情请参见https://help.aliyun.com/zh/model-studio/real-time-speech-recognition
+- 【闭源｜多模态向量】阿里发布多模态向量模型tongyi-embedding-vision-plus、tongyi-embedding-vision-flash，基于Qwen系列大语言模型构建，增强视觉向量化能力，支持文字、图像、视频三种模态。详情请参见https://help.aliyun.com/zh/model-studio/embedding
+- 【闭源｜代码】阿里发布代码模型qwen3-coder-plus-2025-09-23，基于Qwen3，在下游任务效果、工具调用鲁棒性及代码安全性方面较上一版本提升。详情请参见https://help.aliyun.com/zh/model-studio/qwen-coder
+- 【闭源｜文生图】阿里发布文生图模型qwen-image-plus，复杂文本渲染突出，支持中英文及复杂图文混合布局，价格优于qwen-image。详情请参见https://help.aliyun.com/zh/model-studio/qwen-image-api
+- 【闭源｜文生文】阿里发布文生文模型qwen3-max、qwen3-max-2025-09-23，相较preview版在智能体编程与工具调用方向专项升级，达领域SOTA水平。详情请参见https://help.aliyun.com/zh/model-studio/models
+- 【闭源｜视觉推理】阿里发布视觉推理模型qwen3-vl-plus、qwen3-vl-plus-2025-09-23，融合思考/非思考模式，视觉智能体能力世界顶尖，视觉编码、空间感知、多模态思考全面升级。详情请参见https://help.aliyun.com/zh/model-studio/vision
+- 【闭源｜文生图】阿里发布文生图模型wan2.5-t2i-preview，取消单边限制，总像素面积与宽高比约束内可自由选尺寸。详情请参见https://help.aliyun.com/zh/model-studio/text-to-image-v2-api-reference
+- 【闭源｜图像编辑】阿里发布图像编辑模型wan2.5-i2i-preview，支持文本、单图或多图输入，实现主体一致性编辑、多图融合与组图生成。详情请参见https://help.aliyun.com/zh/model-studio/wan2-5-image-edit-api-reference
+- 【闭源｜文生视频】阿里发布文生视频模型wan2.5-t2v-preview，新增音频能力，支持自动配音或自定义音频文件，实现音画同步。详情请参见https://help.aliyun.com/zh/model-studio/text-to-video-api-reference
+- 【闭源｜图生视频】阿里发布图生视频模型wan2.5-i2v-preview，新增音频能力，支持自动配音或自定义音频文件，实现音画同步。详情请参见https://help.aliyun.com/zh/model-studio/image-to-video-api-reference
+- 【闭源】谷歌发布gemini-2.5-flash-native-audio-preview-09-2025，Live API原生音频模型，改进函数调用与语音截断处理。详情请参见https://ai.google.dev/gemini-api/docs/live-guide
+- 【开源】阿里开源Qwen3-VL-235B-A22B-Instruct与Qwen3-VL-235B-A22B-Thinking，迄今最强Qwen视觉语言模型，支持256K上下文可扩展至1M，新增视觉代理与视觉编码能力。详情请参见https://modelscope.cn/models/Qwen/Qwen3-VL-235B-A22B-Instruct与https://modelscope.cn/models/Qwen/Qwen3-VL-235B-A22B-Thinking
+
+### 9月22日
+- 【闭源｜全模态】阿里发布qwen3-omni-flash、qwen3-omni-flash-realtime模型，Qwen3系列多模态模型，高效理解文本、图像、音频、视频，支持119种语言文本交互，具备卓越指令跟随与系统提示定制能力，可用于语音助手、多媒体分析、内容创作等。详情请参见https://help.aliyun.com/zh/model-studio/qwen-omni
+- 【闭源｜语音合成】阿里通义团队发布qwen3-tts-flash、qwen3-tts-flash-realtime，最新离线语音合成大模型，17种高表现力拟人音色，低延迟高稳定合成，支持多语言与方言。详情请参见https://help.aliyun.com/zh/model-studio/qwen-tts
+- 【闭源｜音视频翻译】阿里发布音视频实时翻译模型qwen3-livetranslate-flash-realtime-2025-09-22，可识别18种语言并实时翻译为10种语言音频。详情请参见https://help.aliyun.com/zh/model-studio/qwen3-livetranslate-flash-realtime
+- 【开源】美团发布LongCat-Flash-Thinking高效推理模型，在逻辑、数学、代码、智能体等领域达全球开源SOTA，国内首个同时具备“深度思考+工具调用”与“非形式化+形式化”推理能力的模型。详情请参见https://github.com/meituan-longcat/LongCat-Flash-Thinking
+<br><br>
+
+
 ## 9月15~9月21
 ### 9月19日
 - 【闭源】百度发布ERNIE-4.5-21B-A3B-Thinking轻量级深度思考模型，专注于提升推理质量和深度，在逻辑推理、数学、科学、编码和文本生成等任务上性能显著提升。详情请参见https://cloud.baidu.com/doc/WENXINWORKSHOP/s/flxu4ej5u