[BugFix] Enable moe_gate_fp32 using FD_ENABLE_RL#7130
[BugFix] Enable moe_gate_fp32 using FD_ENABLE_RL#7130Sunny-bot1 wants to merge 2 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-01 19:35 UTC
📋 Review 摘要
PR 概述:通过 FD_ENABLE_RL 环境变量控制 RL 场景下 moe gate 使用 fp32 计算
变更范围:engine/args_utils.py、envs.py、model_executor/models/
影响面 Tag:[RL] [Models] [Engine]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 兼容性 | glm4_moe.py:153 |
移除 dynamic_load_weight 条件导致原有 RL 用户行为变更 |
| 🟡 建议 | envs.py:269 |
注释提到 RoPE 精度对齐,但代码中未体现 |
总体评价
逻辑上改用环境变量控制是合理的,但移除 dynamic_load_weight 条件是 breaking change:原先使用 dynamic_load_weight=True 的用户无需额外配置即可获得 fp32 行为,现在必须显式设置 FD_ENABLE_RL=1。建议在 __post_init__ 中同时检查 dynamic_load_weight 以保持向后兼容。
| weight_dtype=( | ||
| "float32" if fd_config.load_config.dynamic_load_weight or fd_config.model_config.moe_gate_fp32 else "" | ||
| ), | ||
| weight_dtype=("float32" if fd_config.model_config.moe_gate_fp32 else ""), |
There was a problem hiding this comment.
🔴 兼容性 移除 dynamic_load_weight 条件是 breaking change。
原有逻辑中,当 load_config.dynamic_load_weight=True 时会自动启用 fp32。本次变更后,这些用户必须显式设置 FD_ENABLE_RL=1 才能保持原有行为,可能导致现有 RL 训练流程出现精度不一致问题。
建议:在 args_utils.py 的 __post_init__ 中同时检查两个条件:
if envs.FD_ENABLE_RL == 1 or self.dynamic_load_weight:
self.moe_gate_fp32 = True或者在 PR 描述中明确说明此为 intentional breaking change,并在文档中注明迁移方式。
| "FD_SAVE_OUTPUT_CACHE_FOR_PREEMPTED_REQUEST": lambda: bool( | ||
| int(os.getenv("FD_SAVE_OUTPUT_CACHE_FOR_PREEMPTED_REQUEST", "1")) | ||
| ), | ||
| # Whether to align RoPE and moe gate precision with training |
There was a problem hiding this comment.
🟡 建议 注释中提到 "Whether to align RoPE and moe gate precision with training",但当前代码只处理了 moe gate 精度,RoPE 部分未体现。
如果后续会添加 RoPE 相关逻辑,建议保留此注释;否则建议修改为更准确的描述:
# Whether to enable RL mode (moe gate uses fp32 precision)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7130 +/- ##
==========================================
Coverage ? 73.63%
==========================================
Files ? 376
Lines ? 52861
Branches ? 8250
==========================================
Hits ? 38926
Misses ? 11211
Partials ? 2724
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
解决 #6457 导致的RL下moe gate加载权重类型不一致问题
Modifications
FD_ENABLE_RL=1:FD已有环境变量,对齐训练RoPE精度复用环境变量
FD_ENABLE_RL设置GLM & Qwen模型 moe gate使用fp32计算,也可通过部署时指定--moe-gate-fp32,若以上两者均未设置则按bf16计算。Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.