Skip to content

[MLPerf 6.1][GRPO] Add RLVR GRPO reference #890

Open
jepio wants to merge 24 commits into
mlcommons:masterfrom
CarlosGomes98:swe_grpo
Open

[MLPerf 6.1][GRPO] Add RLVR GRPO reference #890
jepio wants to merge 24 commits into
mlcommons:masterfrom
CarlosGomes98:swe_grpo

Conversation

@jepio

@jepio jepio commented Jun 26, 2026

Copy link
Copy Markdown

Add a new GRPO MLPerf Training reference benchmark: a Qwen 3.5 397B A17B MoE model trained using RLVR (Reinforcement Learning with Verifiable Rewards) and Group Relative Policy Optimization (GRPO) on the software engineering R2E-Gym dataset using the OpenHands agent environment. The reference implementation is carried as a git submodule under llm_moe_grpo/RL.

What this adds

  • Model / framework: Qwen 3.5 397B A17B / NeMo RL + NeMo Gym
  • Dataset pipeline: dataset consists of JSONL files specifying test cases + singularity container files for each test case (environment). README includes instructions for how to build the dataset. Singularity container files are CPU architecture specific.
  • Hardware support: NVIDIA GB200 validated. Multi-node SLURM launcher, requires Enroot/Pyxis.
  • MLPerf compliance: mllog integrated in code; RCP and target accuracy need to be agreed.
  • Reference packaging: README, all helper scripts contained within the reference submodule

@jepio jepio requested a review from a team as a code owner June 26, 2026 13:06
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@jepio

jepio commented Jun 26, 2026

Copy link
Copy Markdown
Author

recheck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants