[MLPerf 6.1][GRPO] Add RLVR GRPO reference by jepio · Pull Request #890 · mlcommons/training

jepio · 2026-06-26T13:06:07Z

Add a new GRPO MLPerf Training reference benchmark: a Qwen 3.5 397B A17B MoE model trained using RLVR (Reinforcement Learning with Verifiable Rewards) and Group Relative Policy Optimization (GRPO) on the software engineering R2E-Gym dataset using the OpenHands agent environment. The reference implementation is carried as a git submodule under llm_moe_grpo/RL.

What this adds

Model / framework: Qwen 3.5 397B A17B / NeMo RL + NeMo Gym
Dataset pipeline: dataset consists of JSONL files specifying test cases + singularity container files for each test case (environment). README includes instructions for how to build the dataset. Singularity container files are CPU architecture specific.
Hardware support: NVIDIA GB200 validated. Multi-node SLURM launcher, requires Enroot/Pyxis.
MLPerf compliance: mllog integrated in code; RCP and target accuracy need to be agreed.
Reference packaging: README, all helper scripts contained within the reference submodule

Add data instructions

…uctions update data instructions

Use placeholders for some parameters that still need to be fixed.

* explicitly mention RLVR * expand GRPO * remove commit

update swe rl grpo benchmark definition for Qwen 3.5

github-actions · 2026-06-26T13:06:17Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

jepio · 2026-06-26T13:25:16Z

recheck

CarlosGomes98 and others added 22 commits June 17, 2026 17:36

Add reasoning benchmark scaffold

0985c39

Remove setting seed

0a27166

Update reasoning submodule pointer

ab73c22

Document GPUs per node for reasoning

743c959

add data processing instructions

148377c

update readme

811010c

update readme

5612b77

update readme

32cc9b9

Merge pull request #1 from filaretov/add-data-processing-instructions

6a70c41

Add data instructions

update submodule

7734f53

remove unnecessary parts of readme

3d527bc

update data instructions

be8e059

Merge pull request #2 from CarlosGomes98/hfilaretov/update-data-instr…

14cbdca

…uctions update data instructions

Update SWE GRPO README

ebdcd09

Use placeholders for some parameters that still need to be fixed.

Update submodule to mlperf-training-qwen35 branch

35bd44b

Update SWE GRPO paths for Qwen3.5 branch

ae9b0c3

Add software versions

a8e0066

Rephrase first paragraph

f3c219e

* explicitly mention RLVR * expand GRPO * remove commit

Address review comments and add note on reasoning

34fc67d

Rename benchmark to llm_moe_grpo

a852491

Merge pull request #4 from jepio/swe_grpo_update1

89275ad

update swe rl grpo benchmark definition for Qwen 3.5

Add index placeholder for GRPO benchmark

2c38036

jepio requested a review from a team as a code owner June 26, 2026 13:06

jepio added 2 commits June 26, 2026 15:53

llm_moe_grpo: Update reference and sync README with it

dbaeb7e

llm_moe_grpo: Update submodule reference to NVIDIA-NeMo org

fc2542a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLPerf 6.1][GRPO] Add RLVR GRPO reference #890

[MLPerf 6.1][GRPO] Add RLVR GRPO reference #890
jepio wants to merge 24 commits into
mlcommons:masterfrom
CarlosGomes98:swe_grpo

jepio commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

jepio commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jepio commented Jun 26, 2026

What this adds

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jepio commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 26, 2026 •

edited

Loading