feat: per-dataset max_new_tokens override by roborluo · Pull Request #356 · mlcommons/endpoints

roborluo · 2026-06-13T05:10:13Z

What does this PR do?

When running a combined performance + accuracy benchmark in a single --mode both invocation, the two phases want opposite generation caps, but today the harness only exposes one global model_params.max_new_tokens:

Performance phase needs a small cap. max_new_tokens is sent to the server as the per-request max_tokens, and a disaggregated decode scheduler reserves/plans decode-KV for that declared upper bound — even though generation actually stops at EOS far sooner. A large cap (e.g. 32768) over-reserves decode KV (~3.2× vs 10240), starves admittable decode slots at high concurrency, and triggers KV-transfer-timeout storms on the context→gen path. A realistic small cap avoids this.
Accuracy phase needs a large cap, otherwise long reasoning outputs get truncated and scores are artificially deflated. This matches the MLPerf Inference gpt-oss-120b reference, where the performance and accuracy workloads use different token settings — see language/gpt-oss-120b → Model and Dataset download. Without a per-dataset override, you cannot satisfy both in one --mode both run.

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

N/A

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

github-actions · 2026-06-13T05:10:21Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist

Code Review

This pull request introduces a per-dataset max_new_tokens override capability to allow performance and accuracy datasets to use different token limits, falling back to the global model_params when unset. The feedback suggests encapsulating the override logic into a helper method get_model_params on the Dataset class to eliminate code duplication across the accuracy and performance dataset loading paths, and adding corresponding unit tests for this helper.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-13T05:11:07Z

+        # Per-dataset max_new_tokens override (falls back to global model_params).
+        acc_model_params = (
+            config.model_params
+            if acc_cfg.max_new_tokens is None
+            else config.model_params.model_copy(
+                update={"max_new_tokens": acc_cfg.max_new_tokens}
+            )
+        )


Use the new get_model_params helper method on the Dataset configuration model to simplify the override logic and eliminate duplication.

Suggested change

# Per-dataset max_new_tokens override (falls back to global model_params).

acc_model_params = (

config.model_params

if acc_cfg.max_new_tokens is None

else config.model_params.model_copy(

update={"max_new_tokens": acc_cfg.max_new_tokens}

)

)

acc_model_params = acc_cfg.get_model_params(config.model_params)

gemini-code-assist · 2026-06-13T05:11:07Z

+        # Per-dataset max_new_tokens override (falls back to global model_params).
+        perf_model_params = (
+            config.model_params
+            if perf_cfg.max_new_tokens is None
+            else config.model_params.model_copy(
+                update={"max_new_tokens": perf_cfg.max_new_tokens}
+            )
+        )


Use the new get_model_params helper method on the Dataset configuration model to simplify the override logic and eliminate duplication.

Suggested change

# Per-dataset max_new_tokens override (falls back to global model_params).

perf_model_params = (

config.model_params

if perf_cfg.max_new_tokens is None

else config.model_params.model_copy(

update={"max_new_tokens": perf_cfg.max_new_tokens}

)

)

perf_model_params = perf_cfg.get_model_params(config.model_params)

arekay-nv · 2026-06-13T13:54:19Z

@roborluo Can you look at #344 which addresses the same issue. We can consolidate the two here and merge this one. I think the other one has also the templates correctly populated which you are failing in CI.

…get_model_params Address review feedback on PR mlcommons#356: - Add Dataset.get_model_params(model_params) helper that applies the per-dataset max_new_tokens override (falls back to the global model_params when unset), removing the duplicated override logic from both call sites in benchmark/execute.py. - Add unit tests for the helper (fallback + override + frozen-source preservation). - Regenerate *_template_full.yaml (the new Dataset.max_new_tokens field now appears as `max_new_tokens: null` in the dataset block; the model_params comment drops because the field name now collides across ModelParams/Dataset and the comment generator skips ambiguous names). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add an optional per-dataset `max_new_tokens` that overrides the global `model_params.max_new_tokens` (sent as the per-request max_tokens). Lets a performance dataset use a small cap (avoiding server-side KV over-reservation/overload at high concurrency) while accuracy datasets use a larger cap (avoiding truncation of long reasoning output). Falls back to the global value when unset. - schema: add Dataset.max_new_tokens (gt=0) and a Dataset.get_model_params() helper that applies the override, keeping the logic in one place. - benchmark/execute: both the accuracy and performance load paths use the helper instead of duplicating the override. - tests: per-dataset field validation + get_model_params() fallback/override. - templates: regenerate *_template_full.yaml for the new field. - chore: bump aiohttp 3.14.0 -> 3.14.1 to clear pip-audit CVEs (CVE-2026-54273..54280). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

roborluo requested a review from a team as a code owner June 13, 2026 05:10

gemini-code-assist Bot reviewed Jun 13, 2026

View reviewed changes

roborluo force-pushed the dev-bofengl-per-dataset-max-new-tokens branch from 83e8461 to 458b8fa Compare June 16, 2026 18:17

nvzhihanj merged commit bbc8697 into mlcommons:release/v0.5 Jun 16, 2026
7 checks passed

github-actions Bot locked and limited conversation to collaborators Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: per-dataset max_new_tokens override#356

feat: per-dataset max_new_tokens override#356
nvzhihanj merged 1 commit into
mlcommons:release/v0.5from
roborluo:dev-bofengl-per-dataset-max-new-tokens

roborluo commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Uh oh!

Uh oh!

arekay-nv commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

roborluo commented Jun 13, 2026

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arekay-nv commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 13, 2026 •

edited

Loading