feat: per-dataset max_new_tokens override#356
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request introduces a per-dataset max_new_tokens override capability to allow performance and accuracy datasets to use different token limits, falling back to the global model_params when unset. The feedback suggests encapsulating the override logic into a helper method get_model_params on the Dataset class to eliminate code duplication across the accuracy and performance dataset loading paths, and adding corresponding unit tests for this helper.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| # Per-dataset max_new_tokens override (falls back to global model_params). | ||
| acc_model_params = ( | ||
| config.model_params | ||
| if acc_cfg.max_new_tokens is None | ||
| else config.model_params.model_copy( | ||
| update={"max_new_tokens": acc_cfg.max_new_tokens} | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Use the new get_model_params helper method on the Dataset configuration model to simplify the override logic and eliminate duplication.
| # Per-dataset max_new_tokens override (falls back to global model_params). | |
| acc_model_params = ( | |
| config.model_params | |
| if acc_cfg.max_new_tokens is None | |
| else config.model_params.model_copy( | |
| update={"max_new_tokens": acc_cfg.max_new_tokens} | |
| ) | |
| ) | |
| acc_model_params = acc_cfg.get_model_params(config.model_params) |
| # Per-dataset max_new_tokens override (falls back to global model_params). | ||
| perf_model_params = ( | ||
| config.model_params | ||
| if perf_cfg.max_new_tokens is None | ||
| else config.model_params.model_copy( | ||
| update={"max_new_tokens": perf_cfg.max_new_tokens} | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Use the new get_model_params helper method on the Dataset configuration model to simplify the override logic and eliminate duplication.
| # Per-dataset max_new_tokens override (falls back to global model_params). | |
| perf_model_params = ( | |
| config.model_params | |
| if perf_cfg.max_new_tokens is None | |
| else config.model_params.model_copy( | |
| update={"max_new_tokens": perf_cfg.max_new_tokens} | |
| ) | |
| ) | |
| perf_model_params = perf_cfg.get_model_params(config.model_params) |
…get_model_params Address review feedback on PR mlcommons#356: - Add Dataset.get_model_params(model_params) helper that applies the per-dataset max_new_tokens override (falls back to the global model_params when unset), removing the duplicated override logic from both call sites in benchmark/execute.py. - Add unit tests for the helper (fallback + override + frozen-source preservation). - Regenerate *_template_full.yaml (the new Dataset.max_new_tokens field now appears as `max_new_tokens: null` in the dataset block; the model_params comment drops because the field name now collides across ModelParams/Dataset and the comment generator skips ambiguous names). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an optional per-dataset `max_new_tokens` that overrides the global `model_params.max_new_tokens` (sent as the per-request max_tokens). Lets a performance dataset use a small cap (avoiding server-side KV over-reservation/overload at high concurrency) while accuracy datasets use a larger cap (avoiding truncation of long reasoning output). Falls back to the global value when unset. - schema: add Dataset.max_new_tokens (gt=0) and a Dataset.get_model_params() helper that applies the override, keeping the logic in one place. - benchmark/execute: both the accuracy and performance load paths use the helper instead of duplicating the override. - tests: per-dataset field validation + get_model_params() fallback/override. - templates: regenerate *_template_full.yaml for the new field. - chore: bump aiohttp 3.14.0 -> 3.14.1 to clear pip-audit CVEs (CVE-2026-54273..54280). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
83e8461 to
458b8fa
Compare
What does this PR do?
When running a combined performance + accuracy benchmark in a single --mode both invocation, the two phases want opposite generation caps, but today the harness only exposes one global model_params.max_new_tokens:
Type of change
Related issues
N/A
Testing
Checklist