feat: add truncate-results command#354
Conversation
Perf+accuracy runs store every query's full response text under `responses` in results.json, which can reach gigabytes. `inference-endpoint truncate-results <results.json>` shrinks it: keep the first --keep-n (default 5) responses verbatim and replace the rest with a `truncation` block holding a sha256 of every response (proof of work) plus counts. Writes <name>.truncated.json by default, or --output PATH / --in-place. A perf-only results.json (no `responses`) passes through unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request introduces a new truncate-results command to shrink large benchmark results.json files by keeping a specified number of full responses and replacing the rest with SHA-256 hashes. The changes include the command implementation, CLI registration, documentation updates, and unit tests. The review feedback suggests improving memory efficiency by using file-like objects with json.load and json.dump instead of reading the entire file into memory, and enhancing robustness by validating that the responses section is a dictionary and handling potential negative values for the keep count.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| data = json.loads(config.results.read_text()) | ||
| truncated = truncate_results_dict(data, keep_n=config.keep_n) | ||
|
|
||
| if config.in_place: | ||
| out_path = config.results | ||
| elif config.output is not None: | ||
| out_path = config.output | ||
| else: | ||
| out_path = config.results.with_name(config.results.stem + ".truncated.json") | ||
|
|
||
| out_path.write_text(json.dumps(truncated, indent=2)) |
There was a problem hiding this comment.
Since this command is specifically designed to handle potentially gigabyte-sized results.json files, reading the entire file into memory as a string with read_text() and then parsing/serializing with json.loads()/json.dumps() can lead to high memory usage or Out-Of-Memory (OOM) errors.
Using json.load() and json.dump() with file objects avoids loading the entire file as a single Python string, significantly reducing the memory footprint.
| data = json.loads(config.results.read_text()) | |
| truncated = truncate_results_dict(data, keep_n=config.keep_n) | |
| if config.in_place: | |
| out_path = config.results | |
| elif config.output is not None: | |
| out_path = config.output | |
| else: | |
| out_path = config.results.with_name(config.results.stem + ".truncated.json") | |
| out_path.write_text(json.dumps(truncated, indent=2)) | |
| with open(config.results, "r", encoding="utf-8") as f: | |
| data = json.load(f) | |
| truncated = truncate_results_dict(data, keep_n=config.keep_n) | |
| if config.in_place: | |
| out_path = config.results | |
| elif config.output is not None: | |
| out_path = config.output | |
| else: | |
| out_path = config.results.with_name(config.results.stem + ".truncated.json") | |
| with open(out_path, "w", encoding="utf-8") as f: | |
| json.dump(truncated, f, indent=2) |
| responses = results.get("responses") | ||
| if not responses: | ||
| return dict(results) | ||
|
|
||
| uuids = list(responses.keys()) | ||
| kept = uuids[:keep_n] |
There was a problem hiding this comment.
To make truncate_results_dict more robust and prevent potential runtime exceptions:
- Ensure
responsesis actually a dictionary before calling.keys()or.items(). If it's of an unexpected type (e.g., a list or string due to malformed input), calling.keys()would raise anAttributeError. - Guard against negative values of
keep_n. Ifkeep_nis negative, Python's slice notationuuids[:keep_n]will slice from the end of the list (e.g.,keep_n = -1would keep all but the last response), which is likely unintended.
| responses = results.get("responses") | |
| if not responses: | |
| return dict(results) | |
| uuids = list(responses.keys()) | |
| kept = uuids[:keep_n] | |
| responses = results.get("responses") | |
| if not isinstance(responses, dict) or not responses: | |
| return dict(results) | |
| uuids = list(responses.keys()) | |
| kept = uuids[:max(0, keep_n)] |
|
Converting to draft right now, we need to finalize the accuracy format before adding the truncation. |
Split out from #353 for easier review. Targets
release/v0.5.What & why
In perf+accuracy mode,
results.jsonstores every query's full response text underresponsesand can reach gigabytes — too large to attach to a submission or share as proof of work.Changes
New command:
inference-endpoint truncate-results <results.json> [--keep-n N] [--output PATH | --in-place].--keep-n(default 5) responses verbatim.truncationblock: asha256of every response keyed bysample_uuid, plusresponses_truncated,hash_algorithm,n_responses_total,n_responses_kept. This proves which outputs were produced without the bulk.config/results/accuracy_scores/errorsare preserved verbatim.results.json(noresponses) passes through unchanged.<name>.truncated.jsonby default;--output PATHor--in-placeto choose otherwise. The pure transform never mutates its input.Tests
Unit tests cover: first-N-full + every-response-hashed + metadata; non-response sections preserved; input not mutated;
keep_nexceeding total; perf-only passthrough; and the CLI writing a truncated copy (leaving the original intact) vs--in-place.pre-commitgreen.🤖 Generated with Claude Code