Skip to content

feat: add truncate-results command#354

Draft
nvzhihanj wants to merge 1 commit into
release/v0.5from
feat/truncate-results
Draft

feat: add truncate-results command#354
nvzhihanj wants to merge 1 commit into
release/v0.5from
feat/truncate-results

Conversation

@nvzhihanj

Copy link
Copy Markdown
Collaborator

Split out from #353 for easier review. Targets release/v0.5.

What & why

In perf+accuracy mode, results.json stores every query's full response text under responses and can reach gigabytes — too large to attach to a submission or share as proof of work.

Changes

New command: inference-endpoint truncate-results <results.json> [--keep-n N] [--output PATH | --in-place].

  • Keeps the first --keep-n (default 5) responses verbatim.
  • Adds a truncation block: a sha256 of every response keyed by sample_uuid, plus responses_truncated, hash_algorithm, n_responses_total, n_responses_kept. This proves which outputs were produced without the bulk.
  • config / results / accuracy_scores / errors are preserved verbatim.
  • A perf-only results.json (no responses) passes through unchanged.
  • Output: <name>.truncated.json by default; --output PATH or --in-place to choose otherwise. The pure transform never mutates its input.

Tests

Unit tests cover: first-N-full + every-response-hashed + metadata; non-response sections preserved; input not mutated; keep_n exceeding total; perf-only passthrough; and the CLI writing a truncated copy (leaving the original intact) vs --in-place.

pre-commit green.

🤖 Generated with Claude Code

Perf+accuracy runs store every query's full response text under `responses`
in results.json, which can reach gigabytes. `inference-endpoint
truncate-results <results.json>` shrinks it: keep the first --keep-n
(default 5) responses verbatim and replace the rest with a `truncation` block
holding a sha256 of every response (proof of work) plus counts. Writes
<name>.truncated.json by default, or --output PATH / --in-place. A perf-only
results.json (no `responses`) passes through unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@nvzhihanj nvzhihanj requested a review from a team as a code owner June 12, 2026 06:46
@github-actions

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions Bot requested a review from arekay-nv June 12, 2026 06:46

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new truncate-results command to shrink large benchmark results.json files by keeping a specified number of full responses and replacing the rest with SHA-256 hashes. The changes include the command implementation, CLI registration, documentation updates, and unit tests. The review feedback suggests improving memory efficiency by using file-like objects with json.load and json.dump instead of reading the entire file into memory, and enhancing robustness by validating that the responses section is a dictionary and handling potential negative values for the keep count.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +104 to +114
data = json.loads(config.results.read_text())
truncated = truncate_results_dict(data, keep_n=config.keep_n)

if config.in_place:
out_path = config.results
elif config.output is not None:
out_path = config.output
else:
out_path = config.results.with_name(config.results.stem + ".truncated.json")

out_path.write_text(json.dumps(truncated, indent=2))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Since this command is specifically designed to handle potentially gigabyte-sized results.json files, reading the entire file into memory as a string with read_text() and then parsing/serializing with json.loads()/json.dumps() can lead to high memory usage or Out-Of-Memory (OOM) errors.

Using json.load() and json.dump() with file objects avoids loading the entire file as a single Python string, significantly reducing the memory footprint.

Suggested change
data = json.loads(config.results.read_text())
truncated = truncate_results_dict(data, keep_n=config.keep_n)
if config.in_place:
out_path = config.results
elif config.output is not None:
out_path = config.output
else:
out_path = config.results.with_name(config.results.stem + ".truncated.json")
out_path.write_text(json.dumps(truncated, indent=2))
with open(config.results, "r", encoding="utf-8") as f:
data = json.load(f)
truncated = truncate_results_dict(data, keep_n=config.keep_n)
if config.in_place:
out_path = config.results
elif config.output is not None:
out_path = config.output
else:
out_path = config.results.with_name(config.results.stem + ".truncated.json")
with open(out_path, "w", encoding="utf-8") as f:
json.dump(truncated, f, indent=2)

Comment on lines +52 to +57
responses = results.get("responses")
if not responses:
return dict(results)

uuids = list(responses.keys())
kept = uuids[:keep_n]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To make truncate_results_dict more robust and prevent potential runtime exceptions:

  1. Ensure responses is actually a dictionary before calling .keys() or .items(). If it's of an unexpected type (e.g., a list or string due to malformed input), calling .keys() would raise an AttributeError.
  2. Guard against negative values of keep_n. If keep_n is negative, Python's slice notation uuids[:keep_n] will slice from the end of the list (e.g., keep_n = -1 would keep all but the last response), which is likely unintended.
Suggested change
responses = results.get("responses")
if not responses:
return dict(results)
uuids = list(responses.keys())
kept = uuids[:keep_n]
responses = results.get("responses")
if not isinstance(responses, dict) or not responses:
return dict(results)
uuids = list(responses.keys())
kept = uuids[:max(0, keep_n)]

@nvzhihanj nvzhihanj marked this pull request as draft June 22, 2026 05:10
@nvzhihanj

Copy link
Copy Markdown
Collaborator Author

Converting to draft right now, we need to finalize the accuracy format before adding the truncation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants