Skip to content

Loadgen++/endpoints integration with submission checker#2601

Open
pgmpablo157321 wants to merge 7 commits into
masterfrom
endpoints_integration
Open

Loadgen++/endpoints integration with submission checker#2601
pgmpablo157321 wants to merge 7 commits into
masterfrom
endpoints_integration

Conversation

@pgmpablo157321

Copy link
Copy Markdown
Contributor

Documentation pending

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@pgmpablo157321 pgmpablo157321 marked this pull request as ready for review June 16, 2026 15:43
@pgmpablo157321 pgmpablo157321 requested a review from a team as a code owner June 16, 2026 15:43
@pgmpablo157321 pgmpablo157321 force-pushed the endpoints_integration branch from 7eeb1dc to 2d143ff Compare June 16, 2026 16:07
@pgmpablo157321 pgmpablo157321 force-pushed the endpoints_integration branch from 2d143ff to 788e9ed Compare June 16, 2026 16:11
system=system, benchmark=benchmark, scenario=scenario)

def load_single_log(self, path, log_type: Literal["Performance", "Accuracy",
"AccuracyResult", "AccuracyJSON", "Test", "System", "Measurements"]):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we add Endpoints also here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading the endpoints result is very different. It was moved to another function. Please let me know if you think we should integrate both

"""
log = None
if os.path.exists(path):
if log_type in ["Endpoints"]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to add check if path path exists? as we have in elif?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved to another function as well

acc_json_path, "AccuracyJSON")
measurements_json = self.load_single_log(
measurements_path, "Measurements")
if perf_log is None and acc_log is None:

@anandhu-eng anandhu-eng Jun 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here is that if user did an inference run and supplied a wrong path, it would print the error log specific for endpoints and also set is_endpoints_submittion as true:

Could not load Endpoints log from path/supplied, log type not recognized

Comment thread tools/submission/submission_checker/constants.py Outdated
Comment thread tools/submission/submission_checker/constants.py Outdated
Comment thread tools/submission/submission_checker/constants.py Outdated
Comment thread tools/submission/submission_checker/constants.py Outdated
Comment thread tools/submission/submission_checker/constants.py Outdated
Comment thread tools/submission/submission_checker/constants.py Outdated
Comment thread tools/submission/submission_checker/parsers/endpoints_parser.py Outdated
anandhu-eng and others added 4 commits June 17, 2026 10:56
- constants.py: Fix percentile key format in ENDPOINTS_MAPPINGS — the
  endpoints JSON uses float-format keys (e.g. "99.0") but the mappings
  used integer strings ("99"), causing latency_check to receive None and
  crash on the comparison. Updated all latency/ttft/tpot percentile keys
  to use the .0 suffix (50.0, 90.0, 95.0, 99.0).

- performance_check.py: Fix llm_check for endpoints — the check gated
  on the loadgen use_token_latencies flag which does not exist in
  endpoints submissions, causing all LLM models to fail. Added an
  endpoints-specific branch that checks TTFT/TPOT p99 values directly
  from the result JSON. Also fix get_performance_metric_check to skip
  RESULT_FIELD_BENCHMARK_OVERWRITE for endpoints (the tokens/sec field
  is not present in endpoints result files; use QPS instead).

- endpoints_parser.py: Fix inferred QPS unit — the fallback QPS
  calculation divided n_samples_issued by duration_ns directly, giving
  ~1e-8 instead of the correct value. Convert duration to seconds first.
  Also guard against overwriting an already-resolved QPS value.

- README.md: Update submission checker documentation with current
  version numbers, endpoints directory structure, endpoints-specific
  checks, and accuracy_scores requirement.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants