Loadgen++/endpoints integration with submission checker by pgmpablo157321 · Pull Request #2601 · mlcommons/inference

pgmpablo157321 · 2026-06-15T17:02:09Z

Documentation pending

github-actions · 2026-06-15T17:02:24Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

anandhu-eng · 2026-06-16T16:59:10Z

                           system=system, benchmark=benchmark, scenario=scenario)

    def load_single_log(self, path, log_type: Literal["Performance", "Accuracy",
                        "AccuracyResult", "AccuracyJSON", "Test", "System", "Measurements"]):


How about we add Endpoints also here?

Loading the endpoints result is very different. It was moved to another function. Please let me know if you think we should integrate both

anandhu-eng · 2026-06-16T17:01:56Z

        """
        log = None
-        if os.path.exists(path):
+        if log_type in ["Endpoints"]:


would it be better to add check if path path exists? as we have in elif?

This was moved to another function as well

anandhu-eng · 2026-06-16T17:55:25Z

                                acc_json_path, "AccuracyJSON")
                            measurements_json = self.load_single_log(
                                measurements_path, "Measurements")
+                            if perf_log is None and acc_log is None:


My concern here is that if user did an inference run and supplied a wrong path, it would print the error log specific for endpoints and also set is_endpoints_submittion as true:

Could not load Endpoints log from path/supplied, log type not recognized

- constants.py: Fix percentile key format in ENDPOINTS_MAPPINGS — the endpoints JSON uses float-format keys (e.g. "99.0") but the mappings used integer strings ("99"), causing latency_check to receive None and crash on the comparison. Updated all latency/ttft/tpot percentile keys to use the .0 suffix (50.0, 90.0, 95.0, 99.0). - performance_check.py: Fix llm_check for endpoints — the check gated on the loadgen use_token_latencies flag which does not exist in endpoints submissions, causing all LLM models to fail. Added an endpoints-specific branch that checks TTFT/TPOT p99 values directly from the result JSON. Also fix get_performance_metric_check to skip RESULT_FIELD_BENCHMARK_OVERWRITE for endpoints (the tokens/sec field is not present in endpoints result files; use QPS instead). - endpoints_parser.py: Fix inferred QPS unit — the fallback QPS calculation divided n_samples_issued by duration_ns directly, giving ~1e-8 instead of the correct value. Convert duration to seconds first. Also guard against overwriting an already-resolved QPS value. - README.md: Update submission checker documentation with current version numbers, endpoints directory structure, endpoints-specific checks, and accuracy_scores requirement. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Loadgen++/endpoints integration with submission checker

325ae0d

Handle checks: remove checks for endpoints submissions

5808d0a

pgmpablo157321 marked this pull request as ready for review June 16, 2026 15:43

pgmpablo157321 requested a review from a team as a code owner June 16, 2026 15:43

pgmpablo157321 force-pushed the endpoints_integration branch from 7eeb1dc to 2d143ff Compare June 16, 2026 16:07

Split string to avoid error in automatic testing

788e9ed

pgmpablo157321 force-pushed the endpoints_integration branch from 2d143ff to 788e9ed Compare June 16, 2026 16:11