Skip to content

launcher: package as modelopt_launcher; mcp: call console script directly#1766

Open
ChenhanYu wants to merge 2 commits into
mainfrom
chenhany/launcher-package-modelopt-launcher
Open

launcher: package as modelopt_launcher; mcp: call console script directly#1766
ChenhanYu wants to merge 2 commits into
mainfrom
chenhany/launcher-package-modelopt-launcher

Conversation

@ChenhanYu

@ChenhanYu ChenhanYu commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • tools/launcher/__init__.py gains PACKAGE_DIR, making the directory an importable package named modelopt_launcher via a package-dir mapping in pyproject.toml. No file movescommon/ and examples/ stay exactly where they are.
  • pyproject.toml adds the packages/package-dir/package-data declarations and a modelopt-launcher console script entry point.
  • launch.py switches imports to modelopt_launcher.{core,slurm_config} (works in both uv run launch.py via editable install and the installed console script), adds _has_modelopt_src to skip packaging modelopt source when running installed (cluster container already has it), and adds main().
  • bridge.py deletes the 75-line _find_launcher_dir() filesystem walker and _launcher_dir_not_found_response(); simplifies _find_launcher_examples_dir() to 2 strategies (env override → import modelopt_launcher); switches submit_job subprocesses from ["uv", "run", "launch.py"] with cwd=launcher_dir to ["modelopt-launcher"] with no cwd.
  • tools/mcp/pyproject.toml declares modelopt-launcher as a proper dependency (dev: editable ../launcher; published: PyPI), replacing a lengthy comment explaining why it could not be declared.
  • 7 tests for the deleted functions removed; all 38 remaining tests pass.

Test plan

  • cd tools/launcher && uv run python3 -m pytest tests/ -v — all 65 tests pass
  • cd tools/mcp && uv run python3 -m pytest tests/ -v — 38 tests pass
  • uv run launch.py --yaml examples/Qwen/Qwen3-8B/megatron_lm_ptq.yaml --dryrun --yes -v — dry-run resolves correctly
  • modelopt-launcher --yaml examples/Qwen/Qwen3-8B/megatron_lm_ptq.yaml --dryrun --yes — console script works after pip install -e tools/launcher
  • cd tools/mcp && uvx modelopt-mcp — MCP server starts without FileNotFoundError

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added a modelopt-launcher CLI command as the standardized entry point for launching optimization jobs.
  • Bug Fixes

    • Improved launcher detection and error reporting when the launcher isn’t installed.
    • Simplified experiment-directory discovery and standardized environment handling for job submission and log retrieval.
  • Chores

    • Updated launcher packaging and example/resource discovery to work reliably from installed distributions.
    • Added dev-mode symlink and cleanup safeguards.
  • Tests

    • Adjusted expectations for draft PR failure behavior.

@ChenhanYu ChenhanYu requested review from a team as code owners June 17, 2026 19:44
@ChenhanYu ChenhanYu requested a review from kevalmorabia97 June 17, 2026 19:44
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6593889b-719c-4e81-ba74-fc8133df09d0

📥 Commits

Reviewing files that changed from the base of the PR and between 00b2c13 and 3aafbc8.

📒 Files selected for processing (3)
  • tools/launcher/__init__.py
  • tools/launcher/launch.py
  • tools/mcp/modelopt_mcp/bridge.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tools/launcher/launch.py

📝 Walkthrough

Walkthrough

Converts modelopt_launcher into a proper installable Python package by adding a PACKAGE_DIR constant, a main() entrypoint in launch.py, a CLI script entry in pyproject.toml, and full packaging metadata. The MCP bridge is updated to discover launcher examples and invoke jobs via the installed modelopt-launcher console script instead of uv run launch.py, removing all launcher-directory cwd dependencies.

Changes

modelopt_launcher package formalization and MCP bridge integration

Layer / File(s) Summary
modelopt_launcher package contract and metadata
tools/launcher/__init__.py, tools/launcher/.gitignore, tools/launcher/pyproject.toml
Adds PACKAGE_DIR constant to __init__.py, a .gitignore rule for the auto-created modules/ symlink, a modelopt-launcher CLI entry point, and full setuptools packaging metadata (package list, package-dir mapping, data globs for common/**/* and examples/**/*.yaml/.jinja).
launch.py: installed-package path derivation and main() entrypoint
tools/launcher/launch.py
Imports modelopt_launcher as _pkg, derives LAUNCHER_DIR from _pkg.PACKAGE_DIR, conditionally creates the modules/Model-Optimizer symlink and builds PatternPackager paths only when the local source checkout exists, sets MODELOPT_SRC_PATH to None when absent, adds a guard so launch(clean=True) raises ValueError when dev checkout unavailable, and adds main() delegating to run.cli.main(launch).
bridge.py: launcher examples discovery and environment propagation
tools/mcp/modelopt_mcp/bridge.py
Rewrites _find_launcher_examples_dir() to use MODELOPT_LAUNCHER_EXAMPLES_DIR env override or modelopt_launcher.PACKAGE_DIR/examples/. Updates submit_job_impl to invoke modelopt-launcher console script with --yaml and --yes flags, standardizes NEMORUN_HOME propagation across submit and later status/log reads, and sets SLURM_HOST via environment.
bridge.py: dry-run and Docker/Slurm submission without launcher-dir cwd
tools/mcp/modelopt_mcp/bridge.py
Changes _submit_job_dry_run to invoke modelopt-launcher --dryrun --yes without launcher-directory cwd, propagates NEMORUN_HOME and SLURM_HOST via environment, and wraps subprocess in FileNotFoundError handling. Docker and Slurm submit paths remove cwd dependency while preserving detached/capture semantics and add launcher-missing error responses.
bridge.py: experiment directory and log retrieval simplification
tools/mcp/modelopt_mcp/bridge.py
Simplifies _resolve_experiment_dir() to check only $NEMORUN_HOME/experiments/<id>, ./experiments/<id>, and ./local_experiments/<id>, removing the launcher-package experiments-roots fallback. Updates read_cluster_artifact_impl logs mode to wrap the nemo experiment logs subprocess invocation in try and remove launcher-directory cwd.
MCP dependency declaration and test adjustment
tools/mcp/pyproject.toml, tools/mcp/tests/test_bridge.py
Adds modelopt-launcher to project.dependencies with a [tool.uv.sources] editable mapping to ../launcher. Removes a stale branch_pushed == True assertion from the test_open_draft_pr_gh_failed_but_branch_pushed test case.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/Model-Optimizer#1731: Overlaps with the same submit_job_impl, _submit_job_dry_run, _resolve_experiment_dir, and read_cluster_artifact_impl codepaths that this PR refactors for launcher-aware job submission and experiment tracking.
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main changes: packaging the launcher as a Python package and updating MCP to call the console script directly, which are the core objectives of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns detected. No torch.load/numpy.load/trust_remote_code/eval/exec violations. All # nosec comments pre-existed in parent commit and are justified. New dependencies (nemo-run,...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chenhany/launcher-package-modelopt-launcher

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tools/mcp/modelopt_mcp/bridge.py (1)

554-560: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle missing modelopt-launcher binary as a structured failure.

After switching to the console script, these subprocess calls no longer catch FileNotFoundError. If modelopt-launcher is not on PATH, the MCP flow raises and can take down the request path instead of returning an ok=false diagnostic.

Suggested patch
+def _launcher_not_found(argv: list[str]) -> dict:
+    return {
+        "ok": False,
+        "reason": "launcher_not_installed",
+        "diagnostic": (
+            "`modelopt-launcher` was not found on PATH. "
+            "Install modelopt-launcher (or fix the environment) and retry."
+        ),
+        "argv": argv,
+    }
@@
-        proc = subprocess.Popen(  # nosec B603
-            argv,
-            env=child_env,
-            stdout=subprocess.DEVNULL,
-            stderr=subprocess.DEVNULL,
-            start_new_session=True,
-        )
+        try:
+            proc = subprocess.Popen(
+                argv,
+                env=child_env,
+                stdout=subprocess.DEVNULL,
+                stderr=subprocess.DEVNULL,
+                start_new_session=True,
+            )
+        except FileNotFoundError:
+            return _launcher_not_found(argv)
@@
-    except subprocess.TimeoutExpired as e:
+    except FileNotFoundError:
+        return _launcher_not_found(argv)
+    except subprocess.TimeoutExpired as e:
@@
-    except subprocess.TimeoutExpired as e:
+    except FileNotFoundError:
+        return _launcher_not_found(argv)
+    except subprocess.TimeoutExpired as e:

Also applies to: 580-587, 749-756

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/mcp/modelopt_mcp/bridge.py` around lines 554 - 560, The
subprocess.Popen calls that execute modelopt-launcher (at lines 554-560,
580-587, and 749-756) do not catch FileNotFoundError when the binary is missing
from PATH, causing the process to crash instead of returning a structured
failure. Wrap each of these subprocess.Popen calls in a try-except block that
catches FileNotFoundError and handles it by returning a diagnostic response with
ok=false rather than allowing the exception to propagate and crash the request
handler.
🧹 Nitpick comments (1)
tools/launcher/modelopt_launcher/__init__.py (1)

23-29: ⚡ Quick win

Define explicit exports in package __init__.py.

PACKAGE_DIR is part of the package contract, but this module does not declare __all__. Please make exports explicit so the public API is curated.

Suggested patch
+__all__ = ["PACKAGE_DIR"]
+
 import os as _os
@@
 PACKAGE_DIR: str = _os.path.dirname(_os.path.abspath(__file__))

As per coding guidelines: “Each module declares its public surface with __all__ = [...] at the top of the file.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/modelopt_launcher/__init__.py` around lines 23 - 29, The
module is missing an explicit `__all__` declaration to curate the public API.
Add an `__all__` list to the module that includes the `PACKAGE_DIR` variable as
an explicit public export. This should be declared near the top of the file
after the imports and comments, making it clear that `PACKAGE_DIR` is part of
the package's public contract.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/launcher/launch.py`:
- Around line 63-70: The variable _mo_symlink is only initialized inside the
_has_modelopt_src conditional block, but it is dereferenced later in the code
(around line 119 and related lines 118-121) without being guarded by the same
condition. In installed mode where _has_modelopt_src is False, this causes a
NameError when --clean is used. Guard all references to _mo_symlink (and related
cleanup operations at lines 118-121) with an if _has_modelopt_src check to
ensure the variable exists before being accessed, or initialize _mo_symlink to
None before the conditional and check for None before dereferencing it.

In `@tools/launcher/modules/Megatron-LM`:
- Line 1: The Megatron-LM submodule in tools/launcher/modules/Megatron-LM is
pinned to commit 35d5c653e38a0b5b3772627a7454b059c7bca932 which does not exist
in the upstream repository. Verify the correct commit hash by checking the
upstream Megatron-LM repository for a valid commit that matches the intended
version or functionality needed, then update the submodule pin with the correct
commit hash to ensure the submodule can be properly cloned and initialized.

---

Outside diff comments:
In `@tools/mcp/modelopt_mcp/bridge.py`:
- Around line 554-560: The subprocess.Popen calls that execute modelopt-launcher
(at lines 554-560, 580-587, and 749-756) do not catch FileNotFoundError when the
binary is missing from PATH, causing the process to crash instead of returning a
structured failure. Wrap each of these subprocess.Popen calls in a try-except
block that catches FileNotFoundError and handles it by returning a diagnostic
response with ok=false rather than allowing the exception to propagate and crash
the request handler.

---

Nitpick comments:
In `@tools/launcher/modelopt_launcher/__init__.py`:
- Around line 23-29: The module is missing an explicit `__all__` declaration to
curate the public API. Add an `__all__` list to the module that includes the
`PACKAGE_DIR` variable as an explicit public export. This should be declared
near the top of the file after the imports and comments, making it clear that
`PACKAGE_DIR` is part of the package's public contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9a9ab2ce-afb4-4c6e-96be-ee31b982aad5

📥 Commits

Reviewing files that changed from the base of the PR and between 769ea5f and 6320002.

📒 Files selected for processing (86)
  • .agents/skills/eagle3-new-model/SKILL.md
  • .agents/skills/ptq/references/launcher-guide.md
  • tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/nemo_evaluator.yaml
  • tools/launcher/launch.py
  • tools/launcher/modelopt_launcher/.gitignore
  • tools/launcher/modelopt_launcher/__init__.py
  • tools/launcher/modelopt_launcher/common/check_regression.py
  • tools/launcher/modelopt_launcher/common/eagle3/dump_offline_data.sh
  • tools/launcher/modelopt_launcher/common/eagle3/dump_offline_data_hf.sh
  • tools/launcher/modelopt_launcher/common/eagle3/dump_offline_data_vllm.sh
  • tools/launcher/modelopt_launcher/common/eagle3/hf_ptq.sh
  • tools/launcher/modelopt_launcher/common/eagle3/make_dataset.sh
  • tools/launcher/modelopt_launcher/common/eagle3/train_eagle.sh
  • tools/launcher/modelopt_launcher/common/eagle3/train_eagle_streaming.sh
  • tools/launcher/modelopt_launcher/common/hf/ptq.sh
  • tools/launcher/modelopt_launcher/common/megatron_bridge/import/import.sh
  • tools/launcher/modelopt_launcher/common/megatron_lm/export/export.sh
  • tools/launcher/modelopt_launcher/common/megatron_lm/quantize/quantize.sh
  • tools/launcher/modelopt_launcher/common/megatron_lm/quantize/task.py
  • tools/launcher/modelopt_launcher/common/query.py
  • tools/launcher/modelopt_launcher/common/service_utils.sh
  • tools/launcher/modelopt_launcher/common/specdec/ar_eval_mtbench.sh
  • tools/launcher/modelopt_launcher/common/specdec/dflash_online_training.sh
  • tools/launcher/modelopt_launcher/common/specdec/vllm_smoke_test.sh
  • tools/launcher/modelopt_launcher/common/specdec_bench/quick_check.sh
  • tools/launcher/modelopt_launcher/common/specdec_bench/run.sh
  • tools/launcher/modelopt_launcher/common/specdec_bench/upload_to_s3.sh
  • tools/launcher/modelopt_launcher/common/tensorrt_llm/eval.sh
  • tools/launcher/modelopt_launcher/common/tensorrt_llm/extra_llm_api_options.yaml
  • tools/launcher/modelopt_launcher/common/tensorrt_llm/query.sh
  • tools/launcher/modelopt_launcher/common/vllm/gpqa_sample.jsonl
  • tools/launcher/modelopt_launcher/common/vllm/query.sh
  • tools/launcher/modelopt_launcher/examples/MiniMax/MiniMax-M2.7-DFlash/accelerate_fsdp2_hybrid.yaml
  • tools/launcher/modelopt_launcher/examples/MiniMax/MiniMax-M2.7-DFlash/chat_template_train.jinja
  • tools/launcher/modelopt_launcher/examples/MiniMax/MiniMax-M2.7-DFlash/hf_offline_dflash.yaml
  • tools/launcher/modelopt_launcher/examples/MiniMax/MiniMax-M2.7-DFlash/hf_online_dflash.yaml
  • tools/launcher/modelopt_launcher/examples/MiniMax/MiniMax-M2.7-DFlash/specdec_bench.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-0.6B/chat_template_train.jinja
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-0.6B/hf_offline_dflash.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-0.6B/hf_online_dflash.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-30B-A3B/hf_streaming_dflash_multi_node.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-30B-A3B/hf_streaming_eagle3_multi_node.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-30B-A3B/megatron_lm_ptq.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/chat_template_train.jinja
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/eagle3_quick_check.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_eagle3_dryrun.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_offline_eagle3_ptq.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_online_dflash.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_online_eagle3.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_streaming_dflash.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_streaming_dflash_multi_node.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_streaming_eagle3.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_streaming_eagle3_multi_node.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/megatron_lm_ptq.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/megatron_lm_ptq_local.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/chat_template_train.jinja
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/hf_online_dflash.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/query_specdec_dataset.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/specdec_bench.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/specdec_bench_dflash_vllm.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/specdec_bench_mtp.yaml
  • tools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/specdec_bench_mtp_vllm.yaml
  • tools/launcher/modelopt_launcher/examples/google/gemma-4-E4B-it/specdec_bench_mtp_vllm.yaml
  • tools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_dflash_dryrun.yaml
  • tools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_eagle3_dryrun.yaml
  • tools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_offline_dflash.yaml
  • tools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_dflash.yaml
  • tools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_dflash_multi_node.yaml
  • tools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_eagle3.yaml
  • tools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_eagle3_multi_node.yaml
  • tools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/specdec_bench.yaml
  • tools/launcher/modelopt_launcher/examples/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/nemo_evaluator.yaml
  • tools/launcher/modelopt_launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_bridge_import.yaml
  • tools/launcher/modelopt_launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml
  • tools/launcher/modelopt_launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/specdec_bench_dflash_vllm.yaml
  • tools/launcher/modelopt_launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml
  • tools/launcher/modelopt_launcher/examples/openai/gpt-oss-20b/chat_template_train.jinja
  • tools/launcher/modelopt_launcher/examples/openai/gpt-oss-20b/hf_streaming_dflash_multi_node.yaml
  • tools/launcher/modelopt_launcher/examples/openai/gpt-oss-20b/hf_streaming_eagle3_multi_node.yaml
  • tools/launcher/modules/Megatron-LM
  • tools/launcher/pyproject.toml
  • tools/mcp/modelopt_mcp/bridge.py
  • tools/mcp/pyproject.toml
  • tools/mcp/tests/test_bridge.py
💤 Files with no reviewable changes (2)
  • tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/nemo_evaluator.yaml
  • tools/mcp/tests/test_bridge.py

Comment thread tools/launcher/launch.py
Comment thread tools/launcher/modules/Megatron-LM Outdated
@@ -1 +1 @@
Subproject commit c69697d0510198acddf124694dcfd5057021092f
Subproject commit 35d5c653e38a0b5b3772627a7454b059c7bca932

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

commit="35d5c653e38a0b5b3772627a7454b059c7bca932"
submodule_path="tools/launcher/modules/Megatron-LM"

echo "Checking submodule URL from .gitmodules..."
url="$(git config --file .gitmodules --get "submodule.${submodule_path}.url")"
test -n "${url}"

echo "Confirming pinned commit exists on the configured remote..."
git ls-remote "${url}" "${commit}" | grep -F "${commit}"

echo "If the submodule is already initialized locally, verify launcher-required paths..."
if git -C "${submodule_path}" rev-parse --is-inside-work-tree >/dev/null 2>&1; then
  git -C "${submodule_path}" cat-file -e "${commit}^{commit}"

  for required_path in \
    "megatron" \
    "examples" \
    "examples/post_training/modelopt/quantize.sh" \
    "examples/post_training/modelopt/mmlu.sh" \
    "examples/post_training/modelopt/export.sh"
  do
    git -C "${submodule_path}" ls-tree -r --name-only "${commit}" -- "${required_path}" | grep -q .
    echo "OK: ${required_path}"
  done
else
  echo "Submodule is not initialized in this checkout; initialize it in CI/dev and run the path checks above."
fi

Repository: NVIDIA/Model-Optimizer

Length of output: 169


The pinned Megatron-LM commit does not exist in the upstream repository.

The commit 35d5c653e38a0b5b3772627a7454b059c7bca932 was not found when querying the remote Megatron-LM repository. Cloning or initializing this submodule will fail. Verify the correct commit hash and update the pin before merge.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/modules/Megatron-LM` at line 1, The Megatron-LM submodule in
tools/launcher/modules/Megatron-LM is pinned to commit
35d5c653e38a0b5b3772627a7454b059c7bca932 which does not exist in the upstream
repository. Verify the correct commit hash by checking the upstream Megatron-LM
repository for a valid commit that matches the intended version or functionality
needed, then update the submodule pin with the correct commit hash to ensure the
submodule can be properly cloned and initialized.

@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.31%. Comparing base (769ea5f) to head (3aafbc8).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1766      +/-   ##
==========================================
+ Coverage   74.28%   74.31%   +0.03%     
==========================================
  Files         511      511              
  Lines       56360    56360              
==========================================
+ Hits        41867    41884      +17     
+ Misses      14493    14476      -17     
Flag Coverage Δ
regression 14.70% <ø> (+0.06%) ⬆️
unit 54.34% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ctly

tools/launcher/__init__.py gains PACKAGE_DIR so the directory is an
importable package named modelopt_launcher (via package-dir mapping in
pyproject.toml). common/ and examples/ stay in place — no file moves.

pyproject.toml adds:
  - packages = ["modelopt_launcher"] + package-dir = {"modelopt_launcher": "."}
  - package-data for common/**/* and examples/**/*.{yaml,jinja}
  - console_scripts: modelopt-launcher = "modelopt_launcher.launch:main"

launch.py:
  - imports from modelopt_launcher.{core,slurm_config} (works in both
    uv run and installed console-script modes via editable install)
  - adds _has_modelopt_src flag: skips packaging modelopt source when
    running as an installed console script (cluster container already
    has modelopt pre-installed)
  - adds main() entry point

tools/mcp/modelopt_mcp/bridge.py:
  - deletes _find_launcher_dir() (75-line filesystem walker) and
    _launcher_dir_not_found_response(); 7 corresponding tests removed
  - _find_launcher_examples_dir() simplified to 2 strategies:
    env override → import modelopt_launcher
  - submit_job_impl and _submit_job_dry_run switch from
    ["uv", "run", "launch.py"] with cwd=launcher_dir to
    ["modelopt-launcher"] with no cwd required
  - _resolve_experiment_dir drops the _find_launcher_dir() fallback
  - read_cluster_artifact_impl drops cwd= from nemo subprocess

tools/mcp/pyproject.toml declares modelopt-launcher as a proper
dependency (dev: editable path ../launcher; published: PyPI).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
@ChenhanYu ChenhanYu force-pushed the chenhany/launcher-package-modelopt-launcher branch from 6320002 to 00b2c13 Compare June 17, 2026 20:28
@ChenhanYu ChenhanYu changed the title launcher: package as modelopt-launcher with console_scripts entry point launcher: package as modelopt_launcher; mcp: call console script directly Jun 17, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🧹 Nitpick comments (1)
tools/launcher/__init__.py (1)

16-20: ⚡ Quick win

Declare PACKAGE_DIR in __all__.

PACKAGE_DIR is now a public package contract, but this __init__.py does not define an explicit public surface.

Suggested patch
 import os as _os
 
+__all__ = ["PACKAGE_DIR"]
+
 PACKAGE_DIR: str = _os.path.dirname(_os.path.abspath(__file__))

As per coding guidelines, "__init__.py: Define the public API with __all__ at the top of each module."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/__init__.py` around lines 16 - 20, The `__init__.py` file is
missing an explicit `__all__` declaration to define the public API surface. Add
an `__all__` list after the module docstring and include the `PACKAGE_DIR`
variable in it to explicitly declare it as a public export of the package. This
makes the public contract clear to users and follows the project's coding
guidelines.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/launcher/launch.py`:
- Around line 60-61: The issue is that the code checks for the ModelOpt source
at a symlink path (tools/launcher/modules/Model-Optimizer/modelopt) in the
_modelopt_src and _has_modelopt_src variables on lines 60-61, but this symlink
only exists after setup. In a clean dev checkout, this check will always fail,
causing _has_modelopt_src to be false and preventing proper ModelOpt
configuration in the include patterns (lines 79-89) and MODELOPT_SRC_PATH
assignment (line 96). Instead of checking the symlink destination path, detect
the actual source checkout location first by checking for an alternative path
where the ModelOpt source code would exist in the repository before the symlink
is created, then use that result for the subsequent logic.

---

Nitpick comments:
In `@tools/launcher/__init__.py`:
- Around line 16-20: The `__init__.py` file is missing an explicit `__all__`
declaration to define the public API surface. Add an `__all__` list after the
module docstring and include the `PACKAGE_DIR` variable in it to explicitly
declare it as a public export of the package. This makes the public contract
clear to users and follows the project's coding guidelines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9ffddbe8-88e5-403b-89e2-cdbcf5d6e575

📥 Commits

Reviewing files that changed from the base of the PR and between 6320002 and 00b2c13.

📒 Files selected for processing (7)
  • tools/launcher/.gitignore
  • tools/launcher/__init__.py
  • tools/launcher/launch.py
  • tools/launcher/pyproject.toml
  • tools/mcp/modelopt_mcp/bridge.py
  • tools/mcp/pyproject.toml
  • tools/mcp/tests/test_bridge.py
💤 Files with no reviewable changes (1)
  • tools/mcp/tests/test_bridge.py
✅ Files skipped from review due to trivial changes (1)
  • tools/launcher/.gitignore
🚧 Files skipped from review as they are similar to previous changes (2)
  • tools/mcp/pyproject.toml
  • tools/mcp/modelopt_mcp/bridge.py

Comment thread tools/launcher/launch.py Outdated
@ChenhanYu

Copy link
Copy Markdown
Collaborator Author

/claude review

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review summary

Scope: This PR touches only tools/ (launcher packaging + MCP bridge integration) — no modelopt/ source, examples/, or model-optimization code. Reviewed all 7 changed files; traced the modelopt_launcher.* import switch through core.py/slurm_config.py, the _has_modelopt_src gating in launch.py, and the bridge's subprocess + env-propagation changes.

Findings by severity: CRITICAL: 1 · IMPORTANT: 0 · SUGGESTION: 0

Most impactful finding

  • [CRITICAL] _mo_symlink NameError on the installed --clean path (tools/launcher/launch.py:117). The variable is now only assigned inside the if _has_modelopt_src: block, but the clean branch references it unconditionally. In the installed console-script mode (modelopt-launcher, the feature this PR adds), _has_modelopt_src is False, so modelopt-launcher --clean ... crashes with NameError rather than running. Inline comment has a suggested gate.

Assessment

The core refactor is sound: the modules/Model-Optimizer symlink is git-tracked, so _has_modelopt_src resolves True on dev checkouts (no chicken-and-egg with symlink auto-creation), and the dev-vs-installed packager split is coherent. Test-side bare imports (from core import ...) still work via conftest.py's sys.path insertion, unaffected by the launch.py package-qualified imports. The bridge simplification (dropping _find_launcher_dir/cwd walk-up in favor of import modelopt_launcher) is a clean reduction, and NEMORUN_HOME is still pinned via child_env so the cwd removal does not break experiment-dir resolution.

Risk: low-to-moderate, isolated to the one crash on the --clean flag in installed mode. Recommend fixing that gate before merge.

…__, launcher not found

- launch.py: detect dev checkout via MODELOPT_ROOT/modelopt (actual dir)
  instead of the symlink path which doesn't exist in a clean checkout
- launch.py: initialize _mo_symlink = None before the conditional; guard
  --clean to raise ValueError instead of NameError in installed mode
- __init__.py: add __all__ = ["PACKAGE_DIR"] per coding guidelines
- bridge.py: add _launcher_not_installed() helper; catch FileNotFoundError
  on all three modelopt-launcher subprocess call sites (Docker Popen,
  Slurm run, dry-run run) and return structured ok=false instead of crash

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
@ChenhanYu ChenhanYu self-assigned this Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant