An automation CLI tool for HPC workflow: source code/data sync and scheduler job management (Slurm/PJM).
One-shot execution (no install):
uvx --from git+https://github.com/ultimatile/hpc hpcPermanent install:
uv tool install git+https://github.com/ultimatile/hpc# 1. Initialize project
hpc init
# 2. Edit configuration
vim hpc.toml
# 3. Sync files to cluster
hpc sync
hpc sync --dry-run # preview only
# 4. Submit job
hpc submit "python train.py"
# 5. Check status
hpc status 12345678
# 6. View job output
hpc job-output 12345678Creates hpc.toml configuration file in the current directory.
hpc initSyncs local files to the remote HPC cluster using rsync.
Always syncs the entire project root (where hpc.toml is located), regardless of which subdirectory you run from.
hpc sync # sync files
hpc sync --dry-run # preview without syncing (-n for short)
hpc sync --workdir /scratch/user/other # override remote workdir
hpc sync --push # push only (local → remote)
hpc sync --pull # pull only (remote → local)Executes a command directly on the login node (not via scheduler). Useful for setup tasks that need internet access (package installs, dependency downloads).
hpc exec "julia -e 'using Pkg; Pkg.instantiate()'"
hpc exec --script setup.sh
hpc exec --workdir /scratch/user/other "cmake .."Environment setup ([env] section) is applied automatically. The working directory follows the same CWD-relative logic as hpc submit.
Submits a job to the configured scheduler.
Returns both run_id (e.g., 20260109_1234, hpc's local tracking ID) and job_id (scheduler job ID, e.g., 12345678).
The job's working directory is set based on your current position relative to the project root (see Multi-Setup Runs).
hpc submit "python train.py"
hpc submit --script run.sh
hpc submit -s run.sh --wait
hpc submit --workdir /scratch/user/other "python train.py" # override remote workdirChecks the status of a submitted job. Accepts either run_id or job_id.
hpc status 12345678Shows the output of a submitted job. Accepts either run_id or job_id.
hpc job-output 12345678Waits for a run to complete. Accepts either run_id or job_id.
hpc wait 12345678hpc walks up from the current directory to find hpc.toml, similar to how git finds .git. This means you can run hpc commands from any subdirectory within your project.
Resolution order: --config / -c > $HPC_CONFIG > walk-up discovery > ./hpc.toml.
The directory containing hpc.toml is the project root. This affects:
hpc sync: always syncs the entire project root toworkdir, regardless of CWDhpc submit: sets the job'scdtoworkdir+ (CWD relative to project root).hpc/runs/: run metadata is always stored at the project root
hpc init does not walk up — it always creates hpc.toml in the current directory.
When running multiple benchmarks or parameter sets from a single project, use subdirectories. hpc automatically maps your local directory structure to the remote.
myproject/
hpc.toml # workdir = "/remote/myproject"
src/main.py
runs/
setup-a/
input.dat
setup-b/
input.dat
# Sync the entire project (same result from any subdirectory)
hpc sync
# Submit from a subdirectory — job runs in the matching remote path
cd runs/setup-a
hpc submit "python src/main.py"
# → job cd's to /remote/myproject/runs/setup-a
cd ../setup-b
hpc submit "python src/main.py"
# → job cd's to /remote/myproject/runs/setup-bKey points:
- sync is always project-wide. The remote mirrors your local project structure exactly.
- submit uses your CWD to determine the job's working directory on the remote.
--workdiroverridescluster.workdirfor one-off use without editinghpc.toml.- Large artifacts that shouldn't be synced are managed via
[sync] ignore.
Edit hpc.toml:
[cluster]
host = "myhpc" # SSH host (from ~/.ssh/config)
workdir = "/scratch/user/proj" # Remote working directory; all codes and data will be synced here
scheduler = "slurm" # "slurm" (default) or "pjm"
[env]
modules = ["gcc/12.2.0", "cuda/12.2"] # Modules to load (shorthand for module load)
spack = ["python@3.11"] # Spack packages to load (shorthand for spack load)
setup = [ # Additional setup commands
{source = "/path/to/venv/bin/activate"},
{export = ["VAR=value"]}, # {command = [args...]} format
"some_cmd", # String: command without args
]
[sync]
ignore = ["hpc.toml", ".git"] # Patterns to exclude from sync
compare = "checksum" # File comparison: "checksum" (content-based, default) or "timestamp"
pull_dir = "~/data/myproj" # Pull destination (default: project root). Useful for keeping git repo clean
[slurm.options]
partition = "gpu" # Example (Slurm): partition
time = "02:00:00" # Example (Slurm): time limit
mem = "32G" # Example (Slurm): memory
gpus = 1 # Example (Slurm): number of GPUsCommands are executed in this order: modules → spack → setup.
modules and spack are shorthand syntax:
modules = ["gcc/12.2.0"]expands tomodule load gcc/12.2.0spack = ["python@3.11"]expands tospack load python@3.11
setup accepts:
- String: command without args (e.g.,
"some_cmd") - Dict:
{command = args}format (e.g.,{export = ["VAR=value"]}→export VAR=value) - Special commands
moduleandspackin dict format expand tomodule load/spack load
If you need a different execution order, put everything in setup:
[env]
setup = [
{spack = "python@3.11"},
{module = "gcc/12.2.0"},
{source = "/path/to/venv/bin/activate"},
]Shell special characters (;|&`$<>\'"\n and space) are prohibited in arguments for security.
For PJM scheduler, use array format for options:
[cluster]
scheduler = "pjm"
[pjm]
options = [
["-L", "node=12"],
["-L", "rscgrp=small"],
["-L", "elapse=00:30:00"],
["--mpi", "max-proc-per-node=4"],
["-g", "laa4Hoo5"],
["-s"]
]$XDG_CONFIG_HOME/hpc/config.toml (default: ~/.config/hpc/config.toml) will be copied as hpc.toml if it exists when running hpc init.
- Python 3.11+
- SSH access to HPC cluster (key-based authentication recommended)
- rsync
- Slurm or PJM on the remote cluster
rsync from https://rsync.samba.org/ is recommended over macOS's built-in openrsync. When using checksum-based comparison (compare = "checksum", default), openrsync has a bug where files with sizes that are exact multiples of 64 bytes are always detected as changed, even when identical. This is due to a protocol 29 checksum boundary issue. Confirmed with macOS 15.7's openrsync (protocol version 29, rsync version 2.6.9 compatible). If concerned, use [sync] compare = "timestamp" instead.
On macOS, install rsync via Homebrew:
brew install rsyncThis project includes a Claude Code skill (.claude/skills/hpc/SKILL.md) that teaches Claude how to use the hpc CLI. The CLI reference in the skill is dynamically generated via hpc --skill to stay in sync with the code.
make test # run tests
make lint # run linter
make check # run all checks