Skip to content

ultimatile/hpc

Repository files navigation

hpc

An automation CLI tool for HPC workflow: source code/data sync and scheduler job management (Slurm/PJM).

Installation

One-shot execution (no install):

uvx --from git+https://github.com/ultimatile/hpc hpc

Permanent install:

uv tool install git+https://github.com/ultimatile/hpc

Quick Start

# 1. Initialize project
hpc init

# 2. Edit configuration
vim hpc.toml

# 3. Sync files to cluster
hpc sync
hpc sync --dry-run  # preview only

# 4. Submit job
hpc submit "python train.py"

# 5. Check status
hpc status 12345678

# 6. View job output
hpc job-output 12345678

Commands

hpc init

Creates hpc.toml configuration file in the current directory.

hpc init

hpc sync

Syncs local files to the remote HPC cluster using rsync. Always syncs the entire project root (where hpc.toml is located), regardless of which subdirectory you run from.

hpc sync                # sync files
hpc sync --dry-run      # preview without syncing (-n for short)
hpc sync --workdir /scratch/user/other   # override remote workdir
hpc sync --push         # push only (local → remote)
hpc sync --pull         # pull only (remote → local)

hpc exec

Executes a command directly on the login node (not via scheduler). Useful for setup tasks that need internet access (package installs, dependency downloads).

hpc exec "julia -e 'using Pkg; Pkg.instantiate()'"
hpc exec --script setup.sh
hpc exec --workdir /scratch/user/other "cmake .."

Environment setup ([env] section) is applied automatically. The working directory follows the same CWD-relative logic as hpc submit.

hpc submit

Submits a job to the configured scheduler. Returns both run_id (e.g., 20260109_1234, hpc's local tracking ID) and job_id (scheduler job ID, e.g., 12345678).

The job's working directory is set based on your current position relative to the project root (see Multi-Setup Runs).

hpc submit "python train.py"
hpc submit --script run.sh
hpc submit -s run.sh --wait
hpc submit --workdir /scratch/user/other "python train.py"  # override remote workdir

hpc status

Checks the status of a submitted job. Accepts either run_id or job_id.

hpc status 12345678

hpc job-output

Shows the output of a submitted job. Accepts either run_id or job_id.

hpc job-output 12345678

hpc wait

Waits for a run to complete. Accepts either run_id or job_id.

hpc wait 12345678

Project Root and Config Discovery

hpc walks up from the current directory to find hpc.toml, similar to how git finds .git. This means you can run hpc commands from any subdirectory within your project.

Resolution order: --config / -c > $HPC_CONFIG > walk-up discovery > ./hpc.toml.

The directory containing hpc.toml is the project root. This affects:

  • hpc sync: always syncs the entire project root to workdir, regardless of CWD
  • hpc submit: sets the job's cd to workdir + (CWD relative to project root)
  • .hpc/runs/: run metadata is always stored at the project root

hpc init does not walk up — it always creates hpc.toml in the current directory.

Multi-Setup Runs

When running multiple benchmarks or parameter sets from a single project, use subdirectories. hpc automatically maps your local directory structure to the remote.

myproject/
  hpc.toml              # workdir = "/remote/myproject"
  src/main.py
  runs/
    setup-a/
      input.dat
    setup-b/
      input.dat
# Sync the entire project (same result from any subdirectory)
hpc sync

# Submit from a subdirectory — job runs in the matching remote path
cd runs/setup-a
hpc submit "python src/main.py"
# → job cd's to /remote/myproject/runs/setup-a

cd ../setup-b
hpc submit "python src/main.py"
# → job cd's to /remote/myproject/runs/setup-b

Key points:

  • sync is always project-wide. The remote mirrors your local project structure exactly.
  • submit uses your CWD to determine the job's working directory on the remote.
  • --workdir overrides cluster.workdir for one-off use without editing hpc.toml.
  • Large artifacts that shouldn't be synced are managed via [sync] ignore.

Configuration

Edit hpc.toml:

[cluster]
host = "myhpc"                    # SSH host (from ~/.ssh/config)
workdir = "/scratch/user/proj"    # Remote working directory; all codes and data will be synced here
scheduler = "slurm"                # "slurm" (default) or "pjm"

[env]
modules = ["gcc/12.2.0", "cuda/12.2"]  # Modules to load (shorthand for module load)
spack = ["python@3.11"]                # Spack packages to load (shorthand for spack load)
setup = [                              # Additional setup commands
    {source = "/path/to/venv/bin/activate"},
    {export = ["VAR=value"]},          # {command = [args...]} format
    "some_cmd",                        # String: command without args
]

[sync]
ignore = ["hpc.toml", ".git"]  # Patterns to exclude from sync
compare = "checksum"           # File comparison: "checksum" (content-based, default) or "timestamp"
pull_dir = "~/data/myproj"     # Pull destination (default: project root). Useful for keeping git repo clean

[slurm.options]
partition = "gpu"      # Example (Slurm): partition
time = "02:00:00"      # Example (Slurm): time limit
mem = "32G"            # Example (Slurm): memory
gpus = 1               # Example (Slurm): number of GPUs

Environment Setup

Commands are executed in this order: modulesspacksetup.

modules and spack are shorthand syntax:

  • modules = ["gcc/12.2.0"] expands to module load gcc/12.2.0
  • spack = ["python@3.11"] expands to spack load python@3.11

setup accepts:

  • String: command without args (e.g., "some_cmd")
  • Dict: {command = args} format (e.g., {export = ["VAR=value"]}export VAR=value)
  • Special commands module and spack in dict format expand to module load / spack load

If you need a different execution order, put everything in setup:

[env]
setup = [
    {spack = "python@3.11"},
    {module = "gcc/12.2.0"},
    {source = "/path/to/venv/bin/activate"},
]

Shell special characters (;|&`$<>\'"\n and space) are prohibited in arguments for security.

PJM Configuration

For PJM scheduler, use array format for options:

[cluster]
scheduler = "pjm"

[pjm]
options = [
    ["-L", "node=12"],
    ["-L", "rscgrp=small"],
    ["-L", "elapse=00:30:00"],
    ["--mpi", "max-proc-per-node=4"],
    ["-g", "laa4Hoo5"],
    ["-s"]
]

$XDG_CONFIG_HOME/hpc/config.toml (default: ~/.config/hpc/config.toml) will be copied as hpc.toml if it exists when running hpc init.

Requirements

  • Python 3.11+
  • SSH access to HPC cluster (key-based authentication recommended)
  • rsync
  • Slurm or PJM on the remote cluster

rsync Note

rsync from https://rsync.samba.org/ is recommended over macOS's built-in openrsync. When using checksum-based comparison (compare = "checksum", default), openrsync has a bug where files with sizes that are exact multiples of 64 bytes are always detected as changed, even when identical. This is due to a protocol 29 checksum boundary issue. Confirmed with macOS 15.7's openrsync (protocol version 29, rsync version 2.6.9 compatible). If concerned, use [sync] compare = "timestamp" instead.

On macOS, install rsync via Homebrew:

brew install rsync

Claude Code Integration

This project includes a Claude Code skill (.claude/skills/hpc/SKILL.md) that teaches Claude how to use the hpc CLI. The CLI reference in the skill is dynamically generated via hpc --skill to stay in sync with the code.

Development

make test      # run tests
make lint      # run linter
make check     # run all checks

About

An automation CLI tool for HPC workflow: source code/data sync and scheduler job management (Slurm/PJM)

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors