Conversation
8c08388 to
3823ccc
Compare
6940d9f to
0ac2a42
Compare
0ac2a42 to
45de070
Compare
64d822a to
cdcbcc9
Compare
Reproducible end-to-end training test: small model (8 channels, 2 residual blocks) on 5 MP samples, verifies final validation loss matches platform-specific expected values. - `tests/e2e_train.py`: CLI with `--gpu`, `--epochs`, `--update-expected` - `tests/expected_values.json`: per-platform expected losses - `tests/test_e2e_train.py`: pytest wrapper - `examples/e2e_training_demo.ipynb`: notebook with training plots - Rename `tests/electrai/` → `tests/test_electrai/` to avoid shadowing the installed package when running scripts directly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
470eb7e to
52b81a3
Compare
GPU CI workflows (EC2 via ec2-gha): - `gpu-e2e.yml`: deterministic test on GPU + CPU, triggers on PRs to main - `gpu-benchmark.yml`: larger model (32ch/16 blocks) benchmark - `gen-expected.yml`: regenerate expected values on macOS + Linux - `README.md`: documents workflow responsibilities and expected values Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
52b81a3 to
2006e48
Compare
`tests/e2e_train.py`: - `--wandb-project` (`-W`): enable WandB logging with project name, tags (platform, channels, blocks, git SHA), and CI metadata - `--max-file-size` (`-M`): skip input files larger than N MB to avoid GPU OOM on large grids (e.g. 25MB safe for L4 24GB VRAM) - Deterministic mode disabled when WandB active (benchmark mode) `gpu-benchmark.yml`: - `max_file_size` input (default 25MB): filters during S3 sync via `s3api` query, avoiding download of oversized samples - `wandb_project` input: passes `WANDB_API_KEY` from secrets, adds WandB link to step summary 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add `scripts/s3_sync.py`: standalone `uv run --script` CLI that syncs CHGCAR samples from S3, filters by file size, creates map file, and writes `DATASET_HASH` to `$GITHUB_OUTPUT` when in CI - Replace ~30 lines of bash in `gpu-benchmark.yml` with script call - Auto-compute `dataset_version` as MD5 hash of sorted sample IDs (in both `s3_sync.py` and `e2e_train.py` as fallback) - Log per-sample file-size filtering to stderr in `e2e_train.py` - Mirror S3 data under `data/s3/` (gitignored) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds GPU-focused CI workflows (E2E determinism + benchmarks) and supporting tooling/tests to make model training runs reproducible and traceable across platforms/instances.
Changes:
- Added EC2-backed GPU workflows for deterministic E2E training and scheduled/dispatchable benchmarks (with optional WandB logging and a summary report).
- Added deterministic E2E training harness + expected-values artifacts, plus an S3 sync helper to build benchmark datasets reproducibly.
- Added/expanded unit tests around Zarr conversion, S3 Zarr data loading, and SRGAN model components; added a demo notebook.
Reviewed changes
Copilot reviewed 11 out of 19 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_electrai/zarr_conversion/test_convert_to_zarr.py |
New tests for CHGCAR loading and Zarr conversion. |
tests/test_electrai/model/test_srgan_layernorm_pbc.py |
New comprehensive unit tests for SRGAN/PBC components. |
tests/test_electrai/dataloader/test_mp_zarr_s3_data.py |
New unit tests for local/S3 Zarr dataset reader and splitting. |
tests/test_e2e_train.py |
Pytest wrapper that runs the deterministic training script in a subprocess. |
tests/expected_values.json |
Platform-specific expected loss values for deterministic E2E check. |
tests/e2e_train.py |
Deterministic E2E training runner; supports GPU, filtering, and WandB logging. |
scripts/s3_sync.py |
CLI helper to sync/prepare S3 samples and produce a deterministic dataset hash. |
.github/workflows/gpu-e2e.yml |
EC2-backed workflow for GPU + CPU deterministic E2E training checks. |
.github/workflows/gpu-benchmark.yml |
EC2-backed GPU benchmark workflow (scheduled + manual) with optional CPU comparison and WandB integration. |
.github/workflows/gen-expected.yml |
Generates expected-values artifacts on GitHub-hosted macOS/Linux runners. |
.github/workflows/README.md |
Documentation for workflows and expected-values generation. |
specs/04-ci-wandb.md |
Spec/design doc for CI → WandB tracking and regression detection ideas. |
pyproject.toml |
Adds notebook execution dependencies + updates Ruff per-file ignores. |
examples/e2e_training_demo.ipynb |
Demo notebook showing deterministic training and visualization. |
.gitignore |
Ignores S3 mirror directory (data/s3/). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Gate WandB logging on `WANDB_API_KEY` presence in `gpu-benchmark.yml` - Move notebook deps (`nbconvert`, `ipykernel`, `papermill`) to `[notebooks]` optional extra - Skip `gpu-e2e.yml` EC2 job on fork PRs (secrets unavailable) - Fix `get_platform()` to use resolved accelerator, not `--gpu` flag - Document S3 `ListObjectsV2` lexicographic ordering guarantee in `s3_sync.py` - Fix stale WandB statement in `specs/04-ci-wandb.md` - Update `.github/workflows/README.md` with current defaults and features Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 15 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix `if final_train_loss:` → `is not None` (0.0 would be falsy) - Use `$EPOCHS` input in CPU baseline step instead of hardcoded `5` - Use env vars for S3 sync step inputs (script injection hardening) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Re: re-review #3 — addressed in 2120663:
|
|
@claude another review pls |
|
Claude finished @ryan-williams's task in 3m 5s —— View job Code Review (re-review #4)
All issues from the previous reviews are confirmed fixed. I found one unresolved issue and two minor nits. Unresolved:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 15 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Avoids script injection risk by keeping all expression interpolations in `env:` block rather than inline in `run:` script. Also fix stale `gpu-weekly.yml` reference in spec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the re-review! Responses:
bare
|
|
@claude another review pls |
|
Claude finished @ryan-williams's task in 2m 14s —— View job Code Review (re-review #5)
All issues from the previous four rounds of review have been confirmed fixed. The PR is ready to merge. Full verification below. Confirmed fixes from re-review #4
Minor: bare
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 15 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Read `tests/expected_values.json` before merging so that keys from other workflows (e.g. `linux-gpu` from `gpu-e2e.yml`) aren't dropped. Also use `with` statements for file reads. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 15 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Without it, `tests/expected_values.json` doesn't exist and the merge step can't preserve `linux-gpu` keys from the repo. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 15 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
forklady42
left a comment
There was a problem hiding this comment.
Overall, looking good but I noticed a couple of divergences from the current model config.
tests/e2e_train.py
Outdated
|
|
||
| # Hydra-style model config for LightningGenerator (uses hydra.utils.instantiate) | ||
| cfg.model = { | ||
| "_target_": "electrai.model.srgan_layernorm_pbc.GeneratorResNet", |
There was a problem hiding this comment.
electrai.model.resunet.ResUNet3D is now our default model.
tests/e2e_train.py
Outdated
| cfg.dataset_name = "mp" | ||
| cfg.data_path = data_path or str(repo_root / "data/MP/chgcars/input") | ||
| cfg.label_path = label_path or str(repo_root / "data/MP/chgcars/label") | ||
| cfg.map_path = map_path or str(repo_root / "data/MP/map/map_sample.json.gz") |
There was a problem hiding this comment.
Is this supposed to match our current config? We shifted patterns since this PR started 😅. Now relying on a txt file similar to the map_path with data and label directories under the same root path.
tests/e2e_train.py
Outdated
| echo(f"Train samples: {len(train_data)}, Val samples: {len(test_data)}") | ||
|
|
||
| def dict_collate(batch): | ||
| """Collate tuples from MPDataset into dicts for LightningGenerator.""" |
There was a problem hiding this comment.
For my own clarity, MPDataset isn't a class, right? I initially assumed it was and looked around for it.
ResUNet3D is now the default model. Update hydra config dict, regenerate darwin-arm64 expected values, remove stale linux/linux-gpu values (to be regenerated via CI). Also fix `MPDataset` → `RhoData` in docstring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace `--data-path`/`--label-path`/`--map-path` with `--data-root` - Use `RhoRead` LightningDataModule (filelist + data/label dirs) - Rename `data/MP/chgcars/input/` → `data/` to match `RhoRead` layout - Add `mp_filelist.txt` for sample IDs - Update `s3_sync.py` to output `data/` dir + filelist (not `input/` + json.gz map) - Update benchmark workflow `DATA_ARGS` - Regenerate darwin-arm64 expected values (linux/linux-gpu pending CI) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
tests/e2e_train.py) with platform-specific expected valuesgpu-e2e.yml) using ec2-gha on EC2g6.xlarge(NVIDIA L4), runs on PRs targetingmaingpu-benchmark.yml) with configurable model size, weekly schedule, and manual dispatchscripts/s3_sync.py: reusable S3 data sync with size filtering and deterministic dataset hashinggen-expected.ymlfor regenerating expected values on GHA runners (macOS, Ubuntu)e2e_training_demo.ipynbnotebook with training visualization.github/workflows/README.mddocumenting all workflowstests/electrai/→tests/test_electrai/to fix import shadowing--gradient-checkpointflag toe2e_train.pyfor large models on limited VRAMPassing Runs
Required Setup
Secrets:
GH_SA_TOKEN— GitHub PAT for runner registrationWANDB_API_KEY— WandB API key (optional, for benchmark logging; currently Ryan's personal key)IAM/OIDC:
ec2-ghaOIDC authenticationScreenshots
Test plan
g6.xlargegen-expected.ymlgenerates correct values on macOS and Ubuntudarwin-arm64,linux,linux-gpu)scripts/s3_sync.pydownloads correct samples and generates deterministic dataset hash