Skip to content

Validate run_scientific_agent + analyze_dataset on local + Improv + UCAR (#28)#41

Merged
rajeeja merged 2 commits into
mainfrom
rajeeja/validate-orchestrators-issue-28
May 9, 2026
Merged

Validate run_scientific_agent + analyze_dataset on local + Improv + UCAR (#28)#41
rajeeja merged 2 commits into
mainfrom
rajeeja/validate-orchestrators-issue-28

Conversation

@rajeeja
Copy link
Copy Markdown
Collaborator

@rajeeja rajeeja commented May 9, 2026

Summary

Closes #28. Adds a reproducible end-to-end validation harness for both the deterministic (analyze_dataset) and autonomous (run_scientific_agent) orchestrators, and confirms they work across local + Improv + UCAR.

scripts/validate_orchestrators.py exercises six scenarios and reports a per-scenario pass/fail summary:

Scenario Result
analyze_dataset — local healpix:2 (no data) PASS
analyze_dataset — local synthetic UGRID + data PASS
run_scientific_agent — local healpix:3 PASS
run_scientific_agent — local synthetic UGRID + data PASS
analyze_dataset — REMOTE healpix:2 on Improv PASS
analyze_dataset — REMOTE healpix:2 on UCAR PASS

For the remote scenarios, top-level provenance reads venue=hpc and the per-stage inner provenance carries the full hpc:<endpoint_id> so the actual execution venue is verifiable end-to-end. For run_scientific_agent, all four stages (Analyze → Plan → Execute → Verify) appear in reasoning_trace.

Acceptance criteria from #28

  • run_scientific_agent completes all 4 stages on a local mesh
  • analyze_dataset/orchestrators round-trip on Improv with the HPC endpoint
  • Output is coherent and scientifically meaningful
  • Failures in individual stages handled gracefully — analyze_dataset records per-stage warnings and continues; run_scientific_agent falls through to the next stage on failure

Test plan

  • uv run python scripts/validate_orchestrators.py → 6/6 PASS
  • uv run pre-commit run --all-files → clean
  • uv run pytest tests/ --ignore=tests/test_remote_agent.py → 247 passed
  • (Manual follow-up) re-run the harness against MPAS QU/480 once a known-good remote dataset path is captured — out of scope for this PR; the harness is set up to take any path

Files

  • scripts/validate_orchestrators.py — new harness (reads config.yaml)
  • CHANGELOG.md — Unreleased entry under Fixed

rajeeja added 2 commits May 9, 2026 15:29
Closes #28 by exercising both analyze_dataset and run_scientific_agent
against six scenarios:
- analyze_dataset: local healpix:2, local synthetic UGRID + data,
  Improv remote, UCAR remote
- run_scientific_agent: local healpix:3, local synthetic UGRID + data
  (covers all four Analyze/Plan/Execute/Verify stages)

All six scenarios pass. Top-level provenance reads venue=hpc on remote
runs; per-stage inner provenance carries the full hpc:<endpoint_id>.

The harness lives at scripts/validate_orchestrators.py and reads
endpoints from config.yaml, so it's reproducible by anyone with the
same endpoint UUIDs configured.
@rajeeja rajeeja merged commit a3665f4 into main May 9, 2026
8 checks passed
@rajeeja rajeeja deleted the rajeeja/validate-orchestrators-issue-28 branch May 9, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

run_scientific_agent: needs end-to-end validation with real multi-variable data

1 participant