Validate run_scientific_agent + analyze_dataset on local + Improv + UCAR (#28) by rajeeja · Pull Request #41 · UXARRAY/uxarray-mcp-server

rajeeja · 2026-05-09T19:30:07Z

Summary

Closes #28. Adds a reproducible end-to-end validation harness for both the deterministic (analyze_dataset) and autonomous (run_scientific_agent) orchestrators, and confirms they work across local + Improv + UCAR.

scripts/validate_orchestrators.py exercises six scenarios and reports a per-scenario pass/fail summary:

Scenario	Result
`analyze_dataset` — local healpix:2 (no data)	PASS
`analyze_dataset` — local synthetic UGRID + data	PASS
`run_scientific_agent` — local healpix:3	PASS
`run_scientific_agent` — local synthetic UGRID + data	PASS
`analyze_dataset` — REMOTE healpix:2 on Improv	PASS
`analyze_dataset` — REMOTE healpix:2 on UCAR	PASS

For the remote scenarios, top-level provenance reads venue=hpc and the per-stage inner provenance carries the full hpc:<endpoint_id> so the actual execution venue is verifiable end-to-end. For run_scientific_agent, all four stages (Analyze → Plan → Execute → Verify) appear in reasoning_trace.

Acceptance criteria from #28

run_scientific_agent completes all 4 stages on a local mesh
analyze_dataset/orchestrators round-trip on Improv with the HPC endpoint
Output is coherent and scientifically meaningful
Failures in individual stages handled gracefully — analyze_dataset records per-stage warnings and continues; run_scientific_agent falls through to the next stage on failure

Test plan

uv run python scripts/validate_orchestrators.py → 6/6 PASS
uv run pre-commit run --all-files → clean
uv run pytest tests/ --ignore=tests/test_remote_agent.py → 247 passed
(Manual follow-up) re-run the harness against MPAS QU/480 once a known-good remote dataset path is captured — out of scope for this PR; the harness is set up to take any path

Files

scripts/validate_orchestrators.py — new harness (reads config.yaml)
CHANGELOG.md — Unreleased entry under Fixed

Closes #28 by exercising both analyze_dataset and run_scientific_agent against six scenarios: - analyze_dataset: local healpix:2, local synthetic UGRID + data, Improv remote, UCAR remote - run_scientific_agent: local healpix:3, local synthetic UGRID + data (covers all four Analyze/Plan/Execute/Verify stages) All six scenarios pass. Top-level provenance reads venue=hpc on remote runs; per-stage inner provenance carries the full hpc:<endpoint_id>. The harness lives at scripts/validate_orchestrators.py and reads endpoints from config.yaml, so it's reproducible by anyone with the same endpoint UUIDs configured.

rajeeja added 2 commits May 9, 2026 15:29

Apply CI ruff-format multi-line normalization

f54275e

rajeeja merged commit a3665f4 into main May 9, 2026
8 checks passed

rajeeja deleted the rajeeja/validate-orchestrators-issue-28 branch May 9, 2026 19:52

rajeeja mentioned this pull request May 9, 2026

Add one-shot analysis entry point: analyze(path) or analyze(session_id, dataset_handle) #32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate run_scientific_agent + analyze_dataset on local + Improv + UCAR (#28)#41

Validate run_scientific_agent + analyze_dataset on local + Improv + UCAR (#28)#41
rajeeja merged 2 commits into
mainfrom
rajeeja/validate-orchestrators-issue-28

rajeeja commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rajeeja commented May 9, 2026

Summary

Acceptance criteria from #28

Test plan

Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant