feat(audit): add ado-aw audit <build-id-or-url> command#691
Conversation
🔍 Rust PR ReviewSummary: Well-structured feature addition with good error handling overall, but three actionable issues worth fixing before merge. Findings🐛 Bugs / Logic Issues
🔒 Security Concerns
|
Three issues raised by the Rust PR Reviewer on #691: 1. **Lexicographic sort wrong for multi-digit run IDs.** Previously `find_artifact_dir` / `find_verdict_path` / `top_level_dirs_with_prefix` picked the "lexicographically last" `<prefix>_<id>` directory, which sorts `_9` after `_10` (because `'9' > '1'`). On a build retry that produced both `analyzed_outputs_9` and `analyzed_outputs_10`, the older verdict would be read and the run could be mis-classified as safe. New `crate::audit::cmp_numeric_suffix` extracts the trailing token after the final `_`, parses it as `u64`, and compares numerically with a lexicographic tie-breaker for non-numeric suffixes. All three call sites now use it. Regression tests added in mod.rs, detection.rs, and cli.rs. 2. **Security: `ADO_AW_TEST_ORG_URL` was always active in production.** The override was `#[doc(hidden)]` but not gated by build mode, so a stray env var (debugging leftover, hostile CI environment) could silently redirect ADO REST calls to an attacker-controlled URL in a release binary. Gated on `cfg(debug_assertions)`: debug builds (`cargo test`, `cargo run`) keep the override AND emit a loud `warn!` on every invocation; release builds (all published artifacts via `cargo build --release`) replace the body with a no-op so a stray env var has no effect. The integration test in `tests/audit_it.rs` continues to work because `cargo test` builds in debug mode. 3. **Blocking `std::fs::read_dir` in async context.** `safe_outputs.rs` had two helpers (`top_level_dirs_with_prefix`, `collect_named_files`) using sync I/O from inside `async fn analyze_safe_outputs`. On a Tokio multi-thread runtime this blocks an executor thread for the duration of the directory walk. Both helpers converted to `async fn` using `tokio::fs::read_dir`. The recursive `collect_named_files` uses `Box::pin` to satisfy the async-recursion shape (consistent with the existing pattern in `crate::detect::scan_directory`). Tests: 1745 unit tests + 3 integration tests pass (up from 1740 — 5 new regression tests for the numeric-suffix bug). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Rust PR ReviewSummary: Looks good overall — well-structured module, solid test coverage, good error handling patterns throughout. A few specific concerns worth addressing before merge. Findings🐛 Bugs / Logic Issues
|
Single-run audit: download a build's artifacts, run every analyzer
(firewall, MCP gateway, OTel, safe outputs, detection verdict, build
timeline, missing tools/data/noops), and emit a Markdown or JSON
report. ADO-side counterpart to `gh aw audit`.
New module tree under `src/audit/`:
- `model.rs` — `AuditData` (drift-compatible with gh-aw's top-level
contract; adds ADO-specific `detection_analysis`,
`safe_output_execution`, `rejected_safe_outputs` sections).
- `url.rs` — parses bare IDs, dev.azure.com URLs, legacy
visualstudio.com URLs, and on-prem Azure DevOps Server URLs (with
optional `&j=`/`&t=`/`&s=` job/step anchors).
- `cache.rs` — CLI-version-keyed `run-summary.json` with atomic writes.
- `analyzers/{firewall,policy,mcp,otel,safe_outputs,detection,missing,jobs}.rs`
— eight defensive NDJSON/REST analyzers.
- `findings.rs` — eight heuristic rules emitting severity-rated
findings + recommendations.
- `render/{console,json}.rs` — two renderers; JSON shape is the
public contract.
- `cli.rs` — orchestration: URL parse → auth → metadata fetch →
artifact download → analyzers → findings → cache → render.
Unified rejection trace: when the aggregate `THREAT_DETECTION_RESULT`
has any threat flag set, every proposal lands in
`not_processed_due_to_aggregate_gate` carrying the aggregate
`reasons[]`, exactly one severity-`high` `KeyFinding` is emitted, and a
`rejected_safe_outputs` rollup appears at the top level.
Pipeline-side runtime additions (so an `ado-aw audit` of an existing
build has the data it needs):
- `src/data/*-base.yml` (via `AdoAwMarkerExtension`): emits
`staging/aw_info.json` at runtime with engine, model, agent name,
source path, target, compiler version, and ADO build context.
- `src/execute.rs`: writes a per-item `safe-outputs-executed.ndjson`
in `<output-dir>` so the audit can show the proposed → detection →
executed trace.
CLI surface:
ado-aw audit <build-id-or-url>
-o, --output <dir> # default ./logs
--json
--org / --project / --pat
--artifacts <agent,detection,safe-outputs>
--no-cache
New dependencies: `zip` (artifact unpack), `wiremock` (dev only —
integration test mock server).
Tests: 80 new audit unit tests + 3 integration tests against a fake
ADO REST server (happy path, permission-denied, cache hit) using a
thin `ADO_AW_TEST_ORG_URL` test seam. 1740 total tests pass.
Docs: new `docs/audit.md`; updates to `docs/cli.md`, `README.md`,
`AGENTS.md` index, and `prompts/debug-ado-agentic-workflow.md` (Step 1
first-move + new Step 2a-prime + `AuditData` reference + jq-diff
fallback).
Out of scope (explicit follow-ups): diff mode, cross-run trends,
`--parse` log.md/firewall.md, job/step-anchored audit, MCP-exposed
audit, per-item detection verdict (upstream coordination with gh-aw),
partial-approval gating, AWF policy-manifest plumbing, AWF
token-usage.jsonl, `audit-manifest.json` build inventory.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three issues raised by the Rust PR Reviewer on #691: 1. **Lexicographic sort wrong for multi-digit run IDs.** Previously `find_artifact_dir` / `find_verdict_path` / `top_level_dirs_with_prefix` picked the "lexicographically last" `<prefix>_<id>` directory, which sorts `_9` after `_10` (because `'9' > '1'`). On a build retry that produced both `analyzed_outputs_9` and `analyzed_outputs_10`, the older verdict would be read and the run could be mis-classified as safe. New `crate::audit::cmp_numeric_suffix` extracts the trailing token after the final `_`, parses it as `u64`, and compares numerically with a lexicographic tie-breaker for non-numeric suffixes. All three call sites now use it. Regression tests added in mod.rs, detection.rs, and cli.rs. 2. **Security: `ADO_AW_TEST_ORG_URL` was always active in production.** The override was `#[doc(hidden)]` but not gated by build mode, so a stray env var (debugging leftover, hostile CI environment) could silently redirect ADO REST calls to an attacker-controlled URL in a release binary. Gated on `cfg(debug_assertions)`: debug builds (`cargo test`, `cargo run`) keep the override AND emit a loud `warn!` on every invocation; release builds (all published artifacts via `cargo build --release`) replace the body with a no-op so a stray env var has no effect. The integration test in `tests/audit_it.rs` continues to work because `cargo test` builds in debug mode. 3. **Blocking `std::fs::read_dir` in async context.** `safe_outputs.rs` had two helpers (`top_level_dirs_with_prefix`, `collect_named_files`) using sync I/O from inside `async fn analyze_safe_outputs`. On a Tokio multi-thread runtime this blocks an executor thread for the duration of the directory walk. Both helpers converted to `async fn` using `tokio::fs::read_dir`. The recursive `collect_named_files` uses `Box::pin` to satisfy the async-recursion shape (consistent with the existing pattern in `crate::detect::scan_directory`). Tests: 1745 unit tests + 3 integration tests pass (up from 1740 — 5 new regression tests for the numeric-suffix bug). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
42e5af1 to
c07ea29
Compare
Two issues in src/execute.rs that made the executed-NDJSON manifest
silently mis-classify entries:
1. is_budget_exhausted parsed the human-readable message
("Skipped...maximum...already reached"). Any phrasing tweak to
check_budget would have silently downgraded budget-exhausted
records to status: "failed" in every audit log, with no
compile-time signal.
2. is_warning() entries (e.g. noop / missing-tool that succeeded
without ADO credentials) were emitted as status: "skipped",
conflating successful-but-no-op runs with the rejection bucket
used by the audit rollup.
Fixes:
- Add ExecutionResult.budget_exhausted: bool with budget_exhausted()
constructor and is_budget_exhausted() accessor. check_budget now
emits a structurally-tagged result; execution_record_status keys
off the flag. Refactor-safe.
- Map warning results to status: "warning" (distinct from "skipped").
Add SafeOutputStatus::Warning; counted toward executed_count, never
toward rejected_by_execution_count or the rejection rollup.
- Update affected unit tests; add coverage for budget_exhausted
serialization round-trip and warning-status analyzer mapping.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Rust PR ReviewSummary: Looks good overall — the audit module is well-structured, error handling is solid, and security-sensitive paths (zip extraction, cache writes) are handled correctly. Three issues worth addressing before merge. Findings🐛 Bugs / Logic Issues
🔒 Security Concerns
|
…xfil Two issues surfaced by the adversarial review: 1. download_build_artifact discarded its �uth argument src/ado/mod.rs::download_build_artifact took &AdoAuth but called client.get(download_url).send() with no �uth.apply(...) wrapper, silenced by let _ = auth;. For ADO artifact resource types whose downloadUrl is not pre-signed (legacy/Container artifacts and many on-prem ADO Server topologies) the request returned 401/403, and the 401 branch then misleadingly told the user to check their PAT scopes — which had never been sent. Fix: wrap with �uth.apply(...) to match list_build_artifacts / get_build. Pre-signed Artifact Services URLs continue to work because they ignore the Authorization header in favor of their sig= query string. 2. �udit <URL> sent the local PAT to any host in the URL src/audit/url.rs accepted on-prem URLs with an arbitrary host; apply_parsed_context_overrides plumbed that host straight into ctx.org_url; and AdoAuth::apply then attached the PAT via HTTP Basic Auth — to whatever host the URL named. A user social-engineered into running �do-aw audit https://attacker.example.com/Coll/Proj/_build/results?buildId=1 would silently exfiltrate their PAT. Fix: validate_audit_url_host() now runs before any auth-bearing request. Microsoft-managed cloud hosts (dev.azure.com, *.visualstudio.com) are always trusted. Any other host must match the host in the trusted ADO context, which is itself resolved from --org or the local git remote — both explicit, locally controlled trust anchors. The check is host-suffix anchored (rejects .visualstudio.com bare-suffix and �isualstudio.com.attacker.example lookalikes) and case-insensitive. Adds 11 unit tests covering: cloud-host allowlist (positive and case-insensitive), bare-suffix / lookalike rejection, trusted-context match, trusted-context mismatch, no-trusted-context fallback, and host extraction edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Follow-up to ea9c032. The previous fix derived the trusted host from resolve_ado_context(), which requires BOTH --org AND --project when running outside a git repo. That created a UX regression: when a user ran �do-aw audit https://onprem.example.com/coll/proj/_build/results?buildId=42 from an arbitrary folder with --org https://onprem.example.com/coll, the trust anchor still came back as None (because --project wasn't passed), and validate_audit_url_host then rejected the URL host telling the user to "pass --org" — which they already had. Fix: introduce resolve_trusted_host(cwd, org_flag) that derives a host from --org if provided (any form normalize_org_url accepts), else from the git remote of cwd, else None. validate_audit_url_host now takes Option<&str> directly. Full-context resolution still runs afterward for its original purpose (supplying defaults the URL overrides supersede). Adds 4 regression tests, including one for the original failure mode: running in an arbitrary folder with --org alone (no --project) must yield a usable trust anchor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Rust PR ReviewSummary: Looks good overall — well-structured feature with solid error handling and test coverage. One correctness concern worth addressing before this pattern is widely deployed. Findings🐛 Bugs / Logic Issues
|
feat(audit): addado-aw audit <build-id-or-url>commandADO-side counterpart to
gh aw audit. Single-run audit only in this MVP: download a build's artifacts, run every analyzer (firewall, MCP gateway, OTel, safe outputs, detection verdict, build timeline, missing tools/data/noops), and emit a Markdown or JSON report.What ships
New
src/audit/module tree:model.rsAuditData— top-level JSON contract. Drift-compatible with gh-aw's shape; adds ADO-specificdetection_analysis,safe_output_execution,rejected_safe_outputssections.url.rsdev.azure.comURLs, legacy*.visualstudio.comURLs, on-prem Azure DevOps Server URLs (with optional&j=/&t=/&s=job/step anchors).cache.rs<output>/build-<id>/run-summary.jsonwith atomic temp-file + rename writes.analyzers/firewall.rsanalyzers/policy.rspolicy-manifest.json+audit.jsonl→ rule hit counts.analyzers/mcp.rsunreliableflagging), failures.analyzers/otel.rsaw_info.json→ metrics + engine config.analyzers/safe_outputs.rscontext.analyzers/detection.rsthreat-analysis.json→ DetectionAnalysis.analyzers/missing.rsanalyzers/jobs.rs/timelineREST →JobData[].findings.rsrender/console.rsrender/json.rscli.rsPipeline-side runtime additions (so
ado-aw auditof an existing build has the inputs it needs):src/data/*-base.ymltemplates emitstaging/aw_info.jsonat runtime (engine, model, agent name, source, target, version, build context). Generated by an extension toAdoAwMarkerExtension.src/execute.rswrites per-itemsafe-outputs-executed.ndjsonin<output-dir>so the audit can traceproposed → detection → executedper safe output.CLI surface
Unified rejection trace
When the aggregate
THREAT_DETECTION_RESULThas any threat flag set, every proposed safe output lands insafe_output_execution[*].status = not_processed_due_to_aggregate_gate, carries the aggregatereasons[](annotatedapplies_to_whole_batch: true), and exactly one severity-highKeyFindingis emitted summarizing which threat flags fired and how many proposals were dropped. A top-levelrejected_safe_outputsrollup mirrors the same info for--jsonconsumers.The threat-analysis prompt itself is unchanged — it's identical to gh-aw's today, and per-item verdicts will be coordinated upstream rather than forked.
Dependencies
zip— unpack downloaded ADO PipelineArtifacts.wiremock(dev only) — fake ADO REST server for the integration tests.Tests
tests/audit_it.rs) against a fake REST server: happy path, permission-denied, cache hit.Docs
docs/audit.md— accepted URL formats, flag table, output layout,AuditDatashape, cache behavior, permission-failure UX, out-of-scope follow-ups.docs/cli.md— newauditsubcommand block.README.md— one-line CLI entry.AGENTS.mdindex — pointer todocs/audit.mdunder "Compiler internals & operations".prompts/debug-ado-agentic-workflow.md— Step 1 first-move callout, new Step 2a-prime (runado-aw audit --jsonbefore raw MCP timeline/log calls),AuditDatatop-level-key reference table, jq-diff fallback note.create-/update-prompts intentionally untouched (post-run inspection is debug-flavored).Validation
cargo build✓cargo test✓ (1740 passed, 0 failed)cargo clippy --all-targets --all-features✓ (warnings only, all non-blocking style nits; no new errors)Explicitly out of scope (recorded as follow-ups)
ado-aw audit <a> <b>)ado-aw audit --last N)--parselog.md / firewall.md renderers (Rust-native, no JS bundle)agentic-pipelinesMCP tool for in-pipeline self-audit)audit-manifest.jsonbuild inventoryEach is recorded in the session plan under "Out-of-scope follow-ups" so they're not lost.