Skip to content

feat(integrations): Add VAREK Guardrails + W&B pipeline verification …#620

Open
kwdoug63 wants to merge 2 commits into
wandb:masterfrom
kwdoug63:kwdoug63/add-varek-guardrails-integration
Open

feat(integrations): Add VAREK Guardrails + W&B pipeline verification …#620
kwdoug63 wants to merge 2 commits into
wandb:masterfrom
kwdoug63:kwdoug63/add-varek-guardrails-integration

Conversation

@kwdoug63
Copy link
Copy Markdown

What this example demonstrates

This example shows how to verify that AI/ML pipeline code is being executed under deterministic kernel-level isolation (seccomp-bpf + cgroups + wallclock), with all results logged to W&B for audit and reproducibility.

VAREK Guardrails is an open-source runtime sandbox that wraps subprocess.Popen with a seccomp-bpf filter, a cgroups v2 subtree (memory/CPU/PIDs), and a wallclock killer. This integration demonstrates three patterns that map directly to W&B-supported workflows.

What's included

examples/varek-guardrails/
├── README.md
├── requirements.txt
├── verification_artifact.py # 5-test verification battery → W&B Artifact
├── telemetry_stream.py # subscribe_telemetry → W&B run logs
├── benchmark_suite.py # 8-payload regression suite → W&B Table
└── scripts/
└── with-cgroup.sh # idempotent cgroup subtree wrapper

Why W&B users benefit

For teams running untrusted-code execution (LLM agents, eval harnesses, code-as-data pipelines), this provides a verifiable audit trail: every claim about isolation is backed by a logged run, and the runs are reproducible by any reviewer with the same versions installed. The Artifact pattern lets downstream stages depend on the verification result; the Table pattern catches policy regressions automatically.

Tested against

  • varek-guardrails 1.1.1
  • wandb 0.26.1
  • weave 0.50+
  • Ubuntu 24.04 (Linux 6.x, cgroups v2)

Live verification runs

End-to-end runs from the development environment (sober-agents entity):

Script Run Result
verification_artifact.py fast-jazz-2 5/5 PASS, artifact logged
telemetry_stream.py skilled-bird-2 3 payloads, ~2475 audit events captured
benchmark_suite.py fluent-butterfly-2 8/8 OK, match_rate=100%

Notes for reviewers

  • Requires Linux with cgroups v2 (kernel 4.5+, all major distros from 2019 onward)
  • The wrapper script (scripts/with-cgroup.sh) is required because systemd's Delegate= directive does not apply to .slice units, so cgroup subtree controls must be set explicitly before invocation
  • The benchmark suite includes a known_allowance category for os.system passthrough — preserved deliberately as a regression test against future policy tightening in varek_guardrails
  • The telemetry callback uses a PID guard + queue.Queue buffer to remain safe in post-fork-pre-exec contexts (audit hooks fire in forked children; calling wandb.log directly there breaks the seccomp arming)
  • Directory placement (examples/varek-guardrails/) is a suggestion — happy to move under a different parent (e.g. examples/security/ or examples/integrations/) based on maintainer preference

Happy to iterate on naming, scope, or any of the patterns.

…example

Adds a script-based integration example demonstrating how to use
VAREK Guardrails (open-source seccomp-bpf + cgroup runtime sandbox)
with W&B Runs, Artifacts, Tables, and Weave for verifiable execution
of untrusted ML pipeline code.

Three demonstration scripts under examples/varek-guardrails/:
- verification_artifact.py: 5-test verification battery, logs result
  as a W&B Artifact for downstream consumption.
- telemetry_stream.py: bridges varek_guardrails subscribe_telemetry
  audit-hook events into W&B run logs, with PID-guarded callback for
  fork-safety.
- benchmark_suite.py: 8-payload regression suite (benign, malicious,
  resource, edge, known_allowance) with W&B Table for sortable
  comparison of expectation vs. observed outcome.

Tested end-to-end against varek_guardrails 1.1.1, wandb 0.26.1,
weave 0.50+ on Ubuntu 24.04 (cgroups v2). Live runs linked in PR
description.
@socket-security
Copy link
Copy Markdown

socket-security Bot commented Apr 26, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedweave@​0.52.3795100100100100

View full report

Removes em-dashes, smart quotes from .py and .sh files to clear
GitHub's bidirectional-Unicode warning banner on the PR. README
content unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant