Skip to content

Creation of a new SIG - GPU-Based Model Integrity SIG at Sandbox stage #41

@sandlbn

Description

@sandlbn

Together with @marcelamelara and Zahra Ghodsi, we would like to propose a new SIG focused on GPU based model integrity. We are seeking feedback and interested participants.

GPU-Based Model Integrity SIG

Creation of a new Special Interest Group (SIG) at Sandbox stage

Proposed focus, intent, goals, and/or deliverables

Focus / Mission

As ML models grow in size and complexity, ensuring their integrity throughout the supply chain becomes increasingly critical. Traditional CPU-based integrity verification approaches face significant challenges:

  • Scale: Modern foundation models can exceed hundreds of gigabytes, making CPU-based hashing prohibitively slow. For example, hashing a 100GB model on CPU can take 10+ minutes, creating bottlenecks in CI/CD pipelines and deployment workflows.
  • Provenance: Organizations need to verify not just that a model is unchanged, but its complete lineage from training through deployment
  • Verification granularity: Different use cases require different levels of verification from full model validation to selective layer verification

This SIG addresses these challenges by leveraging GPU acceleration for model integrity operations (hashing, signing, attestation). Model integrity is one component of comprehensive model provenance; this SIG's work will integrate with and enable broader provenance frameworks such as Model Transparency and Atlas.

Goals

  • Establish a hardware-agnostic API and workflow for GPU-based ML model hashing and signing, with reference implementations for major GPU vendors.
  • Enable ML model producers to generate trustworthy GPU-based model hashes and signatures, and model consumers to verify GPU-signed models.
  • Evaluate, standardize, and implement GPU-accelerated versions of below integrity algorithm families:
Algorithm Properties GPU
SHA-256/512 Widely adopted, FIPS-compliant Moderate parallelization
SHA-3 (Keccak) Quantum-resistant design, NIST standard Good parallelization
Lattice Hash Efficient update Good parallelization

Deliverables

  • API specification for GPU-based ML model hashing and signing across GPU vendors
  • Libraries implementing the API and workflow for common GPU hardware GPU-optimized Merkle tree implementations enabling selective layer verification
  • Talk at industry conference (target: Open Source Summit or Open Source SecurityCon, 2026)
  • Stretch goal: Peer-reviewed academic paper documenting algorithm performance and security analysis (target venue: USENIX Security, IEEE S&P, or equivalent)

Success Metrics

  • API specification adopted by at least 2 downstream projects or frameworks
  • Demonstrated speedup over CPU-based hashing for models >10GB
  • Reference implementations available for at least 2 GPU vendors
  • Stretch goal: Peer-reviewed publication accepted at a recognized venue

2026 Roadmap

Quarter Milestone
Q1 2026 API specification v0.1; Merkle tree structure proposal
Q2 2026 Reference implementation for NVIDIA/Intel GPUs; Provenance integration spec draft; Algorithm benchmarking results published
Q3 2026 API specification v1.0 incorporating community feedback;
Q4 2026 Academic paper submission; Conference talk delivery; Integration testing with Model Transparency framework

Future Directions

While the initial focus is on model integrity, the techniques and infrastructure developed by this SIG are directly applicable to dataset integrity. Training datasets face similar challenges:

  • Scale**:** Large-scale datasets (e.g., LAION, Common Crawl derivatives) can reach terabytes, making integrity verification even more demanding than for models.
  • Provenance**:** Tracking dataset lineage—including filtering, deduplication, and augmentation steps—is essential for reproducibility and compliance.
  • Tamper detection**:** Data poisoning attacks target training data; efficient integrity verification can help detect unauthorized modifications.

Pending successful delivery of model integrity milestones, the SIG may expand its scope to include GPU-accelerated dataset hashing, signing, and attestation in 2027 and beyond.

List SIG Lead(s)

The SIG must have a minimum of 1 Lead

List of interested individuals

The SIG have a minimum of 3 members with 2 different organizational affiliations.

Governing Body

SIGs may report to an existing OpenSSF Working Group or directly to the TAC as their governing body. The SIG commits to providing the governing body quarterly updates on progress.

  • "AI/ML Security WG”

SIG References

Reference URL
Repo https://github.com/Andrew-Gan/sentry
Meeting Agenda During AI/ML WG
OSSF Calendar Entry To be added upon SIG approval
Security.md In progress
Roadmap See 2026 Roadmap above
code-of-conduct.md https://github.com/ossf/ai-ml-security/blob/main/code-of-conduct.md
Demos Planned for Q2 2026
Papers Sentry: Authenticating Machine Learning Artifacts on the Fly Scalable GPU-Based Integrity Verification for Large Machine Learning Models
Other

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions