Contribute PolicyEngine vocabulary patterns to TRACE / TROv

## Context

At the 2026-04-21 meeting with Lars Vilhuber, Tim Clark, and Casper of the TRACE project, they explicitly invited PolicyEngine to contribute to TRACE's vocabulary / mapping work as we implement. Direct quotes:

> "If you want to enrich bare trace metadata there is already the ability to attach specific policy engine vocabulary that goes along with it that you find useful so you don't have like two different sets of metadata flying around. But again there might be additions that you think might be more generally useful where we're going to have to ponder do we next release of the trace vocabulary the trough."

> "expressing those things and seeing exactly what is the value that you get from trace or using it in a particular way for your use case is gold. Because outside of a particular domain it all just looks like okay just you know to me just running software you've got data and so on. But what is the precise value you're getting? That's what drives the precise features of the vocabulary."

Codex's review of our post-meeting plan flagged that this invited workstream was not captured.

## What we already do

`policyengine.py` emits TROs with a `pe:` namespace (`https://policyengine.org/trace/0.1#`) carrying fields that are not part of TROv core:

- `pe:certifiedForModelVersion`
- `pe:compatibilityBasis` (one of `exact_build_model_version`, `matching_data_build_fingerprint`, `legacy_compatible_model_package`)
- `pe:builtWithModelVersion`
- `pe:dataBuildId`
- `pe:dataBuildFingerprint`
- `pe:certifiedBy`
- `pe:emittedIn` (`local` or `github-actions`)
- `pe:ciRunUrl`, `pe:ciGitSha`, `pe:ciGitRef`
- `pe:bundleFingerprint`, `pe:bundleTroUrl`

These live on the `trov:TransparentResearchPerformance` node so core SHACL shapes are unaffected.

## What to contribute upstream

Some of the `pe:*` fields are idiosyncratic to us (`pe:bundleFingerprint` depends on our specific bundle-manifest shape). Others probably generalize:

- **Institution-backed self-attestation metadata**: `pe:certifiedBy`, `pe:emittedIn`, and something like `pe:productionRuntime` (container image SHA, cloud region, pod/function instance at execution time). Any institution that runs computation on behalf of researchers and signs the output would need these.
- **Microdata-build provenance**: how a derived dataset was produced from licensed inputs + public code + calibration targets. Our `DataReleaseManifest` shape might be worth generalizing as a vocabulary pattern for "this derived artifact was built by this procedure against these inputs, some of which are restricted."
- **Compatibility-basis vocabulary**: the TRO encodes *how* we chose to treat a model-vs-data version pair as compatible (exact match, fingerprint match, legacy-compatible). Other statistical-agency / microsimulation producers probably need similar vocabulary.

## Deliverables

1. **Technical memo** describing the `pe:*` fields we use, why each one exists, and which we think generalize. (See also: `policyengine.py`/`docs/trace-case-study.md`.)
2. **Proposed TROv additions** filed as issues / discussion threads on the TRACE project as we identify patterns that generalize.
3. **Worked examples**: share our actual emitted TROs (bundle + simulation) with the TRACE team as reference implementations during their next vocabulary point release.

## Timing

Follows the implementation of webapp-run TRO emission (api#3485) since patterns generalize better when they have been stress-tested in production.

## Related

- Meeting on 2026-04-21 with Lars Vilhuber, Tim Clark, Casper
- PolicyEngine/policyengine.py `docs/trace-case-study.md` (PR #315)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contribute PolicyEngine vocabulary patterns to TRACE / TROv #316

Context

What we already do

What to contribute upstream

Deliverables

Timing

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Contribute PolicyEngine vocabulary patterns to TRACE / TROv #316

Description

Context

What we already do

What to contribute upstream

Deliverables

Timing

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions