Context
At the 2026-04-21 meeting with Lars Vilhuber, Tim Clark, and Casper of the TRACE project, they explicitly invited PolicyEngine to contribute to TRACE's vocabulary / mapping work as we implement. Direct quotes:
"If you want to enrich bare trace metadata there is already the ability to attach specific policy engine vocabulary that goes along with it that you find useful so you don't have like two different sets of metadata flying around. But again there might be additions that you think might be more generally useful where we're going to have to ponder do we next release of the trace vocabulary the trough."
"expressing those things and seeing exactly what is the value that you get from trace or using it in a particular way for your use case is gold. Because outside of a particular domain it all just looks like okay just you know to me just running software you've got data and so on. But what is the precise value you're getting? That's what drives the precise features of the vocabulary."
Codex's review of our post-meeting plan flagged that this invited workstream was not captured.
What we already do
policyengine.py emits TROs with a pe: namespace (https://policyengine.org/trace/0.1#) carrying fields that are not part of TROv core:
pe:certifiedForModelVersion
pe:compatibilityBasis (one of exact_build_model_version, matching_data_build_fingerprint, legacy_compatible_model_package)
pe:builtWithModelVersion
pe:dataBuildId
pe:dataBuildFingerprint
pe:certifiedBy
pe:emittedIn (local or github-actions)
pe:ciRunUrl, pe:ciGitSha, pe:ciGitRef
pe:bundleFingerprint, pe:bundleTroUrl
These live on the trov:TransparentResearchPerformance node so core SHACL shapes are unaffected.
What to contribute upstream
Some of the pe:* fields are idiosyncratic to us (pe:bundleFingerprint depends on our specific bundle-manifest shape). Others probably generalize:
- Institution-backed self-attestation metadata:
pe:certifiedBy, pe:emittedIn, and something like pe:productionRuntime (container image SHA, cloud region, pod/function instance at execution time). Any institution that runs computation on behalf of researchers and signs the output would need these.
- Microdata-build provenance: how a derived dataset was produced from licensed inputs + public code + calibration targets. Our
DataReleaseManifest shape might be worth generalizing as a vocabulary pattern for "this derived artifact was built by this procedure against these inputs, some of which are restricted."
- Compatibility-basis vocabulary: the TRO encodes how we chose to treat a model-vs-data version pair as compatible (exact match, fingerprint match, legacy-compatible). Other statistical-agency / microsimulation producers probably need similar vocabulary.
Deliverables
- Technical memo describing the
pe:* fields we use, why each one exists, and which we think generalize. (See also: policyengine.py/docs/trace-case-study.md.)
- Proposed TROv additions filed as issues / discussion threads on the TRACE project as we identify patterns that generalize.
- Worked examples: share our actual emitted TROs (bundle + simulation) with the TRACE team as reference implementations during their next vocabulary point release.
Timing
Follows the implementation of webapp-run TRO emission (api#3485) since patterns generalize better when they have been stress-tested in production.
Related
Context
At the 2026-04-21 meeting with Lars Vilhuber, Tim Clark, and Casper of the TRACE project, they explicitly invited PolicyEngine to contribute to TRACE's vocabulary / mapping work as we implement. Direct quotes:
Codex's review of our post-meeting plan flagged that this invited workstream was not captured.
What we already do
policyengine.pyemits TROs with ape:namespace (https://policyengine.org/trace/0.1#) carrying fields that are not part of TROv core:pe:certifiedForModelVersionpe:compatibilityBasis(one ofexact_build_model_version,matching_data_build_fingerprint,legacy_compatible_model_package)pe:builtWithModelVersionpe:dataBuildIdpe:dataBuildFingerprintpe:certifiedBype:emittedIn(localorgithub-actions)pe:ciRunUrl,pe:ciGitSha,pe:ciGitRefpe:bundleFingerprint,pe:bundleTroUrlThese live on the
trov:TransparentResearchPerformancenode so core SHACL shapes are unaffected.What to contribute upstream
Some of the
pe:*fields are idiosyncratic to us (pe:bundleFingerprintdepends on our specific bundle-manifest shape). Others probably generalize:pe:certifiedBy,pe:emittedIn, and something likepe:productionRuntime(container image SHA, cloud region, pod/function instance at execution time). Any institution that runs computation on behalf of researchers and signs the output would need these.DataReleaseManifestshape might be worth generalizing as a vocabulary pattern for "this derived artifact was built by this procedure against these inputs, some of which are restricted."Deliverables
pe:*fields we use, why each one exists, and which we think generalize. (See also:policyengine.py/docs/trace-case-study.md.)Timing
Follows the implementation of webapp-run TRO emission (api#3485) since patterns generalize better when they have been stress-tested in production.
Related
docs/trace-case-study.md(PR Add TRACE case study writeup for AEA / TRACE grant team #315)