Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
7a29d76
Add calibration package checkpointing, target config, and hyperparame…
baogorek Feb 17, 2026
f42e6aa
Ignore all calibration run outputs in storage/calibration/
baogorek Feb 17, 2026
29e53f9
Add --lambda-l0 to Modal runner, fix load_dataset dict handling
baogorek Feb 18, 2026
a898ebc
Add --package-path support to Modal runner
baogorek Feb 18, 2026
0a9340b
Add --log-freq for per-epoch calibration logging, fix output dir
baogorek Feb 18, 2026
fa7ebed
Create log directory before writing calibration log
baogorek Feb 18, 2026
13ec69c
Add debug logging for CLI args and command in package path
baogorek Feb 18, 2026
b628997
Fix chunked epoch display and rename Modal output files
baogorek Feb 18, 2026
06c465b
Replace per-clone Microsimulation with per-state precomputation
baogorek Feb 18, 2026
0a0f167
Add Modal Volume support and fix CUDA OOM fragmentation
baogorek Feb 19, 2026
13f3f30
Restrict targets to age demographics only for debugging
baogorek Feb 19, 2026
0b4acf7
Add include mode to target config, switch to age-only
baogorek Feb 20, 2026
32c851b
Switch target config to finest-grain include (~18K targets)
baogorek Feb 20, 2026
5a04c9f
Fix at-large district geoid mismatch (7 districts had 0 estimates)
baogorek Feb 20, 2026
09ae440
Add CLI package validator, drop impossible roth_ira_contributions target
baogorek Feb 20, 2026
5cb6d86
Add population-based initial weights for L0 calibration
baogorek Feb 20, 2026
ba97a90
Drop inflated dollar targets, add ACA PTC, save full package
baogorek Feb 20, 2026
49a1f66
Remove redundant --puf-dataset flag, add national targets
baogorek Feb 20, 2026
40ba0f2
fixing the stacked dataset builder
baogorek Feb 20, 2026
7c38d55
Derive cds_ordered from cd_geoid array instead of database query
baogorek Feb 20, 2026
abe1038
Update notebook outputs from successful calibration pipeline run
baogorek Feb 21, 2026
819a48c
Fix takeup draw ordering mismatch between matrix builder and stacked …
baogorek Feb 24, 2026
02f8ad0
checkpoint with aca_ptc randomness working
baogorek Feb 24, 2026
28b0d63
verify script
baogorek Feb 24, 2026
c1b8f62
Prevent clone-to-CD collisions in geography assignment
baogorek Feb 24, 2026
40fb389
checkpoint
baogorek Feb 25, 2026
cb57217
Fix cross-state cache pollution in matrix builder precomputation
baogorek Feb 25, 2026
b9ed175
bens work on feb 25
baogorek Feb 26, 2026
9e53f60
Selective county-level precomputation via COUNTY_DEPENDENT_VARS
juaristi22 Feb 26, 2026
105bb4a
minor fixes
juaristi22 Feb 26, 2026
23369f3
small optimizations
juaristi22 Feb 26, 2026
c86a263
Parallelize clone loop in build_matrix() via ProcessPoolExecutor
juaristi22 Feb 26, 2026
a69d1ee
Migrate from changelog_entry.yaml to towncrier fragments (#550)
MaxGhenis Feb 24, 2026
0157140
Update package version
MaxGhenis Feb 24, 2026
0c43746
Add end-to-end test for calibration database build pipeline (#556)
MaxGhenis Feb 26, 2026
0a67899
Update package version
MaxGhenis Feb 26, 2026
da5f1eb
Add ETL process for pregnancy calibration targets and update document…
daphnehanse11 Feb 26, 2026
9a30d7c
Add changelog fragment for pregnancy imputation (#563)
daphnehanse11 Feb 26, 2026
9ef9aac
Update package version
baogorek Feb 26, 2026
94bdb47
Migrate from changelog_entry.yaml to towncrier fragments (#550)
MaxGhenis Feb 24, 2026
f543c7f
Update package version
MaxGhenis Feb 24, 2026
3eb3eda
Add end-to-end test for calibration database build pipeline (#556)
MaxGhenis Feb 26, 2026
915fec8
Update package version
MaxGhenis Feb 26, 2026
157e6af
Parallelize clone loop in build_matrix() via ProcessPoolExecutor
juaristi22 Feb 26, 2026
7937331
add target config
baogorek Feb 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions .github/bump_version.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
"""Infer semver bump from towncrier fragment types and update version."""

import re
import sys
from pathlib import Path


def get_current_version(pyproject_path: Path) -> str:
text = pyproject_path.read_text()
match = re.search(r'^version\s*=\s*"(\d+\.\d+\.\d+)"', text, re.MULTILINE)
if not match:
print(
"Could not find version in pyproject.toml",
file=sys.stderr,
)
sys.exit(1)
return match.group(1)


def infer_bump(changelog_dir: Path) -> str:
fragments = [
f
for f in changelog_dir.iterdir()
if f.is_file() and f.name != ".gitkeep"
]
if not fragments:
print("No changelog fragments found", file=sys.stderr)
sys.exit(1)

categories = {f.suffix.lstrip(".") for f in fragments}
for f in fragments:
parts = f.stem.split(".")
if len(parts) >= 2:
categories.add(parts[-1])

if "breaking" in categories:
return "major"
if "added" in categories or "removed" in categories:
return "minor"
return "patch"


def bump_version(version: str, bump: str) -> str:
major, minor, patch = (int(x) for x in version.split("."))
if bump == "major":
return f"{major + 1}.0.0"
elif bump == "minor":
return f"{major}.{minor + 1}.0"
else:
return f"{major}.{minor}.{patch + 1}"


def update_file(path: Path, old_version: str, new_version: str):
text = path.read_text()
updated = text.replace(
f'version = "{old_version}"',
f'version = "{new_version}"',
)
if updated != text:
path.write_text(updated)
print(f" Updated {path}")


def main():
root = Path(__file__).resolve().parent.parent
pyproject = root / "pyproject.toml"
changelog_dir = root / "changelog.d"

current = get_current_version(pyproject)
bump = infer_bump(changelog_dir)
new = bump_version(current, bump)

print(f"Version: {current} -> {new} ({bump})")

update_file(pyproject, current, new)


if __name__ == "__main__":
main()
7 changes: 0 additions & 7 deletions .github/check-changelog-entry.sh

This file was deleted.

29 changes: 10 additions & 19 deletions .github/workflows/pr_changelog.yaml
Original file line number Diff line number Diff line change
@@ -1,30 +1,21 @@
name: Changelog entry

on:
pull_request:
branches: [main]

jobs:
check-fork:
check-changelog:
name: Check changelog fragment
runs-on: ubuntu-latest
steps:
- name: Check if PR is from fork
- uses: actions/checkout@v4
- name: Check for changelog fragment
run: |
if [ "${{ github.event.pull_request.head.repo.full_name }}" != "${{ github.repository }}" ]; then
echo "❌ ERROR: This PR is from a fork repository."
echo "PRs must be created from branches in the main PolicyEngine/policyengine-us-data repository."
echo "Please close this PR and create a new one following these steps:"
echo "1. git checkout main"
echo "2. git pull upstream main"
echo "3. git checkout -b your-branch-name"
echo "4. git push -u upstream your-branch-name"
echo "5. Create PR from the upstream branch"
FRAGMENTS=$(find changelog.d -type f ! -name '.gitkeep' | wc -l)
if [ "$FRAGMENTS" -eq 0 ]; then
echo "::error::No changelog fragment found in changelog.d/"
echo "Add one with: echo 'Description.' > changelog.d/\$(git branch --show-current).<type>.md"
echo "Types: added, changed, fixed, removed, breaking"
exit 1
fi
echo "✅ PR is from the correct repository"

require-entry:
needs: check-fork
uses: ./.github/workflows/reusable_changelog_check.yaml
with:
require_entry: true
validate_format: true
45 changes: 0 additions & 45 deletions .github/workflows/reusable_changelog_check.yaml

This file was deleted.

15 changes: 9 additions & 6 deletions .github/workflows/versioning.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:
- main

paths:
- changelog_entry.yaml
- "changelog.d/**"
- "!pyproject.toml"

jobs:
Expand All @@ -19,20 +19,23 @@ jobs:
uses: actions/checkout@v4
with:
token: ${{ secrets.POLICYENGINE_GITHUB }}
fetch-depth: 0
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: 3.12
- name: Install uv
uses: astral-sh/setup-uv@v5
- name: Build changelog
run: pip install yaml-changelog && make changelog
- name: Install towncrier
run: pip install towncrier
- name: Bump version and build changelog
run: |
python .github/bump_version.py
towncrier build --yes --version $(python -c "import re; print(re.search(r'version = \"(.+?)\"', open('pyproject.toml').read()).group(1))")
- name: Update lockfile
run: uv lock
- name: Preview changelog update
run: ".github/get-changelog-diff.sh"
- name: Update changelog
uses: EndBug/add-and-commit@v9
with:
add: "."
message: Update package version
message: Update package version
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ docs/.ipynb_checkpoints/
## ACA PTC state-level uprating factors
!policyengine_us_data/storage/aca_ptc_multipliers_2022_2024.csv

## Raw input cache for database pipeline
policyengine_us_data/storage/calibration/raw_inputs/
## Calibration run outputs (weights, diagnostics, packages, config)
policyengine_us_data/storage/calibration/

## Batch processing checkpoints
completed_*.txt
Expand Down
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
## [1.70.0] - 2026-02-26

### Added

- Add end-to-end test for calibration database build pipeline.


## [1.69.4] - 2026-02-24

### Changed

- Migrated from changelog_entry.yaml to towncrier fragments to eliminate merge conflicts.


# Changelog

All notable changes to this project will be documented in this file.
Expand Down
21 changes: 13 additions & 8 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: all format test install download upload docker documentation data calibrate publish-local-area clean build paper clean-paper presentations database database-refresh promote-database promote-dataset
.PHONY: all format test install download upload docker documentation data calibrate calibrate-build publish-local-area clean build paper clean-paper presentations database database-refresh promote-database promote-dataset

HF_CLONE_DIR ?= $(HOME)/huggingface/policyengine-us-data

Expand All @@ -15,12 +15,8 @@ install:
pip install -e ".[dev]" --config-settings editable_mode=compat

changelog:
build-changelog changelog.yaml --output changelog.yaml --update-last-date --start-from 1.0.0 --append-file changelog_entry.yaml
build-changelog changelog.yaml --org PolicyEngine --repo policyengine-us-data --output CHANGELOG.md --template .github/changelog_template.md
bump-version changelog.yaml pyproject.toml
rm changelog_entry.yaml || true
touch changelog_entry.yaml

python .github/bump_version.py
towncrier build --yes --version $$(python -c "import re; print(re.search(r'version = \"(.+?)\"', open('pyproject.toml').read()).group(1))")
download:
python policyengine_us_data/storage/download_private_prerequisites.py

Expand Down Expand Up @@ -65,6 +61,7 @@ database:
python policyengine_us_data/db/etl_snap.py
python policyengine_us_data/db/etl_state_income_tax.py
python policyengine_us_data/db/etl_irs_soi.py
python policyengine_us_data/db/etl_pregnancy.py
python policyengine_us_data/db/validate_database.py

database-refresh:
Expand Down Expand Up @@ -99,7 +96,15 @@ data: download

calibrate: data
python -m policyengine_us_data.calibration.unified_calibration \
--puf-dataset policyengine_us_data/storage/puf_2024.h5
--target-config policyengine_us_data/calibration/target_config.yaml

calibrate-build: data
python -m policyengine_us_data.calibration.unified_calibration \
--target-config policyengine_us_data/calibration/target_config.yaml \
--build-only

validate-package:
python -m policyengine_us_data.calibration.validate_package

publish-local-area:
python policyengine_us_data/datasets/cps/local_area_calibration/publish_local_area.py
Expand Down
File renamed without changes.
7 changes: 7 additions & 0 deletions changelog.d/calibration-pipeline-improvements.added.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Unified calibration pipeline with GPU-accelerated L1/L0 solver, target config YAML, and CLI package validator.
Per-state and per-county precomputation replacing per-clone Microsimulation (51 sims instead of 436).
Parallel state, county, and clone loop processing via ProcessPoolExecutor.
Block-level takeup re-randomization with deterministic seeded draws.
Hierarchical uprating with ACA PTC state-level CSV factors and CD reconciliation.
Modal remote runner with Volume support, CUDA OOM fixes, and checkpointing.
Stacked dataset builder with sparse CD subsets and calibration block propagation.
3 changes: 3 additions & 0 deletions changelog.d/calibration-pipeline-improvements.changed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Geography assignment now prevents clone-to-CD collisions.
County-dependent vars (aca_ptc) selectively precomputed per county; other vars use state-only path.
Target config switched to finest-grain include mode (~18K targets).
3 changes: 3 additions & 0 deletions changelog.d/calibration-pipeline-improvements.fixed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Cross-state cache pollution in matrix builder precomputation.
Takeup draw ordering mismatch between matrix builder and stacked builder.
At-large district geoid mismatch (7 districts had 0 estimates).
Loading