Add LA-level household land value targets and calibrate on them#371
Add LA-level household land value targets and calibrate on them#371vahid-ahmadi wants to merge 5 commits intomainfrom
Conversation
Generalises targets/sources/mhclg_regional_land.py to local-authority level. Each LA's share of national household land is proportional to households x avg_house_price, scaled to the ONS National Balance Sheet household-land series. Inputs (all already used elsewhere in the repo): - storage/la_land_values.csv: 360 LAs with households (from the existing local_authority_weights.h5 matrix) and avg_house_price (HM Land Registry UK HPI Dec 2025). - _land.HOUSEHOLD_LAND_VALUES for the national anchor. Tests cover CSV data quality, share/target aggregation, sensible ordering (K&C > Blackpool by >3x, London boroughs in top quintile), and registry integration. Updates test_regional_land_value_targets.py to filter by GeographicLevel.REGION now that LA targets share the same name prefix. Closes #370 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note for whoever picks up #357: this PR mirrors |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Blocker: data bug in Impact: IoS alone absorbs 8.6 % of the national household share ( Quick verification: Looks like a UK-HPI 'national-total-as-fallback' path leaked into one LA row. Likely two lines to fix:
Happy to approve once that's in. The methodology itself is sound — mirrors |
The E06000053 row carried households=2,492,115 — roughly the South West region total — from an upstream fallback that fired during CSV generation. Real IoS has ~1,115 households per ONS mid-2023. With the bug, IoS absorbed 7.85% of the national property-wealth share, understating every other LA's 2024 target by ~8.5% (e.g. K&C moved from £42.6bn to £46.2bn after the fix). Two new tests prevent the regression: - test_households_within_plausible_range: bounds every LA to [500, 500_000] so any future 10x+ outlier fails immediately. - test_isles_of_scilly_households_are_thousands_not_millions: tight [500, 5_000] bound on the specific row that leaked. Methodology unchanged; LA targets still sum to the ONS national household-land series within 1e-6.
|
@MaxGhenis thanks — fixed in 3ed729c. Data fix
Quantified impact of the fix
Tests added
Full suite: 20/20 pass locally via Generation-path note: the 2,492,115 figure matches the South West regional household total, so the fallback that fired during CSV generation was a regional sum, not "national-avg" as the PR body suggested. I'll correct the PR description; worth flagging for whoever regenerates the CSV next. |
The targets added in the previous commits were registered but inert —
datasets/local_areas/local_authorities/loss.py never built a column for
them, so the LA reweighter could not see them. This adds the
ons/household_land_value column to the LA target matrix:
- matrix entry: per-household household_land_value (from policyengine-uk).
- y entry: 360-vector of per-LA targets at the calibration year, taken
from la_land._compute_la_targets and reordered to match
local_authorities_2021.csv so the country mask and target indices
agree at every position.
The year is selected from time_period; if it is outside
HOUSEHOLD_LAND_VALUES (defined for 2021–2026) the latest known year is
used as a fallback.
New tests in test_la_loss_land_value.py cover both layers:
- target dict ↔ la_codes ordering, finite-positive vector, sum-to-
national for 2024/2025/2026 (no Microsimulation needed).
- full create_local_authority_target_matrix build (gated on the
enhanced FRS fixture): column presence, length 360, sum-to-national
for the calibration year, ordering matches la_codes, all positive,
and matrix column equals sim.calculate("household_land_value").
Closes the "out of scope" follow-up flagged in the original PR body.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
ons/household_land_value/{code}calibration targetsmhclg_regional_land.pymethodology to local-authority granularityCloses #370.
What this PR does
Extends the regional methodology to LA level
Each LA's share of national household land value is proportional to its total property wealth (
households × avg_house_price), scaled so the LA totals match the ONS national household-land series. Exactly the same formula asmhclg_regional_land.py::_compute_regional_shares, one geography deeper.Wires the targets into LA calibration
datasets/local_areas/local_authorities/loss.pynow adds anons/household_land_valuecolumn to the LA target matrix:household_land_valuefrom policyengine-uk.la_land._compute_la_targetsand reordered to matchlocal_authorities_2021.csvso the country mask and target indices agree at every position.The year is selected from
time_period; if it is outsideHOUSEHOLD_LAND_VALUES(defined for 2021–2026) the latest known year is used as a fallback.Files
New
policyengine_uk_data/storage/la_land_values.csv— 360 rows:code, name, households, avg_house_price.householdsfrom the existinglocal_authority_weights.h5(sum of each LA's 2025 weight row) — keeps household-count semantics aligned with the rest of the LA calibration.avg_house_pricefrom HM Land Registry UK HPI (Dec 2025). Primary match on ONS code, name-based fallback for LAs with re-allocated codes (e.g. Sheffield E08000019 → E08000039 in HPI), NI country-level HPI fallback for missing NI LGD months, national-avg fallback for Isles of Scilly.policyengine_uk_data/targets/sources/la_land.py—_compute_la_shares(),_compute_la_targets(),get_targets()returning 360Targetobjects withgeographic_level=LOCAL_AUTHORITY.policyengine_uk_data/tests/test_la_land_value_targets.py— 18 unit tests on the targets themselves.policyengine_uk_data/tests/test_la_loss_land_value.py— 9 unit tests on the loss-matrix wiring.changelog.d/370.md.Modified
policyengine_uk_data/datasets/local_areas/local_authorities/loss.py— adds theons/household_land_valuecolumn (matrix + y) so the LA reweighter trains on the new targets.policyengine_uk_data/tests/test_regional_land_value_targets.py—test_target_registry_includes_regionalnow filters byGeographicLevel.REGION(the regional and LA targets share theons/household_land_value/name prefix, so filtering by prefix alone now pulls both).Tests
Targets —
test_la_land_value_targets.py(18 tests)CSV data quality
local_authorities_2021.csv(360)[£50k, £2m], households positive[500, 500_000](regression test for the Isles of Scilly fallback leak)Share / target aggregation
Registry integration
get_targets()returns exactly 360ons/household_land_value/{code};geo_code == codeGeographicLevel.LOCAL_AUTHORITYHOUSEHOLD_LAND_VALUESget_all_targets(year=2024, geographic_level=LOCAL_AUTHORITY)returns 360 LA land targetsLoss-matrix wiring —
test_la_loss_land_value.py(9 tests)Light layer (no Microsimulation)
local_authorities_2021.csvhas an LA land target.la_codesorder yields a finite, all-positive 360-vector.HOUSEHOLD_LAND_VALUES[year]for 2024 / 2025 / 2026 within 1e-6.Full-build layer (gated on the enhanced FRS fixture)
ons/household_land_valuecolumn present in bothmatrixandy.yvector length 360.ysum equals ONS national household-land for the calibration year (within 1e-6).yordering matchesla_codes(np.testing.assert_array_equal).yentries positive.sim.calculate("household_land_value").values.Results of running the new tests plus adjacent suites (regional land, land targets, target DB, target registry, release manifest): 67 passed, no regressions.
Sanity check — top 10 LAs by avg household land value (2024)
Bottom 10 are all post-industrial / deprived areas (Inverclyde, East Ayrshire, West Dunbartonshire, Hull, Burnley, Hartlepool, Aberdeen, North Ayrshire, Hyndburn, Blackpool — all at £60–72k).
Sources
Related