Skip to content

Add LA-level council tax band-count targets and calibrate on them#374

Open
vahid-ahmadi wants to merge 4 commits intomainfrom
feat/la-council-tax-targets
Open

Add LA-level council tax band-count targets and calibrate on them#374
vahid-ahmadi wants to merge 4 commits intomainfrom
feat/la-council-tax-targets

Conversation

@vahid-ahmadi
Copy link
Copy Markdown
Collaborator

@vahid-ahmadi vahid-ahmadi commented Apr 21, 2026

Summary

Adds 350 LA-level ons/council_tax_band_d/{code} targets (Band D amount per billing authority) and 2,541 voa/council_tax/{code}/{band} targets (dwellings per band A-H per LA) built from four public sources, covering all 296 English + 22 Welsh + 32 Scottish LAs in local_authorities_2021.csv. Wires the eight band-count targets into datasets/local_areas/local_authorities/loss.py so the LA reweighter trains on them. Follows the pattern of the regional land-value targets, one geography deeper.

What this PR does

Four public sources, one canonical CSV

storage/la_council_tax.csv (360 rows, 31 KB) joins:

Column Source Coverage
band_d_amount (England) MHCLG Council Tax levels set by local authorities in England 2026-27, Table 10, column 17 296/296 English LAs
band_d_amount (Wales) Welsh Government Council Tax levels April 2026 to March 2027, Table 1 "Overall average band D" 22/22 Welsh LAs
band_d_amount (Scotland) Scottish Government Council Tax Assumptions 2025, "CT by Band 2025-26" Band D column 32/32 Scottish LAs
band_Aband_H + total_dwellings VOA Council Tax: Stock of Properties, 2025, CTSOP1.0 summary table 295 English + 22 Welsh LAs

Column 17 in DLUHC Table 10 is the right one: "Average (Band D 2 adult equivalent) council tax for area of the billing authority including both local and major precepts" — i.e. the full amount households pay, inclusive of county, police, fire and parish precepts.

Code joins reconciled

  • Post-2023 South Yorkshire E-codes (E08000038 Barnsley, E08000039 Sheffield) are remapped to the pre-2023 codes used in the reference LA list (E08000016, E08000019).
  • Scottish source name quirks normalised: Argyll & ButeArgyll and Bute, Dumfries & GallowayDumfries and Galloway, Perth & KinrossPerth and Kinross, Shetland Islands (double space) → Shetland Islands, Edinburgh, City ofCity of Edinburgh.

New module

targets/sources/la_council_tax.py:

  • get_targets() returns Target objects at geographic_level=LOCAL_AUTHORITY.
  • Band D targets tagged with year 2026 (England, Wales) or 2025 (Scotland) based on the source's latest release.
  • Per-country reference URLs so downstream consumers can audit each value to its publication.
  • Missing values (NI has no council tax, Scottish band counts) are filtered out of values rather than emitted as NaN so calibrators cleanly skip them.

Wires the band counts into LA calibration

datasets/local_areas/local_authorities/loss.py now adds eight voa/council_tax/{A..H} columns to the LA target matrix:

  • matrix entry: per-household indicator 1[council_tax_band == B] from policyengine-uk.
  • y entry: 360-vector of per-LA dwelling counts from the canonical CSV. For LAs without VOA data — Scotland (the VOA summary tables don't cover Scotland) and NI (no council tax) — the value falls back to national_count × la_household_share, matching the existing tenure block's fallback pattern.

Two targets are deliberately not wired in this pass:

  • Band I — Wales-only and mostly null in the CSV.
  • Band D £ amount (ons/council_tax_band_d/{code}) — a per-rate quantity, not an aggregate. Converting it to a calibratable total (e.g. council-tax revenue) needs Scotland-specific band ratios that diverge from England/Wales after the 2017 reform, and is worth a separate PR.

Documented coverage gaps

  • Northern Ireland (10 LAs): NI uses domestic rates, not council tax. has_council_tax=False flag set; no targets emitted for NI; loss matrix uses national-share fallback.
  • Scotland band counts (32 LAs): the VOA summary doesn't cover Scotland. Scottish Assessors publishes per-LA chargeable-dwellings separately; follow-up PR.
  • City of London Band A: VOA suppresses this cell ([c]) for disclosure control; other bands populated.

Testing (the #371 lesson)

22 hermetic tests on the targets themselves plus 13 new tests on the loss-matrix wiring, all green locally:

Targets — test_la_council_tax_targets.py (22 tests)

  • CSV structure (4): row count matches the reference LA list, expected columns present, four UK countries represented, every code matches the reference.
  • Value plausibility (4): Band D amount in [£900, £3,500], total dwellings in [200, 800,000], explicit Isles of Scilly regression (total in [500, 5,000], guarding against the 2.49M outlier that slipped into Add LA-level household land value targets and calibrate on them #371), band totals sum to total dwellings within 20-property slack.
  • Coverage expectations (4): every English / Welsh / Scottish LA has a Band D value; Northern Ireland is explicitly flagged as no-council-tax.
  • Spot-checks (2): Wandsworth and Westminster are the two lowest-Band-D LAs (catches row-swap bugs); Scottish average Band D is £500+ below English average.
  • Target-API invariants (8): get_targets() returns non-empty without network, Band D target count matches CSV, band count target count matches Σ non-null band columns, every target carries LOCAL_AUTHORITY geo level + geo_code, correct units and is_count=True on count targets, every target has ≥1 year of values, every band count ≤ 500k.

Loss-matrix wiring — test_la_loss_council_tax.py (13 tests)

Light layer (no Microsimulation)

  • Every LA code in local_authorities_2021.csv has a row in la_council_tax.csv.
  • The eight count_band_{A..H} columns exist.
  • E/W rows have band A-H counts populated (allowing for City of London disclosure suppression).
  • Scotland band counts are uniformly null as documented.
  • NI rows have has_council_tax=False.

Full-build layer (gated on the enhanced FRS fixture)

  • All eight voa/council_tax/{A..H} columns present in matrix and y.
  • y vectors length 360.
  • All y entries finite and positive (Scotland and NI receive the fallback rather than NaN or zero).
  • Matrix entries are 0/1 indicators with rows summing to ≤1 (each household sits in at most one band).
  • y matches the CSV verbatim for an English LA (Hartlepool).
  • Scotland and NI LAs receive a positive fallback estimate.

Sanity check — top 10 LAs by Band D amount (2026-27 where available)

LA Country Band D
Dorset ENGLAND £2,765
Lewes ENGLAND £2,756
Nottingham ENGLAND £2,755
Rutland ENGLAND £2,738
Wealden ENGLAND £2,728
Merthyr Tydfil WALES £2,594
Bridgend WALES £2,555
Hartlepool ENGLAND £2,560
Newport WALES £2,443
Blaenau Gwent WALES £2,452

Bottom of the table is dominated by the central-London LAs with the lowest precepts: Wandsworth (£1,028), Westminster (£1,050), City of London (£1,330).

Out of scope for this PR (follow-ups)

  • Scottish Assessors per-LA chargeable-dwellings to fill the Scotland band-count gap.
  • Wiring the Band D £ amount as a total-council-tax-revenue target (needs Scotland-specific band ratios).
  • Council Tax Support caseload per LA (DWP StatXplore) — caseload-style, different target shape.
  • Single Person Discount rate per LA (CIPFA Council Tax Base) — same.

Related

Two families of LA-level targets, covering all 360 LAs in
local_authorities_2021.csv, built from four public sources:

- `ons/council_tax_band_d/{code}` (350 targets): average Band D
  council tax inclusive of all precepts per billing authority.
  Sources: MHCLG *Council Tax levels set by local authorities in
  England 2026-27*, Welsh Government *Council Tax levels April 2026
  to March 2027*, Scottish Government *Council Tax Assumptions 2025*.
  All 296 English + 22 Welsh + 32 Scottish LAs covered.
- `ons/council_tax_band_count/{code}/{band}` (2,541 targets): number
  of dwellings per band A-H per LA. Source: VOA *Council Tax: Stock
  of Properties, 2025*. Covers England + Wales (318 LAs × ~8 bands,
  minus City of London Band A which is VOA-suppressed).

NI is excluded: domestic rates, not council tax. Scotland band
counts are not in VOA; Scottish Assessors publishes them separately
and is a follow-up.

Files
-----

- `storage/la_council_tax.csv` (31 KB, 360 rows): canonical CSV
  joining DLUHC Table 10 column 17, Welsh Table 1 "Overall average
  band D", Scottish Gov "CT by Band 2025-26" Band D column, and VOA
  CTSOP1.0 bands A-H onto the reference LA list.
  - Post-2023 South Yorkshire E-codes (E08000038/39) re-mapped to
    pre-2023 codes (E08000016/19) to match the reference list.
  - Scottish ampersand/double-space naming normalised
    ("Argyll & Bute" → "Argyll and Bute", etc.).
- `targets/sources/la_council_tax.py`: reads the CSV, emits Target
  objects at geographic_level=LOCAL_AUTHORITY with per-country year
  tagging and per-country reference URL.

Testing
-------

22 hermetic tests (no network access, no baseline fixture needed):

Structure
- Row count matches local_authorities_2021.csv.
- Every expected column present.
- Four UK country codes represented.
- Every LA code matches the reference list.

Value plausibility (the #371 lesson)
- Band D amount in [£900, £3,500] for every row with a value.
- Total dwellings in [200, 800,000] for every row with a value.
- Explicit Isles of Scilly regression test: total dwellings in
  [500, 5,000], not the 2.49M outlier that slipped into #371.
- Band A-H counts sum to total dwellings within 20-property slack
  (VOA 10-property suppression allowance).
- Every band-count target value ≤ 500k (largest LA stock).

Coverage expectations
- Every English, Welsh and Scottish LA has a Band D value.
- Northern Ireland has no council tax flagged (has_council_tax=False).

Spot-checks of published facts
- Wandsworth (E09000032) and Westminster (E09000033) are the two
  lowest-Band-D English LAs (catches row-swap bugs).
- Scottish average Band D is £500+ below English average.

Target-API invariants
- get_targets() returns a non-empty list without network access.
- Band D target count matches the CSV's non-null Band D count.
- Band count target count matches Σ non-null band columns.
- Every target carries geographic_level=LOCAL_AUTHORITY and a
  geo_code.
- Band D targets use Unit.GBP; band count targets use Unit.COUNT
  with is_count=True.
- Every target has at least one year of values.

Sources
-------

- MHCLG (England 2026-27):
  https://www.gov.uk/government/statistics/council-tax-levels-set-by-local-authorities-in-england-2026-to-2027
- Welsh Government (Wales 2026-27):
  https://www.gov.wales/council-tax-levels-april-2026-march-2027-html
- Scottish Government (Scotland 2025-26):
  https://www.gov.scot/publications/council-tax-datasets/
- VOA (England + Wales 2025):
  https://www.gov.uk/government/statistics/council-tax-stock-of-properties-2025

Out of scope for this PR (follow-ups)
-------------------------------------

- Wiring these targets into
  datasets/local_areas/local_authorities/loss.py so the LA
  reweighting actually calibrates on them. Planned follow-up PR.
- Scottish Assessors per-LA chargeable-dwellings to fill the Scotland
  band-count gap.
- Council Tax Support caseload per LA (DWP StatXplore).
- Single Person Discount rate per LA (CIPFA).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi self-assigned this Apr 21, 2026
@vahid-ahmadi vahid-ahmadi requested a review from MaxGhenis April 21, 2026 11:42
@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

Review — items to address before merge

Blocking

1. Welsh Band I is dropped. Wales has had 9 council tax bands (A–I) since its 2005 revaluation. _BAND_COUNT_COLUMNS in la_council_tax.py only maps A–H, and the CSV has no band_I column, so every Welsh Band I dwelling is silently missing from the emitted targets. If a calibrator consumes these to set the band distribution, Welsh Band I is implicitly targeted to zero.

Fix: add band_I column to la_council_tax.csv (populated for Welsh rows, NaN elsewhere), add "I": "band_I" to _BAND_COUNT_COLUMNS, update the module docstring, and add a test asserting non-null Band I for the larger Welsh LAs (Cardiff, Swansea, Vale of Glamorgan, Monmouthshire).

2. total_dwellings is Σ(A..H) for Welsh rows, not VOA "All properties". Verified from the CSV:

  • Cardiff (W06000015): 5320 + 20420 + 34920 + 37370 + 31540 + 22250 + 10590 + 2850 = 165,260 — matches CSV exactly.
  • Monmouthshire (W06000021): 520 + 3450 + 7140 + 9460 + 7660 + 8100 + 5520 + 1790 = 43,640 — matches CSV exactly.

Because both sides of test_band_counts_sum_to_total are derived from the same A–H sum, the test passes trivially and does not validate fidelity to the published total. Fix: re-source total_dwellings from the VOA "All properties" column. Once done, the test becomes a real fidelity check with the existing 20-property slack.

Coherence

3. variable field diverges from voa_council_tax.py. The existing regional source uses variable="council_tax_band" with no breakdown_variable; this PR uses variable="council_tax_band_count" with breakdown_variable=f"council_tax_band_{band}". Downstream loss.py wiring will then need to know two names for the same conceptual quantity. Prefer aligning — either update voa_council_tax.py to the new scheme or match its convention here.

4. CSV column asymmetry. band_A, band_B, band_C, band_D_count, band_E, band_F, band_G, band_H — the _count suffix only on D to avoid clashing with band_d_amount. Cleaner to use a count_band_X prefix scheme, or rename band_d_amountamount_band_d so the count columns stay symmetric. This also simplifies _BAND_COUNT_COLUMNS to a plain prefix map.

Nits

5. _load_table() re-reads the CSV on every get_targets() call. @lru_cache matches the pattern in voa_council_tax.py and is free.

6. has_council_tax column is derivable from country == "NORTHERN_IRELAND". Arguably self-documenting, so leave if preferred.

7. Module docstring opens with "band A–H" — update once Band I is added.

PR description

With Band I added, the "2,541 band-count targets" figure will rise by the number of non-null Welsh Band I cells — worth regenerating the description from the final CSV before merging. My earlier arithmetic of 318 × 8 = 2,544 minus City of London band A = 2,543 gave 2 more suppressions than the description claimed; Band I coverage may resolve part of that gap too.

Review points addressed:

- Add count_band_I column to la_council_tax.csv, populated for all 22
  Welsh LAs (Wales revalued in 2005 and introduced a 9th band). Cardiff
  1480, Monmouthshire 670, Vale of Glamorgan 1060, etc. English rows
  keep Band I null; VOA marks it [z] (not applicable).
- Re-source total_dwellings from VOA "All properties" column instead
  of deriving it as the sum of A-H. Previously Σ(A..H) was used for
  both sides of test_band_counts_sum_to_total, making the test
  self-referential; now it validates against the published total with
  a 20-property slack for VOA rounding.
- Rename count columns symmetrically: band_A..band_H + band_D_count →
  count_band_A..count_band_I. Removes the lopsided band_D_count name
  that existed only to avoid clashing with band_d_amount.
- Align band-count target names with voa_council_tax.py:
  voa/council_tax/{code}/{band} (was ons/council_tax_band_count/...);
  variable="council_tax_band" (was council_tax_band_count, which is
  not a real PolicyEngine-UK variable); drop breakdown_variable to
  match the regional VOA module.
- Cache the CSV read with @lru_cache(maxsize=1), matching voa_council_tax.
- Update module docstring: "A-H in England/Scotland, A-I in Wales".

Tests:
- New: test_welsh_las_have_band_i (all 22 Welsh LAs populated).
- New: test_english_las_have_no_band_i (guard against spurious fills).
- New: test_cardiff_band_i_matches_published_figure (~1,480 per VOA 2025).

Final target counts:
- 350 Band D amount targets (unchanged).
- 2,563 band-count targets, up from 2,541: +22 Welsh Band I plus two
  band-H rows that were null due to the earlier truncation.
@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

Pushed d2d2acf addressing the review items:

Blocking fixes

  • Welsh Band I now sourced from VOA CTSOP1.0 for all 22 Welsh LAs (Cardiff 1480, Monmouthshire 670, Vale of Glamorgan 1060, etc.); emitted as voa/council_tax/{code}/I targets. New tests: test_welsh_las_have_band_i, test_english_las_have_no_band_i, test_cardiff_band_i_matches_published_figure.
  • total_dwellings now sourced from VOA "All properties" for every English + Welsh row rather than derived from Σ(A..H). test_band_counts_sum_to_total is now a real fidelity check (max discrepancy 10, VOA rounding).

Coherence fixes

  • Band-count targets renamed to voa/council_tax/{code}/{band} with variable="council_tax_band" (aligned with the existing regional voa_council_tax.py). breakdown_variable dropped to match. council_tax_band is the actual PolicyEngine-UK enum variable and has Band I in its possible values.
  • CSV count columns renamed to the symmetric count_band_A..count_band_I scheme (old band_D_count outlier removed).

Nits

  • _load_table() now @lru_cache-ed.
  • Module docstring clarified: "A–H in England/Scotland, A–I in Wales".

Final target counts: 350 Band D amount + 2,563 band counts (up from 2,541: +22 Welsh Band I, +2 resolved [c] suppressions). 25/25 tests pass locally via uv run pytest.

vahid-ahmadi and others added 2 commits April 23, 2026 13:59
The targets registered in la_council_tax.py were inert — the LA target
matrix had no columns for them, so the reweighter could not see them.
This wires the eight VOA Council Tax Stock-of-Properties band-count
targets (A-H) into the LA loss matrix:

- matrix entry: per-household indicator 1[council_tax_band == B] from
  policyengine-uk.
- y entry: 360-vector of per-LA dwelling counts from
  storage/la_council_tax.csv. For LAs without VOA data — Scottish LAs
  (the VOA summary tables don't cover Scotland) and Northern Irish LAs
  (no council tax) — the value falls back to
  national_count × la_household_share, matching the existing tenure
  block's fallback pattern.

Two targets are deliberately not wired in this pass:

- Band I — Wales-only and mostly null in the CSV.
- The Band D £ amount (ons/council_tax_band_d/{code}) — a per-rate
  quantity that does not fit the linear matrix-times-weights
  aggregation. Wiring it as total council-tax revenue would need
  Scotland-specific band ratios (different from England/Wales after
  2017) and is worth a separate PR.

New tests in test_la_loss_council_tax.py cover both layers:

- Light: CSV joins to every LA code, the eight count_band_{X} columns
  exist, E/W rows are populated, Scotland is null as documented, and
  NI has has_council_tax=False.
- Full build (gated on enhanced FRS fixture): all eight columns present
  in matrix and y; y vectors length 360, finite and positive; matrix
  entries are 0/1 indicators with rows summing to ≤1; y matches the
  CSV verbatim for an English LA (Hartlepool); Scotland and NI LAs
  receive a positive fallback rather than NaN or zero.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi changed the title Add LA-level council tax calibration targets (Band D + band distribution) Add LA-level council tax band-count targets and calibrate on them Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant