Calibrate combined self-employed NICs and skip inert Class 3 target#379
Open
vahid-ahmadi wants to merge 1 commit intomainfrom
Open
Calibrate combined self-employed NICs and skip inert Class 3 target#379vahid-ahmadi wants to merge 1 commit intomainfrom
vahid-ahmadi wants to merge 1 commit intomainfrom
Conversation
Two NI calibration gaps surfaced during issue audit (#88) and bug report #378: 1. Recent OBR EFOs (e.g. March 2026) publish a single combined "Class 4 and Class 2 Self employed NICs" line instead of two separate rows. The parser's Class 2 / Class 4 candidate labels no longer matched, so neither target was registered and self-employed NICs were silently uncalibrated against OBR. 2. ni_class_3 is an input variable in PolicyEngine UK with no formula and no dataset path that populates it. The matrix column is therefore a flat zero, calibration cannot move it, and the diagnostic that "the target is included" is misleading. This commit: - Adds an obr/ni_self_employed target whose values come from the combined EFO line and whose matrix column is computed via a new custom_compute that sums ni_class_2 + ni_class_4 at the household level. Smoke-build on enhanced_frs_2023_24.h5 with year=2025: 6,848 non-zero households, target £2.90bn. - Keeps the legacy Class 2 / Class 4 candidate labels around so older or future EFOs that revert to separate rows still produce individual targets. - Removes the ni_class_3 entry from _parse_nics with a comment pointing at #378 and the conditions for restoring it (a Class 3 imputation that addresses #88 in full). Tests cover both layers: - test_obr_nics.py: parser handles the combined EFO layout, the legacy separate layout, and intentionally drops Class 3 in either format. - test_obr_nic_signal.py: the registered targets are present in the registry, the combined target carries a custom_compute callable, ni_class_3 is absent, and (gated on enhanced_frs) each underlying PE-UK NI variable produces non-zero variation while ni_class_3 returns a uniform zero — the very property that makes it inert as a calibration target. Closes #378. Partial close of #88 — Class 3 imputation remains a separate follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
obr/ni_self_employedcalibration target for the combined Class 2 + Class 4 line that recent OBR EFOs publish (e.g. March 2026), computed via acustom_computethat sumsni_class_2 + ni_class_4at the household level.ni_class_3entry from the OBR NIC parser.ni_class_3is an input variable in PolicyEngine UK with no formula and no dataset populates it — the matrix column would be a flat zero, so the target is inert and its calibration diagnostic is misleading.Closes #378. Partial close of #88 (Class 3 imputation remains a separate follow-up).
Why
Two latent bugs surfaced during an audit of NI calibration:
"Class 2 NICs"and"Class 4 NICs"rows in EFO Table 3.4. The March 2026 EFO publishes a single combined line —"Class 4 and Class 2 Self employed NICs"— so neither legacy candidate matched and both targets dropped out of the registry.Microsimulation.calculate("ni_class_2")and("ni_class_4")do return signal in the dataset (~9.7k and ~7.6k non-zero people respectively, weighted totals ~£0.5bn and ~£3.5bn), so the optimiser had usable matrix columns — just nothing to aim them at.ni_class_3was registered but inert. The OBR Class 3 row was emitted as a target, butMicrosimulation.calculate("ni_class_3")returns a uniform zero (no formula, no dataset path). The calibrator saw a flat-zero matrix column and an OBR target it could never match. Reported as Handle unpopulated ni_class_3 target #378.What this PR does
Combined self-employed target
The parser now lists
"Class 4 and Class 2 Self employed NICs"and"Class 2 and Class 4 Self employed NICs"as the primary candidate labels for a newobr/ni_self_employedtarget. When the parser matches this line, it attaches acustom_computecallable that the loss-matrix builder uses instead of the simple-GBP fallback:The legacy
"Class 2 NICs"/"Class 4 NICs"candidate labels are retained so older or future EFOs that revert to separate rows continue to produce individualobr/ni_class_2/obr/ni_class_4targets.Smoke build on the live target matrix
Running
create_target_matrixagainstenhanced_frs_2023_24.h5attime_period=2025now yields:obr/ni_employeeobr/ni_employerobr/ni_self_employedobr/salary_sacrifice_employee_ni_reliefobr/salary_sacrifice_employer_ni_reliefobr/ni_class_3is absent from the matrix.Class 3 skip
ni_class_3is removed from the parser row-spec dict with a comment explaining (a) why it is currently inert, (b) why it is too small (~£50m vs ~£150bn total NICs) to justify a heuristic imputation, and (c) what would need to change to re-enable it.obr.pyis the single place to revert if a Class 3 imputation lands in the future.Tests (28 added / changed, all passing locally)
test_obr_nics.py(parser, hermetic, no Microsimulation):test_parse_nics_combined_self_employed_line— current EFO layout: emits employee / employer /ni_self_employed(withcustom_compute), and the combined target's value at 2024 is the £bn figure scaled to £.test_parse_nics_falls_back_to_separate_classes_for_old_efo— older EFO layout: emits the four individual classes (no combined).test_parse_nics_intentionally_skips_class_3_in_combined_efo— Class 3 is hidden in "Other NIC" in current EFOs and must not be emitted.test_parse_nics_intentionally_skips_class_3_in_separate_efo— even when an EFO publishes a Class 3 row, the parser drops it.test_parse_nics_tolerates_alt_label_wording— preserved:Self-Employedvsself-employedvariants still match.test_obr_nic_signal.py(signal, gated on theenhanced_frsfixture):test_obr_nic_target_registry_includes_active_classes— the three top-line NIC class targets are registered.test_obr_ni_self_employed_target_uses_custom_compute— the combined target carries a callablecustom_compute; without it the loss matrix would look up a non-existentni_self_employedPE-UK variable.test_obr_ni_class_3_target_is_intentionally_absent— neither the name nor the variable appear in the OBR target set.test_active_nic_variable_has_nonzero_variation(parametrised overni_employee,ni_employer,ni_class_2,ni_class_4) — each PE-UK NIC variable produces non-zero variation across households.test_self_employed_combined_compute_returns_nonzero— the sum used bycustom_computeproduces a non-zero household-level vector.test_ni_class_3_simulator_returns_uniform_zero— direct evidence for why Class 3 is excluded; if PE-UK ever adds a formula or this repo adds an imputation, restore the parser row.Out of scope (follow-ups)
_resolve_valuewon't fall back if the target's earliest year is in the future relative to the dataset (e.g. running on a 2023 dataset against a 2024+ EFO target givesNone). Not introduced by this PR; flagging for a separate look.Sources
Related
ni_class_3target (this PR closes).