Skip to content

Calibrate combined self-employed NICs and skip inert Class 3 target#379

Open
vahid-ahmadi wants to merge 1 commit intomainfrom
feat/ni-class-2-4-calibration
Open

Calibrate combined self-employed NICs and skip inert Class 3 target#379
vahid-ahmadi wants to merge 1 commit intomainfrom
feat/ni-class-2-4-calibration

Conversation

@vahid-ahmadi
Copy link
Copy Markdown
Collaborator

Summary

  • Add an obr/ni_self_employed calibration target for the combined Class 2 + Class 4 line that recent OBR EFOs publish (e.g. March 2026), computed via a custom_compute that sums ni_class_2 + ni_class_4 at the household level.
  • Remove the ni_class_3 entry from the OBR NIC parser. ni_class_3 is an input variable in PolicyEngine UK with no formula and no dataset populates it — the matrix column would be a flat zero, so the target is inert and its calibration diagnostic is misleading.
  • Add hermetic + signal regression tests covering both layouts (combined line, legacy separate Class 2 / Class 4 lines) and the Class 3 absence.

Closes #378. Partial close of #88 (Class 3 imputation remains a separate follow-up).

Why

Two latent bugs surfaced during an audit of NI calibration:

  1. Self-employed NICs were silently uncalibrated against OBR. The previous parser looked for "Class 2 NICs" and "Class 4 NICs" rows in EFO Table 3.4. The March 2026 EFO publishes a single combined line — "Class 4 and Class 2 Self employed NICs" — so neither legacy candidate matched and both targets dropped out of the registry. Microsimulation.calculate("ni_class_2") and ("ni_class_4") do return signal in the dataset (~9.7k and ~7.6k non-zero people respectively, weighted totals ~£0.5bn and ~£3.5bn), so the optimiser had usable matrix columns — just nothing to aim them at.
  2. ni_class_3 was registered but inert. The OBR Class 3 row was emitted as a target, but Microsimulation.calculate("ni_class_3") returns a uniform zero (no formula, no dataset path). The calibrator saw a flat-zero matrix column and an OBR target it could never match. Reported as Handle unpopulated ni_class_3 target #378.

What this PR does

Combined self-employed target

The parser now lists "Class 4 and Class 2 Self employed NICs" and "Class 2 and Class 4 Self employed NICs" as the primary candidate labels for a new obr/ni_self_employed target. When the parser matches this line, it attaches a custom_compute callable that the loss-matrix builder uses instead of the simple-GBP fallback:

def _compute_ni_self_employed_combined(ctx, target, year):
    class_2 = ctx.sim.calculate("ni_class_2")
    class_4 = ctx.sim.calculate("ni_class_4")
    return ctx.household_from_person(class_2 + class_4)

The legacy "Class 2 NICs" / "Class 4 NICs" candidate labels are retained so older or future EFOs that revert to separate rows continue to produce individual obr/ni_class_2 / obr/ni_class_4 targets.

Smoke build on the live target matrix

Running create_target_matrix against enhanced_frs_2023_24.h5 at time_period=2025 now yields:

Target Non-zero households OBR target
obr/ni_employee 52,540 £49.54bn
obr/ni_employer 57,196 £145.32bn
obr/ni_self_employed 6,848 £2.90bn
obr/salary_sacrifice_employee_ni_relief 13,578 £1.24bn
obr/salary_sacrifice_employer_ni_relief 14,300 £2.99bn

obr/ni_class_3 is absent from the matrix.

Class 3 skip

ni_class_3 is removed from the parser row-spec dict with a comment explaining (a) why it is currently inert, (b) why it is too small (~£50m vs ~£150bn total NICs) to justify a heuristic imputation, and (c) what would need to change to re-enable it. obr.py is the single place to revert if a Class 3 imputation lands in the future.

Tests (28 added / changed, all passing locally)

test_obr_nics.py (parser, hermetic, no Microsimulation):

  • test_parse_nics_combined_self_employed_line — current EFO layout: emits employee / employer / ni_self_employed (with custom_compute), and the combined target's value at 2024 is the £bn figure scaled to £.
  • test_parse_nics_falls_back_to_separate_classes_for_old_efo — older EFO layout: emits the four individual classes (no combined).
  • test_parse_nics_intentionally_skips_class_3_in_combined_efo — Class 3 is hidden in "Other NIC" in current EFOs and must not be emitted.
  • test_parse_nics_intentionally_skips_class_3_in_separate_efo — even when an EFO publishes a Class 3 row, the parser drops it.
  • test_parse_nics_tolerates_alt_label_wording — preserved: Self-Employed vs self-employed variants still match.

test_obr_nic_signal.py (signal, gated on the enhanced_frs fixture):

  • test_obr_nic_target_registry_includes_active_classes — the three top-line NIC class targets are registered.
  • test_obr_ni_self_employed_target_uses_custom_compute — the combined target carries a callable custom_compute; without it the loss matrix would look up a non-existent ni_self_employed PE-UK variable.
  • test_obr_ni_class_3_target_is_intentionally_absent — neither the name nor the variable appear in the OBR target set.
  • test_active_nic_variable_has_nonzero_variation (parametrised over ni_employee, ni_employer, ni_class_2, ni_class_4) — each PE-UK NIC variable produces non-zero variation across households.
  • test_self_employed_combined_compute_returns_nonzero — the sum used by custom_compute produces a non-zero household-level vector.
  • test_ni_class_3_simulator_returns_uniform_zero — direct evidence for why Class 3 is excluded; if PE-UK ever adds a formula or this repo adds an imputation, restore the parser row.

Out of scope (follow-ups)

  • Class 3 imputation. Voluntary contributions are paid by people topping up their state-pension record — a population that the FRS does not cleanly identify (lifetime work history isn't in the survey). At ~£50m / ~0.03% of total NICs the calibration benefit is small and the engineering risk (heuristic imputation distorting other targets) is non-trivial. Worth a focused follow-up issue if anyone needs Class 3 specifically.
  • Loss-matrix year resolver. While verifying this PR I noticed _resolve_value won't fall back if the target's earliest year is in the future relative to the dataset (e.g. running on a 2023 dataset against a 2024+ EFO target gives None). Not introduced by this PR; flagging for a separate look.

Sources

Related

Two NI calibration gaps surfaced during issue audit (#88) and bug
report #378:

1. Recent OBR EFOs (e.g. March 2026) publish a single combined
   "Class 4 and Class 2 Self employed NICs" line instead of two
   separate rows. The parser's Class 2 / Class 4 candidate labels
   no longer matched, so neither target was registered and
   self-employed NICs were silently uncalibrated against OBR.
2. ni_class_3 is an input variable in PolicyEngine UK with no
   formula and no dataset path that populates it. The matrix
   column is therefore a flat zero, calibration cannot move it,
   and the diagnostic that "the target is included" is misleading.

This commit:

- Adds an obr/ni_self_employed target whose values come from the
  combined EFO line and whose matrix column is computed via a new
  custom_compute that sums ni_class_2 + ni_class_4 at the household
  level. Smoke-build on enhanced_frs_2023_24.h5 with year=2025:
  6,848 non-zero households, target £2.90bn.
- Keeps the legacy Class 2 / Class 4 candidate labels around so
  older or future EFOs that revert to separate rows still produce
  individual targets.
- Removes the ni_class_3 entry from _parse_nics with a comment
  pointing at #378 and the conditions for restoring it (a Class 3
  imputation that addresses #88 in full).

Tests cover both layers:

- test_obr_nics.py: parser handles the combined EFO layout, the
  legacy separate layout, and intentionally drops Class 3 in either
  format.
- test_obr_nic_signal.py: the registered targets are present in
  the registry, the combined target carries a custom_compute
  callable, ni_class_3 is absent, and (gated on enhanced_frs) each
  underlying PE-UK NI variable produces non-zero variation while
  ni_class_3 returns a uniform zero — the very property that makes
  it inert as a calibration target.

Closes #378. Partial close of #88 — Class 3 imputation remains a
separate follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi self-assigned this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle unpopulated ni_class_3 target

1 participant