Skip to content

Dispatched set_input values are dropped after cache invalidation #484

@anth-volk

Description

@anth-volk

Related to #482.

Symptom

User-provided inputs that go through a custom set_input dispatcher can be lost after Simulation._invalidate_all_caches() runs, for example via apply_reform().

Minimal repro using the country template salary variable, which is monthly but accepts a yearly input through set_input_divide_by_period:

import numpy as np
from policyengine_core.country_template import CountryTaxBenefitSystem, situation_examples
from policyengine_core.model_api import Reform
from policyengine_core.simulations import SimulationBuilder

sim = SimulationBuilder().build_from_entities(
    CountryTaxBenefitSystem(),
    situation_examples.single,
)

sim.set_input("salary", "2017", np.array([12_000.0]))
assert sim.calculate("salary", "2017-01")[0] == 1_000

class NoOpReform(Reform):
    def apply(self):
        pass

sim.apply_reform(NoOpReform)

# Expected: still 1_000. Currently falls back to the default value.
assert sim.calculate("salary", "2017-01")[0] == 1_000

Observed locally while reviewing #483:

before apply_reform: [1000.]
after apply_reform:  [0.]

Suspected mechanism

This appears to be the same broad class of issue as #482: cache invalidation preserves user inputs by replaying entries from _user_input_keys, but _user_input_keys can record a different period from the one actually written to storage.

For this case:

  • Simulation.set_input("salary", "2017", ...) / Holder.set_input(...) records (salary, default, 2017) in _user_input_keys.
  • Because salary.definition_period == MONTH and salary.set_input == set_input_divide_by_period, Holder.set_input dispatches to the helper instead of storing salary@2017 directly.
  • set_input_divide_by_period writes actual monthly values like salary@2017-01, ..., salary@2017-12 via holder._set(...).
  • _invalidate_all_caches() later tries to preserve salary@2017, misses the monthly storage entries, clears the holder storage, and the monthly user inputs are lost.

#482 covers the ETERNITY version of this mismatch: requested ordinary period, stored ETERNITY. This issue covers custom set_input dispatch where requested broad period is expanded into subperiod storage writes.

Expected behavior

User-provided inputs should survive cache invalidation, regardless of whether they are stored directly, canonicalized to ETERNITY, or expanded by a custom set_input dispatcher. Formula-computed cache outputs should still be invalidated.

Possible fix direction

Track the actual storage writes made as part of a user input rather than the caller-requested period. One option is to move preservation tracking closer to Holder._set, with an internal user-input context set by Holder.set_input, so dispatched writes record their real (variable, branch, period) keys. Branch handling would need care because existing dispatcher helpers call holder._set(sub_period, array) without an explicit branch_name.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions