Skip to content

[Question] Bug? IntensityFree: get_dt_stats computes statistics on raw inter-event times instead of log(τ), breaking the LogNormalMixtureDistribution standardization #84

@freshjarman

Description

@freshjarman

Summary

The statistics fed into IntensityFree's LogNormalMixtureDistribution are computed on raw inter-event times $\tau$, but the variable is named mean_log_inter_time / std_log_inter_time and is consumed by an AffineTransform that — per the original IFL-TPP paper and reference implementation — expects the mean and std of $\log\tau$. This silently breaks the intended standardization of the GMM's base space and diverges from the upstream implementation (Shchur et al., ICLR 2020).

Where the issue is

1. easy_tpp/preprocess/dataset.py::TPPDataset.get_dt_stats

for dts, marks in zip(self.time_delta_seqs, self.type_seqs):
    dts = np.array(dts[1:-1 if marks[-1] == -1 else None])
    ...
    y_bar = dts.mean()    # mean of raw τ
    s_2_y = dts.var()     # var of raw τ
    ...
return x_bar, (s_2_x ** 0.5), min_dt, max_dt

Per-chunk inputs y_bar, s_2_y are computed on raw dts, not on np.log(dts).

2. easy_tpp/runner/base_runner.py (around L43)

mean_log_inter_time, std_log_inter_time, min_dt, max_dt = (
    self._data_loader.train_loader().dataset.get_dt_stats())
runner_config.model_config.set("mean_log_inter_time", mean_log_inter_time)
runner_config.model_config.set("std_log_inter_time", std_log_inter_time)

The values are assigned to keys that semantically promise "log-space statistics", with no log transform in between.

Comparison with the original IFL-TPP repo

Reference: shchur/ifl-tpp (ICLR 2020 official).

dpp/data/dataset.py computes log-space statistics explicitly:

def get_inter_time_statistics(self):
    """Get the mean and std of log(inter_time)."""
    all_inter_times = torch.cat([seq.inter_times[:-1] for seq in self.sequences])
    mean_log_inter_time = all_inter_times.log().mean()
    std_log_inter_time = all_inter_times.log().std()
    return mean_log_inter_time, std_log_inter_time

dpp/models/log_norm_mix.py documents the intended modelling chain:

x ~ GaussianMixtureModel(locs, log_scales, log_weights)
y = std_log_inter_time * x + mean_log_inter_time     # <- expects log-space stats
z = exp(y)

The class structure (AffineTransform ∘ ExpTransform) in EasyTPP is the same, but the values supplied to it are computed on the wrong scale.

Is it a bug?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions