Claude Opus 4.8 note (filed via Claude Code on the maintainer's behalf):
When training with a type: loss evaluator, the validation lm_head_loss is logged as exactly 0 at every eval step, while training.lm_head_loss is correct. Observed across 8 independent training runs, in both the console logs and W&B (evaluations.<name>.lm_head_loss = 0).
Likely cause
In LanguageModelHead._logits_loss_forward_backward (fast_llm/layers/language_model/head.py:203-206 on current main), the eval-mode branch computes logits only and returns None for the loss — it never runs the cross-entropy or appends to losses:
if not self.training:
logits, _ = self._logits_loss_forward_backward_partial(input_, kwargs, return_logits=True)
self._debug(logits, "logits", (kwargs[LanguageModelKwargs.hidden_token_dim], self._vocab_dim), kwargs)
return None, None
The LossEvaluator runs the model in eval mode, so the head produces no loss and the aggregated eval metric is 0.
Open question
I haven't pinned the intended fix — either the evaluator should run the forward with loss computation enabled, or the head should compute the loss in eval mode when targets are present. Either way, validation loss appears to be unusable through this path right now.
Repro
Any fast-llm train run configured with a loss evaluator; inspect evaluations.<name>.lm_head_loss (logs as 0).
Claude Opus 4.8 note (filed via Claude Code on the maintainer's behalf):
When training with a
type: lossevaluator, the validationlm_head_lossis logged as exactly 0 at every eval step, whiletraining.lm_head_lossis correct. Observed across 8 independent training runs, in both the console logs and W&B (evaluations.<name>.lm_head_loss = 0).Likely cause
In
LanguageModelHead._logits_loss_forward_backward(fast_llm/layers/language_model/head.py:203-206on currentmain), the eval-mode branch computes logits only and returnsNonefor the loss — it never runs the cross-entropy or appends tolosses:The
LossEvaluatorruns the model in eval mode, so the head produces no loss and the aggregated eval metric is 0.Open question
I haven't pinned the intended fix — either the evaluator should run the forward with loss computation enabled, or the head should compute the loss in eval mode when targets are present. Either way, validation loss appears to be unusable through this path right now.
Repro
Any
fast-llm trainrun configured with alossevaluator; inspectevaluations.<name>.lm_head_loss(logs as 0).