Skip to content

feat(heartbeat): write last-run.json at top of send-telemetry#148

Merged
ashishkurmi merged 3 commits into
step-security:mainfrom
swarit-stepsecurity:swarit/feat/last-run-heartbeat
Jun 23, 2026
Merged

feat(heartbeat): write last-run.json at top of send-telemetry#148
ashishkurmi merged 3 commits into
step-security:mainfrom
swarit-stepsecurity:swarit/feat/last-run-heartbeat

Conversation

@swarit-stepsecurity

Copy link
Copy Markdown
Member

Adds a local "I started" breadcrumb written to /last-run.json as the first action of the send-telemetry path — before the enterprise gate and before telemetry.Run acquires the singleton lock.

agent.error.log and scan-state.json only appear once a run gets far enough to log a line or finish an upload. Several failure modes never reach that point: a process killed mid-startup (e.g. the Windows GUI-launcher teardown), a run that fails the enterprise gate, a lock it can never acquire. The heartbeat records {written_at, pid, agent_version, command, invocation_method, os} independent of all that, so:

  • stale last-run.json -> the agent isn't being invoked (scheduler not firing: battery policy, missing/broken task)
  • fresh last-run.json + no server-side telemetry -> the agent runs but dies/fails before upload

invocation_method reuses telemetry's scheduler-footprint detection, so a scheduled fire is distinguishable from a manual run. The write is durable against the abrupt termination it records: temp sibling, fsync, atomic rename (mirrors internal/state.Save) — a kill leaves the previous or the new file, never a truncated one. Best-effort: a write failure is logged at debug and never affects the run.

What does this PR do?

Type of change

  • Bug fix
  • Enhancement
  • Documentation

Testing

  • Tested on macOS (version: ___)
  • Binary runs without errors: ./stepsecurity-dev-machine-guard --verbose
  • JSON output is valid: ./stepsecurity-dev-machine-guard --json | python3 -m json.tool
  • No secrets or credentials included
  • Lint passes: make lint
  • Tests pass: make test

Related Issues

Adds a local "I started" breadcrumb written to <install-dir>/last-run.json
as the first action of the send-telemetry path — before the enterprise
gate and before telemetry.Run acquires the singleton lock.

agent.error.log and scan-state.json only appear once a run gets far
enough to log a line or finish an upload. Several failure modes never
reach that point: a process killed mid-startup (e.g. the Windows
GUI-launcher teardown), a run that fails the enterprise gate, a lock it
can never acquire. The heartbeat records {written_at, pid, agent_version,
command, invocation_method, os} independent of all that, so:

  - stale last-run.json  -> the agent isn't being invoked (scheduler not
    firing: battery policy, missing/broken task)
  - fresh last-run.json + no server-side telemetry -> the agent runs but
    dies/fails before upload

invocation_method reuses telemetry's scheduler-footprint detection, so a
scheduled fire is distinguishable from a manual run. The write is durable
against the abrupt termination it records: temp sibling, fsync, atomic
rename (mirrors internal/state.Save) — a kill leaves the previous or the
new file, never a truncated one. Best-effort: a write failure is logged
at debug and never affects the run.
@ashishkurmi ashishkurmi merged commit c9b55b7 into step-security:main Jun 23, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants