Fix subtask annotation on multi-episode-per-file (LeRobot v3) videos: clip-relative frame timestamps#229
Conversation
There was a problem hiding this comment.
Your trial has ended. Reactivate Greptile to resume code reviews.
There was a problem hiding this comment.
Code Review
This pull request updates the video decoding logic to report frame timestamps relative to the clip start, adjusting the test assertions accordingly. The reviewer pointed out that while timestamp_s is rebased, pts remains absolute, which introduces an inconsistency. Additionally, they noted that a frame slightly before clip_from could yield a negative timestamp, and suggested clamping both timestamp_s and pts to be non-negative to prevent this.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| timestamp_s = _frame_timestamp_s(frame) | ||
| if timestamp_s is not None: | ||
| timestamp_s -= clip_from | ||
| yield DecodedVideoFrame( | ||
| index=index, | ||
| pts=None if frame.pts is None else int(frame.pts), | ||
| timestamp_s=_frame_timestamp_s(frame), | ||
| timestamp_s=timestamp_s, |
There was a problem hiding this comment.
While timestamp_s is rebased to the clip start, pts remains absolute. This creates an inconsistency between pts and timestamp_s for clipped videos, whereas other video sources (like VideoFrameSequence and VideoFrameArray) keep them in sync and relative to the clip start.
Additionally, due to the epsilon check in _iter_selected_frames (ts + _FRAME_TIMESTAMP_EPSILON_S < clip_from), a frame with a timestamp slightly less than clip_from can be yielded. Subtracting clip_from from its timestamp will result in a negative value. Clamping both timestamp_s and pts to be non-negative (at least 0.0 and 0 respectively) prevents this.
| timestamp_s = _frame_timestamp_s(frame) | |
| if timestamp_s is not None: | |
| timestamp_s -= clip_from | |
| yield DecodedVideoFrame( | |
| index=index, | |
| pts=None if frame.pts is None else int(frame.pts), | |
| timestamp_s=_frame_timestamp_s(frame), | |
| timestamp_s=timestamp_s, | |
| timestamp_s = _frame_timestamp_s(frame) | |
| pts = None if frame.pts is None else int(frame.pts) | |
| if timestamp_s is not None: | |
| timestamp_s = max(0.0, timestamp_s - clip_from) | |
| if frame.time_base is not None: | |
| pts = max(0, pts - int(round(clip_from / float(frame.time_base)))) | |
| yield DecodedVideoFrame( | |
| index=index, | |
| pts=pts, | |
| timestamp_s=timestamp_s, |
Behavior
When a
VideoFileis decoded with a clip window (from_timestamp_s/to_timestamp_s),iter_encoded_framesreturns timestamps relative to thewhole video file, not to the clip.
This is what
test_iter_frames_respects_clip_boundsasserts today: a clipstarting at
from_timestamp_s=0.2yields frame timestamps[0.2, 0.4, 0.6]instead of
[0.0, 0.2, 0.4].Impact
A LeRobot v3 dataset bundles multiple episodes into one video file, so each
episode is a clip with
from_timestamp > 0. With the current behavior, decodingan episode returns video-wide timestamps, not episode timestamps.
robotics.subtask_annotationrelies on episode-relative time, so on v3 datasetsit produces segments on the video clock (which don't match the episode's
timestampcolumn) — dropping segments for every episode except the first onein each video file.
Fix
Rebase yielded timestamps to the clip start in
iter_encoded_frames(subtractclip_fromonce)._frame_timestamp_sstays absolute. No-op for unclipped videos(
clip_from == 0), and bringsVideoFilein line with the in-memory sources.Updated
test_iter_frames_respects_clip_boundsto expect[0.0, 0.2, 0.4].Question
Is this behavior expected? That a segmented video would return absolute timestamp instead of relative timestamps?