Skip to content

Fix subtask annotation on multi-episode-per-file (LeRobot v3) videos: clip-relative frame timestamps#229

Open
Binh Pham (pham-tuan-binh) wants to merge 2 commits into
macrodata-labs:mainfrom
pham-tuan-binh:fix/clip-relative-frame-timestamps
Open

Fix subtask annotation on multi-episode-per-file (LeRobot v3) videos: clip-relative frame timestamps#229
Binh Pham (pham-tuan-binh) wants to merge 2 commits into
macrodata-labs:mainfrom
pham-tuan-binh:fix/clip-relative-frame-timestamps

Conversation

@pham-tuan-binh

Copy link
Copy Markdown

Behavior

When a VideoFile is decoded with a clip window (from_timestamp_s /
to_timestamp_s), iter_encoded_frames returns timestamps relative to the
whole video file, not to the clip.

This is what test_iter_frames_respects_clip_bounds asserts today: a clip
starting at from_timestamp_s=0.2 yields frame timestamps [0.2, 0.4, 0.6]
instead of [0.0, 0.2, 0.4].

def test_iter_frames_respects_clip_bounds(tmp_path) -> None:
    path = tmp_path / "video.mp4"
    _write_video(path, num_frames=5, fps=5)
    video = mdr.video.VideoFile(
        DataFile.resolve(path),
        from_timestamp_s=0.2,
        to_timestamp_s=0.7,
    )

    frames = asyncio.run(_collect_frames(video))

    assert [frame.index for frame in frames] == [0, 1, 2]
    assert [frame.timestamp_s for frame in frames] == [0.2, 0.4, 0.6]

Impact

A LeRobot v3 dataset bundles multiple episodes into one video file, so each
episode is a clip with from_timestamp > 0. With the current behavior, decoding
an episode returns video-wide timestamps, not episode timestamps.

robotics.subtask_annotation relies on episode-relative time, so on v3 datasets
it produces segments on the video clock (which don't match the episode's
timestamp column) — dropping segments for every episode except the first one
in each video file.

Fix

Rebase yielded timestamps to the clip start in iter_encoded_frames (subtract
clip_from once). _frame_timestamp_s stays absolute. No-op for unclipped videos
(clip_from == 0), and brings VideoFile in line with the in-memory sources.

Updated test_iter_frames_respects_clip_bounds to expect [0.0, 0.2, 0.4].

Question

Is this behavior expected? That a segmented video would return absolute timestamp instead of relative timestamps?

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your trial has ended. Reactivate Greptile to resume code reviews.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the video decoding logic to report frame timestamps relative to the clip start, adjusting the test assertions accordingly. The reviewer pointed out that while timestamp_s is rebased, pts remains absolute, which introduces an inconsistency. Additionally, they noted that a frame slightly before clip_from could yield a negative timestamp, and suggested clamping both timestamp_s and pts to be non-negative to prevent this.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +124 to +130
timestamp_s = _frame_timestamp_s(frame)
if timestamp_s is not None:
timestamp_s -= clip_from
yield DecodedVideoFrame(
index=index,
pts=None if frame.pts is None else int(frame.pts),
timestamp_s=_frame_timestamp_s(frame),
timestamp_s=timestamp_s,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While timestamp_s is rebased to the clip start, pts remains absolute. This creates an inconsistency between pts and timestamp_s for clipped videos, whereas other video sources (like VideoFrameSequence and VideoFrameArray) keep them in sync and relative to the clip start.

Additionally, due to the epsilon check in _iter_selected_frames (ts + _FRAME_TIMESTAMP_EPSILON_S < clip_from), a frame with a timestamp slightly less than clip_from can be yielded. Subtracting clip_from from its timestamp will result in a negative value. Clamping both timestamp_s and pts to be non-negative (at least 0.0 and 0 respectively) prevents this.

Suggested change
timestamp_s = _frame_timestamp_s(frame)
if timestamp_s is not None:
timestamp_s -= clip_from
yield DecodedVideoFrame(
index=index,
pts=None if frame.pts is None else int(frame.pts),
timestamp_s=_frame_timestamp_s(frame),
timestamp_s=timestamp_s,
timestamp_s = _frame_timestamp_s(frame)
pts = None if frame.pts is None else int(frame.pts)
if timestamp_s is not None:
timestamp_s = max(0.0, timestamp_s - clip_from)
if frame.time_base is not None:
pts = max(0, pts - int(round(clip_from / float(frame.time_base))))
yield DecodedVideoFrame(
index=index,
pts=pts,
timestamp_s=timestamp_s,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant