[codex] Extract Rerun reader by guipenedo · Pull Request #219 · macrodata-labs/refiner

Guilherme Penedo (guipenedo) · 2026-06-16T15:31:31Z

This PR extracts the Rerun reader into its own branch and keeps the reader-only surface separate from the writer, cleanup, and benchmark work.

Included here:

read_rerun pipeline entrypoints and package exports
the shared _rerun_io.py helper used by the reader
reader-only docs and optional-dependency updates
the reader test suite
the minimal core pipeline fixes needed for reader projections and metadata-only batch handling

Excluded from this branch:

write_rerun
Rerun writer sink code
cleanup/reducer changes
benchmark harnesses and performance-only tuning

Validation:

uv run ruff check ... on the reader branch files
uv run ty check ... on the reader branch files
uv run pytest tests/readers/test_rerun_reader.py

gemini-code-assist

Code Review

This pull request introduces a new Rerun reader (read_rerun) to parse .rrd files into columnar recording rows or robotics episode rows, along with corresponding documentation, tests, and dependency updates. Feedback on the changes highlights two critical issues: first, a deserialization bug in LocalRrd.__setstate__ where restoring local paths for remote sources can cause FileNotFoundError in distributed environments; second, a potential crash in _iter_encoded_images when handling empty or null image slices. Both issues include actionable code suggestions to resolve them.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-16T15:41:18Z

+    def __setstate__(self, state: dict[str, object]) -> None:
+        self.source = cast(DataFile, state["source"])
+        self.tmpdir = None
+        path = state.get("path")
+        self.path = Path(path) if isinstance(path, str) else None


In __setstate__, restoring self.path from the serialized state when the source is a remote file (i.e., self.source.is_local is False) will cause subsequent calls to open() on the deserialized machine to return a non-existent local temporary path from the serialization machine. This will lead to a FileNotFoundError during distributed or cloud execution. Only restore self.path if the source is local.

Suggested change

def __setstate__(self, state: dict[str, object]) -> None:

self.source = cast(DataFile, state["source"])

self.tmpdir = None

path = state.get("path")

self.path = Path(path) if isinstance(path, str) else None

def __setstate__(self, state: dict[str, object]) -> None:

self.source = cast(DataFile, state["source"])

self.tmpdir = None

path = state.get("path")

self.path = Path(path) if isinstance(path, str) and self.source.is_local else None

gemini-code-assist · 2026-06-16T15:41:19Z

+            byte_start = int(inner_offsets[outer_start])
+            byte_end = int(inner_offsets[outer_start + 1])
+            data = np.asarray(
+                inner_values.slice(byte_start, byte_end - byte_start)
+            ).tobytes()


If the inner image list is empty or null (e.g., a missing frame on the timeline), inner_offsets[outer_start + 1] will be equal to or less than inner_offsets[outer_start]. This results in byte_end <= byte_start, which causes inner_values.slice to return an empty array, leading to a crash in PIL.Image.open with UnidentifiedImageError. Add a guard to skip empty/null image slices.

Suggested change

byte_start = int(inner_offsets[outer_start])

byte_end = int(inner_offsets[outer_start + 1])

data = np.asarray(

inner_values.slice(byte_start, byte_end - byte_start)

).tobytes()

byte_start = int(inner_offsets[outer_start])

byte_end = int(inner_offsets[outer_start + 1])

if byte_end <= byte_start:

continue

data = np.asarray(

inner_values.slice(byte_start, byte_end - byte_start)

).tobytes()

Extract Rerun reader

96c2063

gemini-code-assist Bot reviewed Jun 16, 2026

View reviewed changes

Guilherme Penedo (guipenedo) added 3 commits June 17, 2026 23:33

Emit Rerun robotics rows directly

80251ad

Clean up Rerun robotics side data

8e59397

Simplify Rerun robotics rows

210dcbe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[codex] Extract Rerun reader#219

[codex] Extract Rerun reader#219
Guilherme Penedo (guipenedo) wants to merge 4 commits into
mainfrom
codex/rerun-reader-pr

Guilherme Penedo (guipenedo) commented Jun 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Guilherme Penedo (guipenedo) commented Jun 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant