[codex] Extract Rerun reader#219
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new Rerun reader (read_rerun) to parse .rrd files into columnar recording rows or robotics episode rows, along with corresponding documentation, tests, and dependency updates. Feedback on the changes highlights two critical issues: first, a deserialization bug in LocalRrd.__setstate__ where restoring local paths for remote sources can cause FileNotFoundError in distributed environments; second, a potential crash in _iter_encoded_images when handling empty or null image slices. Both issues include actionable code suggestions to resolve them.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def __setstate__(self, state: dict[str, object]) -> None: | ||
| self.source = cast(DataFile, state["source"]) | ||
| self.tmpdir = None | ||
| path = state.get("path") | ||
| self.path = Path(path) if isinstance(path, str) else None |
There was a problem hiding this comment.
In __setstate__, restoring self.path from the serialized state when the source is a remote file (i.e., self.source.is_local is False) will cause subsequent calls to open() on the deserialized machine to return a non-existent local temporary path from the serialization machine. This will lead to a FileNotFoundError during distributed or cloud execution. Only restore self.path if the source is local.
| def __setstate__(self, state: dict[str, object]) -> None: | |
| self.source = cast(DataFile, state["source"]) | |
| self.tmpdir = None | |
| path = state.get("path") | |
| self.path = Path(path) if isinstance(path, str) else None | |
| def __setstate__(self, state: dict[str, object]) -> None: | |
| self.source = cast(DataFile, state["source"]) | |
| self.tmpdir = None | |
| path = state.get("path") | |
| self.path = Path(path) if isinstance(path, str) and self.source.is_local else None |
| byte_start = int(inner_offsets[outer_start]) | ||
| byte_end = int(inner_offsets[outer_start + 1]) | ||
| data = np.asarray( | ||
| inner_values.slice(byte_start, byte_end - byte_start) | ||
| ).tobytes() |
There was a problem hiding this comment.
If the inner image list is empty or null (e.g., a missing frame on the timeline), inner_offsets[outer_start + 1] will be equal to or less than inner_offsets[outer_start]. This results in byte_end <= byte_start, which causes inner_values.slice to return an empty array, leading to a crash in PIL.Image.open with UnidentifiedImageError. Add a guard to skip empty/null image slices.
| byte_start = int(inner_offsets[outer_start]) | |
| byte_end = int(inner_offsets[outer_start + 1]) | |
| data = np.asarray( | |
| inner_values.slice(byte_start, byte_end - byte_start) | |
| ).tobytes() | |
| byte_start = int(inner_offsets[outer_start]) | |
| byte_end = int(inner_offsets[outer_start + 1]) | |
| if byte_end <= byte_start: | |
| continue | |
| data = np.asarray( | |
| inner_values.slice(byte_start, byte_end - byte_start) | |
| ).tobytes() |
This PR extracts the Rerun reader into its own branch and keeps the reader-only surface separate from the writer, cleanup, and benchmark work.
Included here:
read_rerunpipeline entrypoints and package exports_rerun_io.pyhelper used by the readerExcluded from this branch:
write_rerunValidation:
uv run ruff check ...on the reader branch filesuv run ty check ...on the reader branch filesuv run pytest tests/readers/test_rerun_reader.py