Skip to content

feat(data): migrate zarr v2 → v3 (3.1.6) with zero-dead-space itar writes#68

Draft
janickm wants to merge 1 commit intoNVIDIA:mainfrom
janickm:zarr-v3
Draft

feat(data): migrate zarr v2 → v3 (3.1.6) with zero-dead-space itar writes#68
janickm wants to merge 1 commit intoNVIDIA:mainfrom
janickm:zarr-v3

Conversation

@janickm
Copy link
Copy Markdown
Collaborator

@janickm janickm commented Apr 2, 2026

Summary

Migrate the ncore data layer from zarr-python 2.x to zarr-python 3.1.6. New .zarr.itar files are written in zarr format 3; existing v2 .zarr.itar files remain fully readable (backwards compatible).

Core changes

  • IndexedTarStore rewritten as zarr3 Store ABC with async methods, byte range support, and consolidated metadata intercept
  • Zero dead space: ALL zarr.json writes are deferred in memory and flushed exactly once on close() — no duplicate records in the tar archive regardless of how many times zarr3 rewrites metadata
  • Root zarr.json intercepted at flush time → compressed as zarr.cbor.xz (CBOR+LZMA)
  • Transparent fallback reads legacy v2 .zmetadata.cbor.xz files
  • components.py updated to zarr3 APIs: create_array(), attrs.update(), group.members()/group.groups(), LocalStore, consolidated metadata

Python 3.8 dropped

  • Removed Python 3.8 toolchain registration, pip targets, lockfiles
  • Removed sys.version_info guards in types.py, base.py, transformations.py
  • Simplified @dataclass decorators to unconditional slots=True, kw_only=True

Dependencies

  • zarr>=3.1.6, cbor2>=5.9.0, python_requires>=3.11

Files changed (14 files, +650 / -671)

File Change
ncore/impl/data/stores.py Rewritten: zarr3 Store ABC, deferred zarr.json, consolidated metadata
ncore/impl/data/stores_test.py Rewritten for zarr3 APIs
ncore/impl/data/v4/components.py All zarr3 API updates
ncore/impl/data/v4/components_test.py Updated for zarr3
ncore/impl/data/types.py Removed version guards, simplified Self import
ncore/impl/data_converter/base.py Removed conditional dataclass kwargs
ncore/impl/common/transformations.py Removed conditional dataclass kwargs
deps/pip/requirements_ncore.in zarr>=3.1.6, cbor2>=5.9.0
ncore/BUILD.bazel python_requires>=3.11, zarr>=3.1.6
MODULE.bazel Removed Python 3.8 toolchain
pyproject.toml target-version = "py311"
deps/pip/BUILD.bazel Removed 3.8 pip_compile targets
deps/pip/requirements_3_8.in Deleted
deps/pip/requirements_3_8.txt Deleted

Test Plan

  • stores_test.py — 3/3 pass (consolidated metadata round-trip, reserialization)
  • components_test.py — 6/6 pass (reload itar, reload directory, new component extension)
  • compat_test.py — requires Bazel + @test-data-v4 archive (cannot run outside Bazel)
  • Lockfile regeneration — blocked by numpy<2 / torch==1.13.0 constraint conflict

Known blockers

  1. Lockfile: zarr>=3.1.6 requires numpy>=2.0, but torch==1.13.0+cu116 requires numpy<2. Lockfile cannot be regenerated until torch is upgraded.
  2. compat_test.py: Integration tests with real v2 itar files require Bazel and authenticated @test-data-v4 archive from GitHub Packages.

…ites

Migrate ncore data layer from zarr-python 2.x to zarr-python 3.1.6.
New .zarr.itar files are written in zarr format 3; existing v2 .zarr.itar
files remain readable (backwards compatible).

Core changes:
- Rewrite IndexedTarStore as zarr3 Store ABC implementation with async
  methods, byte range support, and consolidated metadata intercept
- All zarr.json writes are deferred in memory and flushed once on close(),
  guaranteeing zero dead space in tar archives
- Root zarr.json is intercepted at flush time and compressed as
  zarr.cbor.xz (CBOR+LZMA consolidated metadata format)
- Transparent fallback for legacy v2 .zmetadata.cbor.xz on read
- Update components.py to zarr3 APIs (create_array, attrs.update,
  group.members/groups, LocalStore, consolidated metadata)

Cleanup:
- Drop Python 3.8 support: remove toolchain, pip targets, lockfiles
- Remove sys.version_info guards in types.py, base.py, transformations.py
- Simplify dataclass decorators to unconditional slots=True, kw_only=True

Dependencies:
- zarr>=3.1.6, cbor2>=5.9.0, python_requires>=3.11
- Lockfile regeneration blocked by numpy<2 / torch constraint conflict
@janickm janickm self-assigned this Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant