refactor: one array class#4034
Conversation
…#176) Bumps the actions group with 8 updates in the / directory: | Package | From | To | | --- | --- | --- | | [prefix-dev/setup-pixi](https://github.com/prefix-dev/setup-pixi) | `0.9.5` | `0.9.6` | | [codecov/codecov-action](https://github.com/codecov/codecov-action) | `6.0.0` | `6.0.1` | | [github/issue-metrics](https://github.com/github/issue-metrics) | `4.2.2` | `4.2.7` | | [j178/prek-action](https://github.com/j178/prek-action) | `2.0.3` | `2.0.4` | | [actions/upload-artifact](https://github.com/actions/upload-artifact) | `7.0.0` | `7.0.1` | | [actions/download-artifact](https://github.com/actions/download-artifact) | `7.0.0` | `8.0.1` | | [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) | `1.13.0` | `1.14.0` | | [zizmorcore/zizmor-action](https://github.com/zizmorcore/zizmor-action) | `0.5.3` | `0.5.6` | Updates `prefix-dev/setup-pixi` from 0.9.5 to 0.9.6 - [Release notes](https://github.com/prefix-dev/setup-pixi/releases) - [Commits](prefix-dev/setup-pixi@1b2de7f...5185adf) Updates `codecov/codecov-action` from 6.0.0 to 6.0.1 - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@57e3a13...e79a696) Updates `github/issue-metrics` from 4.2.2 to 4.2.7 - [Release notes](https://github.com/github/issue-metrics/releases) - [Commits](github-community-projects/issue-metrics@c9e9838...1e38d5e) Updates `j178/prek-action` from 2.0.3 to 2.0.4 - [Release notes](https://github.com/j178/prek-action/releases) - [Commits](j178/prek-action@6ad8027...bdca6f1) Updates `actions/upload-artifact` from 7.0.0 to 7.0.1 - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v7...043fb46) Updates `actions/download-artifact` from 7.0.0 to 8.0.1 - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@v7...3e5f45b) Updates `pypa/gh-action-pypi-publish` from 1.13.0 to 1.14.0 - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](pypa/gh-action-pypi-publish@v1.13.0...cef2210) Updates `zizmorcore/zizmor-action` from 0.5.3 to 0.5.6 - [Release notes](https://github.com/zizmorcore/zizmor-action/releases) - [Commits](zizmorcore/zizmor-action@b1d7e1f...5f14fd0) --- updated-dependencies: - dependency-name: prefix-dev/setup-pixi dependency-version: 0.9.6 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: codecov/codecov-action dependency-version: 6.0.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: github/issue-metrics dependency-version: 4.2.7 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: j178/prek-action dependency-version: 2.0.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: actions/upload-artifact dependency-version: 7.0.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: actions/download-artifact dependency-version: 8.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: actions - dependency-name: pypa/gh-action-pypi-publish dependency-version: 1.14.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions - dependency-name: zizmorcore/zizmor-action dependency-version: 0.5.6 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ocol Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Array no longer wraps an AsyncArray. It owns metadata, store_path, config, codec_pipeline, _chunk_grid, and a pluggable _runner (defaulting to SyncRunner). Adds Array._from_async_array and a deprecated async_array property. External Array(async_array) construction sites are converted to Array._from_async_array. Fixes downstream typing fallout from removing the _async_array attribute. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…roperty Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…runner Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…unner Routes resize/append/update_attributes/nchunks_initialized/nbytes_stored/ info_complete through self._runner.run(self.*_async(...)), which mutate the live Array. Fixes resize/append not updating array state. Array no longer delegates to the deprecated async_array property. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hods
Adds get/set_{orthogonal,mask,coordinate,block}_selection_async to Array and
migrates tests off the deprecated async_array property where an Array async
equivalent now exists.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Eliminates duplicated indexer construction and coordinate value-validation by routing each sync selection method through self._runner.run of its *_async twin. Adds get/set_basic_selection_async for a complete surface. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- update_attributes (sync) returns a fresh Array, preserving the prior contract - from_array docstring example uses a public construction path - align SupportsArrayState._iter_shard_keys signature with the real methods - restore AsyncArray coverage in test_get_shape_chunks - extract shared sharding-codec helper to dedup Array/AsyncArray properties Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4034 +/- ##
==========================================
+ Coverage 93.53% 93.56% +0.02%
==========================================
Files 88 88
Lines 11894 12027 +133
==========================================
+ Hits 11125 11253 +128
- Misses 769 774 +5
🚀 New features to boost your workflow:
|
Softens the constructor break: Array(async_array) still works but emits a DeprecationWarning, constructing from the async array's metadata/store_path/ config. The new Array(metadata, store_path, ...) form is preferred. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The docs build guards that every python block declares exec/test; the new custom-runner example was a bare fence. Mark it exec="true" (it constructs an Array with a custom runner, which runs cleanly) and drop the unused Runner import. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…compressor/filters) Closes coverage gaps introduced by the Array unification: the store_path-required TypeError, __eq__ NotImplemented path, the sharded read_chunk_sizes/_chunk_grid_shape branch, and the v2/v3 compressor and v2 filters property branches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s, iterators, async_array - legacy Array(async_array) raises if store_path/config also supplied - update_attributes_async returns a fresh Array, consistent with the sync form - align Array._iter_shard_coords signature with sibling iterators - async_array property left uncached: resize/append replace metadata so caching would be stale Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
the main breaking change I was worried about here is |
maxrjones
left a comment
There was a problem hiding this comment.
Why did you choose this approach over using AsyncArray as the single source of truth, and making Array a facade holding a reference to it? The regression section details the most critical issue with this PR, and would be prevented by having AsyncArray as the single source of truth. Using AsyncArray as the single source of truth would reduce the amount of code needed by a few hundred lines (see duplication section). A facade Array can still expose every *_async method and the runner-dispatched sync wrappers by delegating to the held AsyncArray, so the user-facing API of this PR is identical either way.
The regression and the code duplication below are two symptoms of the same root cause: after this PR, Array and AsyncArray each independently own array state and derive behavior from it. Caching the handle would fix the regression and delegating to the shared helpers would fix the method duplication. Even with those changes, the state and the derived properties remain mirrored across both classes, and every future change must land on both sides. The facade removes the second source of truth instead of patching its symptoms one at a time.
Also, what's the use-case for a synchronous Array having a custom runner? I would think that most people who want to bring an event loop would just use the async methods. It seems like a YAGNI case, and not worth exposing publicly in this PR since it's not usable via zarr.open_array, zarr.create_array, etc.
regression
On main, Array.async_array returned the shared backing _async_array, and all Array properties read through it — so arr.async_array.resize((N,)) updated arr.shape too. On this branch the property constructs a fresh, throwaway AsyncArray(self.metadata, self.store_path, self.config) on every access. Three consequences, all verified:
arr.async_array.resize((N,))— a previously working pattern — now runs_resizeagainst the detached temporary: the store metadata is updated (array.py:6683) butarr.metadata/arr.shapestay stale. Subsequentarr[...]reads/writes index against the old shape while the store has the new one.aa = arr.async_array; arr.resize(...)leavesaaholding the pre-resize metadata, sinceresizerebindsarr.metadataviaobject.__setattr__onarronly (array.py:4634,6686).arr.async_array is arr.async_arrayis nowFalse, breaking identity-based caching.
The changelog frames async_array as "deprecated but still works for now"; for mutating operations it does not work — it silently desynchronizes the handle from the array.
code duplication avoided by using AsyncArray as the single source of truth
AsyncArray's selection methods are one-line delegations to shared module-level helpers:
# AsyncArray.get_orthogonal_selection (array.py:1544)
return await _get_orthogonal_selection(
self.store_path, self.metadata, self.codec_pipeline, self.config, self._chunk_grid,
selection, out=out, fields=fields, prototype=prototype,
)The new Array.get_orthogonal_selection_async re-inlines the body of that same helper instead of calling it:
# Array.get_orthogonal_selection_async (array.py:2865)
if prototype is None:
prototype = default_buffer_prototype()
indexer = OrthogonalIndexer(selection, self.shape, self._chunk_grid)
return await self._get_selection(indexer=indexer, out=out, fields=fields, prototype=prototype)# _get_orthogonal_selection — the existing shared helper (array.py:6336)
if prototype is None:
prototype = default_buffer_prototype()
indexer = OrthogonalIndexer(selection, metadata.shape, chunk_grid)
return await _get_selection(..., indexer=indexer, out=out, fields=fields, prototype=prototype)The same pattern repeats for the mask, coordinate, and block selection getters and setters. Beyond the methods:
Array._info(array.py:1954) is a byte-for-byte copy ofAsyncArray._info(array.py:1825) — a 20-lineArrayInfoconstruction duplicated verbatim.- The
__init__state-derivation block (parse_array_metadata/ChunkGrid.from_metadata/create_codec_pipeline) is duplicated betweenAsyncArray.__init__(array.py:415–426) andArray.__init__(array.py:1892–1902). - The derived properties (
order,read_only,filters,serializer,compressors,nchunks, …) are mirrored across both classes.
With the docstrings these twins carry, that's the few hundred lines. Each pair is a place where a future fix can land on one side and silently miss the other. Sync and async selection silently returning different results for the same call. A facade Array delegating to its held AsyncArray would have exactly one copy of each.
| """Return the array's sharding codec, or `None` if the array is not sharded. | ||
|
|
||
| An array is considered sharded when its metadata declares exactly one codec | ||
| and that codec is a `ShardingCodec`. | ||
| """ |
There was a problem hiding this comment.
| """Return the array's sharding codec, or `None` if the array is not sharded. | |
| An array is considered sharded when its metadata declares exactly one codec | |
| and that codec is a `ShardingCodec`. | |
| """ | |
| """Return the array's sole sharding codec, or `None`. | |
| The gate used by the chunk/shard accessors: sharding is reported only | |
| when the sharding codec is the only declared codec, because any other | |
| codec (e.g. ``codecs=[sharding, gzip]``) makes the inner chunks not | |
| independently addressable. Not a general sharded-ness predicate; see #4036. | |
| """ |
I think it's important to document that this won't always return the sharding codec to prevent misuse.
There was a problem hiding this comment.
Does the code today actually do the check you are describing?
|
Here's Claude's evaluation of the pros/cons of either class being the facade, which I think is the cleanest way to avoid duplicate state. It seems to come down to how much diff/compatibility risk we accept now in exchange for Note, a third shape exists where both classes hold a shared mutable state object, with methods delegating to the module-level helpers this branch already has, but it adds a third class without changing the analysis below much. Option A — review's direction:
|
…into one-array-class
…tion getters Array.async_array previously returned a fresh, throwaway AsyncArray on every access, so mutations through the handle (e.g. resize) silently desynced from the parent Array, and handle identity was unstable. Introduce _AsyncArrayView, a cached view whose state reads through the parent Array and whose mutations land on it via a single _rebind_state seam (replacing the raw object.__setattr__ poke in _resize). Array now fully owns its state and AsyncArray depends on it, so the eventual AsyncArray removal is deleting a shim rather than a migration. Also addresses the duplication flagged in review: - Array's orthogonal/mask/coordinate *_async getters now delegate to the shared _get_*_selection helpers instead of re-inlining indexer construction. - The byte-identical AsyncArray._info / Array._info bodies collapse into one module-level _array_info helper. Add a signature-parity test over Array's foo/foo_async pairs (with a documented exceptions table for getitem/setitem/resize) so the residual mirrored signatures are CI-enforced rather than vigilance-dependent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Use ZarrDeprecationWarning (not plain DeprecationWarning) for the legacy Array(async_array) constructor and the Array.async_array property, matching the rest of array.py so users filtering on the zarr-specific category catch them. (Medium) - Sync update_attributes now returns the Array built by update_attributes_async instead of discarding it and constructing a second one. (Low) - test_nchunks: parametrize over the real [Array, AsyncArray] classes and make the AsyncArray branch actually construct and test an AsyncArray; the previous [AnyArray, AnyAsyncArray] aliases never compared equal to Array, so both branches ran identical assertions and the async path was never exercised. (Low) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…fork Address roborev findings (job 255), both about locking in already-correct behavior with tests rather than restructuring: - test_sync_async_property_parity: the ~24 scalar/derived properties that are reimplemented independently on Array and AsyncArray are now asserted to agree by value across a v2/v3 x sharded/unsharded matrix, so a fix landing on one class but not the other fails CI (the bodies can no longer drift silently). - test_async_handle_with_config_returns_detached_async_array: asserts Array.async_array.with_config(...) returns a real AsyncArray (not another view bound to the parent) and that the result is detached, so later parent mutations don't leak into it. Verified the parity test catches drift by temporarily breaking Array.order and confirming the v3 cases failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address roborev finding (job 257): the property-parity matrix builds both the
Array and the AsyncArray from the same freshly-created array with default
config and a writable store, so `order` (v3 reads self.config.order) and
`read_only` (reads self.store_path.read_only) were always compared against
identical inputs and could never reveal drift.
Add two targeted tests that build the pair from a non-default order ("F", v3)
and a read-only store, so those config/store-derived branches are
cross-checked divergently. Verified both fail if Array.order / Array.read_only
drift from their AsyncArray twins.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address roborev finding (job 259): the docstrings cross-referenced __getitem__/ __setitem__, but those sync dunders pop_fields and route fancy/orthogonal selections to vindex/get_orthogonal_selection, whereas getitem_async/ setitem_async call _getitem/_setitem directly and support basic indexing only. A caller following the old cross-reference with a fancy selection would hit a BasicIndexer error or get different results than arr[...]. Reword both to state they are the async counterparts of get_basic_selection/ set_basic_selection (basic indexing only) and point to the orthogonal/ coordinate/mask *_async methods for advanced indexers. Doc-only; behavior unchanged (matches the long-standing AsyncArray.getitem convention). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Yes, the whole point is to have a single class that has the sync and async apis. That class should be the |
|
the regressions are fixed and I (via claude) added tests to ensure that the |
Array.setitem_async marked `prototype` keyword-only, but AsyncArray.setitem (the API consumers are migrating off) accepts it positionally. A caller doing `await async_array.setitem(sel, value, proto)` would hit a TypeError after switching to `arr.setitem_async(sel, value, proto)`. Drop the `*` so prototype is positional-or-keyword, matching AsyncArray.setitem exactly. Add a kind-aware, cross-class signature test asserting Array.getitem_async / setitem_async match AsyncArray.getitem / setitem including each parameter's kind (the existing within-Array parity test compares names only and excludes getitem/setitem), plus a test that setitem_async accepts a positional prototype. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This PR makes 2 fundamental changes to our top-level
Arrayclass:it adds
Array._runner: Runnerto theArray.Runneris a protocol that looks like this:the
runnerparameter allows a caller to provide their own event loop that the array will use when blocking on the execution of a coroutine. If the user doesn't declare a runner, we use a house default, which is justsync. So if you don't request a different runnner, everything is the same.it adds async methods for every sync method. the sync methods use
self.runner.run(self.do_thing())to runThis means the
AsyncArrayclass has no use and can be phased out. NOTE: it is not removed.The goal here is no breaking changes. Removing the
AsyncArrayclass can happen at its own pace. If you find any breaking changes in this PR, we can fix them.