Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions .github/workflows/run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,87 @@ jobs:
run: uv sync --frozen --extra viz --extra dlpack
- name: Run pytest
run: uv run --frozen pytest

benchmark_canary:
runs-on: ubuntu-24.04
env:
BENCH_ENFORCE: "0"
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Python setup
uses: actions/setup-python@v6
with:
python-version: "3.11"
- name: Install the latest version of uv
uses: astral-sh/setup-uv@v7
- name: Sync dependencies
run: uv sync --frozen
- name: Run lazy tensor canary benchmark
run: |
extra_args=""
if [ "${BENCH_ENFORCE}" = "1" ]; then
extra_args="--fail-on-regression"
fi
uv run --frozen python benchmarks/benchmark_lazy_tensor.py \
--repeats 7 \
--warmup 2 \
--baseline-json benchmarks/ci-baseline.json \
--regression-factor 1.5 \
--absolute-slack-ms 5.0 \
--json-out benchmark-results.json \
${extra_args}
- name: Write benchmark summary
run: |
python - <<'PY'
import json
from pathlib import Path

payload = json.loads(Path("benchmark-results.json").read_text())
lines = [
"## Lazy Tensor Benchmark Canary",
"",
f"- repeats: `{payload['repeats']}`",
f"- warmup: `{payload['warmup']}`",
"",
"| case | median (ms) | min (ms) | max (ms) |",
"|---|---:|---:|---:|",
]
for r in payload["results"]:
lines.append(
"| "
f"{r['name']} | {r['median_ms']:.2f} "
f"| {r['min_ms']:.2f} | {r['max_ms']:.2f} |"
)
lines.extend(
[
"",
"| case | baseline (ms) | threshold (ms) | status |",
"|---|---:|---:|---|",
]
)
for c in payload["regression_checks"]:
baseline = (
"-"
if c["baseline_ms"] is None
else f"{c['baseline_ms']:.2f}"
)
threshold = (
"-"
if c["threshold_ms"] is None
else f"{c['threshold_ms']:.2f}"
)
status = "regressed" if c["regressed"] else "ok"
lines.append(
f"| {c['name']} | {baseline} | "
f"{threshold} | {status} |"
)

summary_path = Path(__import__("os").environ["GITHUB_STEP_SUMMARY"])
summary_path.write_text("\n".join(lines) + "\n")
PY
- name: Upload benchmark artifact
uses: actions/upload-artifact@v4
with:
name: lazy-tensor-benchmark-${{ github.run_id }}
path: benchmark-results.json
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ repos:
- id: check-yaml
- id: detect-private-key
- repo: https://github.com/tox-dev/pyproject-fmt
rev: "v2.16.1"
rev: "v2.16.2"
hooks:
- id: pyproject-fmt
- repo: https://github.com/citation-file-format/cffconvert
Expand Down Expand Up @@ -39,7 +39,7 @@ repos:
- id: yamllint
exclude: pre-commit-config.yaml
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: "v0.15.1"
rev: "v0.15.2"
hooks:
- id: ruff-format
- id: ruff-check
Expand Down
48 changes: 45 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,24 +86,66 @@ from ome_arrow import OMEArrow

oa = OMEArrow("your_image.ome.parquet")

# Spatial ROI per plane
view = oa.tensor_view(t=0, z=0, roi=(32, 32, 128, 128), layout="CHW")
# Spatial ROI per plane (YX convention)
view = oa.tensor_view(t=0, z=0, roi=(32, 32, 128, 128), layout="CYX")

# Convenience 3D ROI (x, y, z, w, h, d)
view3d = oa.tensor_view(roi3d=(32, 32, 2, 128, 128, 4), layout="TZCHW")
view3d = oa.tensor_view(roi3d=(32, 32, 2, 128, 128, 4), layout="TZCYX")

# 3D tiled iteration over (z, y, x)
for cap in view3d.iter_tiles_3d(tile_size=(2, 64, 64), mode="numpy"):
pass
```

Lazy scan-style convention (Polars-like):

```python
from ome_arrow import OMEArrow

oa = OMEArrow.scan("your_image.ome.parquet") # deferred load
# First: queue lazy spatial/index slicing
lazy_crop = oa.slice_lazy(0, 512, 0, 512).slice_lazy(64, 256, 64, 256)
cropped = lazy_crop.collect()

# slice_lazy returns a new OMEArrow plan; collect does not mutate `oa`.
# Build tensor_view from the returned sliced object to reuse that plan.
tensor_view_result = cropped.tensor_view(t=0, z=slice(0, 4), roi=(0, 0, 192, 192))
arr = tensor_view_result.to_numpy()
```

Advanced options:

- `chunk_policy="auto" | "combine" | "keep"` controls ChunkedArray handling.
- `channel_policy="error" | "first"` controls behavior when dropping `C` from layout.

See full docs: [`docs/src/dlpack.md`](docs/src/dlpack.md)

## Benchmarking lazy reads

Use the lightweight benchmark utility in `benchmarks/` to compare lazy tensor
read paths (TIFF source-backed, Parquet planes, Parquet chunks):

```bash
uv run python benchmarks/benchmark_lazy_tensor.py --repeats 5 --warmup 1
```

Notes:

- This benchmark is for local iteration and relative comparisons.
- It is not part of CI pass/fail checks.
- CI also runs this benchmark in a dedicated `benchmark_canary` job and
uploads `benchmark-results.json` as a workflow artifact.

Recalibrating `benchmarks/ci-baseline.json`:

1. Run the benchmark on `main` a few times (for example 3-5 runs):
`uv run python benchmarks/benchmark_lazy_tensor.py --repeats 7 --warmup 2 --json-out benchmark-results.json`
1. For each case, collect the observed `median_ms` values.
1. Update `benchmarks/ci-baseline.json` with stable medians from those runs
(prefer a conservative value near the slower side, not the fastest sample).
1. Keep CI canary tolerance (`regression_factor` + `absolute_slack_ms`) unchanged
unless you have repeated false positives.

## Contributing, Development, and Testing

Please see our [contributing documentation](https://github.com/wayscience/ome-arrow/tree/main/CONTRIBUTING.md) for more details on contributions, development, and testing.
Expand Down
Loading