Add streaming backend for convert_and_aggregate by FabianHofmann · Pull Request #497 · PyPSA/atlite

FabianHofmann · 2026-04-05T07:23:15Z

Closes # (if applicable).

Changes proposed in this Pull Request

Add a chunk-based streaming execution backend for convert_and_aggregate that processes weather-to-energy conversions one time-chunk at a time, reducing peak memory for large cutouts.

Key changes:

atlite/aggregate.py: Extract shared helpers (resolve_matrix, normalize_aggregate_time, reduce_time, wrap_matrix_result, finalize_aggregated_result) used by both dask and streaming paths.
atlite/streaming.py: New streaming backend that reads storage-aligned chunks eagerly, applies the convert function, and either multiplies by a sparse matrix or accumulates into a temporal reducer — never materialising the full (time, y, x) grid.
atlite/convert.py: New backend parameter ("auto", "dask", "streaming") on convert_and_aggregate; body slimmed down using shared helpers.
atlite/cutout.py: Auto-detect storage-aligned chunk sizes when opening existing cutouts.
Benchmarks: On the default Europe-scale solar profile (6.2 GB, 50 clusters), streaming runs single-threaded in ~111s vs dask (4 distributed workers) in ~100s — comparable throughput with significantly lower peak memory.

Checklist

Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
Unit tests for new features were added (if applicable).
Newly introduced dependencies are added to environment.yaml, environment_docs.yaml and setup.py (if applicable).
A note for the release notes doc/release_notes.rst of the upcoming release is included.
I consent to the release of this PR's code under the MIT license.

…on helpers Extract resolve_matrix, normalize_aggregate_time, reduce_time, wrap_matrix_result and finalize_aggregated_result into aggregate.py. Add streaming.py for chunk-based conversion with reduced peak memory. Support storage-aligned chunking in Cutout.

for more information, see https://pre-commit.ci

coroa · 2026-04-05T14:08:12Z

Why is this necessary? Didn't the dask path already process each chunk separately? We loose multiprocessing capabilities with this.

…, thread pool Replace multiprocessing.Pool(spawn) with ThreadPoolExecutor sharing a RasterCache that pre-reads rasters into memory. Add fast_isin (LUT), fast_dilation (distance transform), and per-shape cached availability. Includes numpy-style docstrings and 13 new tests for all new functions.

for more information, see https://pre-commit.ci

…, thread pool Replace multiprocessing.Pool(spawn) with ThreadPoolExecutor sharing a RasterCache that pre-reads rasters into memory. Add fast_isin (LUT), fast_dilation (distance transform), and per-shape cached availability. Includes numpy-style docstrings and 13 new tests for all new functions.

FabianHofmann · 2026-04-07T07:04:21Z

It is mostly overhead from multiple dask processes. I'll post a detailed benchmark here, in summary: pypsa-eur takes 23 min for all availability matrices and renewable profiles right now which I always found suspicious. With this changes is takes 4-5 min in total. the main inefficiencies on master and pypsa-eur are that dask chunks are not aligned with the cutout chunks on the stored and compresed netcdf cutout. So each dask chunk needs to decompress the larger chunk from the netcdf file, leading to redundant decompressions (dask chunks 100 time steps and netcdf are 2760 timesteps). The streaming backend auto-aligns with the stored chunks to avoid the IO overhead.

The other part of the story is that dask generally leads to significant overhead. While you can still boost a single conversion with parallelization by dask using aligned chunks, a one-core computation per conversion in case of multiple parallel conversion as we do in pypsa-eur is just faster. Probably because we save time on spawning processes and parallelize on the workflow level instead

…o exclude outside-geometry pixels from reproject averaging

coroa · 2026-04-07T07:48:12Z

Sounds like this chunk/storage alignment would be a gain we would also want for the dask path and maybe we could experiment with the threading scheduler of dask: https://docs.dask.org/en/stable/scheduling.html#local-threads

FabianHofmann · 2026-04-07T08:07:28Z

Sounds like this chunk/storage alignment would be a gain we would also want for the dask path and maybe we could experiment with the threading scheduler of dask: https://docs.dask.org/en/stable/scheduling.html#local-threads

yes, likely there is a gain from that as well. but this would be done solely on the pypsa-eur side when setting the dask args. it is often a bit cumbersome trying to tweak these. atm I lean towards using the straight forward numpy based calculation on chunks as the gain is significant. there is no memory overhead even for finer resolutions btw

for more information, see https://pre-commit.ci

coroa · 2026-04-07T12:22:54Z

Yes, but you are adding another layer of complexity that is non-trivial and needs to be maintained, and i am not convinced that the same gains would not be achieved with a lot simpler tweaking of the dask backend.

FabianHofmann and others added 2 commits April 2, 2026 10:02

[pre-commit.ci] auto fixes from pre-commit.com hooks

ae6881a

for more information, see https://pre-commit.ci

FabianHofmann and others added 3 commits April 5, 2026 21:56

[pre-commit.ci] auto fixes from pre-commit.com hooks

837e0fb

for more information, see https://pre-commit.ci

Apply geometry mask and set src_nodata in RasterCache.window_read() t…

3ad730a

…o exclude outside-geometry pixels from reproject averaging

[pre-commit.ci] auto fixes from pre-commit.com hooks

39ce0fa

for more information, see https://pre-commit.ci

This was referenced Apr 7, 2026

perf: use atlite streaming PyPSA/pypsa-eur#2135

Draft

perf: load cutout into memory and use dask threads PyPSA/pypsa-eur#2136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add streaming backend for convert_and_aggregate#497

Add streaming backend for convert_and_aggregate#497
FabianHofmann wants to merge 7 commits intorefac/type-annotationfrom
perf/conversion

FabianHofmann commented Apr 5, 2026

Uh oh!

coroa commented Apr 5, 2026 •

edited

Loading

Uh oh!

FabianHofmann commented Apr 7, 2026

Uh oh!

coroa commented Apr 7, 2026

Uh oh!

FabianHofmann commented Apr 7, 2026 •

edited

Loading

Uh oh!

coroa commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FabianHofmann commented Apr 5, 2026

Changes proposed in this Pull Request

Checklist

Uh oh!

coroa commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FabianHofmann commented Apr 7, 2026

Uh oh!

coroa commented Apr 7, 2026

Uh oh!

FabianHofmann commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coroa commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coroa commented Apr 5, 2026 •

edited

Loading

FabianHofmann commented Apr 7, 2026 •

edited

Loading