Add streaming backend for convert_and_aggregate#497
Add streaming backend for convert_and_aggregate#497FabianHofmann wants to merge 7 commits intorefac/type-annotationfrom
Conversation
…on helpers Extract resolve_matrix, normalize_aggregate_time, reduce_time, wrap_matrix_result and finalize_aggregated_result into aggregate.py. Add streaming.py for chunk-based conversion with reduced peak memory. Support storage-aligned chunking in Cutout.
for more information, see https://pre-commit.ci
|
Why is this necessary? Didn't the dask path already process each chunk separately? We loose multiprocessing capabilities with this. |
…, thread pool Replace multiprocessing.Pool(spawn) with ThreadPoolExecutor sharing a RasterCache that pre-reads rasters into memory. Add fast_isin (LUT), fast_dilation (distance transform), and per-shape cached availability. Includes numpy-style docstrings and 13 new tests for all new functions.
for more information, see https://pre-commit.ci
…, thread pool Replace multiprocessing.Pool(spawn) with ThreadPoolExecutor sharing a RasterCache that pre-reads rasters into memory. Add fast_isin (LUT), fast_dilation (distance transform), and per-shape cached availability. Includes numpy-style docstrings and 13 new tests for all new functions.
|
It is mostly overhead from multiple dask processes. I'll post a detailed benchmark here, in summary: The other part of the story is that dask generally leads to significant overhead. While you can still boost a single conversion with parallelization by dask using aligned chunks, a one-core computation per conversion in case of multiple parallel conversion as we do in pypsa-eur is just faster. Probably because we save time on spawning processes and parallelize on the workflow level instead |
…o exclude outside-geometry pixels from reproject averaging
|
Sounds like this chunk/storage alignment would be a gain we would also want for the dask path and maybe we could experiment with the threading scheduler of dask: https://docs.dask.org/en/stable/scheduling.html#local-threads |
yes, likely there is a gain from that as well. but this would be done solely on the pypsa-eur side when setting the dask args. it is often a bit cumbersome trying to tweak these. atm I lean towards using the straight forward numpy based calculation on chunks as the gain is significant. there is no memory overhead even for finer resolutions btw |
for more information, see https://pre-commit.ci
|
Yes, but you are adding another layer of complexity that is non-trivial and needs to be maintained, and i am not convinced that the same gains would not be achieved with a lot simpler tweaking of the dask backend. |
Closes # (if applicable).
Changes proposed in this Pull Request
Add a chunk-based streaming execution backend for
convert_and_aggregatethat processes weather-to-energy conversions one time-chunk at a time, reducing peak memory for large cutouts.Key changes:
atlite/aggregate.py: Extract shared helpers (resolve_matrix,normalize_aggregate_time,reduce_time,wrap_matrix_result,finalize_aggregated_result) used by both dask and streaming paths.atlite/streaming.py: New streaming backend that reads storage-aligned chunks eagerly, applies the convert function, and either multiplies by a sparse matrix or accumulates into a temporal reducer — never materialising the full(time, y, x)grid.atlite/convert.py: Newbackendparameter ("auto","dask","streaming") onconvert_and_aggregate; body slimmed down using shared helpers.atlite/cutout.py: Auto-detect storage-aligned chunk sizes when opening existing cutouts.Checklist
doc.environment.yaml,environment_docs.yamlandsetup.py(if applicable).doc/release_notes.rstof the upcoming release is included.