Skip to content

Attempt at using pyKrige.ok3d with xarray.apply_ufunc to reduce memory issues #284

@MDTocean

Description

@MDTocean

Hiya

I am attempting to use OrdinaryKriging3D with a large set of temperature measurements. These are scattered measurements taken at (longitude,latitude,time) positions, resulting in four very long 1D vector strings. I would then like to interpolate this onto a regular 3D grid of (lon_reg,lat_reg,time_reg) using 3D kriging. This appears to require huge amounts of memory, requiring an array of size len(lon_reg)*len(lat_reg)*len(time_reg)*len(temperature).

A number of other posts have reported on memory issues. To try and get around the large memory requirements, I have been attempting to treat the data arrays using lazy operations and calling the pyKrige commands from within xarray's apply_ufunc functionality. However, I am still running into memory issues.

I have pasted some example code below using randomised data. Can anyone comment on whether I am making a mistake, either in the data creation or in the apply_ufunc call, or whether the problem is more fundamental to the kriging operation itself -- for example, are kriging methods simply incompatible with the concept of dask chunks and lazy operations? The code works on small datasets (e.g. if sample_length is set to 300), but I continue to run out of memory for longer sample lengths (e.g. of about 2000).

Note that I have also tried running the code below on a dask cluster connected to multiple CPUs, but it didn't solve the memory problems.

Thank you for any advice.


import numpy as np
import xarray as xr
from pykrige.ok3d import OrdinaryKriging3D

# create random datasets
sample_length=2000
da_time=xr.DataArray(data=np.arange(0,sample_length),coords=dict(time=np.arange(0,sample_length))).chunk(chunks={"time" : 100})
da_lat=xr.DataArray(data=np.random.uniform(-12, 3, size=sample_length),coords=dict(time=da_time)).chunk(chunks={"time" : 100})
da_lon=xr.DataArray(data=np.random.uniform(45, 62, size=sample_length),coords=dict(time=da_time)).chunk(chunks={"time" : 100})
da_temp=xr.DataArray(data=np.random.rand(sample_length)+18,coords=dict(time=da_time)).chunk(chunks={"time" : 100})

# Define the function to apply to the dataset
def kriging_3d(da_lon,da_lat,da_time,da_temp):

    xi = np.linspace(-12, 3, 91)
    yi = np.linspace(45, 62, 103)
    zi = np.arange(np.min(da_time),np.max(da_time))

    # Create the 3D Kriging object
    OK3D = OrdinaryKriging3D(da_lon, da_lat, da_time, da_temp, variogram_model='linear')
    # Execute on grid
    out, ss = OK3D.execute('grid', xi, yi, zi)

    # convert the output into an xarray object
    out = xr.DataArray(out, coords=[("zi", zi), ("yi", yi), ("xi", xi)])

    return out

out = xr.apply_ufunc(
    kriging_3d,
    da_lon,da_lat,da_time,da_temp,
    input_core_dims=[["time"],["time"],["time"],["time"]],
    output_core_dims=[["zi", "yi", "xi"]],
    dask = 'allowed', 
    vectorize = True,
    )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions