Skip to content

FutureCancelledError (lost dependencies) during dask.compute with optimize_graph=True when chaining Dataset.assign #11329

@maneesh29s

Description

@maneesh29s

What happened?

It appears that the High-Level Graph (HLG) optimization fails to correctly resolve dependencies when a variable (like new_weight in the example) is used both as an input for a subsequent calculation and as a replacement variable in an intermediate Dataset state.

Raised exception for the failure scenario (when run using distributed client)

---------------------------------------------------------------------------
FutureCancelledError                      Traceback (most recent call last)
Cell In[67], line 26
     22 output_gaintable = output_gaintable.assign(gain=new_gain)
     24 # trigger computation
---> 26 dask.compute(output_gaintable, optimize_graph=True) # Fail

File /lib/python3.11/site-packages/dask/base.py:685, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    682     expr = expr.optimize()
    683     keys = list(flatten(expr.__dask_keys__()))
--> 685     results = schedule(expr, keys, **kwargs)
    687 return repack(results)

File /lib/python3.11/site-packages/distributed/client.py:2431, in Client._gather(self, futures, errors, direct, local_worker)
   2429     exception = st.exception
   2430     traceback = st.traceback
-> 2431     raise exception.with_traceback(traceback)
   2432 if errors == "skip":
   2433     bad_keys.add(key)

FutureCancelledError: finalize-hlgfinalizecompute-0b5dadc6527147a1bffc7006ce7c9329 cancelled for reason: lost dependencies.

What did you expect to happen?

The computations should have completed successfully, even with optimize_graph=True

Minimal Complete Verifiable Example

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!

import xarray as xr
xr.show_versions()
# your reproducer code ...

import dask
import dask.array as da

rng = da.random.default_rng(seed=1234)

# Setup small dask-backed dataset
gain = rng.random((100,), chunks=10)
weight = rng.random((100,), chunks=10)
initialtable = xr.Dataset({
    "gain": (("x"), gain),
    "weight": (("x"), weight),
})
original_chunks = initialtable.chunksizes

# Update weight
new_weight = initialtable.weight * 1.1
output_gaintable = initialtable.assign(weight=new_weight)

# Update gain, filtered based on weight
new_gain = initialtable.gain.where(new_weight > 0.5, 0.0)
output_gaintable = output_gaintable.assign(gain=new_gain)

# trigger computation, which FAILs
dask.compute(output_gaintable, optimize_graph=True)

# Other ways to compute, which PASS
dask.compute(new_weight, new_gain, optimize_graph=True)
dask.compute(output_gaintable, optimize_graph=False)[0].gain
dask.persist(output_gaintable, optimize_graph=True)[0].gain.compute()
output_gaintable.compute(optimize_graph=True).gain

Steps to reproduce

Run above script through uv run

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Traceback (most recent call last):
  File "/home/maneesh/Work/SKAO/ska-sdp-instrumental-calibration/compute_bug.py", line 38, in <module>
    dask.compute(output_gaintable, optimize_graph=True)
  File "/home/maneesh/.cache/uv/environments-v2/compute-bug-884655f05503df7b/lib/python3.11/site-packages/dask/base.py", line 685, in compute
    results = schedule(expr, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/maneesh/.cache/uv/environments-v2/compute-bug-884655f05503df7b/lib/python3.11/site-packages/dask/local.py", line 191, in start_state_from_dask
    raise ValueError(
ValueError: Missing dependency ('mul-e4ad8b7f030eed6eae70b41334e6993e', 6) for dependents {'finalize-hlgfinalizecompute-c00f26a73e664e208c485a28c4ea721b'}

Anything else we need to know?

No response

Environment

Details

INSTALLED VERSIONS

commit: None
python: 3.11.12 (main, Apr 9 2025, 08:55:54) [GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 6.8.0-65-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.6
libnetcdf: 4.9.3

xarray: 2026.4.1.dev6+g757a7d42a
pandas: 3.0.2
numpy: 2.4.4
scipy: 1.17.1
netCDF4: 1.7.4
pydap: 3.5.9
h5netcdf: 1.8.1
h5py: 3.16.0
zarr: 3.1.6
cftime: 1.6.5
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.6.0
dask: 2026.3.0
distributed: 2026.3.0
matplotlib: 3.10.9
cartopy: 0.25.0
seaborn: 0.13.2
numbagg: 0.9.4
fsspec: 2026.4.0
cupy: None
pint: None
sparse: 0.18.0
flox: 0.11.2
numpy_groupies: 0.11.3
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds triageIssue that has not been reviewed by xarray team member

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions