Cubed: an introduction#

Tom White, August 2025

Idea#

Use Zarr as the underlying intermediate persistent storage between array operations.

Cubed idea

Tasks operate on Zarr chunks.

Tasks are embarrassingly parallel, and their runtime memory can be tightly controlled.

Demo#

Cubed implements the Python Array API standard

import cubed.array_api as xp
a = xp.asarray([[1, 2, 3], [4, 5, 6], [7, 8, 9]], chunks=(2, 2))

Notice that we specify chunks, just like in Dask Array.

b = xp.asarray([[1, 1, 1], [1, 1, 1], [1, 1, 1]], chunks=(2, 2))
c = xp.add(a, b)

Cubed uses lazy evaluation, so nothing has been computed yet.

c.compute()
array([[ 2,  3,  4],
       [ 5,  6,  7],
       [ 8,  9, 10]])

Primitives#

  • Blockwise: applies a function to multiple blocks from multiple inputs

  • Rechunk: changes chunking, without changing shape/dtype

Dask introduced both of these operations.

Almost all array operations can be implemented using these two primitives!

Design#

Cubed is composed of five layers: from the storage layer at the bottom, to the Array API layer at the top:

Five layer diagram

Core and Primitive Operations#

Core and Primitive Operations

Example: map_selection#

Each block in the output array is read directly from one or more blocks from the input.

Can cross block boundaries.

Example: reduction#

Implemented using multiple rounds of a tree reduce operation followed by a final aggregation.

Computation plan#

Cubed creates a computation plan, which is a directed acyclic graph (DAG).

c = xp.add(a, b)
c.visualize()
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pydot/core.py:1824, in Dot.create(self, prog, format, encoding)
   1823 try:
-> 1824     stdout_data, stderr_data, process = call_graphviz(
   1825         program=prog,
   1826         arguments=arguments,
   1827         working_dir=tmp_dir,
   1828     )
   1829 except OSError as e:

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pydot/core.py:249, in call_graphviz(program, arguments, working_dir, **kwargs)
    247 program_with_args = [program] + arguments
--> 249 process = subprocess.Popen(
    250     program_with_args,
    251     env=env,
    252     cwd=working_dir,
    253     shell=False,
    254     stderr=subprocess.PIPE,
    255     stdout=subprocess.PIPE,
    256     **kwargs,
    257 )
    258 stdout_data, stderr_data = process.communicate()

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/subprocess.py:1026, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize, process_group)
   1023             self.stderr = io.TextIOWrapper(self.stderr,
   1024                     encoding=encoding, errors=errors)
-> 1026     self._execute_child(args, executable, preexec_fn, close_fds,
   1027                         pass_fds, cwd, env,
   1028                         startupinfo, creationflags, shell,
   1029                         p2cread, p2cwrite,
   1030                         c2pread, c2pwrite,
   1031                         errread, errwrite,
   1032                         restore_signals,
   1033                         gid, gids, uid, umask,
   1034                         start_new_session, process_group)
   1035 except:
   1036     # Cleanup if the child failed starting.

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/subprocess.py:1955, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session, process_group)
   1954 if err_filename is not None:
-> 1955     raise child_exception_type(errno_num, err_msg, err_filename)
   1956 else:

FileNotFoundError: [Errno 2] No such file or directory: 'dot'

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
Cell In[4], line 2
      1 c = xp.add(a, b)
----> 2 c.visualize()

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/core/array.py:234, in CoreArray.visualize(self, filename, format, optimize_graph, optimize_function, show_hidden)
    201 def visualize(
    202     self,
    203     filename="cubed",
   (...)    207     show_hidden=False,
    208 ):
    209     """Produce a visualization of the computation graph for this array.
    210 
    211     Parameters
   (...)    232         in a notebook), otherwise None.
    233     """
--> 234     return visualize(
    235         self,
    236         filename=filename,
    237         format=format,
    238         optimize_graph=optimize_graph,
    239         optimize_function=optimize_function,
    240         show_hidden=show_hidden,
    241     )

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/core/array.py:365, in visualize(filename, format, optimize_graph, optimize_function, show_hidden, *arrays)
    333 """Produce a visualization of the computation graph for multiple arrays.
    334 
    335 Parameters
   (...)    358     in a notebook), otherwise None.
    359 """
    360 finalized_plan = plan(
    361     *arrays,
    362     optimize_graph=optimize_graph,
    363     optimize_function=optimize_function,
    364 )
--> 365 return finalized_plan.visualize(
    366     filename=filename,
    367     format=format,
    368     show_hidden=show_hidden,
    369 )

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/core/plan.py:728, in FinalizedPlan.visualize(self, filename, format, rankdir, show_hidden)
    726     format = "svg"
    727 full_filename = f"{filename}.{format}"
--> 728 gv.write(full_filename, format=format)
    730 try:  # pragma: no cover
    731     import IPython.display as display

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pydot/core.py:1730, in Dot.write(self, path, prog, format, encoding)
   1728         f.write(s)
   1729 else:
-> 1730     s = self.create(prog, format, encoding=encoding)
   1731     with open(path, mode="wb") as f:
   1732         f.write(s)  # type: ignore

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pydot/core.py:1833, in Dot.create(self, prog, format, encoding)
   1831     args = list(e.args)  # type: ignore
   1832     args[1] = f'"{prog}" not found in path.'  # type: ignore
-> 1833     raise OSError(*args)
   1834 else:
   1835     raise

FileNotFoundError: [Errno 2] "dot" not found in path.

Unlike a Dask graph which is at the task level, a Cubed graph is at the Zarr array level.

Optimization#

Cubed will automatically optimize the graph before computing it. For example by fusing blockwise (map) operations:

Optimization: an advanced example#

In early 2024 we implemented more optimizations to give a 4.8x performance improvement on the “Quadratic Means” climate workload running on Lithops with AWS Lambda, with a 1.5 TB workload completing in around 100 seconds

More details in Optimizing Cubed

Quadratic Means timeline#

QM timeline

Memory#

Cubed models the memory used by every operation, and calculates the projected_mem for a task - an upper bound.

If projected memory is more than what user specifies is allowed then an exception is raised during planning

import cubed
spec = cubed.Spec(work_dir="tmp", allowed_mem=100)  # not enough memory!
a = xp.asarray([[1, 2, 3], [4, 5, 6], [7, 8, 9]], chunks=(2, 2), spec=spec)
b = xp.asarray([[1, 1, 1], [1, 1, 1], [1, 1, 1]], chunks=(2, 2), spec=spec)
try:
    c = xp.add(a, b)
except ValueError as e:
    print(e)
Projected blockwise memory (192) exceeds allowed_mem (100), including reserved_mem (0)

Peak memory#

Cubed measures the peak amount of memory actually used during runtime.

Used to checked utilization, and improve the modelling.

  array_name    op_name  num_tasks  peak_mem_delta_mb_max  projected_mem_mb  utilization
0  array-003    rechunk          1             103.727104         0.000064          NaN
1  array-004  blockwise          4             654.286848       800.000008     0.817859
2  array-007    rechunk          1             103.645184         0.000064          NaN
3  array-008  blockwise          4             654.364672       800.000008     0.817956
4  array-009  blockwise          4             796.954624      1200.000000     0.664129

Runtimes#

  • Local machine

    • Start here to process hundreds of GB on your laptop using all available cores

  • Serverless

    • Lithops: multi-cloud serverless computing framework

      • Slightly more work to get started since you have to build a runtime environment first

      • Tested on AWS Lambda and Google Cloud Functions with ~1000 workers

    • Modal: a commercial serverless platform

      • Very easy to set up since it builds the runtime automatically

      • Tested with ~300 workers

    • Coiled Functions

  • Cluster/HPC

    • Ray

    • Apache Beam (Google Cloud Dataflow)

    • Globus Compute

    • Apache Spark

Scalability and robustness#

  • Serverless scales out

  • Retries

    • Each task is tried three times before failing

  • Stragglers

    • A backup task will be launched if a task is taking significantly longer than average

Xarray integration#

  • Xarray can use Cubed as its computation engine instead of Dask

  • Cubed can use Flox for groupby operations

    • Examples at https://flox.readthedocs.io/en/latest/user-stories/climatology-hourly-cubed.html

Try out Cubed!#

  • Try it out on your use case

    • Get started at https://cubed-dev.github.io/cubed/

  • Some examples from the Pangeo community:

    • https://github.com/pangeo-data/distributed-array-examples