Cubed: an introduction#
Tom White, August 2025
Idea#
Use Zarr as the underlying intermediate persistent storage between array operations.
Tasks operate on Zarr chunks.
Tasks are embarrassingly parallel, and their runtime memory can be tightly controlled.
Demo#
Cubed implements the Python Array API standard
import cubed.array_api as xp
a = xp.asarray([[1, 2, 3], [4, 5, 6], [7, 8, 9]], chunks=(2, 2))
Notice that we specify chunks, just like in Dask Array.
b = xp.asarray([[1, 1, 1], [1, 1, 1], [1, 1, 1]], chunks=(2, 2))
c = xp.add(a, b)
Cubed uses lazy evaluation, so nothing has been computed yet.
c.compute()
array([[ 2, 3, 4],
[ 5, 6, 7],
[ 8, 9, 10]])
Primitives#
Blockwise: applies a function to multiple blocks from multiple inputs
Rechunk: changes chunking, without changing shape/dtype
Dask introduced both of these operations.
Almost all array operations can be implemented using these two primitives!
Design#
Cubed is composed of five layers: from the storage layer at the bottom, to the Array API layer at the top:
Core and Primitive Operations#
Example: map_selection#
Each block in the output array is read directly from one or more blocks from the input.
Can cross block boundaries.
Example: reduction#
Implemented using multiple rounds of a tree reduce operation followed by a final aggregation.
Computation plan#
Cubed creates a computation plan, which is a directed acyclic graph (DAG).
c = xp.add(a, b)
c.visualize()
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pydot/core.py:1824, in Dot.create(self, prog, format, encoding)
1823 try:
-> 1824 stdout_data, stderr_data, process = call_graphviz(
1825 program=prog,
1826 arguments=arguments,
1827 working_dir=tmp_dir,
1828 )
1829 except OSError as e:
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pydot/core.py:249, in call_graphviz(program, arguments, working_dir, **kwargs)
247 program_with_args = [program] + arguments
--> 249 process = subprocess.Popen(
250 program_with_args,
251 env=env,
252 cwd=working_dir,
253 shell=False,
254 stderr=subprocess.PIPE,
255 stdout=subprocess.PIPE,
256 **kwargs,
257 )
258 stdout_data, stderr_data = process.communicate()
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/subprocess.py:1026, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize, process_group)
1023 self.stderr = io.TextIOWrapper(self.stderr,
1024 encoding=encoding, errors=errors)
-> 1026 self._execute_child(args, executable, preexec_fn, close_fds,
1027 pass_fds, cwd, env,
1028 startupinfo, creationflags, shell,
1029 p2cread, p2cwrite,
1030 c2pread, c2pwrite,
1031 errread, errwrite,
1032 restore_signals,
1033 gid, gids, uid, umask,
1034 start_new_session, process_group)
1035 except:
1036 # Cleanup if the child failed starting.
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/subprocess.py:1955, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session, process_group)
1954 if err_filename is not None:
-> 1955 raise child_exception_type(errno_num, err_msg, err_filename)
1956 else:
FileNotFoundError: [Errno 2] No such file or directory: 'dot'
During handling of the above exception, another exception occurred:
FileNotFoundError Traceback (most recent call last)
Cell In[4], line 2
1 c = xp.add(a, b)
----> 2 c.visualize()
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/core/array.py:234, in CoreArray.visualize(self, filename, format, optimize_graph, optimize_function, show_hidden)
201 def visualize(
202 self,
203 filename="cubed",
(...) 207 show_hidden=False,
208 ):
209 """Produce a visualization of the computation graph for this array.
210
211 Parameters
(...) 232 in a notebook), otherwise None.
233 """
--> 234 return visualize(
235 self,
236 filename=filename,
237 format=format,
238 optimize_graph=optimize_graph,
239 optimize_function=optimize_function,
240 show_hidden=show_hidden,
241 )
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/core/array.py:365, in visualize(filename, format, optimize_graph, optimize_function, show_hidden, *arrays)
333 """Produce a visualization of the computation graph for multiple arrays.
334
335 Parameters
(...) 358 in a notebook), otherwise None.
359 """
360 finalized_plan = plan(
361 *arrays,
362 optimize_graph=optimize_graph,
363 optimize_function=optimize_function,
364 )
--> 365 return finalized_plan.visualize(
366 filename=filename,
367 format=format,
368 show_hidden=show_hidden,
369 )
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/core/plan.py:728, in FinalizedPlan.visualize(self, filename, format, rankdir, show_hidden)
726 format = "svg"
727 full_filename = f"{filename}.{format}"
--> 728 gv.write(full_filename, format=format)
730 try: # pragma: no cover
731 import IPython.display as display
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pydot/core.py:1730, in Dot.write(self, path, prog, format, encoding)
1728 f.write(s)
1729 else:
-> 1730 s = self.create(prog, format, encoding=encoding)
1731 with open(path, mode="wb") as f:
1732 f.write(s) # type: ignore
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pydot/core.py:1833, in Dot.create(self, prog, format, encoding)
1831 args = list(e.args) # type: ignore
1832 args[1] = f'"{prog}" not found in path.' # type: ignore
-> 1833 raise OSError(*args)
1834 else:
1835 raise
FileNotFoundError: [Errno 2] "dot" not found in path.
Unlike a Dask graph which is at the task level, a Cubed graph is at the Zarr array level.
Optimization#
Cubed will automatically optimize the graph before computing it. For example by fusing blockwise (map) operations:
Optimization: an advanced example#
In early 2024 we implemented more optimizations to give a 4.8x performance improvement on the “Quadratic Means” climate workload running on Lithops with AWS Lambda, with a 1.5 TB workload completing in around 100 seconds
More details in Optimizing Cubed
Quadratic Means timeline#

Memory#
Cubed models the memory used by every operation, and calculates the projected_mem for a task - an upper bound.
If projected memory is more than what user specifies is allowed then an exception is raised during planning
import cubed
spec = cubed.Spec(work_dir="tmp", allowed_mem=100) # not enough memory!
a = xp.asarray([[1, 2, 3], [4, 5, 6], [7, 8, 9]], chunks=(2, 2), spec=spec)
b = xp.asarray([[1, 1, 1], [1, 1, 1], [1, 1, 1]], chunks=(2, 2), spec=spec)
try:
c = xp.add(a, b)
except ValueError as e:
print(e)
Projected blockwise memory (192) exceeds allowed_mem (100), including reserved_mem (0)
Peak memory#
Cubed measures the peak amount of memory actually used during runtime.
Used to checked utilization, and improve the modelling.
array_name op_name num_tasks peak_mem_delta_mb_max projected_mem_mb utilization
0 array-003 rechunk 1 103.727104 0.000064 NaN
1 array-004 blockwise 4 654.286848 800.000008 0.817859
2 array-007 rechunk 1 103.645184 0.000064 NaN
3 array-008 blockwise 4 654.364672 800.000008 0.817956
4 array-009 blockwise 4 796.954624 1200.000000 0.664129
Runtimes#
Local machine
Start here to process hundreds of GB on your laptop using all available cores
Serverless
Lithops: multi-cloud serverless computing framework
Slightly more work to get started since you have to build a runtime environment first
Tested on AWS Lambda and Google Cloud Functions with ~1000 workers
Modal: a commercial serverless platform
Very easy to set up since it builds the runtime automatically
Tested with ~300 workers
Coiled Functions
Cluster/HPC
Ray
Apache Beam (Google Cloud Dataflow)
Globus Compute
Apache Spark
Scalability and robustness#
Serverless scales out
AWS Lambda supports 1000 concurrent instances by default
PyWren paper: https://shivaram.org/publications/pywren-socc17.pdf
Retries
Each task is tried three times before failing
Stragglers
A backup task will be launched if a task is taking significantly longer than average
Xarray integration#
Xarray can use Cubed as its computation engine instead of Dask
Just install the cubed-xarray integration package
Cubed can use Flox for
groupbyoperationsExamples at https://flox.readthedocs.io/en/latest/user-stories/climatology-hourly-cubed.html
Try out Cubed!#
Try it out on your use case
Get started at https://cubed-dev.github.io/cubed/
Some examples from the Pangeo community:
https://github.com/pangeo-data/distributed-array-examples