Zarr#
Cubed was designed to work seamlessly with Zarr data. The examples below demonstrate using cubed.from_zarr(), cubed.to_zarr() and cubed.store() to read and write Zarr data.
Write to Zarr#
We’ll start by creating a small chunked array containing random data in Cubed and writing it to Zarr using cubed.to_zarr(). Note that the call to to_zarr executes eagerly.
import cubed
import cubed.random
# 2MB chunks
a = cubed.random.random((5000, 5000), chunks=(500, 500))
# write to Zarr
cubed.to_zarr(a, "a.zarr")
Read from Zarr#
We can check that the Zarr file was created by loading it from disk using cubed.from_zarr():
cubed.from_zarr("a.zarr")
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/IPython/core/formatters.py:406, in BaseFormatter.__call__(self, obj)
404 method = get_real_method(obj, self.print_method)
405 if method is not None:
--> 406 return method()
407 return None
408 else:
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/array_api/array_object.py:50, in Array._repr_html_(self)
49 def _repr_html_(self):
---> 50 from cubed.diagnostics.widgets import get_template
52 try:
53 grid = self.to_svg(size=ARRAY_SVG_SIZE)
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/diagnostics/__init__.py:1
----> 1 from .rich import RichProgressBar as ProgressBar
3 __all__ = ["ProgressBar"]
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/diagnostics/rich.py:6
3 import time
4 from contextlib import contextmanager
----> 6 from rich.console import RenderableType
7 from rich.progress import (
8 BarColumn,
9 MofNCompleteColumn,
(...) 15 TimeElapsedColumn,
16 )
17 from rich.text import Text
ModuleNotFoundError: No module named 'rich'
cubed.Array<array-005, shape=(5000, 5000), dtype=float64, chunks=((500, 500, 500, 500, 500, 500, 500, 500, 500, 500), (500, 500, 500, 500, 500, 500, 500, 500, 500, 500))>
Multiple arrays#
To write multiple arrays in a single computation use cubed.store():
import cubed
import cubed.random
# 2MB chunks
a = cubed.random.random((5000, 5000), chunks=(500, 500))
b = cubed.random.random((5000, 5000), chunks=(500, 500))
# write to Zarr
arrays = [a, b]
paths = ["a.zarr", "b.zarr"]
cubed.store(arrays, paths)
Then to read the Zarr files back, we use cubed.from_zarr() for each array and perform whatever array operations we like on them. Only when we call to_zarr is the whole computation executed.
import cubed.array_api as xp
# read from Zarr
a = cubed.from_zarr("a.zarr")
b = cubed.from_zarr("b.zarr")
# perform operation
c = xp.add(a, b)
# write to Zarr
cubed.to_zarr(c, store="c.zarr")