Zarr#

Cubed was designed to work seamlessly with Zarr data. The examples below demonstrate using cubed.from_zarr(), cubed.to_zarr() and cubed.store() to read and write Zarr data.

Write to Zarr#

We’ll start by creating a small chunked array containing random data in Cubed and writing it to Zarr using cubed.to_zarr(). Note that the call to to_zarr executes eagerly.

import cubed
import cubed.random

# 2MB chunks
a = cubed.random.random((5000, 5000), chunks=(500, 500))

# write to Zarr
cubed.to_zarr(a, "a.zarr")

Read from Zarr#

We can check that the Zarr file was created by loading it from disk using cubed.from_zarr():

cubed.from_zarr("a.zarr")
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/IPython/core/formatters.py:406, in BaseFormatter.__call__(self, obj)
    404     method = get_real_method(obj, self.print_method)
    405     if method is not None:
--> 406         return method()
    407     return None
    408 else:

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/array_api/array_object.py:50, in Array._repr_html_(self)
     49 def _repr_html_(self):
---> 50     from cubed.diagnostics.widgets import get_template
     52     try:
     53         grid = self.to_svg(size=ARRAY_SVG_SIZE)

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/diagnostics/__init__.py:1
----> 1 from .rich import RichProgressBar as ProgressBar
      3 __all__ = ["ProgressBar"]

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/diagnostics/rich.py:6
      3 import time
      4 from contextlib import contextmanager
----> 6 from rich.console import RenderableType
      7 from rich.progress import (
      8     BarColumn,
      9     MofNCompleteColumn,
   (...)     15     TimeElapsedColumn,
     16 )
     17 from rich.text import Text

ModuleNotFoundError: No module named 'rich'
cubed.Array<array-005, shape=(5000, 5000), dtype=float64, chunks=((500, 500, 500, 500, 500, 500, 500, 500, 500, 500), (500, 500, 500, 500, 500, 500, 500, 500, 500, 500))>

Multiple arrays#

To write multiple arrays in a single computation use cubed.store():

import cubed
import cubed.random

# 2MB chunks
a = cubed.random.random((5000, 5000), chunks=(500, 500))
b = cubed.random.random((5000, 5000), chunks=(500, 500))

# write to Zarr
arrays = [a, b]
paths = ["a.zarr", "b.zarr"]
cubed.store(arrays, paths)

Then to read the Zarr files back, we use cubed.from_zarr() for each array and perform whatever array operations we like on them. Only when we call to_zarr is the whole computation executed.

import cubed.array_api as xp

# read from Zarr
a = cubed.from_zarr("a.zarr")
b = cubed.from_zarr("b.zarr")

# perform operation
c = xp.add(a, b)

# write to Zarr
cubed.to_zarr(c, store="c.zarr")