Icechunk#

This example shows how to perform large-scale distributed writes to Icechunk using Cubed (based on the examples for using Icechunk with Dask).

Install the package pre-requisites by running the following:

pip install cubed icechunk

Start by creating an Icechunk store.

import icechunk
import tempfile

# initialize the icechunk store
storage = icechunk.local_filesystem_storage(tempfile.TemporaryDirectory().name)
repo = icechunk.Repository.create(storage)
session = repo.writable_session("main")
  2025-12-12T12:06:18.931979Z  WARN icechunk::storage::object_store: The LocalFileSystem storage is not safe for concurrent commits. If more than one thread/process will attempt to commit at the same time, prefer using object stores.
    at icechunk/src/storage/object_store.rs:80

Write to Icechunk#

Use cubed.icechunk.store_icechunk to write a Cubed array to an Icechunk store. The API follows that of cubed.store().

First create a Cubed array to write:

import cubed
shape = (100, 100)
cubed_chunks = (20, 20)
cubed_array = cubed.random.random(shape, chunks=cubed_chunks)

Now create the Zarr array you will write to.

import zarr

zarr_chunks = (10, 10)
group = zarr.group(store=session.store, overwrite=True)

zarray = group.create_array(
    "array",
    shape=shape,
    chunks=zarr_chunks,
    dtype="f8",
    fill_value=float("nan"),
)
session.commit("initialize array")
'6QPN1475EP905XN9MDYG'

Note that the chunks in the store are a divisor of the Cubed chunks. This means each individual write task is independent, and will not conflict. It is your responsibility to ensure that such conflicts are avoided.

First remember to fork the session before re-opening the Zarr array. store_icechunk will merge all the remote write sessions on the cluster before returning back a single merged ForkSession.

from cubed.icechunk import store_icechunk

session = repo.writable_session("main")
fork = session.fork()
zarray = zarr.open_array(fork.store, path="array")
remote_session = store_icechunk(
    sources=[cubed_array],
    targets=[zarray]
)

Merge the remote session in to the local Session

session.merge(remote_session)

Finally commit your changes!

print(session.commit("wrote a cubed array!"))
7Z5WJV09HMP6KH633SQ0

Read from Icechunk#

Use cubed.from_zarr() to read from Icechunk - note that no special Icechunk-specific function is needed in this case.

cubed.from_zarr(store=session.store, path="array")
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/IPython/core/formatters.py:406, in BaseFormatter.__call__(self, obj)
    404     method = get_real_method(obj, self.print_method)
    405     if method is not None:
--> 406         return method()
    407     return None
    408 else:

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/array_api/array_object.py:50, in Array._repr_html_(self)
     49 def _repr_html_(self):
---> 50     from cubed.diagnostics.widgets import get_template
     52     try:
     53         grid = self.to_svg(size=ARRAY_SVG_SIZE)

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/diagnostics/__init__.py:1
----> 1 from .rich import RichProgressBar as ProgressBar
      3 __all__ = ["ProgressBar"]

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/cubed/diagnostics/rich.py:6
      3 import time
      4 from contextlib import contextmanager
----> 6 from rich.console import RenderableType
      7 from rich.progress import (
      8     BarColumn,
      9     MofNCompleteColumn,
   (...)     15     TimeElapsedColumn,
     16 )
     17 from rich.text import Text

ModuleNotFoundError: No module named 'rich'
cubed.Array<array-005, shape=(100, 100), dtype=float64, chunks=((10, 10, 10, 10, 10, 10, 10, 10, 10, 10), (10, 10, 10, 10, 10, 10, 10, 10, 10, 10))>