Basic array operations

Basic array operations#

The following examples show how to run a few basic Array API operations on Cubed arrays.

Adding two small arrays#

The first example adds two small 4x4 arrays together, and is useful for checking that the runtime is working.

import cubed.array_api as xp

if __name__ == "__main__":
    a = xp.asarray(
        [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]],
        chunks=(2, 2),
    )
    b = xp.asarray(
        [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]],
        chunks=(2, 2),
    )
    c = xp.add(a, b)
    res = c.compute()
    print(res)

Paste the code into a file called add-asarray.py, or download from GitHub, then run with:

python add-asarray.py

If successful it will print a 4x4 array:

[[ 2  4  6  8]
 [10 12 14 16]
 [18 20 22 24]
 [26 28 30 32]]

Adding two larger arrays#

The next example generates two random 20GB arrays and then adds them together.

import logging

import cubed
import cubed.array_api as xp
import cubed.random
from cubed.diagnostics import ProgressBar
from cubed.diagnostics.history import HistoryCallback
from cubed.diagnostics.timeline import TimelineVisualizationCallback

# suppress harmless connection pool warnings
logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR)

if __name__ == "__main__":
    # 200MB chunks
    a = cubed.random.random((50000, 50000), chunks=(5000, 5000))
    b = cubed.random.random((50000, 50000), chunks=(5000, 5000))
    c = xp.add(a, b)

    # use store=None to write to temporary zarr
    with ProgressBar(), HistoryCallback(), TimelineVisualizationCallback():
        cubed.to_zarr(c, store=None)

Paste the code into a file called add-random.py, or download from GitHub, then run with:

python add-random.py

This example demonstrates how we can use callbacks to gather information about the computation.

RichProgressBar shows a progress bar for the computation as it is running.
TimelineVisualizationCallback produces a plot (after the computation has completed) showing the timeline of events in the task lifecycle.
HistoryCallback produces various stats about the computation once it has completed.

The plots and stats are written in the history directory in a directory with a timestamp. You can open the latest plot with

open $(ls -d history/compute-* | tail -1)/timeline.svg

Matmul#

The next example generates two random 5GB arrays and then multiplies them together. This is a more intensive computation than addition, and will take a few minutes to run locally.

import logging

import cubed
import cubed.array_api as xp
import cubed.random
from cubed.diagnostics.history import HistoryCallback
from cubed.diagnostics.rich import RichProgressBar
from cubed.diagnostics.timeline import TimelineVisualizationCallback

# suppress harmless connection pool warnings
logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR)

if __name__ == "__main__":
    # 200MB chunks
    a = cubed.random.random((25000, 25000), chunks=(5000, 5000))
    b = cubed.random.random((25000, 25000), chunks=(5000, 5000))
    c = xp.matmul(a, b)

    progress = RichProgressBar()
    hist = HistoryCallback()
    timeline_viz = TimelineVisualizationCallback()
    # use store=None to write to temporary zarr
    cubed.to_zarr(
        c,
        store=None,
        callbacks=[progress, hist, timeline_viz],
    )

Paste the code into a file called matmul-random.py, or download from GitHub, then run with:

python matmul-random.py

Trying different executors#

You can run these scripts using different executors by setting environment variables to control the Cubed configuration.

For example, this will use the processes executor to run the example:

CUBED_SPEC__EXECUTOR_NAME=processes python add-random.py

For cloud executors, it’s usually best to put all of the configuration in one YAML file, and set the CUBED_CONFIG environment variable to point to it:

export CUBED_CONFIG=/path/to/lithops/aws/cubed.yaml
python add-random.py

You can read more about how configuration works in Cubed in general, and detailed steps to run on a particular cloud service here.