Basic array operations#

The following examples show how to run a few basic Array API operations on Cubed arrays.

Adding two small arrays#

The first example adds two small 4x4 arrays together, and is useful for checking that the runtime is working.

import cubed.array_api as xp

if __name__ == "__main__":
    a = xp.asarray(
        [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]],
        chunks=(2, 2),
    )
    b = xp.asarray(
        [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]],
        chunks=(2, 2),
    )
    c = xp.add(a, b)
    res = c.compute()
    print(res)

Paste the code into a file called add-asarray.py, or download from GitHub, then run with:

python add-asarray.py

If successful it will print a 4x4 array:

[[ 2  4  6  8]
 [10 12 14 16]
 [18 20 22 24]
 [26 28 30 32]]

Adding two larger arrays#

The next example generates two random 20GB arrays and then adds them together.

import logging

import cubed
import cubed.array_api as xp
import cubed.random
from cubed.diagnostics.history import HistoryCallback
from cubed.diagnostics.rich import RichProgressBar
from cubed.diagnostics.timeline import TimelineVisualizationCallback

# suppress harmless connection pool warnings
logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR)

if __name__ == "__main__":
    # 200MB chunks
    a = cubed.random.random((50000, 50000), chunks=(5000, 5000))
    b = cubed.random.random((50000, 50000), chunks=(5000, 5000))
    c = xp.add(a, b)

    progress = RichProgressBar()
    hist = HistoryCallback()
    timeline_viz = TimelineVisualizationCallback()
    # use store=None to write to temporary zarr
    cubed.to_zarr(
        c,
        store=None,
        callbacks=[progress, hist, timeline_viz],
    )

Paste the code into a file called add-random.py, or download from GitHub, then run with:

python add-random.py

This example demonstrates how we can use callbacks to gather information about the computation.

  • RichProgressBar shows a progress bar for the computation as it is running.

  • TimelineVisualizationCallback produces a plot (after the computation has completed) showing the timeline of events in the task lifecycle.

  • HistoryCallback produces various stats about the computation once it has completed.

The plots and stats are written in the history directory in a directory with a timestamp. You can open the latest plot with

open $(ls -d history/compute-* | tail -1)/timeline.svg

Matmul#

The next example generates two random 5GB arrays and then multiplies them together. This is a more intensive computation than addition, and will take a few minutes to run locally.

import logging

import cubed
import cubed.array_api as xp
import cubed.random
from cubed.diagnostics.history import HistoryCallback
from cubed.diagnostics.rich import RichProgressBar
from cubed.diagnostics.timeline import TimelineVisualizationCallback

# suppress harmless connection pool warnings
logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR)

if __name__ == "__main__":
    # 200MB chunks
    a = cubed.random.random((25000, 25000), chunks=(5000, 5000))
    b = cubed.random.random((25000, 25000), chunks=(5000, 5000))
    c = xp.matmul(a, b)

    progress = RichProgressBar()
    hist = HistoryCallback()
    timeline_viz = TimelineVisualizationCallback()
    # use store=None to write to temporary zarr
    cubed.to_zarr(
        c,
        store=None,
        callbacks=[progress, hist, timeline_viz],
    )

Paste the code into a file called matmul-random.py, or download from GitHub, then run with:

python matmul-random.py

Trying different executors#

You can run these scripts using different executors by setting environment variables to control the Cubed configuration.

For example, this will use the processes executor to run the example:

CUBED_SPEC__EXECUTOR_NAME=processes python add-random.py

For cloud executors, it’s usually best to put all of the configuration in one YAML file, and set the CUBED_CONFIG environment variable to point to it:

export CUBED_CONFIG=/path/to/lithops/aws/cubed.yaml
python add-random.py

You can read more about how configuration works in Cubed in general, and detailed steps to run on a particular cloud service here.