Basic array operations#
The following examples show how to run a few basic Array API operations on Cubed arrays.
Adding two small arrays#
The first example adds two small 4x4 arrays together, and is useful for checking that the runtime is working.
import cubed.array_api as xp
if __name__ == "__main__":
a = xp.asarray(
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]],
chunks=(2, 2),
)
b = xp.asarray(
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]],
chunks=(2, 2),
)
c = xp.add(a, b)
res = c.compute()
print(res)
Paste the code into a file called add-asarray.py
, or download from GitHub, then run with:
python add-asarray.py
If successful it will print a 4x4 array:
[[ 2 4 6 8]
[10 12 14 16]
[18 20 22 24]
[26 28 30 32]]
Adding two larger arrays#
The next example generates two random 20GB arrays and then adds them together.
import logging
import cubed
import cubed.array_api as xp
import cubed.random
from cubed.diagnostics.history import HistoryCallback
from cubed.diagnostics.rich import RichProgressBar
from cubed.diagnostics.timeline import TimelineVisualizationCallback
# suppress harmless connection pool warnings
logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR)
if __name__ == "__main__":
# 200MB chunks
a = cubed.random.random((50000, 50000), chunks=(5000, 5000))
b = cubed.random.random((50000, 50000), chunks=(5000, 5000))
c = xp.add(a, b)
progress = RichProgressBar()
hist = HistoryCallback()
timeline_viz = TimelineVisualizationCallback()
# use store=None to write to temporary zarr
cubed.to_zarr(
c,
store=None,
callbacks=[progress, hist, timeline_viz],
)
Paste the code into a file called add-random.py
, or download from GitHub, then run with:
python add-random.py
This example demonstrates how we can use callbacks to gather information about the computation.
RichProgressBar
shows a progress bar for the computation as it is running.TimelineVisualizationCallback
produces a plot (after the computation has completed) showing the timeline of events in the task lifecycle.HistoryCallback
produces various stats about the computation once it has completed.
The plots and stats are written in the history
directory in a directory with a timestamp. You can open the latest plot with
open $(ls -d history/compute-* | tail -1)/timeline.svg
Matmul#
The next example generates two random 5GB arrays and then multiplies them together. This is a more intensive computation than addition, and will take a few minutes to run locally.
import logging
import cubed
import cubed.array_api as xp
import cubed.random
from cubed.diagnostics.history import HistoryCallback
from cubed.diagnostics.rich import RichProgressBar
from cubed.diagnostics.timeline import TimelineVisualizationCallback
# suppress harmless connection pool warnings
logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR)
if __name__ == "__main__":
# 200MB chunks
a = cubed.random.random((25000, 25000), chunks=(5000, 5000))
b = cubed.random.random((25000, 25000), chunks=(5000, 5000))
c = xp.matmul(a, b)
progress = RichProgressBar()
hist = HistoryCallback()
timeline_viz = TimelineVisualizationCallback()
# use store=None to write to temporary zarr
cubed.to_zarr(
c,
store=None,
callbacks=[progress, hist, timeline_viz],
)
Paste the code into a file called matmul-random.py
, or download from GitHub, then run with:
python matmul-random.py
Trying different executors#
You can run these scripts using different executors by setting environment variables to control the Cubed configuration.
For example, this will use the processes
executor to run the example:
CUBED_SPEC__EXECUTOR_NAME=processes python add-random.py
For cloud executors, it’s usually best to put all of the configuration in one YAML file, and set the CUBED_CONFIG
environment variable to point to it:
export CUBED_CONFIG=/path/to/lithops/aws/cubed.yaml
python add-random.py
You can read more about how configuration works in Cubed in general, and detailed steps to run on a particular cloud service here.