Operations#
Here we look in more depth at the core and primitive operations in Cubed.
Dependency Tree#
The following diagram shows the dependencies between operations. Array API functions are shown at the top (in white), the core operations are in the middle (in orange), and the primitive operations are at the bottom (blockwise in pink and rechunk in green). Not all Array API functions are shown, just a representative selection.
Note how fundamental blockwise is - all array API functions depend on it.
elemwise#
The simplest core operation is elemwise, which maps input array elements to output array elements, a block at a time.
Preserves:
shape,chunks,numblocksMultiple inputs, single output
It is a core operation that is implemented using blockwise.
Here’s an example with two inputs, such as for add. (Arrows are only shown for two input blocks, in order to avoid cluttering the diagram.)
(Note that elemwise supports broadcasting, but that is not shown here.)
map_blocks#
Like elemwise, map_blocks operates on corresponding input blocks, but they need not match in shape, and the output block shape may differ too.
Preserves:
numblocksMultiple inputs, single output
It is a core operation that is implemented using blockwise.
This example shows how the squeeze operation is implemented using map_blocks (with a single input). Note that although the number of blocks is preserved, the shape and chunk size of the output is different to the input since the second dimension is dropped.
map_selection#
The map_selection operation selects subsets of an input array using standard NumPy indexing notation.
No input array attributes are preserved in general
Single input, single output
It is a core operation that is implemented using blockwise on the output’s blocks. It works by converting indexing selections, such as slices, to keys that refer to blocks in the input array, then retrieving these blocks, slicing them, and assembling them into the final output block.
This example shows how index is implemented using map_selection. Each block in the output array is read directly from one or more blocks from the inputs.
Note: previously, operations that now use map_selection were written using map_direct, which allowed input arrays to be read directly. The main difference between the two operations is that map_selection tracks which blocks are read by the operation, which allows the optimizer to fuse operations. This is unlike map_direct, which does not provide information about block inputs, and therefore cannot be fused by the optimizer.
blockwise#
The blockwise operation is a primitive operation that operates on input array blocks, while allowing an input block to be sent to multiple output blocks.
No input array attributes are preserved in general
Multiple inputs, single output
This example shows how outer is implemented using blockwise. Each block from the input is sent to three blocks in the output. (Arrows are only shown for two input blocks, in order to avoid cluttering the diagram.)
Note: the general_blockwise operation is a more general form of blockwise that uses a function to specify the block mapping, rather than an index notation, and which supports multiple outputs.
No input array attributes are preserved in general
Multiple inputs, multiple outputs
For multiple outputs, all output arrays must have matching numblocks.
rechunk#
The rechunk operation is a primitive operation for changing the chunking of an array, without changing its shape or dtype.
Preserves:
shape,dtypeSingle input, single output
This example shows how there is no one-to-one correspondence between blocks in the input and output. In general, an output block is composed of parts of many input blocks. Consult this page on the rechunker algorithm for details of how it is implemented.
reduction and arg_reduction#
The reduction operation reduces an array along one or more axes.
No input array attributes are preserved in general
Single input, single output
It is a core operation that is implemented using a blockwise operation called partial_reduce that reads multiple blocks and performs the reduction operation on them.
The partial_reduce operations are arranged in a tree (tree_reduce) with multiple rounds until there’s a single block in each reduction axis. Finally an aggregate blockwise operation is applied to the results.
Here is an example of reducing over the first axis, with two rounds of partial_reduce operations:
The arg_reduction works similarly, but uses different functions to return indexes rather than values.