The evolution of array
computing in Python
Ralf Gommers

PyData Amsterdam 2019
whoami
Maintainer of NumPy & SciPy (2010 -- )
NumFOCUS board member (2012 -- 2018)
Director of Quansight Labs (2019 -- )
You can find me at:
https://github.com/rgommers
rgommers@quansight.com
2
Public benefit division of Quansight,
providing a home for a “PyData Core
Team” and growing the community
A very brief history of array computing in Python
3
Numeric
Numarray
1995
2003
2006
2008
2012
2015 -- today
Sparse
… ?
A motivating example
4
Another motivating example
5
Do try this at home
6
$ conda create -n pydata-ams
$ conda activate pydata-ams
$ conda install numpy dask jupyterlab
$ # If you have a NVIDIA GPU. Needs CUDA installed.
$ # CuPy 6.0.0 will be out soon, then conda-installable.
$ pip install --pre cupy-cuda100 # or cupy-90 for CUDA 9
$ python -c "import numpy; print(numpy.__version__)"
1.16.3
$ python -c "import cupy; print(cupy.__version__)"
6.0.0rc1
$ python -c "import dask; print(dask.__version__)"
1.2.0
$ export NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1
The NumPy array protocols - goals
7
Separate NumPy API from NumPy “execution engine”
Allow other libraries (Dask, CuPy, PyTorch, …) to reuse
the NumPy API
Bigger picture: avoid or reduce ecosystem
fragmentation (we don’t want to see a reimplementation of SciPy for
PyTorch, SciPy for Tensorflow, etc.)
The NumPy array protocols - goals
Current state of N-dimensional arrays in Python
8
The NumPy array protocols - goals
What we’re aiming for
9
The NumPy array protocols - concept
10
Function body

(the “implementation”, defines
semantics)
Function signature (the API)
Example for one function - this can be a ufunc or a regular function.
The NumPy array protocols - concept
11
Function body
Function signature (the API)
In short: use the NumPy API, bring your own implementation
Function body
Function signature
Function signature (the API)
if input arg has __array_function__:
execute other_function
Function body
Function signature
==
Using array protocols in your own code
Suitable for code that uses NumPy functions and ndarray
methods.
Try CuPy if you need more performance on large arrays,
and Dask if you want a distributed array.
Let’s play with this in a notebook!
12
Limits to these array protocols
13
Only functions can be overridden. And not even all functions -- only the
ones with an array_like parameter. Important exceptions:
np.array, np.asarray, np.linspace, np.concatenate
NumPy’s roadmap - what’s next?
Interoperability
Roll out __array_function__,
handle subclasses better,
new protocols?
Extensibility
Easier custom dtypes
Performance
Ufunc optimizations,

more SIMD instructions,
...
np.random rewrite
Is about to be merged
Indexing
NEP 21 (oindex/vindex), for
more intuitive behavior.
Type annotations
PEP 484 / mypy compatible
annotations, see numpy-
stubs repo.
14
The array computing landscape
15
maturity
n-D arrays (a.k.a. tensors)
dataframes
CPU
GPU
NumPy
XND
CuPy
PyTorch
xtensor
CPU
GPU
Pandas
MXNet
DyND
cuDF
Apache Arrow
Tensorflow
Uarray
TensorLy
xframe
Xarray
XND
Recreates the foundations of NumPy as a number of
smaller libraries. Plus:
Variable length strings
Ragged arrays
Categorical type
Missing data support
Easy custom dtypes
Automatic multi-threading
JIT compilation (via Numba)
16
XND
17
Variable length strings:
Ragged arrays:
xtensor
A C++ library for n-D arrays. Plus:
Lazy evaluation
Performance - very fast
Can operate on NumPy arrays
Python, Julia and R bindings
JIT compilation (via Pythran)
Built on top: rray (NumPy-like arrays for R)
18
xtensor
19
Uarray
A more general solution than the NumPy array protocols
for building APIs with multiple backends.
Override any object: functions, classes, ufuncs, dtypes,
context managers, and more
Uses multiple dispatch rather than protocols.
20
Uarray
21
Thanks!
Any questions?
22

The evolution of array computing in Python

  • 1.
    The evolution ofarray computing in Python Ralf Gommers
 PyData Amsterdam 2019
  • 2.
    whoami Maintainer of NumPy& SciPy (2010 -- ) NumFOCUS board member (2012 -- 2018) Director of Quansight Labs (2019 -- ) You can find me at: https://github.com/rgommers rgommers@quansight.com 2 Public benefit division of Quansight, providing a home for a “PyData Core Team” and growing the community
  • 3.
    A very briefhistory of array computing in Python 3 Numeric Numarray 1995 2003 2006 2008 2012 2015 -- today Sparse … ?
  • 4.
  • 5.
  • 6.
    Do try thisat home 6 $ conda create -n pydata-ams $ conda activate pydata-ams $ conda install numpy dask jupyterlab $ # If you have a NVIDIA GPU. Needs CUDA installed. $ # CuPy 6.0.0 will be out soon, then conda-installable. $ pip install --pre cupy-cuda100 # or cupy-90 for CUDA 9 $ python -c "import numpy; print(numpy.__version__)" 1.16.3 $ python -c "import cupy; print(cupy.__version__)" 6.0.0rc1 $ python -c "import dask; print(dask.__version__)" 1.2.0 $ export NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1
  • 7.
    The NumPy arrayprotocols - goals 7 Separate NumPy API from NumPy “execution engine” Allow other libraries (Dask, CuPy, PyTorch, …) to reuse the NumPy API Bigger picture: avoid or reduce ecosystem fragmentation (we don’t want to see a reimplementation of SciPy for PyTorch, SciPy for Tensorflow, etc.)
  • 8.
    The NumPy arrayprotocols - goals Current state of N-dimensional arrays in Python 8
  • 9.
    The NumPy arrayprotocols - goals What we’re aiming for 9
  • 10.
    The NumPy arrayprotocols - concept 10 Function body
 (the “implementation”, defines semantics) Function signature (the API) Example for one function - this can be a ufunc or a regular function.
  • 11.
    The NumPy arrayprotocols - concept 11 Function body Function signature (the API) In short: use the NumPy API, bring your own implementation Function body Function signature Function signature (the API) if input arg has __array_function__: execute other_function Function body Function signature ==
  • 12.
    Using array protocolsin your own code Suitable for code that uses NumPy functions and ndarray methods. Try CuPy if you need more performance on large arrays, and Dask if you want a distributed array. Let’s play with this in a notebook! 12
  • 13.
    Limits to thesearray protocols 13 Only functions can be overridden. And not even all functions -- only the ones with an array_like parameter. Important exceptions: np.array, np.asarray, np.linspace, np.concatenate
  • 14.
    NumPy’s roadmap -what’s next? Interoperability Roll out __array_function__, handle subclasses better, new protocols? Extensibility Easier custom dtypes Performance Ufunc optimizations,
 more SIMD instructions, ... np.random rewrite Is about to be merged Indexing NEP 21 (oindex/vindex), for more intuitive behavior. Type annotations PEP 484 / mypy compatible annotations, see numpy- stubs repo. 14
  • 15.
    The array computinglandscape 15 maturity n-D arrays (a.k.a. tensors) dataframes CPU GPU NumPy XND CuPy PyTorch xtensor CPU GPU Pandas MXNet DyND cuDF Apache Arrow Tensorflow Uarray TensorLy xframe Xarray
  • 16.
    XND Recreates the foundationsof NumPy as a number of smaller libraries. Plus: Variable length strings Ragged arrays Categorical type Missing data support Easy custom dtypes Automatic multi-threading JIT compilation (via Numba) 16
  • 17.
  • 18.
    xtensor A C++ libraryfor n-D arrays. Plus: Lazy evaluation Performance - very fast Can operate on NumPy arrays Python, Julia and R bindings JIT compilation (via Pythran) Built on top: rray (NumPy-like arrays for R) 18
  • 19.
  • 20.
    Uarray A more generalsolution than the NumPy array protocols for building APIs with multiple backends. Override any object: functions, classes, ufuncs, dtypes, context managers, and more Uses multiple dispatch rather than protocols. 20
  • 21.
  • 22.