2. Hello
• I am David Cournapeau (ダビ
ド): @cournape (twitter/github)
• NumPy/SciPy user since 2005
• Former core contributor to
NumPy, SciPy
• Started the learn project, which
would later become scikit learn
• Currently leading ML
Engineering team at Cogent
Labs
4. Why would you care about
NumPy ?
• Used a fundamental piece in many higher level Machine Learning
libraries (scikit learn/image, pandas, Tensorflow/Chainer/PyTorch)
• Required to understand the source code of those libraries
• Historically: key enabler of python for ML and Data Science
• NumPy is a library for array computing
• Long history in computing (APL, J, K, Matlab, etc…): see e.g.
http://jsoftware.com/
• Both about efficiency and expressivity
5. A bit of history
• Early work for array computing in Python (matrix-sig mailing
list):
• 1995: Jim Fulton, Jim Hugunin. Became Numeric
• 1995-2000ies: Paul Dubois, Konrad Hinsen, David Ascher,
Travis Oliphant and other contributed later
• 2001: Numarray: Perry Greenfield, Rick White and Todd
Miller
• 2005: “grand unification” into NumPy, led by Travis
Oliphant
6. Array Computing for speed
• You want to compute some math operations:
• In NumPy:
7. Why the difference ?
• Why (c)python is slow for computation: boxing
From Python Data Science Handbook by Jake Vanderplas
8. Why the difference ?
• Why (c)python is slow for
computation: genericity
• E.g. lists can contains arbitrary
python values
• You need to jump pointers to
access values
• Note: accessing an arbitrary
value in RAM costs ~ 100
cycles (as much as computing
the exponential of a double in
C !)
From Python Data Science Handbook by Jake Vanderplas
9. Array computing for
expressivity
• One simple ReLU layer in neural network for 1d vector x:
logits = W @ x + b
output = softmax(logits)
print(logits.shape)
• Maps more directly to many scientific domains
10. Structure of NumPy arrays
• A NumPy array is essentially:
• A single bloc of memory
• A dtype to describe how to interpret single values in the
memory bloc
• Metadata such as shape, strides, etc.
• NumPy arrays memory cost same as C + constant
11. Structure of NumPy arrays
• Data is like a C array
• Dtype is a python object with
information about values in
the array (size, endianness,
etc.)
• dimensions, strides and dtype
are used for multidimensional
indexing
13. Broadcasting 1/4
• Linear Algebra defines most basic NumPy operations
• We do not always want to be as strict as mathematics:
• We want to add scalar to arrays without having to
create arrays with the duplicated scalar
• We sometimes do not care about row vs column vector
• We sometimes want to save memory and avoid
temporaries
14. Broadcasting 2/4
• Broadcasting: rules to work with arrays (and scalars) with
non conforming shapes
• NumPy provides powerful broadcasting capabilities
import numpy as np
# np.newaxis creates a new dimension, but array has the same size
x = np.arange(5)[:, np.newaxis]
y = np.arange(5)
print(x + y)
15. Broadcasting 3/4
• Broadcasting rules:
• If arrays have different number
of dimensions, insert new axes
on the left until arrays have
same number of dimensions
• For each axis i, if arrays
dimension[i] do not match,
“stretch” the arrays where
dimension[i] = 1 to match
other array(s)
• (if no match and dimension[i] !
= 1 -> error) From Python Data Science Handbook by Jake Vanderplas
16. Broadcasting 4/4
• A few notes:
• Broadcasting is done “logically”, and the temporary arrays
are not created in memory
• Integrated in the ufunc and multi-dimensional indexing
infrastructure in NumPy code (see later)
• Indices are broadcasted as well in fancy indexing (see
later)
• You can use np.broadcast_arrays to explicitly build arrays
as if they were broadcasted
17. Indexing: views
• One can use slices any time
one needs to extract “regular”
subarrays
• If arrays are solely indexed
through slices, the returned
array is a view (no data
copied)
import numpy as np
x = np.arange(6).reshape(2, 3)
print(x)
print(x[:, ::2])
print(x[::2, ::2])
19. Indexing: fancy indexing
• As soon as you index an array with an array, you are using
fancy indexing
• Fancy indexing always returns a copy (why ?)
• 2 main cases of fancy indexing:
• Use an array of boolean (aka mask)
• Use an array of integers
• Fancy indexing can get too fancy…
20. Fancy indexing with masks
• Indexing with array of booleans
• Appears naturally with comparison
21. Fancy indexing with integer
arrays
• Indexing with array of integers
• Appears naturally to select specific values from their
indices
23. See Jaime Fernández - The Future of NumPy Indexing presentation
Sebastian Berge: new fancy indexing NEP
24. How to go further
• From Python to NumPy by Nicolas Rougier: http://
www.labri.fr/perso/nrougier/from-python-to-numpy
• 100 NumPy exercises by Nicolas Rougier: https://
github.com/rougier/numpy-100/blob/master/
100%20Numpy%20exercises.md
• Guide to NumPy: http://web.mit.edu/dvp/Public/
numpybook.pdf
• “New” ND index by Mark Wiebe, with notes about speeding
up indexing, etc.: https://github.com/numpy/numpy/blob/
master/doc/neps/nep-0010-new-iterator-ufunc.rst