2. What is NumPy?
2 / 57
NumPy (numerical python) is a package for scientific
computing. It provides tools for handling n-
dimensional arrays (especially vectors and matrices).
The objects are all the same type into a NumPy arrays
structure
The package offers a large number of routines for fast access
to data (e.g. search, extraction), for various manipulations
(e.g. sorting), for calculations (e.g. statistical computing)
etc
3. Overview
3 / 57
Broadcasting
Array Broadcasting
Broadcasting rules
Fancy indexing and index tricks
Indexing with Arrays of indices
Indexing with Boolean Arrays
The ix_() function
Indexing with strings
Linear Algebra
Simple Array Operations
Tricks and Tips
“Automatic” Reshaping
Vector Stacking
Histograms
5. Broadcasting
Broadcasting allows us to deal with inputs that do not have
exactly the same shape.
NumPy operations are usually done on pairs of arrays on an element-by-
element basis. The two arrays must have exactly the same shape, as in
the following example.
NumPy’s broadcasting rule relaxes this constraint when the arrays’
shapes meet certain constraints.
>>> a = np.array([[1, 2], [3, 4]])
>>> b = 2
>>> a * b
array([[2, 4], [6, 7]])
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[2, 2], [2, 2]])
>>> a * b
array([[2, 4], [6, 7]])
Add a multiarray with the same shape Add a scala to a multiarray
4
2
1
3
*
2
2
2
2
=
2x2 2x2
8
4
2
6
2x2
Broadcasting occurs!
4
2
1
3
*
2
2
2
2
=
2x2 1x1
8
4
2
6
2x2
6. Broadcasting
6 / 57
+
30
20
10
0 0 1 2
=
30 31 32
20 21 22
10 11 12
0 1 2
4x1 1x3 4x3
stretch
stretch
>>> A = np.array([0,10,20,30])
>>> B = np.array([0,1,2])
>>> y = A[:, None] + B
Both A and B arrays have axes with length one that are
expanded to a larger size during the broadcast operation:
The smaller array is “broadcast” across the larger
array so that they have compatible shapes
8. Broadcasting
8 / 57
+
30 30 30
20 20 20
10 10 10
0 0 0 0 1 2
=
3
4x3 4
mismatch!
When operating on two arrays, NumPy compares
their shapes element-wise.
Two dimensions are compatible when
they are equal, or
one of them is 1
If these conditions are not met, a
ValueError: frames are not aligned exception is thrown,
indicating that the arrays have incompatible shapes.
10. Indexing with Arrays of Indices
NumPy offers more indexing facilities than regular Python
sequences.
In addition to indexing by integers and slices, arrays can be indexed by
arrays of integers and arrays of booleans.
>>> a = np.arange(6)**2 # the first 5 square numbers
>>> i = np.array([ 0,0,1,3 ]) # an array of indices
>>> a[i] # the elements of a at the positions i
array([ 0, 0, 1, 9 ])
>>> j = np.array( [ [ 1, 2], [ 0, 4 ] ] ) # a bidimensional array of indices
>>> a[j] # the same shape as j
array([[1, 4],
[0, 16]])
9
0 1 4 16 25 → 0 0 1 9
9
0 1 4 16 25 →
4
1
0 16
Unlike slicing, fancy indexing creates
copies instead of a view into original
array
2x2
11. Indexing with Arrays of Indices
11 / 57
When the indexed array a is multidimensional, a single array of indices
refers to the first dimension of a.
The following example shows this behavior by converting an image of labels into a color
image using a palette.
>>> palette = np.array( [ [0,0,0], # black
[255,0,0], # red
[0,255,0], # green
[0,0,255], # blue
[255,255,255] ] ) # white
>>> image = np.array( [ [ 0, 1, 2, 0 ], # each value corresponds to
# a color in the palette
[ 0, 3, 4, 0 ] ] )
>>> palette[image] # the (2,4,3) color image
array([[[ 0, 0, 0],
[255, 0, 0],
[ 0, 255, 0],
[ 0, 0, 0]],
[[ 0, 0, 0],
[ 0, 0, 255],
[255, 255, 255],
[ 0, 0, 0]]])
0 0 0
→cc
0 0
255
0 0
255
0
0 255
255
255
255
0 0 0
0 0
255
0 0
255
0
0 255
0 0 0
0
0
0
0 0
255
255
255
255
2x5x3
5x3
12. Indexing with Arrays of Indices
12 / 57
We can also give indexes for more than one dimension. The arrays of
indices for each dimension must have the same shape.
>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> i = np.array( [ [0,1], # indices for the first dim of a
[1,2] ] )
>>> j = np.array( [ [2,1], # indices for the second dim of a
[3,3] ] )
>>> a[i,j] # i and j must have equal shape
array([[ 2, 5],
[ 7, 11]])
>>> a[i,2]
array([[ 2, 6 ],
[ 6, 10 ]])
1 2 3
6 7
5
9 11
10
3x4
0
4
8
j
i
→
5
2
7
2x2
11
1 3
7
5
9 11
3x4
0
4
8
i
2
6
10
j =2
Indexing arrays for multi-dimension
Indexing with a fixed index
>>> a[:,j] # i.e., a[ : , j]
array([[[ 2, 1],
[ 3, 3]],
[[ 6, 5],
[ 7, 7]],
[[10, 9],
[11, 11]]])
Indexing for the complete row slices
13. Indexing with Arrays of Indices
13 / 57
Naturally, we can put i and j in a sequence (say a list) and then
do the indexing with the list
>>> l = [i,j]
>>> a[l] # equivalent to a[i,j]
array([[ 2, 5],
[ 7, 11]])
14. Indexing with Arrays of Indices
14 / 57
Another common use of indexing with arrays is the search of the maximum
value of time-dependent series :
>>> time = np.linspace(20, 145, 5) # time scale
>>> data = np.sin(np.arange(20)).reshape(5,4) # 4 time-dependent series
>>> time
array([ 20. , 51.25, 82.5 , 113.75, 145. ])
>>> data
array([[ 0. , 0.84147098, 0.90929743, 0.14112001],
[-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ],
[ 0.98935825, 0.41211849, -0.54402111, -0.99999021],
[-0.53657292, 0.42016704, 0.99060736, 0.65028784],
[-0.28790332, -0.96139749, -0.75098725, 0.14987721]])
>>> ind = data.argmax(axis=0)
>>> ind
array([2, 0, 3, 1])
>>> time_max = time[ ind ] # times corresponding to the maxima
>>> time_max
array([ 82.5 , 20. , 113.75, 51.25])
Index of the maxima for each series
Time-dependent series
15. Indexing with Arrays of Indices
15 / 57
You can also use indexing with arrays as a target to assign to:
However, when the list of indices contains repetitions, the
assignment is done several times, leaving behind the last value:
>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1,3,4]] = 0
>>> a
array([0, 0, 2, 0, 0])
>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[0,0,2]]=[1,2,3]
>>> a
array([2, 1, 3, 3, 4])
2 3 4
1
0
2 0 0
0
0
0 1 2 3 4
→
16. Boolean or “mask” Index Arrays
16 / 57
When we index arrays with arrays of (integer) indices we are providing the
list of indices to pick.
With boolean indices the approach is different; we explicitly choose which
items in the array we want and which ones we don’t.
The most natural way one can think of for boolean indexing is to use boolean arrays that
have the same shape as the original array:
>>> a = np.arange(12).reshape(3,4)
>>> mask = a > 4
>>> mask # mask is a boolean with a's shape
array([[False, False, False, False],
[False, True, True, True],
[ True, True, True, True]], dtype=bool)
>>> a[mask] # 1d array with the selected elements
array([ 5, 6, 7, 8, 9, 10, 11])
1 2 3
6 7
5
9 11
10
3x4
0
4
8
Unlike in the case of integer index arrays, in the boolean
case, the result is a 1-D array containing all the
elements in the indexed array corresponding to all the
true elements in the boolean array.
17. Boolean or “mask” Index Arrays
17 / 57
This property can be very useful in assignments:
>>> a[mask] = 0 # All elements of 'a' higher than 4 become 0
>>> a
array([[0, 1, 2, 3],
[4, 0, 0, 0],
[0, 0, 0, 0]])
1 2 3
0 0
0
0 0
0
3x4
0
4
0
18. Indexing with Boolean Arrays
18 / 57
The second way of indexing with booleans is more similar to integer
indexing; for each dimension of the array we give a 1D boolean array
selecting the slices we want.
>>> a = np.arange(12).reshape(3,4)
>>> b1 = np.array([False,True,True]) # first dim selection
>>> b2 = np.array([True,False,True,False]) # second dim selection
>>> a[b1,:] # selecting rows
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
1 2 3
6 7
5
9 11
10
0
4
8
F F F
T T
T
T T
T
F
T
T
→
5 6 7
10 11
9
4
8
+
mask
19. Indexing with Boolean Arrays
19 / 57
>>> a = np.arange(12).reshape(3,4)
>>> b1 = np.array([False,True,True]) # first dim selection
>>> b2 = np.array([True,False,True,False]) # second dim selection
>>> a[:,b2] # selecting columns
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
>>> a[b1,b2]
array([4, 10])
>>> a[np.ix_(b1,b2)]
array([[4, 6],
[8, 10]])
1 2 3
6 7
5
9 11
10
0
4
8
F F F
T T
T
T T
T
F
T
T
→
T F
F
F F
T
T
T
1 2 3
6 7
5
9 11
0
8
4
T
F F
T
mask2
mask1
10
Without the np.ix_ call or only the diagonal elements would be selected.
Without the np.ix_ call or only the diagonal
elements would be selected!
→
6
6
8
4
Taking advantage of numpy’s broadcasting facilities.
10
20. The ix_() function: Special indexing field
20 / 57
Basic slicing is constructed by start:stop:step notation inside of brackets.
The numpy.ix_ function generates indexes for irregular slices
>>> a = np.arange(42).reshape(6,7)
>>> a[np.ix_([1,3,4],[2,5])]
MATLAB NumPy Description
a( [2,4,5],[3,6] ) a[ ix_( [1,3,4],[2,5] ) ] rows 2,4 and 5 and columns 3 and 6.
6x7
0 1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31 32 33 34
35 36 37 38 39 40 41
9 12
23 26
30 33
3x2
>>> a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[1:7:2]
array([1, 3, 5]) for example, a[1:3, : ], a[2,3], a[5:]
Picking out rows and columns!
23. Simple Array Operations
23 / 57
>>> A = np.array([[3, 4], [2, 3]])
>>> print(A)
[[ 3 4]
[ 2 3]]
>>> A.transpose() # the same as matlab
array([[ 3, 2],
[ 4, 3]])
>>> np.linalg.inv(a) # Compute the (multiplicative) inverse of a matrix.
array([[3, -4],
[-2, 3]])
All functions we know by now operate element-wise on arrays. For linear
algebra we need scalar, matrix-vector and matrix-matrix products.
24. Simple Array Operations: INVERSE 3x3 ARRAY
24 / 57
>>> a = np.arange(9).reshape(3,3)
>>> np.linalg.det(a)
0.0
>>> np.linalg.inv(a)
ERROR
IDENTITY MATRIX
0 1 2
4 5
3
6 8
7
3x3
AA–1 = A–1A = In
detA = a11a22a33 + a21a32a13 + a31a12a23 -
a11a32a23 - a31a22a13 - a21a12a33
INVERSE OF 3x3 MATRIX
PYTHON TEST
DETERMINANT OF a ARRAY
det(a) = 0*4*8 + 3*7*2 + 6*1*5 –
0*7*5 – 6*4*2 – 3*1*8 = 0
a–1 is undefined!
If the determinant of A is zero, then A inverse does
not exist.
27. “Automatic” Reshaping
27 / 57
>>> a = np.arange(30)
>>> a.shape = 2,-1,3 # -1 means "whatever is needed"
>>> a.shape
(2, 5, 3)
>>> a
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]],
[[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29]]])
To change the dimensions of an array, you can omit one of the
sizes which will then be deduced automatically:
2 x A x 3 → 2*A*3 = 30 → A = 5
CONSTRUCT 3D ARRAY
28. Vector Stacking
28 / 57
>>> x = np.arange(0,10,2) # x=([0,2,4,6,8])
>>> y = np.arange(5) # y=([0,1,2,3,4])
>>> m = np.vstack([x,y]) # m=([[0,2,4,6,8],
# [0,1,2,3,4]])
>>> xy = np.hstack([x,y]) # xy =([0,2,4,6,8,0,1,2,3,4])
How do we construct a 2D array from a list of equally-sized
row vectors?
In MATLAB this is quite easy: if x and y are two vectors of the same length
you only need do m=[x;y].
In NumPy this works via the functions column_stack, dstack,
hstack and vstack. For example:
4 6 8
2
0 2 3 4
1
0
5 6 7
10 11
9
4
8
→
29. Numpy Histogram vs Hist matplotlib
29 / 57
>>> import numpy as np
>>> import matplotlib.pyplot as plt
# Build a vector of 10000 normal deviates with variance 0.5^2
and mean 2
>>> mu, sigma = 2, 0.5
>>> v = np.random.normal(mu,sigma,10000)
# Plot a normalized histogram with 50 bins
>>> plt.hist(v, bins=50, normed=1)
>>> plt.show()
Normed=1 means that the
sum of the histograms is
normalized to 1.
• Hist ( Matplotlib) plots the histogram automatically.
PLOT A HISTOGRAM WITH MATPLOTLIB HIST
Numpy.random.normal(mean, std, size)
draws random samples from a normal
(Gaussian) distribution.
31. Numpy Histogram vs Hist matplotlib
31 / 57
>>> mu, sigma = 2, 0.5
>>> a = np.random.normal(mu,sigma,10000)
>>> (v, bins) = np.histogram(a, bins=5, normed=True)
>>> plt.plot(bins[1:], v)
Beware: matplotlib also has a function to build histograms (called hist, as in Matlab) that differs
from the one in NumPy.
The main difference is that hist plots the histogram automatically, while numpy.histogram
only generates the data
bins[1:]
v
bins
v
33. The ix_() function: Special indexing field
33 / 57
>>> ax, bx = np.ix_([1,3,4],[2,5])]
(array([[1],[3],[4]]), array([[2, 5]]))
>>> ax.shape, bx.shape
(3, 1), (1, 2)
numpy.ix_(*args)
2
0 1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31 32 33 34
35 36 37 38 39 40 41
The way it works is by taking advantage of numpy’s broadcasting facilities.
You can see that the two arrays used as row and column indices have different shapes; numpy’s
broadcasting repeats each along the too-short axis so that they conform.
5
1
3
4
8
1
9
2 3 4 5 6
7
0
10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28
35
7
0
14
28
35
21
1 2 3 4 5 6
0
→
34. Backup: numpy.linspace
34 / 57
Parameters:
start : scalar
The starting value of the sequence.
stop : scalar
The end value of the sequence
num : int, optional
Number of samples to generate
numpy.linspace(start, stop, num=50, endpoint=True, retstep=False,
dtype=None)
Return evenly spaced numbers over a specified interval.
Returns num evenly spaced samples, calculated over the
interval [start, stop].
Editor's Notes
The result is equivalent to the previous example where b was an array. We can think of the scalar b being stretched during the arithmetic operation into an array with the same shape as a.
The new elements in b are simply copies of the original scalar. NumPy is smart enough to use the original scalar value without actually making copies, so that broadcasting operations are as memory and computationally efficient as possible.
The code in the second example is more efficient than that in the first because broadcasting moves less memory around during the multiplication (b is a scalar rather than an array).