14078956.ppt

Python NumPy
AILab
Batselem Jagvaral
2016 March

What is NumPy?
2 / 57
 NumPy (numerical python) is a package for scientific
computing. It provides tools for handling n-
dimensional arrays (especially vectors and matrices).
 The objects are all the same type into a NumPy arrays
structure
 The package offers a large number of routines for fast access
to data (e.g. search, extraction), for various manipulations
(e.g. sorting), for calculations (e.g. statistical computing)
 etc

Overview
3 / 57
 Broadcasting
 Array Broadcasting
 Broadcasting rules
 Fancy indexing and index tricks
 Indexing with Arrays of indices
 Indexing with Boolean Arrays
 The ix_() function
 Indexing with strings
 Linear Algebra
 Simple Array Operations
 Tricks and Tips
 “Automatic” Reshaping
 Vector Stacking
 Histograms

Broadcasting
 Broadcasting allows us to deal with inputs that do not have
exactly the same shape.
 NumPy operations are usually done on pairs of arrays on an element-by-
element basis. The two arrays must have exactly the same shape, as in
the following example.
 NumPy’s broadcasting rule relaxes this constraint when the arrays’
shapes meet certain constraints.
>>> a = np.array([[1, 2], [3, 4]])
>>> b = 2
>>> a * b
array([[2, 4], [6, 7]])
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[2, 2], [2, 2]])
>>> a * b
array([[2, 4], [6, 7]])
Add a multiarray with the same shape Add a scala to a multiarray
4
2
1
3
*
2
2
2
2
=
2x2 2x2
8
4
2
6
2x2
Broadcasting occurs!
4
2
1
3
*
2
2
2
2
=
2x2 1x1
8
4
2
6
2x2

Broadcasting
6 / 57
+
30
20
10
0 0 1 2
=
30 31 32
20 21 22
10 11 12
0 1 2
4x1 1x3 4x3
stretch
stretch
>>> A = np.array([0,10,20,30])
>>> B = np.array([0,1,2])
>>> y = A[:, None] + B
 Both A and B arrays have axes with length one that are
expanded to a larger size during the broadcast operation:
The smaller array is “broadcast” across the larger
array so that they have compatible shapes

Broadcasting Arrays
7 / 57
30 31 32
20 21 22
10 11 12
0 1 2
0 1 2
0 1 2
0 1 2
+
+
+
0 1 2
0 1 2
0 1 2
0 1 2
30 30 30
20 20 20
10 10 10
0 0 0
30 30 30
20 20 20
10 10 10
0 0 0
0 1 2
30
20
10
0
30
20
10
0
0 1 2
0 1 2
0 1 2
0 1 2
+
+
+
0 1 2
0 1 2
0 1 2
0 1 2
30 30 30
20 20 20
10 10 10
0 0 0
30 30 30
20 20 20
10 10 10
0 0 0
30
20
10
0 0 1 2
0 1 2
=
=
=
=
=
=
30
20
10
0
stretch stretch
stretch
4x1 3
4x3 3
4x3 4x3
1
2
3
The result is equivalent to
the previous examples.

Broadcasting
8 / 57
+
30 30 30
20 20 20
10 10 10
0 0 0 0 1 2
=
3
4x3 4
mismatch!
 When operating on two arrays, NumPy compares
their shapes element-wise.
 Two dimensions are compatible when
 they are equal, or
 one of them is 1
If these conditions are not met, a
ValueError: frames are not aligned exception is thrown,
indicating that the arrays have incompatible shapes.

Fancy indexing
and index
tricks
9 / 57

Indexing with Arrays of Indices
 NumPy offers more indexing facilities than regular Python
sequences.
 In addition to indexing by integers and slices, arrays can be indexed by
arrays of integers and arrays of booleans.
>>> a = np.arange(6)**2 # the first 5 square numbers
>>> i = np.array([ 0,0,1,3 ]) # an array of indices
>>> a[i] # the elements of a at the positions i
array([ 0, 0, 1, 9 ])
>>> j = np.array( [ [ 1, 2], [ 0, 4 ] ] ) # a bidimensional array of indices
>>> a[j] # the same shape as j
array([[1, 4],
[0, 16]])
9
0 1 4 16 25 → 0 0 1 9
9
0 1 4 16 25 →
4
1
0 16
Unlike slicing, fancy indexing creates
copies instead of a view into original
array
2x2

11 / 57
 When the indexed array a is multidimensional, a single array of indices
refers to the first dimension of a.
 The following example shows this behavior by converting an image of labels into a color
image using a palette.
>>> palette = np.array( [ [0,0,0], # black
[255,0,0], # red
[0,255,0], # green
[0,0,255], # blue
[255,255,255] ] ) # white
>>> image = np.array( [ [ 0, 1, 2, 0 ], # each value corresponds to
# a color in the palette
[ 0, 3, 4, 0 ] ] )
>>> palette[image] # the (2,4,3) color image
array([[[ 0, 0, 0],
[255, 0, 0],
[ 0, 255, 0],
[ 0, 0, 0]],
[[ 0, 0, 0],
[ 0, 0, 255],
[255, 255, 255],
[ 0, 0, 0]]])
0 0 0
→cc
0 0
255
0 0
255
0
0 255
255
255
255
0 0 0
0 0
255
0 0
255
0
0 255
0 0 0
0
0
0
0 0
255
255
255
255
2x5x3
5x3

12 / 57
 We can also give indexes for more than one dimension. The arrays of
indices for each dimension must have the same shape.
>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> i = np.array( [ [0,1], # indices for the first dim of a
[1,2] ] )
>>> j = np.array( [ [2,1], # indices for the second dim of a
[3,3] ] )
>>> a[i,j] # i and j must have equal shape
array([[ 2, 5],
[ 7, 11]])
>>> a[i,2]
array([[ 2, 6 ],
[ 6, 10 ]])
1 2 3
6 7
5
9 11
10
3x4
0
4
8
j
i
→
5
2
7
2x2
11
1 3
7
5
9 11
3x4
0
4
8
i
2
6
10
j =2
Indexing arrays for multi-dimension
Indexing with a fixed index
>>> a[:,j] # i.e., a[ : , j]
array([[[ 2, 1],
[ 3, 3]],
[[ 6, 5],
[ 7, 7]],
[[10, 9],
[11, 11]]])
Indexing for the complete row slices

13 / 57
 Naturally, we can put i and j in a sequence (say a list) and then
do the indexing with the list
>>> l = [i,j]
>>> a[l] # equivalent to a[i,j]
array([[ 2, 5],
[ 7, 11]])

14 / 57
 Another common use of indexing with arrays is the search of the maximum
value of time-dependent series :
>>> time = np.linspace(20, 145, 5) # time scale
>>> data = np.sin(np.arange(20)).reshape(5,4) # 4 time-dependent series
>>> time
array([ 20. , 51.25, 82.5 , 113.75, 145. ])
>>> data
array([[ 0. , 0.84147098, 0.90929743, 0.14112001],
[-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ],
[ 0.98935825, 0.41211849, -0.54402111, -0.99999021],
[-0.53657292, 0.42016704, 0.99060736, 0.65028784],
[-0.28790332, -0.96139749, -0.75098725, 0.14987721]])
>>> ind = data.argmax(axis=0)
>>> ind
array([2, 0, 3, 1])
>>> time_max = time[ ind ] # times corresponding to the maxima
>>> time_max
array([ 82.5 , 20. , 113.75, 51.25])
Index of the maxima for each series
Time-dependent series

15 / 57
 You can also use indexing with arrays as a target to assign to:
 However, when the list of indices contains repetitions, the
assignment is done several times, leaving behind the last value:
>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1,3,4]] = 0
>>> a
array([0, 0, 2, 0, 0])
>>> a
array([0, 1, 2, 3, 4])
>>> a[[0,0,2]]=[1,2,3]
>>> a
array([2, 1, 3, 3, 4])
2 3 4
1
0
2 0 0
0
0
0 1 2 3 4
→

Boolean or “mask” Index Arrays
16 / 57
 When we index arrays with arrays of (integer) indices we are providing the
list of indices to pick.
 With boolean indices the approach is different; we explicitly choose which
items in the array we want and which ones we don’t.
 The most natural way one can think of for boolean indexing is to use boolean arrays that
have the same shape as the original array:
>>> mask = a > 4
>>> mask # mask is a boolean with a's shape
array([[False, False, False, False],
[False, True, True, True],
[ True, True, True, True]], dtype=bool)
>>> a[mask] # 1d array with the selected elements
array([ 5, 6, 7, 8, 9, 10, 11])
1 2 3
6 7
5
9 11
10
3x4
0
4
8
Unlike in the case of integer index arrays, in the boolean
case, the result is a 1-D array containing all the
elements in the indexed array corresponding to all the
true elements in the boolean array.

Boolean or “mask” Index Arrays
17 / 57
 This property can be very useful in assignments:
>>> a[mask] = 0 # All elements of 'a' higher than 4 become 0
>>> a
array([[0, 1, 2, 3],
[4, 0, 0, 0],
[0, 0, 0, 0]])
1 2 3
0 0
0
0 0
0
3x4
0
4
0

Indexing with Boolean Arrays
18 / 57
 The second way of indexing with booleans is more similar to integer
indexing; for each dimension of the array we give a 1D boolean array
selecting the slices we want.
>>> b1 = np.array([False,True,True]) # first dim selection
>>> b2 = np.array([True,False,True,False]) # second dim selection
>>> a[b1,:] # selecting rows
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
1 2 3
6 7
5
9 11
10
0
4
8
F F F
T T
T
T T
T
F
T
T
→
5 6 7
10 11
9
4
8
+
mask

Indexing with Boolean Arrays
19 / 57
>>> b1 = np.array([False,True,True]) # first dim selection
>>> b2 = np.array([True,False,True,False]) # second dim selection
>>> a[:,b2] # selecting columns
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
>>> a[b1,b2]
array([4, 10])
>>> a[np.ix_(b1,b2)]
array([[4, 6],
[8, 10]])
1 2 3
6 7
5
9 11
10
0
4
8
F F F
T T
T
T T
T
F
T
T
→
T F
F
F F
T
T
T
1 2 3
6 7
5
9 11
0
8
4
T
F F
T
mask2
mask1
10
Without the np.ix_ call or only the diagonal elements would be selected.
Without the np.ix_ call or only the diagonal
elements would be selected!
→
6
6
8
4
Taking advantage of numpy’s broadcasting facilities.
10

The ix_() function: Special indexing field
20 / 57
 Basic slicing is constructed by start:stop:step notation inside of brackets.
 The numpy.ix_ function generates indexes for irregular slices
>>> a[np.ix_([1,3,4],[2,5])]
MATLAB NumPy Description
a( [2,4,5],[3,6] ) a[ ix_( [1,3,4],[2,5] ) ] rows 2,4 and 5 and columns 3 and 6.
6x7
0 1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31 32 33 34
35 36 37 38 39 40 41
9 12
23 26
30 33
3x2
>>> a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[1:7:2]
array([1, 3, 5]) for example, a[1:3, : ], a[2,3], a[5:]
Picking out rows and columns!

Reduce Operation
21 / 57
 Reduces a‘s dimension by one, by applying arithmetic functions
SUMMING UP EACH ROW
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> np.add.reduce(a, 1)
array([ 6, 22, 38])
1 2 3
6 7
5
9 11
10
3x4
0
4
8
6
38
22
→
→
→

Simple Array Operations
23 / 57
>>> A = np.array([[3, 4], [2, 3]])
>>> print(A)
[[ 3 4]
[ 2 3]]
>>> A.transpose() # the same as matlab
array([[ 3, 2],
[ 4, 3]])
>>> np.linalg.inv(a) # Compute the (multiplicative) inverse of a matrix.
array([[3, -4],
[-2, 3]])
 All functions we know by now operate element-wise on arrays. For linear
algebra we need scalar, matrix-vector and matrix-matrix products.

Simple Array Operations: INVERSE 3x3 ARRAY
24 / 57
>>> np.linalg.det(a)
0.0
>>> np.linalg.inv(a)
ERROR
IDENTITY MATRIX
0 1 2
4 5
3
6 8
7
3x3
AA–1 = A–1A = In
detA = a11a22a33 + a21a32a13 + a31a12a23 -
a11a32a23 - a31a22a13 - a21a12a33
INVERSE OF 3x3 MATRIX
PYTHON TEST
DETERMINANT OF a ARRAY
det(a) = 0*4*8 + 3*7*2 + 6*1*5 –
0*7*5 – 6*4*2 – 3*1*8 = 0
a–1 is undefined!
If the determinant of A is zero, then A inverse does
not exist.

Simple Array Operations
25 / 57
>>> u = np.eye(2) # unit 2x2 matrix; "eye" represents "I"
>>> u
array([[ 1., 0.],
[ 0., 1.]])
>>> a = np.array([[1, 2, 3],
[4, 5, 6]])
>>> b = np.array([[7, 8],
[9, 10],
[11, 12]])
>>> np.dot(a,b)
array([[ 58, 64],
[139, 154]])
EYE ARRAY (IDENTITY MATRIX)
DOT PRODUCT

“Automatic” Reshaping
27 / 57
>>> a.shape = 2,-1,3 # -1 means "whatever is needed"
>>> a.shape
(2, 5, 3)
>>> a
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]],
[[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29]]])
 To change the dimensions of an array, you can omit one of the
sizes which will then be deduced automatically:
2 x A x 3 → 2*A*3 = 30 → A = 5
CONSTRUCT 3D ARRAY

Vector Stacking
28 / 57
>>> x = np.arange(0,10,2) # x=([0,2,4,6,8])
>>> y = np.arange(5) # y=([0,1,2,3,4])
>>> m = np.vstack([x,y]) # m=([[0,2,4,6,8],
# [0,1,2,3,4]])
>>> xy = np.hstack([x,y]) # xy =([0,2,4,6,8,0,1,2,3,4])
 How do we construct a 2D array from a list of equally-sized
row vectors?
 In MATLAB this is quite easy: if x and y are two vectors of the same length
you only need do m=[x;y].
 In NumPy this works via the functions column_stack, dstack,
hstack and vstack. For example:
4 6 8
2
0 2 3 4
1
0
5 6 7
10 11
9
4
8
→

Numpy Histogram vs Hist matplotlib
29 / 57
>>> import numpy as np
>>> import matplotlib.pyplot as plt
# Build a vector of 10000 normal deviates with variance 0.5^2
and mean 2
>>> mu, sigma = 2, 0.5
>>> v = np.random.normal(mu,sigma,10000)
# Plot a normalized histogram with 50 bins
>>> plt.hist(v, bins=50, normed=1)
>>> plt.show()
Normed=1 means that the
sum of the histograms is
normalized to 1.
• Hist ( Matplotlib) plots the histogram automatically.
PLOT A HISTOGRAM WITH MATPLOTLIB HIST
Numpy.random.normal(mean, std, size)
draws random samples from a normal
(Gaussian) distribution.

30 / 57
>>> a = np.random.normal(mu, sigma, 10)
>>> (v, bins) = np.histogram(a, bins=5, normed=True)
>>> bins
array([ 1.51704794, 1.80359328, 2.09013862, 2.37668396,
2.6632293, 2.94977464])
>>> v
array([ 1.39593965, 0.34898491, 1.04695474, 0.34898491,
0.34898491])
 numpy.histogram(a, bins, normed=True)
Length(bins) = Length(v) + 1
bins
v

31 / 57
>>> mu, sigma = 2, 0.5
>>> a = np.random.normal(mu,sigma,10000)
>>> (v, bins) = np.histogram(a, bins=5, normed=True)
>>> plt.plot(bins[1:], v)
 Beware: matplotlib also has a function to build histograms (called hist, as in Matlab) that differs
from the one in NumPy.
 The main difference is that hist plots the histogram automatically, while numpy.histogram
only generates the data
bins[1:]
v
bins
v

The ix_() function: Special indexing field
33 / 57
>>> ax, bx = np.ix_([1,3,4],[2,5])]
(array([[1],[3],[4]]), array([[2, 5]]))
>>> ax.shape, bx.shape
(3, 1), (1, 2)
numpy.ix_(*args)
2
0 1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31 32 33 34
35 36 37 38 39 40 41
 The way it works is by taking advantage of numpy’s broadcasting facilities.
 You can see that the two arrays used as row and column indices have different shapes; numpy’s
broadcasting repeats each along the too-short axis so that they conform.
5
1
3
4
8
1
9
2 3 4 5 6
7
0
10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28
35
7
0
14
28
35
21
1 2 3 4 5 6
0
→

Backup: numpy.linspace
34 / 57
Parameters:
start : scalar
The starting value of the sequence.
stop : scalar
The end value of the sequence
num : int, optional
Number of samples to generate
 numpy.linspace(start, stop, num=50, endpoint=True, retstep=False,
dtype=None)
 Return evenly spaced numbers over a specified interval.
 Returns num evenly spaced samples, calculated over the
interval [start, stop].

14078956.ppt

Recommended

Recommended

More Related Content

Similar to 14078956.ppt

Similar to 14078956.ppt (20)

More from Sivam Chinna

More from Sivam Chinna (6)

Recently uploaded

Recently uploaded (20)

14078956.ppt

Editor's Notes