0
LOREM
I P S U M
NUMBER
CRUNCHING
IN PYTHONEnrico Franchi (efranchi@ce.unipr.it) &
Valerio Maggio (valerio.maggio@unina.it)
DOLOR
S I T OUTLINE
• Scientific and Engineering Computing
• Common FP pitfalls
• Numpy NDArray (Memory and Indexing)
• Ca...
DOLOR
S I T OUTLINE
• Scientific and Engineering Computing
• Common FP pitfalls
• Numpy NDArray (Memory and Indexing)
• Ca...
DOLOR
S I T OUTLINE
• Scientific and Engineering Computing
• Common FP pitfalls
• Numpy NDArray (Memory and Indexing)
• Ca...
number-crunching: n. [common] Computations of a
numerical nature, esp. those that make extensive
use of floating-point num...
number-crunching: n. [common] Computations of a
numerical nature, esp. those that make extensive
use of floating-point num...
number-crunching: n. [common] Computations of a
numerical nature, esp. those that make extensive
use of floating-point num...
AMET
M E N T I
T U M ALTERNATIVES
• Matlab (IDE, numeric computations oriented, high quality algorithms,
lots of packages,...
HIS EX,
T E M P O
R PYTHON
• Numpy (low-level numerical computations) +
Scipy (lots of additional packages)
• IPython (won...
TOOLSCU
S E D
TOOLSCU
S E D
TOOLSCU
S E D
DENIQU
E
G U B E R
G R E N
Our Code
Numpy
Atlas/MKL
Improvements
Improvements
Algorithms are
fast because of
highly optimi...
ndarray
ndarray
Memory
behavior
shape, stride, flags
(i0, . . . , in 1) ! I
Shape: (d0, …, dn-1)
4x3
An n-dimensional array...
(i0, . . . , in 1) ! I
C-contiguousF-contiguous
Shape: (d0, …, dn)
IC =
n 1X
k=0
ik
n 1Y
j=k+1
dj
IF =
n 1X
k=0
ik
k 1Y
j=...
Stride
C-contiguous F-contiguous
sF (k) =
k 1Y
j=0
dj
IF =
nX
k=0
ik · sF (k)
sC(k) =
n 1Y
j=k+1
dj
IC =
n 1X
k=0
ik · sC(...
ndarray
Memory
behavior
shape, stride, flags
ndarray
behavior
shape, stride, flags
View View
View View
Views
C-contiguous
ndarray
behavior
(1,4)
Memory
C-contiguous
ndarray
behavior
(1,4)
Memory
ndarray
Memory
behavior
shape, stride, flags
matrix
Memory
behavior
shape, stride, flags
ndarray
matrix
BasicIndexing
AdvancedIndexing
Broadcasting!
AdvancedIndexing
Broadcasting!
AdvancedIndexing
Broadcasting!
AdvancedIndexing
Broadcasting!
AdvancedIndexing2
AdvancedIndexing2
AdvancedIndexing2
AdvancedIndexing2
AdvancedIndexing2
AdvancedIndexing2
Vectorize!
Don’t use explicit for loops unless you have to!
PART II:
NUMBER CRUNCHING
IN ACTION
PART II:
NUMBER CRUNCHING
IN ACTION
General Disclaimer:
All the Maths appearing in the next slides is only intended to better introduce the considered case st...
BEFORE STARTING
What do you need to get started:
• A handful Unix Command-line tool:
• Linux / Mac OSX Users: Your’re done...
LOREM
I P S U M
BENCHMARKING
LOREM
I P S U M
• Vectorization (NumPy vs. “pure” Python
• Loops and Math functions (i.e., sin(x))
• Matrix-Vector Product...
HwInfo
Vectorization:sin(x)
Vectorization:sin(x)
Vectorization:sin(x)
Vectorization:sin(x)
Vectorization:sin(x)
Vectorization:sin(x)
NumPy, Winssin(x):Results
NumPy, Wins
fatality
sin(x):Results
NumPy, Wins
fatality
sin(x):Results
NumPy, Wins
fatality
sin(x):Results
Matrix-VectorProduct
dot
dot
dot
dot
dot
dot
NumPy, Winsdot:Results
NumPy, Wins
fatality
dot:Results
LOREM
I P S U M
NUMBER CRUNCHING
APPLICATIONS
MACHINE LEARNING
• Machine Learing = Learning by Machine(s)
• Algorithms and Techniques to gain insights from data or a da...
LOREM
I P S U M
CLUSTERING:
BRIEF INTRODUCTION
• Clustering is a type of unsupervised learning that automatically forms
cl...
from scipy.cluster.vq import kmeans, vq
K-means
from scipy.cluster.vq import kmeans, vq
K-means
from scipy.cluster.vq import kmeans, vq
K-means
from scipy.cluster.vq import kmeans, vq
K-means
from scipy.cluster.vq import kmeans, vq
K-means
K-meansplotfrom scipy.cluster.vq import kmeans, vq
K-meansplotfrom scipy.cluster.vq import kmeans, vq
LOREM
I P S U M
EXAMPLE:
CLUSTERING POINTS
ON A MAP
Here’s the situation:
your friend <NAME> wants you to take him out in ...
YahooAPI:geoGrab
s s f f
Latitude and Longitude Coordinates of two
points (s and f)
Corresponding differences
ˆ = arccos(sin s sin f + cos ...
kmeanswithdistLSC
• Problem: Given an input matrix A,
calculate if possible, its inverse matrix.
• Definition:
In linear algebra, a n-by-n (...
✓ Eigen Decomposition:
• If A is nonsingular, i.e., it can be eigendecomposed and none of its
eigenvalue is equal to zero
...
C =
0
@
C1,1 C1,2 C1,3
C2,1 C2,2 C2,3
C3,1 C3,2 C3,3
1
A
Example
C =
0
@
C1,1 C1,2 C1,3
C2,1 C2,2 C2,3
C3,1 C3,2 C3,3
1
A
Example
C 1
=
1
det(C)
⇤
⇤
0
@
(C2,2C3,3 C2,3C3,2) (C1,3C3,2 C1,2...
C =
0
@
C1,1 C1,2 C1,3
C2,1 C2,2 C2,3
C3,1 C3,2 C3,3
1
A
Example det(C) = C1,1(C2,2C3,3 C2,3C3,2)
+C1,2(C1,3C3,2 C1,2C3,3)...
HomeMade
Duplicated Code
HomeMade
Duplicated Code
Template Method Pattern
HomeMade
Duplicated Code
Template Method Pattern
However, we still have to implement
from scratch computational functions!!
Reinven...
Numpyfrom numpy import linalg
Type: function
String Form:<function inv at 0x105f72b90>
File: /Library/Python/2.7/site-packages/numpy/linalg/linalg.py
De...
• Alternative built-in solutions to the same problem:
NumpyAlternatives
Thanks for your kind attention.
Vectorization:i+=2
Vectorization:i+=2
Vectorization:i+=2
Vectorization:i+=2
NumPy, Winsi+=2:Results
fatality
NumPy, Winsi+=2:Results
Create k points for starting centroids
(often randomly)
While any point has changed cluster assignment
for every point in ...
Number Crunching in Python
Number Crunching in Python
Upcoming SlideShare
Loading in...5
×

Number Crunching in Python

773

Published on

"Number Crunching in Python": slides presented at EuroPython 2012, Florence, Italy

Slides have been authored by me and by Dr. Enrico Franchi.

Scientific and Engineering Computing, Numpy NDArray implementation and some working case studies are reported.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
773
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Number Crunching in Python"

  1. 1. LOREM I P S U M NUMBER CRUNCHING IN PYTHONEnrico Franchi (efranchi@ce.unipr.it) & Valerio Maggio (valerio.maggio@unina.it)
  2. 2. DOLOR S I T OUTLINE • Scientific and Engineering Computing • Common FP pitfalls • Numpy NDArray (Memory and Indexing) • Case Studies
  3. 3. DOLOR S I T OUTLINE • Scientific and Engineering Computing • Common FP pitfalls • Numpy NDArray (Memory and Indexing) • Case Studies
  4. 4. DOLOR S I T OUTLINE • Scientific and Engineering Computing • Common FP pitfalls • Numpy NDArray (Memory and Indexing) • Case Studies
  5. 5. number-crunching: n. [common] Computations of a numerical nature, esp. those that make extensive use of floating-point numbers. This term is in widespread informal use outside hackerdom and even in mainstream slang, but has additional hackish connotations: namely, that the computations are mindless and involve massive use of brute force. This is not always evil, esp. if it involves ray tracing or fractals or some other use that makes pretty pictures, esp. if such pictures can be used as screen backgrounds. See also crunch.
  6. 6. number-crunching: n. [common] Computations of a numerical nature, esp. those that make extensive use of floating-point numbers. This term is in widespread informal use outside hackerdom and even in mainstream slang, but has additional hackish connotations: namely, that the computations are mindless and involve massive use of brute force. This is not always evil, esp. if it involves ray tracing or fractals or some other use that makes pretty pictures, esp. if such pictures can be used as screen backgrounds. See also crunch. We are not evil.
  7. 7. number-crunching: n. [common] Computations of a numerical nature, esp. those that make extensive use of floating-point numbers. This term is in widespread informal use outside hackerdom and even in mainstream slang, but has additional hackish connotations: namely, that the computations are mindless and involve massive use of brute force. This is not always evil, esp. if it involves ray tracing or fractals or some other use that makes pretty pictures, esp. if such pictures can be used as screen backgrounds. See also crunch. We are not evil. Just chaotic neutral.
  8. 8. AMET M E N T I T U M ALTERNATIVES • Matlab (IDE, numeric computations oriented, high quality algorithms, lots of packages, poor GP programming support, commercial) • Octave (Matlab clone) • R (stats oriented, poor general purpose programming support) • Fortran/C++ (very low level, very fast, more complex to use) • In general, these tools either are low level GP or high level DSLs
  9. 9. HIS EX, T E M P O R PYTHON • Numpy (low-level numerical computations) + Scipy (lots of additional packages) • IPython (wonderfull command line interpreter) + IPython Notebook (“Mathematica-like” interactive documents) • HDF5 (PyTables, H5Py), Databases • Specific libraries for machine learning, etc. • General Purpose Object Oriented Programming
  10. 10. TOOLSCU S E D
  11. 11. TOOLSCU S E D
  12. 12. TOOLSCU S E D
  13. 13. DENIQU E G U B E R G R E N Our Code Numpy Atlas/MKL Improvements Improvements Algorithms are fast because of highly optimized C/Fortran code 4 30 LOAD_GLOBAL 1 (dot) 33 LOAD_FAST 0 (a) 36 LOAD_FAST 1 (b) 39 CALL_FUNCTION 2 42 STORE_FAST 2 (c) NUMPY STACK c = a · b
  14. 14. ndarray ndarray Memory behavior shape, stride, flags (i0, . . . , in 1) ! I Shape: (d0, …, dn-1) 4x3 An n-dimensional array references some (usually contiguous memory area) An n-dimensional array has property such as its shape or the data-type of the elements containes Is an object, so there is some behavior, e.g., the def. of __add__ and similar stuff N-dimensional arrays are homogeneous
  15. 15. (i0, . . . , in 1) ! I C-contiguousF-contiguous Shape: (d0, …, dn) IC = n 1X k=0 ik n 1Y j=k+1 dj IF = n 1X k=0 ik k 1Y j=0 dj Shape: (d0, …, dk ,…, dn-1) Shape: (d0, …, dk ,…, dn-1) IC = i0 · d0 + i14x3 IF = i0 + i1 · d1 ElementLayout inMemory
  16. 16. Stride C-contiguous F-contiguous sF (k) = k 1Y j=0 dj IF = nX k=0 ik · sF (k) sC(k) = n 1Y j=k+1 dj IC = n 1X k=0 ik · sC(k) Stride C-contiguousF-contiguous C-contiguous (s0 = d0, s1 = 1) (s0 = 1, s1 = d1) IC = n 1X k=0 ik n 1Y j=k+1 dj IF = n 1X k=0 ik k 1Y j=0 dj
  17. 17. ndarray Memory behavior shape, stride, flags ndarray behavior shape, stride, flags View View View View Views
  18. 18. C-contiguous ndarray behavior (1,4) Memory
  19. 19. C-contiguous ndarray behavior (1,4) Memory
  20. 20. ndarray Memory behavior shape, stride, flags matrix Memory behavior shape, stride, flags ndarray matrix
  21. 21. BasicIndexing
  22. 22. AdvancedIndexing Broadcasting!
  23. 23. AdvancedIndexing Broadcasting!
  24. 24. AdvancedIndexing Broadcasting!
  25. 25. AdvancedIndexing Broadcasting!
  26. 26. AdvancedIndexing2
  27. 27. AdvancedIndexing2
  28. 28. AdvancedIndexing2
  29. 29. AdvancedIndexing2
  30. 30. AdvancedIndexing2
  31. 31. AdvancedIndexing2
  32. 32. Vectorize! Don’t use explicit for loops unless you have to!
  33. 33. PART II: NUMBER CRUNCHING IN ACTION
  34. 34. PART II: NUMBER CRUNCHING IN ACTION
  35. 35. General Disclaimer: All the Maths appearing in the next slides is only intended to better introduce the considered case studies. Speakers are not responsible for any possible disease or “brain consumption” caused by too much formulas. So BEWARE; use this information at your own risk! It's intention is solely educational. We would strongly encourage you to use this information in cooperation with a medical or health professional. AwfulMaths
  36. 36. BEFORE STARTING What do you need to get started: • A handful Unix Command-line tool: • Linux / Mac OSX Users: Your’re done. • Windows Users: It should be the time to change your OS :-) • [I]Python (You say?!) • A DBMS: • Relational: e.g., SQLite3, PostgreSQL • No-SQL: e.g., MongoDB MINIM S C R I P T O R E M
  37. 37. LOREM I P S U M BENCHMARKING
  38. 38. LOREM I P S U M • Vectorization (NumPy vs. “pure” Python • Loops and Math functions (i.e., sin(x)) • Matrix-Vector Product • Different implementations of Matrix-Vector Product CASE STUDIES ON NUMERICAL EFFICIENCY
  39. 39. HwInfo
  40. 40. Vectorization:sin(x)
  41. 41. Vectorization:sin(x)
  42. 42. Vectorization:sin(x)
  43. 43. Vectorization:sin(x)
  44. 44. Vectorization:sin(x)
  45. 45. Vectorization:sin(x)
  46. 46. NumPy, Winssin(x):Results
  47. 47. NumPy, Wins fatality sin(x):Results
  48. 48. NumPy, Wins fatality sin(x):Results
  49. 49. NumPy, Wins fatality sin(x):Results
  50. 50. Matrix-VectorProduct
  51. 51. dot
  52. 52. dot
  53. 53. dot
  54. 54. dot
  55. 55. dot
  56. 56. dot
  57. 57. NumPy, Winsdot:Results
  58. 58. NumPy, Wins fatality dot:Results
  59. 59. LOREM I P S U M NUMBER CRUNCHING APPLICATIONS
  60. 60. MACHINE LEARNING • Machine Learing = Learning by Machine(s) • Algorithms and Techniques to gain insights from data or a dataset • Supervised or Unsupervised Learning • Machine Learning is actively being used today, perhaps in many more places than you’d expected • Mail Spam Filtering • Search Engine Results Ranking • Preference Selection • e.g., Amazon “Customers Who Bought This Item Also Bought” NAM IN, S E A N O
  61. 61. LOREM I P S U M CLUSTERING: BRIEF INTRODUCTION • Clustering is a type of unsupervised learning that automatically forms clusters (groups) of similar things. It’s like automatic classification. You can cluster almost anything, and the more similar the items are in the cluster, the better your clusters are. • k-means is an algorithm that will find k clusters for a given dataset. • The number of clusters k is user defined. • Each cluster is described by a single point known as the centroid. • Centroid means it’s at the center of all the points in the cluster.
  62. 62. from scipy.cluster.vq import kmeans, vq K-means
  63. 63. from scipy.cluster.vq import kmeans, vq K-means
  64. 64. from scipy.cluster.vq import kmeans, vq K-means
  65. 65. from scipy.cluster.vq import kmeans, vq K-means
  66. 66. from scipy.cluster.vq import kmeans, vq K-means
  67. 67. K-meansplotfrom scipy.cluster.vq import kmeans, vq
  68. 68. K-meansplotfrom scipy.cluster.vq import kmeans, vq
  69. 69. LOREM I P S U M EXAMPLE: CLUSTERING POINTS ON A MAP Here’s the situation: your friend <NAME> wants you to take him out in the greater Portland, Oregon, area (US) for his birthday. A number of other friends are going to come also, so you need to provide a plan that everyone can follow. Your friend has given you a list of places he wants to go. This list is long; it has 70 establishments in it.
  70. 70. YahooAPI:geoGrab
  71. 71. s s f f Latitude and Longitude Coordinates of two points (s and f) Corresponding differences ˆ = arccos(sin s sin f + cos s cos f cos ) Spherical Distance Measure SphericalDistanceMeasure
  72. 72. kmeanswithdistLSC
  73. 73. • Problem: Given an input matrix A, calculate if possible, its inverse matrix. • Definition: In linear algebra, a n-by-n (square) matrix A is invertible (a.k.a. is nonsingular or nondegenerate) if there exists a n-by-n matrix B (A-1) such that: AB = BA = In TRIVIAL EXAMPLE:INVERSE MATRIX
  74. 74. ✓ Eigen Decomposition: • If A is nonsingular, i.e., it can be eigendecomposed and none of its eigenvalue is equal to zero ✓ Cholesky Decomposition: • If A is positive definite, where is the Conjugate transpose matrix of L (i.e., L is a lower triangular matrix) ✓ LU Factorization: (with L and U Lower (Upper) Triangular Matrix) ✓ Analytic Solution: (writing the Matrix of Cofactors), a.k.a. Cramer Method A 1 = Q⇤Q 1 A 1 = (L⇤ ) 1 L 1 A 1 = 1 det(A) (CT )i,j = 1 det(A) (Cji) = 1 det(A) 0 B B B @ C1,1 C1,2 · · · C1,n C2,1 C2,2 · · · C2,n ... ... ... ... Cm,1 Cm,2 · · · Cm,n 1 C C C A L⇤ A = LU Solution(s)
  75. 75. C = 0 @ C1,1 C1,2 C1,3 C2,1 C2,2 C2,3 C3,1 C3,2 C3,3 1 A Example
  76. 76. C = 0 @ C1,1 C1,2 C1,3 C2,1 C2,2 C2,3 C3,1 C3,2 C3,3 1 A Example C 1 = 1 det(C) ⇤ ⇤ 0 @ (C2,2C3,3 C2,3C3,2) (C1,3C3,2 C1,2C3,3) (C1,2C2,3 C1,3C2,2) (C2,3C3,1 C2,1C3,3) (C1,1C3,3 C1,3C3,1) (C1,3C2,1 C1,1C2,3) (C2,1C3,2 C2,2C3,1) (C3,1C1,2 C1,1C3,2) (C1,1C2,2 C1,2C2,1) 1 A
  77. 77. C = 0 @ C1,1 C1,2 C1,3 C2,1 C2,2 C2,3 C3,1 C3,2 C3,3 1 A Example det(C) = C1,1(C2,2C3,3 C2,3C3,2) +C1,2(C1,3C3,2 C1,2C3,3) +C1,3(C1,2C2,3 C1,3C2,2) C 1 = 1 det(C) ⇤ ⇤ 0 @ (C2,2C3,3 C2,3C3,2) (C1,3C3,2 C1,2C3,3) (C1,2C2,3 C1,3C2,2) (C2,3C3,1 C2,1C3,3) (C1,1C3,3 C1,3C3,1) (C1,3C2,1 C1,1C2,3) (C2,1C3,2 C2,2C3,1) (C3,1C1,2 C1,1C3,2) (C1,1C2,2 C1,2C2,1) 1 A
  78. 78. HomeMade
  79. 79. Duplicated Code HomeMade
  80. 80. Duplicated Code Template Method Pattern HomeMade
  81. 81. Duplicated Code Template Method Pattern However, we still have to implement from scratch computational functions!! Reinventing the wheel! HomeMade
  82. 82. Numpyfrom numpy import linalg
  83. 83. Type: function String Form:<function inv at 0x105f72b90> File: /Library/Python/2.7/site-packages/numpy/linalg/linalg.py Definition: linalg.inv(a) Source: def inv(a): """ Compute the (multiplicative) inverse of a matrix. [...] Parameters ---------- a : array_like, shape (M, M) Matrix to be inverted. Returns ------- ainv : ndarray or matrix, shape (M, M) (Multiplicative) inverse of the matrix `a`. Raises ------ LinAlgError If `a` is singular or not square. [...] """ a, wrap = _makearray(a) return wrap(solve(a, identity(a.shape[0], dtype=a.dtype))) Underthehood
  84. 84. • Alternative built-in solutions to the same problem: NumpyAlternatives
  85. 85. Thanks for your kind attention.
  86. 86. Vectorization:i+=2
  87. 87. Vectorization:i+=2
  88. 88. Vectorization:i+=2
  89. 89. Vectorization:i+=2
  90. 90. NumPy, Winsi+=2:Results
  91. 91. fatality NumPy, Winsi+=2:Results
  92. 92. Create k points for starting centroids (often randomly) While any point has changed cluster assignment for every point in dataset: for every centroid: d = distance(centroid,point) assign(point, nearest(cluster)) for each cluster: mean = average(cluster) centroid[cluster] = mean K-means
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×