Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Test Driven Development by Valerio Maggio 1457 views
- Scaffolding with JMock by Valerio Maggio 2518 views
- A Tree Kernel based approach for cl... by Valerio Maggio 525 views
- Web frameworks by Valerio Maggio 3639 views
- Junit in action by Valerio Maggio 4537 views
- Unit testing and scaffolding by Valerio Maggio 5786 views

1,334 views

Published on

Slides have been authored by me and by Dr. Enrico Franchi.

Scientific and Engineering Computing, Numpy NDArray implementation and some working case studies are reported.

License: CC Attribution-ShareAlike License

No Downloads

Total views

1,334

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

29

Comments

0

Likes

1

No embeds

No notes for slide

- 1. LOREM I P S U M NUMBER CRUNCHING IN PYTHONEnrico Franchi (efranchi@ce.unipr.it) & Valerio Maggio (valerio.maggio@unina.it)
- 2. DOLOR S I T OUTLINE • Scientific and Engineering Computing • Common FP pitfalls • Numpy NDArray (Memory and Indexing) • Case Studies
- 3. DOLOR S I T OUTLINE • Scientific and Engineering Computing • Common FP pitfalls • Numpy NDArray (Memory and Indexing) • Case Studies
- 4. DOLOR S I T OUTLINE • Scientific and Engineering Computing • Common FP pitfalls • Numpy NDArray (Memory and Indexing) • Case Studies
- 5. number-crunching: n. [common] Computations of a numerical nature, esp. those that make extensive use of floating-point numbers. This term is in widespread informal use outside hackerdom and even in mainstream slang, but has additional hackish connotations: namely, that the computations are mindless and involve massive use of brute force. This is not always evil, esp. if it involves ray tracing or fractals or some other use that makes pretty pictures, esp. if such pictures can be used as screen backgrounds. See also crunch.
- 6. number-crunching: n. [common] Computations of a numerical nature, esp. those that make extensive use of floating-point numbers. This term is in widespread informal use outside hackerdom and even in mainstream slang, but has additional hackish connotations: namely, that the computations are mindless and involve massive use of brute force. This is not always evil, esp. if it involves ray tracing or fractals or some other use that makes pretty pictures, esp. if such pictures can be used as screen backgrounds. See also crunch. We are not evil.
- 7. number-crunching: n. [common] Computations of a numerical nature, esp. those that make extensive use of floating-point numbers. This term is in widespread informal use outside hackerdom and even in mainstream slang, but has additional hackish connotations: namely, that the computations are mindless and involve massive use of brute force. This is not always evil, esp. if it involves ray tracing or fractals or some other use that makes pretty pictures, esp. if such pictures can be used as screen backgrounds. See also crunch. We are not evil. Just chaotic neutral.
- 8. AMET M E N T I T U M ALTERNATIVES • Matlab (IDE, numeric computations oriented, high quality algorithms, lots of packages, poor GP programming support, commercial) • Octave (Matlab clone) • R (stats oriented, poor general purpose programming support) • Fortran/C++ (very low level, very fast, more complex to use) • In general, these tools either are low level GP or high level DSLs
- 9. HIS EX, T E M P O R PYTHON • Numpy (low-level numerical computations) + Scipy (lots of additional packages) • IPython (wonderfull command line interpreter) + IPython Notebook (“Mathematica-like” interactive documents) • HDF5 (PyTables, H5Py), Databases • Specific libraries for machine learning, etc. • General Purpose Object Oriented Programming
- 10. TOOLSCU S E D
- 11. TOOLSCU S E D
- 12. TOOLSCU S E D
- 13. DENIQU E G U B E R G R E N Our Code Numpy Atlas/MKL Improvements Improvements Algorithms are fast because of highly optimized C/Fortran code 4 30 LOAD_GLOBAL 1 (dot) 33 LOAD_FAST 0 (a) 36 LOAD_FAST 1 (b) 39 CALL_FUNCTION 2 42 STORE_FAST 2 (c) NUMPY STACK c = a · b
- 14. ndarray ndarray Memory behavior shape, stride, ﬂags (i0, . . . , in 1) ! I Shape: (d0, …, dn-1) 4x3 An n-dimensional array references some (usually contiguous memory area) An n-dimensional array has property such as its shape or the data-type of the elements containes Is an object, so there is some behavior, e.g., the def. of __add__ and similar stuff N-dimensional arrays are homogeneous
- 15. (i0, . . . , in 1) ! I C-contiguousF-contiguous Shape: (d0, …, dn) IC = n 1X k=0 ik n 1Y j=k+1 dj IF = n 1X k=0 ik k 1Y j=0 dj Shape: (d0, …, dk ,…, dn-1) Shape: (d0, …, dk ,…, dn-1) IC = i0 · d0 + i14x3 IF = i0 + i1 · d1 ElementLayout inMemory
- 16. Stride C-contiguous F-contiguous sF (k) = k 1Y j=0 dj IF = nX k=0 ik · sF (k) sC(k) = n 1Y j=k+1 dj IC = n 1X k=0 ik · sC(k) Stride C-contiguousF-contiguous C-contiguous (s0 = d0, s1 = 1) (s0 = 1, s1 = d1) IC = n 1X k=0 ik n 1Y j=k+1 dj IF = n 1X k=0 ik k 1Y j=0 dj
- 17. ndarray Memory behavior shape, stride, ﬂags ndarray behavior shape, stride, ﬂags View View View View Views
- 18. C-contiguous ndarray behavior (1,4) Memory
- 19. C-contiguous ndarray behavior (1,4) Memory
- 20. ndarray Memory behavior shape, stride, ﬂags matrix Memory behavior shape, stride, ﬂags ndarray matrix
- 21. BasicIndexing
- 22. AdvancedIndexing Broadcasting!
- 23. AdvancedIndexing Broadcasting!
- 24. AdvancedIndexing Broadcasting!
- 25. AdvancedIndexing Broadcasting!
- 26. AdvancedIndexing2
- 27. AdvancedIndexing2
- 28. AdvancedIndexing2
- 29. AdvancedIndexing2
- 30. AdvancedIndexing2
- 31. AdvancedIndexing2
- 32. Vectorize! Don’t use explicit for loops unless you have to!
- 33. PART II: NUMBER CRUNCHING IN ACTION
- 34. PART II: NUMBER CRUNCHING IN ACTION
- 35. General Disclaimer: All the Maths appearing in the next slides is only intended to better introduce the considered case studies. Speakers are not responsible for any possible disease or “brain consumption” caused by too much formulas. So BEWARE; use this information at your own risk! It's intention is solely educational. We would strongly encourage you to use this information in cooperation with a medical or health professional. AwfulMaths
- 36. BEFORE STARTING What do you need to get started: • A handful Unix Command-line tool: • Linux / Mac OSX Users: Your’re done. • Windows Users: It should be the time to change your OS :-) • [I]Python (You say?!) • A DBMS: • Relational: e.g., SQLite3, PostgreSQL • No-SQL: e.g., MongoDB MINIM S C R I P T O R E M
- 37. LOREM I P S U M BENCHMARKING
- 38. LOREM I P S U M • Vectorization (NumPy vs. “pure” Python • Loops and Math functions (i.e., sin(x)) • Matrix-Vector Product • Different implementations of Matrix-Vector Product CASE STUDIES ON NUMERICAL EFFICIENCY
- 39. HwInfo
- 40. Vectorization:sin(x)
- 41. Vectorization:sin(x)
- 42. Vectorization:sin(x)
- 43. Vectorization:sin(x)
- 44. Vectorization:sin(x)
- 45. Vectorization:sin(x)
- 46. NumPy, Winssin(x):Results
- 47. NumPy, Wins fatality sin(x):Results
- 48. NumPy, Wins fatality sin(x):Results
- 49. NumPy, Wins fatality sin(x):Results
- 50. Matrix-VectorProduct
- 51. dot
- 52. dot
- 53. dot
- 54. dot
- 55. dot
- 56. dot
- 57. NumPy, Winsdot:Results
- 58. NumPy, Wins fatality dot:Results
- 59. LOREM I P S U M NUMBER CRUNCHING APPLICATIONS
- 60. MACHINE LEARNING • Machine Learing = Learning by Machine(s) • Algorithms and Techniques to gain insights from data or a dataset • Supervised or Unsupervised Learning • Machine Learning is actively being used today, perhaps in many more places than you’d expected • Mail Spam Filtering • Search Engine Results Ranking • Preference Selection • e.g., Amazon “Customers Who Bought This Item Also Bought” NAM IN, S E A N O
- 61. LOREM I P S U M CLUSTERING: BRIEF INTRODUCTION • Clustering is a type of unsupervised learning that automatically forms clusters (groups) of similar things. It’s like automatic classification. You can cluster almost anything, and the more similar the items are in the cluster, the better your clusters are. • k-means is an algorithm that will find k clusters for a given dataset. • The number of clusters k is user defined. • Each cluster is described by a single point known as the centroid. • Centroid means it’s at the center of all the points in the cluster.
- 62. from scipy.cluster.vq import kmeans, vq K-means
- 63. from scipy.cluster.vq import kmeans, vq K-means
- 64. from scipy.cluster.vq import kmeans, vq K-means
- 65. from scipy.cluster.vq import kmeans, vq K-means
- 66. from scipy.cluster.vq import kmeans, vq K-means
- 67. K-meansplotfrom scipy.cluster.vq import kmeans, vq
- 68. K-meansplotfrom scipy.cluster.vq import kmeans, vq
- 69. LOREM I P S U M EXAMPLE: CLUSTERING POINTS ON A MAP Here’s the situation: your friend <NAME> wants you to take him out in the greater Portland, Oregon, area (US) for his birthday. A number of other friends are going to come also, so you need to provide a plan that everyone can follow. Your friend has given you a list of places he wants to go. This list is long; it has 70 establishments in it.
- 70. YahooAPI:geoGrab
- 71. s s f f Latitude and Longitude Coordinates of two points (s and f) Corresponding differences ˆ = arccos(sin s sin f + cos s cos f cos ) Spherical Distance Measure SphericalDistanceMeasure
- 72. kmeanswithdistLSC
- 73. • Problem: Given an input matrix A, calculate if possible, its inverse matrix. • Definition: In linear algebra, a n-by-n (square) matrix A is invertible (a.k.a. is nonsingular or nondegenerate) if there exists a n-by-n matrix B (A-1) such that: AB = BA = In TRIVIAL EXAMPLE:INVERSE MATRIX
- 74. ✓ Eigen Decomposition: • If A is nonsingular, i.e., it can be eigendecomposed and none of its eigenvalue is equal to zero ✓ Cholesky Decomposition: • If A is positive definite, where is the Conjugate transpose matrix of L (i.e., L is a lower triangular matrix) ✓ LU Factorization: (with L and U Lower (Upper) Triangular Matrix) ✓ Analytic Solution: (writing the Matrix of Cofactors), a.k.a. Cramer Method A 1 = Q⇤Q 1 A 1 = (L⇤ ) 1 L 1 A 1 = 1 det(A) (CT )i,j = 1 det(A) (Cji) = 1 det(A) 0 B B B @ C1,1 C1,2 · · · C1,n C2,1 C2,2 · · · C2,n ... ... ... ... Cm,1 Cm,2 · · · Cm,n 1 C C C A L⇤ A = LU Solution(s)
- 75. C = 0 @ C1,1 C1,2 C1,3 C2,1 C2,2 C2,3 C3,1 C3,2 C3,3 1 A Example
- 76. C = 0 @ C1,1 C1,2 C1,3 C2,1 C2,2 C2,3 C3,1 C3,2 C3,3 1 A Example C 1 = 1 det(C) ⇤ ⇤ 0 @ (C2,2C3,3 C2,3C3,2) (C1,3C3,2 C1,2C3,3) (C1,2C2,3 C1,3C2,2) (C2,3C3,1 C2,1C3,3) (C1,1C3,3 C1,3C3,1) (C1,3C2,1 C1,1C2,3) (C2,1C3,2 C2,2C3,1) (C3,1C1,2 C1,1C3,2) (C1,1C2,2 C1,2C2,1) 1 A
- 77. C = 0 @ C1,1 C1,2 C1,3 C2,1 C2,2 C2,3 C3,1 C3,2 C3,3 1 A Example det(C) = C1,1(C2,2C3,3 C2,3C3,2) +C1,2(C1,3C3,2 C1,2C3,3) +C1,3(C1,2C2,3 C1,3C2,2) C 1 = 1 det(C) ⇤ ⇤ 0 @ (C2,2C3,3 C2,3C3,2) (C1,3C3,2 C1,2C3,3) (C1,2C2,3 C1,3C2,2) (C2,3C3,1 C2,1C3,3) (C1,1C3,3 C1,3C3,1) (C1,3C2,1 C1,1C2,3) (C2,1C3,2 C2,2C3,1) (C3,1C1,2 C1,1C3,2) (C1,1C2,2 C1,2C2,1) 1 A
- 78. HomeMade
- 79. Duplicated Code HomeMade
- 80. Duplicated Code Template Method Pattern HomeMade
- 81. Duplicated Code Template Method Pattern However, we still have to implement from scratch computational functions!! Reinventing the wheel! HomeMade
- 82. Numpyfrom numpy import linalg
- 83. Type: function String Form:<function inv at 0x105f72b90> File: /Library/Python/2.7/site-packages/numpy/linalg/linalg.py Definition: linalg.inv(a) Source: def inv(a): """ Compute the (multiplicative) inverse of a matrix. [...] Parameters ---------- a : array_like, shape (M, M) Matrix to be inverted. Returns ------- ainv : ndarray or matrix, shape (M, M) (Multiplicative) inverse of the matrix `a`. Raises ------ LinAlgError If `a` is singular or not square. [...] """ a, wrap = _makearray(a) return wrap(solve(a, identity(a.shape[0], dtype=a.dtype))) Underthehood
- 84. • Alternative built-in solutions to the same problem: NumpyAlternatives
- 85. Thanks for your kind attention.
- 86. Vectorization:i+=2
- 87. Vectorization:i+=2
- 88. Vectorization:i+=2
- 89. Vectorization:i+=2
- 90. NumPy, Winsi+=2:Results
- 91. fatality NumPy, Winsi+=2:Results
- 92. Create k points for starting centroids (often randomly) While any point has changed cluster assignment for every point in dataset: for every centroid: d = distance(centroid,point) assign(point, nearest(cluster)) for each cluster: mean = average(cluster) centroid[cluster] = mean K-means

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment