Numpy Talk at SIAM

4,640 views

Published on

This is similar to the NYC talk on NumPy, but the ending slides are different.

Published in: Technology
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,640
On SlideShare
0
From Embeds
0
Number of Embeds
54
Actions
Shares
0
Downloads
0
Comments
0
Likes
15
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • [toc]\nlevel = 2\ntitle = Brownian Motion\n# end config\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • [toc]\nlevel = 2\ntitle = Array Data Structure\n# end config\n
  • \n
  • Good\n1. More efficient because it doesn’t force array copies for every operation.\n2. It is often nice to rename the view of an array for manipulation. (A view of the odd and even arrays)\nBad\n1. Can cause unexpected side-effects that are hard to track down.\n2. When you would rather have a copy, it requires some ugliness.\n
  • \n
  • \n
  • [toc]\nlevel = 2\ntitle = Memory Mapped Arrays\n# end config\n
  • \n
  • \n
  • [toc]\nlevel = 2\ntitle = Array Operators\n# end config\n
  • \n
  • [toc]\nlevel = 2\ntitle = Array Broadcasting\n# end config\n
  • \n
  • \n
  • [toc]\nlevel = 2\ntitle = Universal Function Methods\n# end config\n
  • \n
  • [toc]\nlevel = 2\ntitle = Array Calculation Methods\n# end config\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Numpy Talk at SIAM

    1. 1. SciPy and NumPy Travis Oliphant SIAM 2011 Mar 2, 2011
    2. 2. Magnetic Resonance Elastography 1997 Richard Ehman Armando Manduca Raja Muthupillai 2ρ0 (2πf ) Ui (a, f ) = [Cijkl (a, f ) Uk,l (a, f )],j
    3. 3. Finding Derivatives of 5-d data U X (a, f ) UY (a, f ) UZ (a, f )
    4. 4. Finding Derivatives of 5-d data Ξ= ×U ΞX (a, f ) ΞY (a, f ) ΞZ (a, f )
    5. 5. NumPy Array A NumPy array is an N-dimensional homogeneous collection of “items” of the same kind. The kind can be any arbitrary structure of bytes and is specified using the data-type.
    6. 6. SciPy [ Scientific Algorithms ] linalg stats interpolate cluster special maxentropy io fftpack odr ndimage sparse integrate signal optimize weaveNumPy [ Data Structure Core ] fft random linalg NDArray UFunc multi-dimensional fast array array object math operations Python
    7. 7. SciPy is acommunityProjectA great way toget involved withPython
    8. 8. Community effort • Chuck Harris many, many others --- forgive me! • Pauli Virtanen • Mark Wiebe • David Cournapeau • Stefan van der Walt • Jarrod Millman • Josef Perktold • Anne Archibald • Dag Sverre Seljebotn • Robert Kern • Matthew Brett • Warren Weckesser • Ralf Gommers • Joe Harrington --- Documentation effort • Andrew Straw --- www.scipy.org
    9. 9. Optimization: Data FittingNONLINEAR LEAST SQUARES CURVE FITTING>>> from scipy.optimize import curve_fit>>> from scipy.stats import norm# Define the function to fit.>>> def function(x, a , b, f, phi):... result = a * exp(-b * sin(f * x + phi))... return result# Create a noisy data set.>>> actual_params = [3, 2, 1, pi/4]>>> x = linspace(0,2*pi,25)>>> exact = function(x, *actual_params)>>> noisy = exact + 0.3*norm.rvs(size=len(x))# Use curve_fit to estimate the function parameters from the noisy data.>>> initial_guess = [1,1,1,1]>>> estimated_params, err_est = curve_fit(function, x, noisy, p0=initial_guess)>>> estimated_paramsarray([3.1705, 1.9501, 1.0206, 0.7034])# err_est is an estimate of the covariance matrix of the estimates# (i.e. how good of a fit is it)
    10. 10. 2D RBF InterpolationEXAMPLE>>> from scipy.interpolate import ... Rbf>>> from numpy import hypot, mgrid>>> from scipy.special import j0>>> x, y = mgrid[-5:6,-5:6]>>> z = j0(hypot(x,y))>>> newfunc = Rbf(x, y, z)>>> xx, yy = mgrid[-5:5:100j, -5:5:100j]# xx and yy are both 2-d# result is evaluated# element-by-element>>> zz = newfunc(xx, yy)>>> from enthought.mayavi import mlab>>> mlab.surf(x, y, z*5)>>> mlab.figure()>>> mlab.surf(xx, yy, zz*5)>>> mlab.points3d(x,y,z*5, scale_factor=0.5)
    11. 11. Brownian MotionBrownian motion (Wiener process): 2X (t + dt) = X (t) + N 0, σ dt, t, t + dtwhere N (a, b, t1 , t2 ) is normal with mean aand variance b, and independent ondisjoint time intervals.>>> from scipy.stats import norm>>> x0 = 100.0>>> dt = 0.5>>> sigma = 1.5>>> n = 100>>> steps = norm.rvs(size=n, scale=sigma*sqrt(dt))>>> steps[0] = x0 # Make i.c. work>>> x = steps.cumsum()>>> t = linspace(0, (n-1)*dt, n)>>> plot(t, x)
    12. 12. Image Processing # Edge detection using Sobel filter >>> from scipy.ndimage.filters import sobel >>> imshow(lena) >>> edges = sobel(lena) >>> imshow(edges)LENA IMAGE FILTERED IMAGE
    13. 13. NumPy : so what? (speed and expressiveness) • Data: the array object – slicing – shapes and strides – data-type generality • Fast Math: – vectorization – broadcasting – aggregations
    14. 14. Array SlicingSLICING WORKS MUCH LIKESTANDARD PYTHON SLICING>>> a[0,3:5]array([3, 4])>>> a[4:,4:]array([[44, 45], [54, 55]])>>> a[:,2]array([2,12,22,32,42,52])STRIDES ARE ALSO POSSIBLE>>> a[2::2,::2]array([[20, 22, 24], [40, 42, 44]]) 14
    15. 15. Fancy Indexing in 2-D>>> a[(0,1,2,3,4),(1,2,3,4,5)]array([ 1, 12, 23, 34, 45])>>> a[3:,[0, 2, 5]]array([[30, 32, 35], [40, 42, 45], [50, 52, 55]])>>> mask = array([1,0,1,0,0,1], dtype=bool)>>> a[mask,2]array([2,22,52]) Unlike slicing, fancy indexing creates copies instead of a view into original array.
    16. 16. Fancy Indexing with IndicesINDEXING BY POSITION# create an Nx3 colormap# grayscale map -- R G B>>> cmap = array([[1.0,1.0,1.0], [0.9,0.9,0.9], ... [0.0,0.0,0.0]])>>> cmap.shape(10,3)>>> img = array([[0,10], [5,1 ]])>>> img.shape(2,2)# use the image as an index into# the colormap>>> rgb_img = cmap[img]>>> rgb_img.shape(2,2,3) 16
    17. 17. Array Data Structure
    18. 18. NumPy dtypes Basic Type Available NumPy types Comments Boolean bool Elements are 1 byte in size. int8, int16, int32, int64, int defaults to the size of int in C for the Integer int128, int platform. Unsigned uint8, uint16, uint32, uint64, uint defaults to the size of unsigned int in Integer uint128, uint C for the platform. float is always a double precision floating float32, float64, float, point value (64 bits). longfloat represents Float longfloat, large precision floats. Its size is platform dependent. The real and complex elements of a complex64, complex128, complex64 are each represented by a single Complex complex, longcomplex precision (32 bit) value for a total size of 64 bits. Strings str, unicode Object Object Represent items in array as Python objects. Records Void Used for arbitrary data structures.
    19. 19. “Structured” ArraysElements of an array can be EXAMPLEany fixed-size data structure! >>> from numpy import dtype, empty # structured data format name char[10] >>> fmt = dtype([(name, S10), age int (age, int), weight double (weight, float) ]) >>> a = empty((3,4), dtype=fmt) Brad Jane John Fred >>> a.itemsize 22 33 25 47 54 >>> a[name] = [[Brad, … ,Jill]] 135.0 105.0 225.0 140.0 >>> a[age] = [[33, … , 54]] Henry George Brian Amy >>> a[weight] = [[135, … , 145]] >>> print a 29 61 32 27 [[(Brad, 33, 135.0) 154.0 202.0 137.0 187.0 … Ron Susan Jennifer Jill (Jill, 54, 145.0)]] 19 33 18 54 188.0 135.0 88.0 145.0
    20. 20. Nested Datatype
    21. 21. Nested Datatypedt = dtype([(time, uint64), (size, uint32), (position, [(az, float32), (el, float32), (region_type, uint8), (region_ID, uint16)]), (gain, np.uint8), (samples, np.int16, 2048)])data = np.fromfile(f, dtype=dt)
    22. 22. Memory Mapped Arrays • Methods for Creating: –memmap: subclass of ndarray that manages the memory mapping details. –frombuffer: Create an array from a memory mapped buffer object. – ndarray constructor: Use the buffer keyword to pass in a memory mapped buffer. • Limitations: –Files must be < 2GB on Python 2.4 and before. –Files must be < 2GB on 32-bit machines. –Python 2.5 on 64 bit machines is theoretically "limited" to 17.2 billion GB (17 Exabytes).
    23. 23. Memmap Timings (3D arrays) Operations Linux OS X (500x500x1000) In Memory In Memory Memory Mapped Memory Mapped read 2103 ms 11 ms 3505 ms 27 ms x slice 1.8 ms 4.8 ms 1.8 ms 8.3 ms y slice 2.8 ms 4.6 ms 4.4 ms 7.4 ms z slice 9.2 ms 13.8 ms 10 ms 18.7 msLinux: Ubuntu 4.1, Dell Precision 690,Dual Quad Core Zeon X5355 2.6 GHz, 8GB Memory downsampleOS X: OS X 10.5, MacBook Pro Laptop, 0.02 ms 125 ms 0.02 ms 198.7 ms2.6 GHz Core Duo, 4 GB Memory 4x4 All times in milliseconds (ms).
    24. 24. Structured Arrays char[12] int64 float32 Elements of array can be any fixed-size data structure! Name Time Value MSFT_profit 10 6.20 Example GOOG_profit 12 -1.08 >>> import numpy as np >>> fmt = np.dtype([(name, S12), MSFT_profit 18 8.40 (time, np.int64), (value, np.float32)]) >>> vals = [(MSFT_profit, 10, 6.20), INTC_profit 25 -0.20 (GOOG_profit, 12, -1.08), … (INTC_profit, 1000385, -1.05) (MSFT_profit, 1000390, 5.60)] GOOG_profit 1000325 3.20 >>> arr = np.array(vals, dtype=fmt) GOOG_profit 1000350 4.50 # or >>> arr = np.fromfile(db.dat, dtype=fmt) INTC_profit 1000385 -1.05 # or >>> arr = np.memmap(db.dat, dtype=fmt, mode=c) MSFT_profit 1000390 5.60MSFT_profit 10 6.20 GOOG_profit 12 -1.08 INTC_profit 1000385 -1.05 MSFT_profit 1000390 5.60
    25. 25. Mathematical Binary Operatorsa + b  add(a,b) a * b  multiply(a,b)a - b  subtract(a,b) a / b  divide(a,b)a % b  remainder(a,b) a ** b  power(a,b)MULTIPLY BY A SCALAR ADDITION USING AN OPERATOR FUNCTION>>> a = array((1,2))>>> a*3. >>> add(a,b)array([3., 6.]) array([4, 6])ELEMENT BY ELEMENT ADDITION IN-PLACE OPERATION # Overwrite contents of a.>>> a = array([1,2]) # Saves array creation>>> b = array([3,4]) # overhead.>>> a + b >>> add(a,b,a) # a += barray([4, 6]) array([4, 6]) >>> a array([4, 6])
    26. 26. Comparison and Logical Operatorsequal (==) not_equal (!=) greater (>)greater_equal (>=) less (<) less_equal (<=)logical_and logical_or logical_xorlogical_not2-D EXAMPLE Be careful with if statements>>> a = array(((1,2,3,4),(2,3,4,5))) involving numpy arrays. To>>> b = array(((1,2,5,4),(1,3,4,5))) test for equality of arrays,>>> a == b dont do:array([[True, True, False, True], if a == b: [False, True, True, True]])# functional equivalent Rather, do:>>> equal(a,b) if all(a==b):array([[True, True, False, True], [False, True, True, True]]) For floating point, if allclose(a,b): is even better.
    27. 27. Array Broadcasting 4x3 4x3 4x3 3 stretch 4x1 3 stretch stretch
    28. 28. Broadcasting Rules 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"ValueError: shape mismatch: objects cannot bebroadcast to a single shape" 
 
 ! 4x3 4
    29. 29. Broadcasting in Action >>> a = array((0,10,20,30)) >>> b = array((0,1,2)) >>> y = a[:, newaxis] + b
    30. 30. Universal Function Methods 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
op.reduce(a,axis=0)op.accumulate(a,axis=0)op.outer(a,b)op.reduceat(a,indices)
    31. 31. op.reduce()For multidimensional arrays, op.reduce(a,axis)applies op to the elements of aalong the specified axis. The resulting array has dimensionality one less than a. Thedefault value for axis is 0.SUM COLUMNS BY DEFAULT SUMMING UP EACH ROW>>> add.reduce(a) >>> add.reduce(a,1)array([60, 64, 68]) array([ 3, 33, 63, 93])
    32. 32. Array Calculation MethodsSUM FUNCTION SUM ARRAY METHOD>>> a = array([[1,2,3], # a.sum() defaults to adding [4,5,6]], float) # up all values in an array. >>> a.sum()# sum() defaults to adding up 21.# all the values in an array.>>> sum(a) # supply an axis argument to21. # sum along a specific axis >>> a.sum(axis=0)# supply the keyword axis to array([5., 7., 9.])# sum along the 0th axis>>> sum(a, axis=0)array([5., 7., 9.]) PRODUCT # product along columns# supply the keyword axis to >>> a.prod(axis=0)# sum along the last axis array([ 4., 10., 18.])>>> sum(a, axis=-1)array([6., 15.]) # functional form >>> prod(a, axis=0) array([ 4., 10., 18.])
    33. 33. Min/MaxMIN MAX>>> a = array([2.,3.,0.,1.]) >>> a = array([2.,3.,0.,1.])>>> a.min(axis=0) >>> a.max(axis=0)0. 3.# Use NumPys amin() instead# of Pythons built-in min()# for speedy operations on# multi-dimensional arrays. # functional form>>> amin(a, axis=0) >>> amax(a, axis=0)0. 3.ARGMIN ARGMAX# Find index of minimum value. # Find index of maximum value.>>> a.argmin(axis=0) >>> a.argmax(axis=0)2 1# functional form # functional form>>> argmin(a, axis=0) >>> argmax(a, axis=0)2 1
    34. 34. Statistics Array MethodsMEAN STANDARD DEV./VARIANCE>>> a = array([[1,2,3], # Standard Deviation [4,5,6]], float) >>> a.std(axis=0) array([ 1.5, 1.5, 1.5])# mean value of each column>>> a.mean(axis=0) # variancearray([ 2.5, 3.5, 4.5]) >>> a.var(axis=0)>>> mean(a, axis=0) array([2.25, 2.25, 2.25])array([ 2.5, 3.5, 4.5]) >>> var(a, axis=0)>>> average(a, axis=0) array([2.25, 2.25, 2.25])array([ 2.5, 3.5, 4.5])# average can also calculate# a weighted average>>> average(a, weights=[1,2],... axis=0)array([ 3., 4., 5.])
    35. 35. Zen of NumPy (version 0.1) • strided is better than scattered • contiguous is better than strided • descriptive is better than imperative (e.g. data-types) • array-oriented is better than object-oriented • broadcasting is a great idea -- use where possible • vectorized is better than an explicit loop • unless it’s complicated --- then use Cython or numexpr • think in higher dimensions
    36. 36. Recent Developments in NumPy / SciPy • Community growth (github) • Addition of .NET (IronPython) support – NumPy as a core C-library – NumPy and SciPy using Cython to build all extension modules – better tests and bugs closed • Re-factoring of the ufunc-implementation as iterators (Mark Wiebe) – expose the pipeline that was only previously used in “worst-case” scenario. – first stages of calculation structure refactoring
    37. 37. NumPy next steps (1.6, 2.0 and beyond) • Calculation Frame-work – basic generic function mechanism needs to be extended to allow other objects to participate more seamlessly – test on distributed arrays, generated arrays, masked arrays, etc. – add better support for run-time code-generation – more optimized algorithms (e.g. add special-case versions of low-level loops) • Addition of an “indirect array” – could subsume sparse arrays, distributed arrays, etc. – add compressed arrays, lazy-evaluation arrays, generator arrays • Data-base integration – data-base integration: please give me a structured array from a fetchall command – SQL front end for NumPy table
    38. 38. NumPy next steps (1.6, 2.0, ...) • Catching up – Dropping support for <2.5 (in NumPy 2.0) – Addition of “with-statement” contexts (error handling, print-options, lazy- evaluation, etc.) – Finish datetime NEP – Finish reduce-by NEP – Add “geometry” information to NumPy (dimension and index labels). • Core library growth – Eliminate extra indirection where possible – Optimizations for small-arrays – Addition of additional front-ends: Jython, Ruby, Javascript
    39. 39. SciPy next steps • Roadmap generation (move to github) • Finish Cython migration • Module cleanup continues • Lots of work happening... – errorbars in polyfit – spectral algorithms in signal – improvements to ode and sparse – addition of pre-conditioners to sparse – ..., ...

    ×