Successfully reported this slideshow.
Python for Science and Engineering  Dr Edward Schofield A*STAR / Singapore Computational Sciences Club Seminar             ...
Scientific programming in 2011 Most scientists and engineers are:   programming for 50+% of their work time (and rising)   ...
Scientific programming needs Rapid prototyping Efficiency for computational kernels Pre-written packages!   Vectors, matrice...
Eds story:How I found Python PhD in statistical pattern recognition: 2001-2006 Needed good tools for my research! Discover...
1. Why Python?
Introducing Python What is it? What is it good for? Who uses it?
What is Python? interpreted strongly but dynamically typed object-oriented intuitive, readable open source, free ‘batterie...
‘batteries included’ Python’s standard library is:   very large   well-supported   well-documented
Python’s standard library data types     strings     networking     threads operating              compression      GUI   ...
What is an efficientprogramming language?Native Python codeexecutes 10x more slowlythan C and FORTRAN
Would you build a racing car ...... to get to Kuala Lumpur ASAP?
Date      Cost per GFLOPS (US $)             Technology  1961          US $1.1 trillion          17 million IBM 1620s  198...
Unit labor cost growthProxy for cost of programmer time
Efficiency When FORTRAN was invented, computer time was more expensive than programmer time. In the 1980s and 1990s that re...
Efficient programming Python code is 10x faster to write than C and FORTRAN
What if ...... you now need to reach Sydney?
Advantages of Python Easy to write Easy to maintain Great standard libraries Thriving ecosystem of third-party packages Op...
‘Batteries included’ Python’s standard library is:   very large   well supported   well documented
Python’s standard library data types     strings     networking     threads operating              compression      GUI   ...
QuestionWhat is the date 177 days from now?
Natural applications of Python Rapid prototyping Plotting, visualisation, 3D Numerical computing Web and database programm...
Python vs other languages
Languages used at CSIRO   Python   Fortran       Java   Matlab     C          VB.net    IDL      C++           R    Perl  ...
Which language do I choose? A different language for each task? A language you know? A language others in your team are us...
Python     Matlab       Interpreted             Yes       YesPowerful data input/output     Yes       Yes      Great plott...
Python     C++        Powerful              Yes       Yes        Portable              Yes     In theory    Standard libra...
Python     C           Fast to write                Yes       NoGood for embedded systems, device                         ...
Python    JavaPowerful, well-designed language    Yes      Yes       Standard libraries           Vast     Vast         Ea...
Open sourcePython is open source softwareBenefits:  No vendor lock-in  Cross-platform  Insurance against bugs in the platfo...
Python success stories Computer graphics:   Industrial Light & Magic Web:   Google: News, Groups, Maps, Gmail Legacy syste...
Python success stories (2) Aerospace:   NASA Research:   universities worldwide ... Others:   YouTube, Reddit, BitTorrent,...
Industrial Light & Magic Python spread from scripting to the entire production pipeline Numerous reviews since 1996: Pytho...
United Space Alliance A common sentiment: “We achieve immediate functioning code so much faster in Python than in any othe...
Case study: air-traffic control Eric Newton, “Python for Critical Applications”: http:// metaslash.com/brochure/ recall.htm...
Case study: air-traffic control Python prototype -> C++ implementation -> Python again Why?   C++ dependencies were buggy  ...
More case studies See http://www.python.org/about/success/ for lots more case studies and success stories
2. The scientific Python ecosystem
Scientific softwaredevelopment Small beginnings Piecemeal growth, quirky interfaces ... Large, cumbersome systems
NumPyAn n-dimensional array/matrix package
NumPyCentre of Python’s numerical computing ecosystem
NumPyThe most fundamental tool for numerical computing inPythonFast multi-dimensional array capability
What NumPy defines: Two fundamental objects: 1. n-dimensional array 2. universal function a rich set of numerical data type...
NumPys features Fast. Written in C with BLAS/LAPACK hooks. Rich set of data types Linear algebra: matrix inversion, decomp...
Elementwise array operations Loops are mostly unnecessary Operate on entire arrays!>>> a = numpy.array([20, 30, 40, 50])>>...
Universal functions NumPy defines ufuncs that operate on entire arrays and other sequences (hence universal) Example: sin()...
Array slicing Arrays can be sliced and indexed powerfully:>>> a = numpy.arange(10)**3>>> aarray([ 0,    1,    8, 27, 64, 1...
Fancy indexing Arrays can be used as indices into other arrays:>>> a = numpy.arange(12)**2>>> ind = numpy.array([ 1, 1, 3,...
Other linear algebra features Matrix inversion: mat(A).I Or: linalg.inv(A) Linear solvers: linalg.solve(A, x) Pseudoinvers...
What is SciPy? A community A conference A package of scientific libraries
Python for scientific software Back-end: computational work Front-end: input / output, visualization, GUIs Dozens of great ...
Python in science (2) NumPy: numerical / array module Matplotlib: great 2D and 3D plotting library IPython: nice interacti...
Python in science (3) Cython: C language extensions Mayavi: 3D graphics, volumetric rendering Nitimes, Nipype: Python tool...
Python in science (4) VPython: easy, real-time 3D programming UCSF Chimera, PyMOL, VMD: molecular graphics PyRAF: Hubble S...
The SciPy packageBSD-licensed software for maths, science,engineering  integration    signal processing    sparse matrices...
SciPy optimisation exampleFit a model to noisy data:y = a/xb sin(cx)+ε
Example: fitting a model withscipy.optimize Task: Fit a model of the form y = a/bx sin(cx)+ε to noisy data. Spec: 1. Genera...
SciPy optimisation exampleimport numpyimport pylabfrom scipy.optimize import leastsqdef myfunc(params, x):    (a, b, c) = ...
SciPy optimisation example#   Generate noisy data to fitn   = 30; xmin = 0.1; xmax = 5x   = numpy.linspace(xmin, xmax, n)y...
SciPy optimisation exampleFit a model to noisy data:y = a/xb sin(cx)+ε
Ingredients for this example numpy.linspace numpy.random.rand for the noise model (uniform) scipy.optimize.leastsq
Sparse matrix exampleConstruct and solve a sparse linear system
Sparse matricesSparse matrices are mostly zeros.They can be symmetric orasymmetric.Sparsity patterns vary:  block sparse, ...
Sparse matrices in SciPy SciPy supports seven sparse storage schemes ... and sparse solvers in Fortran.
Sparse matrix creation To construct a 1000x1000 lil_matrix and add values:>>> from scipy.sparse import lil_matrix>>> from ...
Solving sparse matrixsystems Now convert the matrix to CSR format and solve Ax=b:>>> A = A.tocsr()>>> b = rand(1000)>>> x ...
Matplotlib Great plotting package in Python Matlab-like syntax Great rendering: anti-aliasing etc. Many ‘backends’: Cairo,...
Matplotlib: worked examplesSearch the web for Matplotlib gallery
Example: NumPyvectorization1. Use a Monte Carlo algorithm to   estimate π:   1. Generate uniform random variates (x,%y) ov...
3. Scaling
HPCHigh-performance computing
Aspects to HPC   Supercomputers       Distributed clusters / grids Parallel programming            ScriptingCaches, shared...
Python for HPC       Advantages                 Disadvantages         Portability            Global interpreter lock    Ea...
Large data sets Useful Python language features:   Generators, iterators Useful packages:   Great HDF5 support from PyTabl...
Hierarchical dataDatabases without the relational baggage
Great interface for HDF5 dataEfficient support for massive data sets
Applications of PyTables     aeronautics       telecommunications   drug discovery          data mining  financial analysis...
Breaking news: June 2011PyTables Pro is now being open sourced.  Indexed searches for speedMerging with PyTablesWorking pr...
PyTables performanceOPSI indexing engine speed:  Querying 10 billion rows can take hundredths of a  second!Target use-case...
Principles for efficient code
Important principles1. "Premature optimization is the root of all evil"      Dont write cryptic code just to make it more ...
Checklist for efficient code From most to least important: 1. Check: Do you really need to make it more efficient? 2. Check:...
Relative efficiency gains Exponential-order and polynomial-order speedups are possible by choosing the right algorithm for ...
4. About Python Charmers
The largest Python training provider in South-East AsiaDelighted customers include:
Most popular course topics         Python for Programmers            3 days    Python for Scientists and Engineers    4 da...
Python Charmers:Topics of expertise Python: beginners, advanced Scientific data processing with Python Software engineering...
Python Charmers:Topics of expertise (2) Spatial data analysis / GIS General scripting, job control, glue GUIs with PyQt In...
How to get in touch See PythonCharmers.com or email us at: info@pythoncharmers.com
Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofie...
Upcoming SlideShare
Loading in …5
×

Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofield, Python Charmers, June 2011

8,030 views

Published on

An introduction to Python in science and engineering.

The presentation was given by Dr Edward Schofield of Python Charmers (www.pythoncharmers.com) to A*STAR and the Singapore Computational Sciences Club in June 2011.

Published in: Technology
  • Hi All, We are planning to start new devops online batch on this week... If any one interested to attend the demo please register in our website... For this batch we are also provide everyday recorded sessions with Materials. For more information feel free to contact us : siva@keylabstraining.com. For Course Content and Recorded Demo Click Here : http://www.keylabstraining.com/devops-online-training-tutorial
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofield, Python Charmers, June 2011

  1. 1. Python for Science and Engineering Dr Edward Schofield A*STAR / Singapore Computational Sciences Club Seminar June 14, 2011
  2. 2. Scientific programming in 2011 Most scientists and engineers are: programming for 50+% of their work time (and rising) self-taught programmers using inefficient programming practices using the wrong programming languages: C++, FORTRAN, C#, PHP, Java, ...
  3. 3. Scientific programming needs Rapid prototyping Efficiency for computational kernels Pre-written packages! Vectors, matrices, modelling, simulations, visualisation Extensibility; web front-ends; database backends; ...
  4. 4. Eds story:How I found Python PhD in statistical pattern recognition: 2001-2006 Needed good tools for my research! Discovered Python in 2002 after frustration with C++, Matlab, Java, Perl Contributed to NumPy and SciPy: maxent, sparse matrices, optimization, Monte Carlo, etc. Managed six releases of SciPy in 2005-6
  5. 5. 1. Why Python?
  6. 6. Introducing Python What is it? What is it good for? Who uses it?
  7. 7. What is Python? interpreted strongly but dynamically typed object-oriented intuitive, readable open source, free ‘batteries included’
  8. 8. ‘batteries included’ Python’s standard library is: very large well-supported well-documented
  9. 9. Python’s standard library data types strings networking threads operating compression GUI arguments system complex CGI FTP cryptography numbers testing multimedia databases CSV files calendar email XML serialization
  10. 10. What is an efficientprogramming language?Native Python codeexecutes 10x more slowlythan C and FORTRAN
  11. 11. Would you build a racing car ...... to get to Kuala Lumpur ASAP?
  12. 12. Date Cost per GFLOPS (US $) Technology 1961 US $1.1 trillion 17 million IBM 1620s 1984 US $15,000,000 Cray X-MP Two 16-CPU clusters of 1997 US $30,000 Pentiums2000, Apr $1000 Bunyip Beowulf cluster2003, Aug $82 KASY02007, Mar $0.42 Ambric AM20452009, Sep $0.13 ATI Radeon R800 Source: Wikipedia: “FLOPS”
  13. 13. Unit labor cost growthProxy for cost of programmer time
  14. 14. Efficiency When FORTRAN was invented, computer time was more expensive than programmer time. In the 1980s and 1990s that reversed.
  15. 15. Efficient programming Python code is 10x faster to write than C and FORTRAN
  16. 16. What if ...... you now need to reach Sydney?
  17. 17. Advantages of Python Easy to write Easy to maintain Great standard libraries Thriving ecosystem of third-party packages Open source
  18. 18. ‘Batteries included’ Python’s standard library is: very large well supported well documented
  19. 19. Python’s standard library data types strings networking threads operating compression GUI arguments system complex CGI FTP cryptography numbers testing multimedia databases CSV files calendar email XML serialization
  20. 20. QuestionWhat is the date 177 days from now?
  21. 21. Natural applications of Python Rapid prototyping Plotting, visualisation, 3D Numerical computing Web and database programming All-purpose glue
  22. 22. Python vs other languages
  23. 23. Languages used at CSIRO Python Fortran Java Matlab C VB.net IDL C++ R Perl C# +5-10 others!
  24. 24. Which language do I choose? A different language for each task? A language you know? A language others in your team are using: support and help?
  25. 25. Python Matlab Interpreted Yes YesPowerful data input/output Yes Yes Great plotting Yes YesGeneral-purpose language Powerful Limited Cost Free $$$ Open source Yes No
  26. 26. Python C++ Powerful Yes Yes Portable Yes In theory Standard libraries Vast LimitedEasy to write and maintain Yes No Easy to learn Yes No
  27. 27. Python C Fast to write Yes NoGood for embedded systems, device No Yes drivers and operating systemsGood for most other high-level tasks Yes No Standard library Vast Limited
  28. 28. Python JavaPowerful, well-designed language Yes Yes Standard libraries Vast Vast Easy to learn Yes No Code brevity Short Verbose Easy to write and maintain Yes Okay
  29. 29. Open sourcePython is open source softwareBenefits: No vendor lock-in Cross-platform Insurance against bugs in the platform Free
  30. 30. Python success stories Computer graphics: Industrial Light & Magic Web: Google: News, Groups, Maps, Gmail Legacy system integration: AstraZeneca - collaborative drug discovery
  31. 31. Python success stories (2) Aerospace: NASA Research: universities worldwide ... Others: YouTube, Reddit, BitTorrent, Civilization IV,
  32. 32. Industrial Light & Magic Python spread from scripting to the entire production pipeline Numerous reviews since 1996: Python is still the best tool for them
  33. 33. United Space Alliance A common sentiment: “We achieve immediate functioning code so much faster in Python than in any other language that it’s staggering.” - Robin Friedrich, Senior Project Engineer
  34. 34. Case study: air-traffic control Eric Newton, “Python for Critical Applications”: http:// metaslash.com/brochure/ recall.html Metaslash, Inc: 1999 to 2001 Mission-critical system for air-traffic control Replicated, fault-tolerant data storage
  35. 35. Case study: air-traffic control Python prototype -> C++ implementation -> Python again Why? C++ dependencies were buggy C++ threads, STL were not portable enough Python’s advantages over C++ More portable 75% less code: more productivity, fewer bugs
  36. 36. More case studies See http://www.python.org/about/success/ for lots more case studies and success stories
  37. 37. 2. The scientific Python ecosystem
  38. 38. Scientific softwaredevelopment Small beginnings Piecemeal growth, quirky interfaces ... Large, cumbersome systems
  39. 39. NumPyAn n-dimensional array/matrix package
  40. 40. NumPyCentre of Python’s numerical computing ecosystem
  41. 41. NumPyThe most fundamental tool for numerical computing inPythonFast multi-dimensional array capability
  42. 42. What NumPy defines: Two fundamental objects: 1. n-dimensional array 2. universal function a rich set of numerical data types nearly 400 functions and methods on arrays: type conversions mathematical logical
  43. 43. NumPys features Fast. Written in C with BLAS/LAPACK hooks. Rich set of data types Linear algebra: matrix inversion, decompositions, … Discrete Fourier transforms Random number generation Trig, hypergeometric functions, etc.
  44. 44. Elementwise array operations Loops are mostly unnecessary Operate on entire arrays!>>> a = numpy.array([20, 30, 40, 50])>>> a < 35array([True, True, False, False], dtype=bool)>>> b = numpy.arange(4)>>> a - barray([20, 29, 38, 47])>>> b**2array([0, 1, 4, 9])
  45. 45. Universal functions NumPy defines ufuncs that operate on entire arrays and other sequences (hence universal) Example: sin()>>> a = numpy.array([20, 30, 40, 50])>>> c = 10 * numpy.sin(a)>>> carray([ 9.12945251, -9.88031624, 7.4511316 ,-2.62374854])
  46. 46. Array slicing Arrays can be sliced and indexed powerfully:>>> a = numpy.arange(10)**3>>> aarray([ 0, 1, 8, 27, 64, 125, 216, 343,512, 729])>>> a[2:5]array([ 8, 27, 64])
  47. 47. Fancy indexing Arrays can be used as indices into other arrays:>>> a = numpy.arange(12)**2>>> ind = numpy.array([ 1, 1, 3, 8, 5 ])>>> a[ind]array([ 1, 1, 9, 64, 25])
  48. 48. Other linear algebra features Matrix inversion: mat(A).I Or: linalg.inv(A) Linear solvers: linalg.solve(A, x) Pseudoinverse: linalg.pinv(A)
  49. 49. What is SciPy? A community A conference A package of scientific libraries
  50. 50. Python for scientific software Back-end: computational work Front-end: input / output, visualization, GUIs Dozens of great scientific packages exist
  51. 51. Python in science (2) NumPy: numerical / array module Matplotlib: great 2D and 3D plotting library IPython: nice interactive Python shell SciPy: set of scientific libraries: sparse matrices, signal processing, … RPy: integration with the R statistical environment
  52. 52. Python in science (3) Cython: C language extensions Mayavi: 3D graphics, volumetric rendering Nitimes, Nipype: Python tools for neuroimaging SymPy: symbolic mathematics library
  53. 53. Python in science (4) VPython: easy, real-time 3D programming UCSF Chimera, PyMOL, VMD: molecular graphics PyRAF: Hubble Space Telescope interface to RAF astronomical data BioPython: computational molecular biology Natural language toolkit: symbolic + statistical NLP Physics: PyROOT
  54. 54. The SciPy packageBSD-licensed software for maths, science,engineering integration signal processing sparse matrices optimization linear algebra maximum entropy interpolation ODEs statistics n-dim image FFTs scientific constants processing C/C++ and Fortran clustering interpolation integration
  55. 55. SciPy optimisation exampleFit a model to noisy data:y = a/xb sin(cx)+ε
  56. 56. Example: fitting a model withscipy.optimize Task: Fit a model of the form y = a/bx sin(cx)+ε to noisy data. Spec: 1. Generate noisy data 2. Choose parameters (a, b, c) to minimize sum squared errors 3. Plot the data and fitted model (next session)
  57. 57. SciPy optimisation exampleimport numpyimport pylabfrom scipy.optimize import leastsqdef myfunc(params, x): (a, b, c) = params return a / (x**b) * numpy.sin(c * x)true_params = [1.5, 0.1, 2.]def f(x): return myfunc(true_params, x)def err(params, x, y): # error function return myfunc(params, x) - y
  58. 58. SciPy optimisation example# Generate noisy data to fitn = 30; xmin = 0.1; xmax = 5x = numpy.linspace(xmin, xmax, n)y = f(x)y += numpy.rand(len(x)) * 0.2 * (y.max() - y.min())v0 = [3., 1., 4.] # initial param estimate# Fittingv, success = leastsq(err, v0, args=(x, y), maxfev=10000)print Estimated parameters: , vprint True parameters: , true_paramsX = numpy.linspace(xmin, xmax, 5 * n)pylab.plot(x, y, ro, X, myfunc(v, X))pylab.show()
  59. 59. SciPy optimisation exampleFit a model to noisy data:y = a/xb sin(cx)+ε
  60. 60. Ingredients for this example numpy.linspace numpy.random.rand for the noise model (uniform) scipy.optimize.leastsq
  61. 61. Sparse matrix exampleConstruct and solve a sparse linear system
  62. 62. Sparse matricesSparse matrices are mostly zeros.They can be symmetric orasymmetric.Sparsity patterns vary: block sparse, band matrices, ...They can be huge!Only non-zeros are stored.
  63. 63. Sparse matrices in SciPy SciPy supports seven sparse storage schemes ... and sparse solvers in Fortran.
  64. 64. Sparse matrix creation To construct a 1000x1000 lil_matrix and add values:>>> from scipy.sparse import lil_matrix>>> from numpy.random import rand>>> from scipy.sparse.linalg import spsolve>>> A = lil_matrix((1000, 1000))>>> A[0, :100] = rand(100)>>> A[1, 100:200] = A[0, :100]>>> A.setdiag(rand(1000))
  65. 65. Solving sparse matrixsystems Now convert the matrix to CSR format and solve Ax=b:>>> A = A.tocsr()>>> b = rand(1000)>>> x = spsolve(A, b)# Convert it to a dense matrix and solve, andcheck that the result is the same:>>> from numpy.linalg import solve, norm>>> x_ = solve(A.todense(), b)# Compute norm of the error:>>> err = norm(x - x_)>>> err < 1e-10True
  66. 66. Matplotlib Great plotting package in Python Matlab-like syntax Great rendering: anti-aliasing etc. Many ‘backends’: Cairo, GTK, Cocoa, PDF Flexible output: to EPS, PS, PDF, TIFF, PNG, ...
  67. 67. Matplotlib: worked examplesSearch the web for Matplotlib gallery
  68. 68. Example: NumPyvectorization1. Use a Monte Carlo algorithm to estimate π: 1. Generate uniform random variates (x,%y) over [0, 1]. 2. Estimate π from the proportion p that land in the unit circle.2. Time two ways of doing this: 1. Using for loops 2. Using array operations (vectorized)
  69. 69. 3. Scaling
  70. 70. HPCHigh-performance computing
  71. 71. Aspects to HPC Supercomputers Distributed clusters / grids Parallel programming ScriptingCaches, shared memory Job control Code porting Specialized hardware
  72. 72. Python for HPC Advantages Disadvantages Portability Global interpreter lock Easy scripting, glue Less control than C Maintainability Native loops are slowProfiling to identify hotspots Vectorization with NumPy
  73. 73. Large data sets Useful Python language features: Generators, iterators Useful packages: Great HDF5 support from PyTables!
  74. 74. Hierarchical dataDatabases without the relational baggage
  75. 75. Great interface for HDF5 dataEfficient support for massive data sets
  76. 76. Applications of PyTables aeronautics telecommunications drug discovery data mining financial analysis statistical analysis climate prediction etc.
  77. 77. Breaking news: June 2011PyTables Pro is now being open sourced. Indexed searches for speedMerging with PyTablesWorking project name: NewPyTables
  78. 78. PyTables performanceOPSI indexing engine speed: Querying 10 billion rows can take hundredths of a second!Target use-case: mostly read-only or append-only data
  79. 79. Principles for efficient code
  80. 80. Important principles1. "Premature optimization is the root of all evil" Dont write cryptic code just to make it more efficient!2. 1-5% of the code takes up the vast majority of the computing time! ... and it might not be the 1-5% that you think!
  81. 81. Checklist for efficient code From most to least important: 1. Check: Do you really need to make it more efficient? 2. Check: Are you using the right algorithms and data structures? 3. Check: Are you reusing pre-written libraries wherever possible? 4. Check: Which parts of the code are expensive? Measure, dont guess!
  82. 82. Relative efficiency gains Exponential-order and polynomial-order speedups are possible by choosing the right algorithm for a task. These require the right data structures! These dwarf 10-25x linear-order speedups from: using lower-level languages using different language constructs.
  83. 83. 4. About Python Charmers
  84. 84. The largest Python training provider in South-East AsiaDelighted customers include:
  85. 85. Most popular course topics Python for Programmers 3 days Python for Scientists and Engineers 4 days Python for Geoscientists 4 days Python for Bioinformaticians 4 daysNew courses: Python for Financial Engineers 4 days Python for IT Security Professionals 3 days
  86. 86. Python Charmers:Topics of expertise Python: beginners, advanced Scientific data processing with Python Software engineering with Python Large-scale problems: HPC, huge data sets, grids Statistics and Monte Carlo problems
  87. 87. Python Charmers:Topics of expertise (2) Spatial data analysis / GIS General scripting, job control, glue GUIs with PyQt Integrating with other languages: R, C, C++, Fortran, ... Web development in Django
  88. 88. How to get in touch See PythonCharmers.com or email us at: info@pythoncharmers.com

×