Python for Science
 and Engineering  Dr Edward Schofield

 A*STAR / Singapore Computational Sciences Club Seminar
                     June 14, 2011
Scientific programming in 2011

 Most scientists and engineers are:
   programming for 50+% of their work time (and rising)
   self-taught programmers
   using inefficient programming practices
   using the wrong programming languages: C++,
   FORTRAN, C#, PHP, Java, ...
Scientific programming needs

 Rapid prototyping
 Efficiency for computational kernels
 Pre-written packages!
   Vectors, matrices, modelling, simulations, visualisation
 Extensibility; web front-ends; database backends; ...
Ed's story:
How I found Python
 PhD in statistical pattern recognition: 2001-2006

 Needed good tools for my research!

 Discovered Python in 2002 after frustration with C++, Matlab,
 Java, Perl

 Contributed to NumPy and SciPy:

   maxent, sparse matrices, optimization, Monte Carlo, etc.

   Managed six releases of SciPy in 2005-6
1. Why Python?
Introducing Python

 What is it?

 What is it good for?

 Who uses it?
What is Python?

 strongly but dynamically typed
 intuitive, readable
 open source, free
 ‘batteries included’
‘batteries included’

 Python’s standard library
   very large
Python’s standard library
 data types     strings     networking     threads

              compression      GUI        arguments
    CGI                        FTP       cryptography

  testing     multimedia    databases     CSV files

 calendar        email        XML        serialization
What is an efficient
programming language?

Native Python code
executes 10x more slowly
than C and FORTRAN
Would you build a racing car ...
... to get to Kuala Lumpur ASAP?
Date      Cost per GFLOPS (US $)             Technology

  1961          US $1.1 trillion          17 million IBM 1620s

  1984         US $15,000,000                  Cray X-MP

                                         Two 16-CPU clusters of
  1997           US $30,000

2000, Apr           $1000                Bunyip Beowulf cluster

2003, Aug            $82                         KASY0

2007, Mar           $0.42                   Ambric AM2045

2009, Sep           $0.13                   ATI Radeon R800

                                     Source: Wikipedia: “FLOPS”
Unit labor cost growth
Proxy for cost of programmer time

 When FORTRAN was invented, computer time was more
 expensive than programmer time.

 In the 1980s and 1990s that reversed.
Efficient programming

 Python code is 10x faster
 to write than C and
What if ...
... you now need to reach Sydney?
Advantages of Python

 Easy to write

 Easy to maintain

 Great standard libraries

 Thriving ecosystem of
 third-party packages

 Open source
‘Batteries included’

 Python’s standard library is:

   very large

   well supported

   well documented
Python’s standard library
 data types     strings     networking     threads

              compression      GUI        arguments
    CGI                        FTP       cryptography

  testing     multimedia    databases     CSV files

 calendar        email        XML        serialization
What is the date 177 days from now?
Natural applications of Python

 Rapid prototyping

 Plotting, visualisation, 3D

 Numerical computing

 Web and database

 All-purpose glue
Python vs other languages
Languages used at CSIRO

   Python   Fortran       Java

   Matlab     C

    IDL      C++           R

    Perl      C#      +5-10 others!
Which language do I choose?

 A different language for each task?

 A language you know?

 A language others in your team are using: support and help?
Python     Matlab

       Interpreted             Yes       Yes

Powerful data input/output     Yes       Yes

      Great plotting           Yes       Yes

General-purpose language     Powerful   Limited

          Cost                Free       $$$

      Open source              Yes        No
Python     C++

        Powerful              Yes       Yes

        Portable              Yes     In theory

    Standard libraries        Vast    Limited

Easy to write and maintain    Yes       No

      Easy to learn           Yes       No
Python     C

           Fast to write                Yes       No

Good for embedded systems, device
                                        No       Yes
  drivers and operating systems

Good for most other high-level tasks    Yes       No

          Standard library              Vast    Limited
Python    Java

Powerful, well-designed language    Yes      Yes

       Standard libraries           Vast     Vast

         Easy to learn              Yes       No

         Code brevity              Short    Verbose

   Easy to write and maintain       Yes      Okay
Open source

Python is open source software


  No vendor lock-in


  Insurance against bugs in the platform

Python success stories

 Computer graphics:

   Industrial Light & Magic


   Google: News, Groups, Maps, Gmail

 Legacy system integration:

   AstraZeneca - collaborative drug discovery
Python success stories (2)




   universities worldwide ...


   YouTube, Reddit, BitTorrent, Civilization IV,
Industrial Light & Magic

 Python spread from
 scripting to the entire
 production pipeline

 Numerous reviews since
 1996: Python is still the
 best tool for them
United Space Alliance

 A common sentiment:

 “We achieve immediate functioning code so much faster in
 Python than in any other language that it’s staggering.”

                       - Robin Friedrich, Senior Project Engineer
Case study: air-traffic control

 Eric Newton, “Python for
 Critical Applications”: http://

 Metaslash, Inc: 1999 to 2001

 Mission-critical system for
 air-traffic control

 Replicated, fault-tolerant
 data storage
Case study: air-traffic control
 Python prototype -> C++ implementation -> Python again


   C++ dependencies were buggy

   C++ threads, STL were not portable enough

 Python’s advantages over C++

   More portable

   75% less code: more productivity, fewer bugs
More case studies

 See for lots more case
 studies and success stories
2. The scientific Python ecosystem
Scientific software

 Small beginnings

 Piecemeal growth, quirky interfaces

 ... Large, cumbersome systems
An n-dimensional array/matrix package
Centre of Python’s numerical computing ecosystem

The most fundamental tool for numerical computing in
Fast multi-dimensional array capability
What NumPy defines:

 Two fundamental objects:
 1. n-dimensional array
 2. universal function

 a rich set of numerical data types
 nearly 400 functions and methods on arrays:
   type conversions
NumPy's features

 Fast. Written in C with BLAS/LAPACK hooks.
 Rich set of data types
 Linear algebra: matrix inversion, decompositions, …
 Discrete Fourier transforms
 Random number generation
 Trig, hypergeometric functions, etc.
Elementwise array operations

 Loops are mostly unnecessary
 Operate on entire arrays!
>>> a = numpy.array([20, 30, 40, 50])
>>> a < 35
array([True, True, False, False], dtype=bool)
>>> b = numpy.arange(4)
>>> a - b
array([20, 29, 38, 47])
>>> b**2
array([0, 1, 4, 9])
Universal functions

 NumPy defines 'ufuncs' that operate on entire arrays
 and other sequences (hence 'universal')
 Example: sin()
>>> a = numpy.array([20, 30, 40, 50])
>>> c = 10 * numpy.sin(a)
>>> c
array([ 9.12945251, -9.88031624, 7.4511316 ,
Array slicing

 Arrays can be sliced and indexed powerfully:
>>> a = numpy.arange(10)**3
>>> a
array([ 0,    1,    8, 27, 64, 125, 216, 343,
512, 729])
>>> a[2:5]
array([ 8, 27, 64])
Fancy indexing

 Arrays can be used as indices into other arrays:
>>> a = numpy.arange(12)**2
>>> ind = numpy.array([ 1, 1, 3, 8, 5 ])
>>> a[ind]
array([ 1, 1, 9, 64, 25])
Other linear algebra features

 Matrix inversion: mat(A).I

 Or: linalg.inv(A)
 Linear solvers: linalg.solve(A, x)
 Pseudoinverse: linalg.pinv(A)
What is SciPy?

 A community
 A conference
 A package of scientific libraries
Python for scientific software

 Back-end: computational work

 Front-end: input / output, visualization, GUIs

 Dozens of great scientific packages exist
Python in science (2)

 NumPy: numerical / array module
 Matplotlib: great 2D and 3D plotting library
 IPython: nice interactive Python shell
 SciPy: set of scientific libraries: sparse matrices, signal
 processing, …
 RPy: integration with the R statistical environment
Python in science (3)

 Cython: C language extensions
 Mayavi: 3D graphics, volumetric rendering
 Nitimes, Nipype: Python tools for neuroimaging
 SymPy: symbolic mathematics library
Python in science (4)

 VPython: easy, real-time 3D programming

 UCSF Chimera, PyMOL, VMD: molecular graphics

 PyRAF: Hubble Space Telescope interface to RAF astronomical

 BioPython: computational molecular biology

 Natural language toolkit: symbolic + statistical NLP

 Physics: 	 PyROOT
The SciPy package
BSD-licensed software for maths, science,

  integration    signal processing    sparse matrices
 optimization     linear algebra     maximum entropy
 interpolation        ODEs                statistics
                   n-dim image
     FFTs                            scientific constants
                                     C/C++ and Fortran
  clustering       interpolation
SciPy optimisation example
Fit a model to noisy data:
y = a/xb sin(cx)+ε
Example: fitting a model with

 Task: Fit a model of the form y = a/bx sin(cx)+ε
 to noisy data.
 1. Generate noisy data
 2. Choose parameters (a, b, c) to minimize sum squared
 3. Plot the data and fitted model (next session)
SciPy optimisation example
import numpy
import pylab
from scipy.optimize import leastsq

def myfunc(params, x):
    (a, b, c) = params
    return a / (x**b) * numpy.sin(c * x)

true_params = [1.5, 0.1, 2.]
def f(x):
    return myfunc(true_params, x)

def err(params, x, y): # error function
    return myfunc(params, x) - y
SciPy optimisation example
#   Generate noisy data to fit
n   = 30; xmin = 0.1; xmax = 5
x   = numpy.linspace(xmin, xmax, n)
y   = f(x)
y   += numpy.rand(len(x)) * 0.2 * 
       (y.max() - y.min())

v0 = [3., 1., 4.] # initial param estimate
# Fitting
v, success = leastsq(err, v0, args=(x, y), maxfev=10000)

print 'Estimated parameters: ', v
print 'True parameters: ', true_params
X = numpy.linspace(xmin, xmax, 5 * n)
pylab.plot(x, y, 'ro', X, myfunc(v, X))
SciPy optimisation example
Fit a model to noisy data:
y = a/xb sin(cx)+ε
Ingredients for this example


 numpy.random.rand for the noise model (uniform)

Sparse matrix example
Construct and solve a sparse linear system
Sparse matrices
Sparse matrices are mostly zeros.
They can be symmetric or
Sparsity patterns vary:
  block sparse, band matrices, ...
They can be huge!
Only non-zeros are stored.
Sparse matrices in SciPy

 SciPy supports seven sparse storage schemes
 ... and sparse solvers in Fortran.
Sparse matrix creation

 To construct a 1000x1000 lil_matrix and add values:
>>> from scipy.sparse import lil_matrix
>>> from numpy.random import rand
>>> from scipy.sparse.linalg import spsolve

>>>   A = lil_matrix((1000, 1000))
>>>   A[0, :100] = rand(100)
>>>   A[1, 100:200] = A[0, :100]
>>>   A.setdiag(rand(1000))
Solving sparse matrix
 Now convert the matrix to CSR format and solve Ax=b:
>>> A = A.tocsr()
>>> b = rand(1000)
>>> x = spsolve(A, b)

# Convert it to a dense matrix and solve, and
check that the result is the same:
>>> from numpy.linalg import solve, norm
>>> x_ = solve(A.todense(), b)
# Compute norm of the error:
>>> err = norm(x - x_)
>>> err < 1e-10

 Great plotting package in Python
 Matlab-like syntax
 Great rendering: anti-aliasing etc.
 Many ‘backends’: Cairo, GTK, Cocoa, PDF
 Flexible output: to EPS, PS, PDF, TIFF, PNG, ...
Matplotlib: worked examples
Search the web for 'Matplotlib gallery'
Example: NumPy
1. Use a Monte Carlo algorithm to
   estimate π:

   1. Generate uniform random variates (x,%y) over [0, 1].

   2. Estimate π from the proportion p that land in the unit
2. Time two ways of doing this:
   1. Using for loops

   2. Using array operations (vectorized)
3. Scaling
High-performance computing
Aspects to HPC
   Supercomputers       Distributed clusters / grids

 Parallel programming            Scripting

Caches, shared memory          Job control

    Code porting          Specialized hardware
Python for HPC
       Advantages                 Disadvantages

         Portability            Global interpreter lock

    Easy scripting, glue         Less control than C

       Maintainability          Native loops are slow

Profiling to identify hotspots

 Vectorization with NumPy
Large data sets

 Useful Python language features:
   Generators, iterators
 Useful packages:
   Great HDF5 support from PyTables!
Hierarchical data
Databases without the relational baggage
Great interface for HDF5 data
Efficient support for massive data sets
Applications of PyTables

     aeronautics       telecommunications

   drug discovery          data mining

  financial analysis     statistical analysis

  climate prediction           etc.
Breaking news: June 2011

PyTables Pro is now being open sourced.
  Indexed searches for speed
Merging with PyTables
Working project name: NewPyTables
PyTables performance

OPSI indexing engine speed:
  Querying 10 billion rows can take hundredths of a
Target use-case:
  mostly read-only or append-only data
Principles for efficient code
Important principles

1. "Premature optimization is the root of all evil"
      Don't write cryptic code just to make it more efficient!

2. 1-5% of the code takes up the vast majority of the
   computing time!
      ... and it might not be the 1-5% that you think!
Checklist for efficient code
 From most to least important:

 1. Check: Do you really need to make it more efficient?
 2. Check: Are you using the right algorithms and data
 3. Check: Are you reusing pre-written libraries wherever
 4. Check: Which parts of the code are expensive?
    Measure, don't guess!
Relative efficiency gains

 Exponential-order and polynomial-order speedups are
 possible by choosing the right algorithm for a task.
   These require the right data structures!
 These dwarf 10-25x linear-order speedups from:
   using lower-level languages
   using different language constructs.
4. About Python Charmers
The largest Python training provider in South-East Asia
Delighted customers include:
Most popular course topics
         Python for Programmers            3 days

    Python for Scientists and Engineers    4 days

         Python for Geoscientists          4 days

        Python for Bioinformaticians       4 days

New courses:

       Python for Financial Engineers      4 days
    Python for IT Security Professionals   3 days
Python Charmers:
Topics of expertise
 Python: beginners, advanced
 Scientific data processing with Python
 Software engineering with Python
 Large-scale problems: HPC, huge data sets, grids
 Statistics and Monte Carlo problems
Python Charmers:
Topics of expertise (2)
 Spatial data analysis / GIS
 General scripting, job control, glue
 GUIs with PyQt
 Integrating with other languages: R, C, C++, Fortran, ...
 Web development in Django
How to get in touch

 or email us at:
Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofield, Python Charmers, June 2011

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ... Founder Sachin Dev Duggal's Strategic Approach to Create an Innova... Founder Sachin Dev Duggal's Strategic Approach to Create an Founder Sachin Dev Duggal's Strategic Approach to Create an Innova... Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofield, Python Charmers, June 2011

  • 1. Python for Science and Engineering Dr Edward Schofield A*STAR / Singapore Computational Sciences Club Seminar June 14, 2011
  • 2. Scientific programming in 2011 Most scientists and engineers are: programming for 50+% of their work time (and rising) self-taught programmers using inefficient programming practices using the wrong programming languages: C++, FORTRAN, C#, PHP, Java, ...
  • 3. Scientific programming needs Rapid prototyping Efficiency for computational kernels Pre-written packages! Vectors, matrices, modelling, simulations, visualisation Extensibility; web front-ends; database backends; ...
  • 4. Ed's story: How I found Python PhD in statistical pattern recognition: 2001-2006 Needed good tools for my research! Discovered Python in 2002 after frustration with C++, Matlab, Java, Perl Contributed to NumPy and SciPy: maxent, sparse matrices, optimization, Monte Carlo, etc. Managed six releases of SciPy in 2005-6
  • 6. Introducing Python What is it? What is it good for? Who uses it?
  • 7. What is Python? interpreted strongly but dynamically typed object-oriented intuitive, readable open source, free ‘batteries included’
  • 8. ‘batteries included’ Python’s standard library is: very large well-supported well-documented
  • 9. Python’s standard library data types strings networking threads operating compression GUI arguments system complex CGI FTP cryptography numbers testing multimedia databases CSV files calendar email XML serialization
  • 10. What is an efficient programming language? Native Python code executes 10x more slowly than C and FORTRAN
  • 11. Would you build a racing car ... ... to get to Kuala Lumpur ASAP?
  • 12. Date Cost per GFLOPS (US $) Technology 1961 US $1.1 trillion 17 million IBM 1620s 1984 US $15,000,000 Cray X-MP Two 16-CPU clusters of 1997 US $30,000 Pentiums 2000, Apr $1000 Bunyip Beowulf cluster 2003, Aug $82 KASY0 2007, Mar $0.42 Ambric AM2045 2009, Sep $0.13 ATI Radeon R800 Source: Wikipedia: “FLOPS”
  • 13. Unit labor cost growth Proxy for cost of programmer time
  • 14. Efficiency When FORTRAN was invented, computer time was more expensive than programmer time. In the 1980s and 1990s that reversed.
  • 15. Efficient programming Python code is 10x faster to write than C and FORTRAN
  • 16. What if ... ... you now need to reach Sydney?
  • 17. Advantages of Python Easy to write Easy to maintain Great standard libraries Thriving ecosystem of third-party packages Open source
  • 18. ‘Batteries included’ Python’s standard library is: very large well supported well documented
  • 19. Python’s standard library data types strings networking threads operating compression GUI arguments system complex CGI FTP cryptography numbers testing multimedia databases CSV files calendar email XML serialization
  • 20. Question What is the date 177 days from now?
  • 21. Natural applications of Python Rapid prototyping Plotting, visualisation, 3D Numerical computing Web and database programming All-purpose glue
  • 22. Python vs other languages
  • 23. Languages used at CSIRO Python Fortran Java Matlab C IDL C++ R Perl C# +5-10 others!
  • 24. Which language do I choose? A different language for each task? A language you know? A language others in your team are using: support and help?
  • 25. Python Matlab Interpreted Yes Yes Powerful data input/output Yes Yes Great plotting Yes Yes General-purpose language Powerful Limited Cost Free $$$ Open source Yes No
  • 26. Python C++ Powerful Yes Yes Portable Yes In theory Standard libraries Vast Limited Easy to write and maintain Yes No Easy to learn Yes No
  • 27. Python C Fast to write Yes No Good for embedded systems, device No Yes drivers and operating systems Good for most other high-level tasks Yes No Standard library Vast Limited
  • 28. Python Java Powerful, well-designed language Yes Yes Standard libraries Vast Vast Easy to learn Yes No Code brevity Short Verbose Easy to write and maintain Yes Okay
  • 29. Open source Python is open source software Benefits: No vendor lock-in Cross-platform Insurance against bugs in the platform Free
  • 30. Python success stories Computer graphics: Industrial Light & Magic Web: Google: News, Groups, Maps, Gmail Legacy system integration: AstraZeneca - collaborative drug discovery
  • 31. Python success stories (2) Aerospace: NASA Research: universities worldwide ... Others: YouTube, Reddit, BitTorrent, Civilization IV,
  • 32. Industrial Light & Magic Python spread from scripting to the entire production pipeline Numerous reviews since 1996: Python is still the best tool for them
  • 33. United Space Alliance A common sentiment: “We achieve immediate functioning code so much faster in Python than in any other language that it’s staggering.” - Robin Friedrich, Senior Project Engineer
  • 34. Case study: air-traffic control Eric Newton, “Python for Critical Applications”: http:// recall.html Metaslash, Inc: 1999 to 2001 Mission-critical system for air-traffic control Replicated, fault-tolerant data storage
  • 35. Case study: air-traffic control Python prototype -> C++ implementation -> Python again Why? C++ dependencies were buggy C++ threads, STL were not portable enough Python’s advantages over C++ More portable 75% less code: more productivity, fewer bugs
  • 36. More case studies See for lots more case studies and success stories
  • 37. 2. The scientific Python ecosystem
  • 38. Scientific software development Small beginnings Piecemeal growth, quirky interfaces ... Large, cumbersome systems
  • 40. NumPy Centre of Python’s numerical computing ecosystem
  • 41. NumPy The most fundamental tool for numerical computing in Python Fast multi-dimensional array capability
  • 42. What NumPy defines: Two fundamental objects: 1. n-dimensional array 2. universal function a rich set of numerical data types nearly 400 functions and methods on arrays: type conversions mathematical logical
  • 43. NumPy's features Fast. Written in C with BLAS/LAPACK hooks. Rich set of data types Linear algebra: matrix inversion, decompositions, … Discrete Fourier transforms Random number generation Trig, hypergeometric functions, etc.
  • 44. Elementwise array operations Loops are mostly unnecessary Operate on entire arrays! >>> a = numpy.array([20, 30, 40, 50]) >>> a < 35 array([True, True, False, False], dtype=bool) >>> b = numpy.arange(4) >>> a - b array([20, 29, 38, 47]) >>> b**2 array([0, 1, 4, 9])
  • 45. Universal functions NumPy defines 'ufuncs' that operate on entire arrays and other sequences (hence 'universal') Example: sin() >>> a = numpy.array([20, 30, 40, 50]) >>> c = 10 * numpy.sin(a) >>> c array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])
  • 46. Array slicing Arrays can be sliced and indexed powerfully: >>> a = numpy.arange(10)**3 >>> a array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729]) >>> a[2:5] array([ 8, 27, 64])
  • 47. Fancy indexing Arrays can be used as indices into other arrays: >>> a = numpy.arange(12)**2 >>> ind = numpy.array([ 1, 1, 3, 8, 5 ]) >>> a[ind] array([ 1, 1, 9, 64, 25])
  • 48. Other linear algebra features Matrix inversion: mat(A).I Or: linalg.inv(A) Linear solvers: linalg.solve(A, x) Pseudoinverse: linalg.pinv(A)
  • 49. What is SciPy? A community A conference A package of scientific libraries
  • 50. Python for scientific software Back-end: computational work Front-end: input / output, visualization, GUIs Dozens of great scientific packages exist
  • 51. Python in science (2) NumPy: numerical / array module Matplotlib: great 2D and 3D plotting library IPython: nice interactive Python shell SciPy: set of scientific libraries: sparse matrices, signal processing, … RPy: integration with the R statistical environment
  • 52. Python in science (3) Cython: C language extensions Mayavi: 3D graphics, volumetric rendering Nitimes, Nipype: Python tools for neuroimaging SymPy: symbolic mathematics library
  • 53. Python in science (4) VPython: easy, real-time 3D programming UCSF Chimera, PyMOL, VMD: molecular graphics PyRAF: Hubble Space Telescope interface to RAF astronomical data BioPython: computational molecular biology Natural language toolkit: symbolic + statistical NLP Physics: PyROOT
  • 54. The SciPy package BSD-licensed software for maths, science, engineering integration signal processing sparse matrices optimization linear algebra maximum entropy interpolation ODEs statistics n-dim image FFTs scientific constants processing C/C++ and Fortran clustering interpolation integration
  • 55. SciPy optimisation example Fit a model to noisy data: y = a/xb sin(cx)+ε
  • 56. Example: fitting a model with scipy.optimize Task: Fit a model of the form y = a/bx sin(cx)+ε to noisy data. Spec: 1. Generate noisy data 2. Choose parameters (a, b, c) to minimize sum squared errors 3. Plot the data and fitted model (next session)
  • 57. SciPy optimisation example import numpy import pylab from scipy.optimize import leastsq def myfunc(params, x): (a, b, c) = params return a / (x**b) * numpy.sin(c * x) true_params = [1.5, 0.1, 2.] def f(x): return myfunc(true_params, x) def err(params, x, y): # error function return myfunc(params, x) - y
  • 58. SciPy optimisation example # Generate noisy data to fit n = 30; xmin = 0.1; xmax = 5 x = numpy.linspace(xmin, xmax, n) y = f(x) y += numpy.rand(len(x)) * 0.2 * (y.max() - y.min()) v0 = [3., 1., 4.] # initial param estimate # Fitting v, success = leastsq(err, v0, args=(x, y), maxfev=10000) print 'Estimated parameters: ', v print 'True parameters: ', true_params X = numpy.linspace(xmin, xmax, 5 * n) pylab.plot(x, y, 'ro', X, myfunc(v, X))
  • 59. SciPy optimisation example Fit a model to noisy data: y = a/xb sin(cx)+ε
  • 60. Ingredients for this example numpy.linspace numpy.random.rand for the noise model (uniform) scipy.optimize.leastsq
  • 61. Sparse matrix example Construct and solve a sparse linear system
  • 62. Sparse matrices Sparse matrices are mostly zeros. They can be symmetric or asymmetric. Sparsity patterns vary: block sparse, band matrices, ... They can be huge! Only non-zeros are stored.
  • 63. Sparse matrices in SciPy SciPy supports seven sparse storage schemes ... and sparse solvers in Fortran.
  • 64. Sparse matrix creation To construct a 1000x1000 lil_matrix and add values: >>> from scipy.sparse import lil_matrix >>> from numpy.random import rand >>> from scipy.sparse.linalg import spsolve >>> A = lil_matrix((1000, 1000)) >>> A[0, :100] = rand(100) >>> A[1, 100:200] = A[0, :100] >>> A.setdiag(rand(1000))
  • 65. Solving sparse matrix systems Now convert the matrix to CSR format and solve Ax=b: >>> A = A.tocsr() >>> b = rand(1000) >>> x = spsolve(A, b) # Convert it to a dense matrix and solve, and check that the result is the same: >>> from numpy.linalg import solve, norm >>> x_ = solve(A.todense(), b) # Compute norm of the error: >>> err = norm(x - x_) >>> err < 1e-10 True
  • 66. Matplotlib Great plotting package in Python Matlab-like syntax Great rendering: anti-aliasing etc. Many ‘backends’: Cairo, GTK, Cocoa, PDF Flexible output: to EPS, PS, PDF, TIFF, PNG, ...
  • 67. Matplotlib: worked examples Search the web for 'Matplotlib gallery'
  • 68. Example: NumPy vectorization 1. Use a Monte Carlo algorithm to estimate π: 1. Generate uniform random variates (x,%y) over [0, 1]. 2. Estimate π from the proportion p that land in the unit circle. 2. Time two ways of doing this: 1. Using for loops 2. Using array operations (vectorized)
  • 71. Aspects to HPC Supercomputers Distributed clusters / grids Parallel programming Scripting Caches, shared memory Job control Code porting Specialized hardware
  • 72. Python for HPC Advantages Disadvantages Portability Global interpreter lock Easy scripting, glue Less control than C Maintainability Native loops are slow Profiling to identify hotspots Vectorization with NumPy
  • 73. Large data sets Useful Python language features: Generators, iterators Useful packages: Great HDF5 support from PyTables!
  • 74. Hierarchical data Databases without the relational baggage
  • 75. Great interface for HDF5 data Efficient support for massive data sets
  • 76. Applications of PyTables aeronautics telecommunications drug discovery data mining financial analysis statistical analysis climate prediction etc.
  • 77. Breaking news: June 2011 PyTables Pro is now being open sourced. Indexed searches for speed Merging with PyTables Working project name: NewPyTables
  • 78. PyTables performance OPSI indexing engine speed: Querying 10 billion rows can take hundredths of a second! Target use-case: mostly read-only or append-only data
  • 80. Important principles 1. "Premature optimization is the root of all evil" Don't write cryptic code just to make it more efficient! 2. 1-5% of the code takes up the vast majority of the computing time! ... and it might not be the 1-5% that you think!
  • 81. Checklist for efficient code From most to least important: 1. Check: Do you really need to make it more efficient? 2. Check: Are you using the right algorithms and data structures? 3. Check: Are you reusing pre-written libraries wherever possible? 4. Check: Which parts of the code are expensive? Measure, don't guess!
  • 82. Relative efficiency gains Exponential-order and polynomial-order speedups are possible by choosing the right algorithm for a task. These require the right data structures! These dwarf 10-25x linear-order speedups from: using lower-level languages using different language constructs.
  • 83. 4. About Python Charmers
  • 84. The largest Python training provider in South-East Asia Delighted customers include:
  • 85. Most popular course topics Python for Programmers 3 days Python for Scientists and Engineers 4 days Python for Geoscientists 4 days Python for Bioinformaticians 4 days New courses: Python for Financial Engineers 4 days Python for IT Security Professionals 3 days
  • 86. Python Charmers: Topics of expertise Python: beginners, advanced Scientific data processing with Python Software engineering with Python Large-scale problems: HPC, huge data sets, grids Statistics and Monte Carlo problems
  • 87. Python Charmers: Topics of expertise (2) Spatial data analysis / GIS General scripting, job control, glue GUIs with PyQt Integrating with other languages: R, C, C++, Fortran, ... Web development in Django
  • 88. How to get in touch See or email us at: