The Joy of SciPy

2,953 views

Published on

Presentation to Python Users Berlin on 2/14/2013

Published in: Technology

The Joy of SciPy

  1. 1. The Joy of SciPy David Kammeyer PUB February 14, 2013
  2. 2. Brief History Person Package Year Matrix Object Jim Fulton 1994 in Python Jim Hugunin Numeric 1995 Perry Greenfield, Rick White, Todd Miller Numarray 2001 Travis Oliphant NumPy 2005
  3. 3. SciPy 2001 Travis Oliphant optimize sparse interpolate integrate special signal stats Founded in 2001 with Travis Vaught fftpack misc Eric Jones weave clusterPearu Peterson GA* linalg interpolate f2py
  4. 4. SciPy Ecosystem
  5. 5. Community effort• Chuck Harris• Pauli Virtanen• David Cournapeau• Stefan van der Walt• Dag Sverre Seljebotn• Robert Kern• Warren Weckesser• Ralf Gommers• Mark Wiebe• Nathaniel Smith
  6. 6. Why Python for Technical Computing• Syntax (it gets out of your way)• Over-loadable operators• Complex numbers built-in early• Just enough language support for arrays• “Occasional” programmers can grok it• Supports multiple programming styles• Expert programmers can also use it effectively• Has a simple, extensible implementation• General-purpose language --- can build a system• Critical mass
  7. 7. Putting Science back in Comp Sci • Much of the software stack is for systems programming --- C++, Java, .NET, ObjC, web - Complex numbers? - Vectorized primitives? • Array-oriented programming has been supplanted by Object-oriented programming • Software stack for scientists is not as helpful as it should be • Fortran is still where many scientists end up
  8. 8. NumPy: an Array-Oriented Extension• Data: the array object – slicing and shaping – data-type map to Bytes• Fast Math: – vectorization – broadcasting – aggregations
  9. 9. Zen of NumPy• strided is better than scattered• contiguous is better than strided• descriptive is better than imperative• array-oriented is better than object-oriented• broadcasting is a great idea• vectorized is better than an explicit loop• unless it’s too complicated --- then use Cython/Numba• think in higher dimensions
  10. 10. Memory using Object-oriented Object Object Object Attr1 Attr1 Attr1 Attr2 Attr2 Attr2 Attr3 Attr3 Attr3 Object Attr1 Object Attr2 Attr1 Object Attr3 Attr2 Attr1 Attr3 Attr2 Attr3
  11. 11. Array-oriented (Table) approach Attr1 Attr2 Attr3 Object1 Object2 Object3 Object4 Object5 Object6
  12. 12. Benefits of Array-oriented• Many technical problems are naturally array- oriented (easy to vectorize)• Algorithms can be expressed at a high-level• These algorithms can be parallelized more simply (quite often much information is lost in the translation to typical “compiled” languages)• Array-oriented algorithms map to modern hard-ware caches and pipelines.
  13. 13. We need more focus oncomplied array-orientedlanguages with fast compilers!
  14. 14. What is good about NumPy?• Array-oriented• Extensive Dtype System (including structures)• C-API• Simple to understand data-structure• Memory mapping• Syntax support from Python• Large community of users• Broadcasting• Easy to interface C/C++/Fortran code
  15. 15. New Project NumPy Blaze Next Generation NumPy Out-of-core Distributed Tables
  16. 16. Overview Processing Code Node Processing Code Node Main Code Processing Script Node Code Processing Processing Node Node
  17. 17. Timeline (Available on GitHubNow!) Date Milestone July 2012 Pre-alpha release December 2012 Early Beta Release June 2013 Version 1.0
  18. 18. Spectrogram Demo
  19. 19. Introducing Numba(lots of kernels to write)
  20. 20. NumPy Users • Want to be able to write Python to get fast code that works on arrays and scalars • Need access to a boat-load of C-extensions (NumPy is just the beginning) PyPy doesn’t cut it for us!
  21. 21. Ufuncs Generalized UFuncs Python Function Window Kernel Funcs Function- based Indexing Memory Dynamic compilation Filters Dynamic CompilationNumPy Runtime I/O Filters Reduction Filters Computed Columns function pointer
  22. 22. SciPy needs a Python compiler optimize integrate special ode writing more of SciPy at high-level
  23. 23. Numba -- a Python compiler • Replays byte-code on a stack with simple type- inference • Translates to LLVM (using LLVM-py) • Uses LLVM for code-gen • Resulting C-level function-pointer can be inserted into NumPy run-time • Understands NumPy arrays • Is NumPy / SciPy aware
  24. 24. NumPy + Mamba = Numba Python Function Machine Code LLVM-PY LLVM 3.1 ISPC OpenCL OpenMP CUDA CLANG Intel AMD Nvidia Apple
  25. 25. Examples
  26. 26. Software Stack Future? Plateaus of Code re-use + DSLs SQL R TDPL Matlab Python OBJC C FORTRAN C++ LLVM
  27. 27. Wakari/Numba Demo
  28. 28. How to pay for all this?
  29. 29. Dual strategy Blaze
  30. 30. NumFOCUSNum(Py) Foundation for Open Code for Usable Science
  31. 31. NumFOCUS• Mission • To initiate and support educational programs furthering the use of open source software in science. • To promote the use of high-level languages and open source in science, engineering, and math research • To encourage reproducible scientific research • To provide infrastructure and support for open source projects for technical computing
  32. 32. NumFOCUSCore Projects NumPy SciPy IPython MatplotlibOther Projects (seeking more --- need representatives) Scikits Image
  33. 33. • Large-scale data analysis products• Anaconda, SciPy in a Box• Wakari.io -- Cloud Hosted SciPy• Python training (data analysis and development)• NumPy support and consulting• Blaze, Numba, and More Development

×