Your SlideShare is downloading. ×
0
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The High Performance Python Landscape by Ian Ozsvald

785

Published on

The High Performance Python Landscape by Ian Ozsvald

The High Performance Python Landscape by Ian Ozsvald

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
785
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. www.morconsulting.c The High Performance Python Landscape - profiling and fast calculation Ian Ozsvald @IanOzsvald MorConsulting.com
  • 2. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 What is “high performance”? ● Profiling to understand system behaviour ● We often ignore this step... ● Speeding up the bottleneck ● Keeps you on 1 machine (if possible) ● Keeping team speed high
  • 3. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 “High Performance Python” • “Practical Performant Programming for Humans” • Please join the mailing list via IanOzsvald.com
  • 4. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 cProfile
  • 5. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 line_profiler Line #      Hits         Time  Per Hit   % Time  Line Contents ==============================================================      9                                           @profile     10                                           def calculate_z_serial_purepython(                                                       maxiter, zs, cs):     12         1         6870   6870.0      0.0      output = [0] * len(zs)     13   1000001       781959      0.8      0.8      for i in range(len(zs)):     14   1000000       767224      0.8      0.8          n = 0     15   1000000       843432      0.8      0.8          z = zs[i]     16   1000000       786013      0.8      0.8          c = cs[i]     17  34219980     36492596      1.1     36.2          while abs(z) < 2                                                                 and n < maxiter:     18  33219980     32869046      1.0     32.6              z = z * z + c     19  33219980     27371730      0.8     27.2              n += 1     20   1000000       890837      0.9      0.9          output[i] = n     21         1            4      4.0      0.0      return output
  • 6. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 memory_profiler Line #    Mem usage    Increment   Line Contents ================================================      9   89.934 MiB    0.000 MiB   @profile     10                             def calculate_z_serial_purepython(                                                      maxiter, zs, cs):                                       12   97.566 MiB    7.633 MiB       output = [0] * len(zs)     13  130.215 MiB   32.648 MiB       for i in range(len(zs)):     14  130.215 MiB    0.000 MiB           n = 0     15  130.215 MiB    0.000 MiB           z = zs[i]     16  130.215 MiB    0.000 MiB           c = cs[i]     17  130.215 MiB    0.000 MiB           while n < maxiter and abs(z) < 2:     18  130.215 MiB    0.000 MiB               z = z * z + c     19  130.215 MiB    0.000 MiB               n += 1     20  130.215 MiB    0.000 MiB           output[i] = n     21  122.582 MiB   ­7.633 MiB       return output
  • 7. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 memory_profiler mprof https://github.com/scikit-learn/scikit-l earn/pull/2248 Before & After an improvement
  • 8. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Transforming memory_profiler into a resource profiler?
  • 9. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Profiling possibilities ● CPU (line by line or by function) ● Memory (line by line) ● Disk read/write (with some hacking) ● Network read/write (with some hacking) ● mmaps ● File handles ● Network connections ● Cache utilisation via libperf?
  • 10. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Cython 0.20 (pyx annotations) #cython: boundscheck=False def calculate_z(int maxiter, zs, cs):     """Calculate output list using Julia update rule"""     cdef unsigned int i, n     cdef double complex z, c     output = [0] * len(zs)     for i in range(len(zs)):         n = 0         z = zs[i]         c = cs[i]         while n < maxiter and (z.real * z.real + z.imag * z.imag) < 4:             z = z * z + c             n += 1         output[i] = n     return output Pure CPython lists code 12s Cython lists runtime 0.19s Cython numpy runtime 0.16s
  • 11. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Cython + numpy + OMP nogil #cython: boundscheck=False from cython.parallel import parallel, prange import numpy as np cimport numpy as np def calculate_z(int maxiter, double complex[:] zs, double complex[:] cs):     cdef unsigned int i, length, n     cdef double complex z, c     cdef int[:] output = np.empty(len(zs), dtype=np.int32)     length = len(zs)     with nogil, parallel():         for i in prange(length, schedule="guided"):             z = zs[i]             c = cs[i]             n = 0             while n < maxiter and (z.real * z.real + z.imag * z.imag) < 4:                 z = z * z + c                 n = n + 1             output[i] = n     return output Runtime 0.05s
  • 12. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 ShedSkin 0.9.4 annotations def calculate_z(maxiter, zs, cs):        # maxiter: [int], zs:                             [list(complex)], cs: [list(complex)]     output = [0] * len(zs)               # [list(int)]     for i in range(len(zs)):             # [__iter(int)]         n = 0                            # [int]         z = zs[i]                        # [complex]         c = cs[i]                        # [complex]         while n < maxiter and (… <4):    # [complex]             z = z * z + c                # [complex]             n += 1                       # [int]         output[i] = n                    # [int]     return output                        # [list(int)] Couldn't we generate Cython pyx? Runtime 0.22s
  • 13. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Pythran (0.40) #pythran export calculate_z_serial_purepython(int,  complex list, complex list) def calculate_z_serial_purepython(maxiter, zs, cs):  …  Support for OpenMP on numpy arrays Author Serge made an overnight fix – superb support! List Runtime 0.4s #pythran export calculate_z(int, complex[], complex[], int[]) …  #omp parallel for schedule(dynamic) OMP numpy Runtime 0.10s
  • 14. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 PyPy nightly (and numpypy) ● “It just works” on Python 2.7 code ● Clever list strategies (e.g. unboxed, uniform) ● Little support for pre-existing C extensions (e.g. the existing numpy) ● multiprocessing, IPython etc all work fine ● Python list code runtime: 0.3s ● (pypy)numpy support is incomplete, bugs are tackled (numpy runtime 5s [CPython+numpy 56s])
  • 15. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Numba 0.12 from numba import jit @jit(nopython=True) def calculate_z_serial_purepython(maxiter, zs, cs, output):     # couldn't create output, had to pass it in     # output = numpy.zeros(len(zs), dtype=np.int32)     for i in xrange(len(zs)):         n = 0         z = zs[i]         c = cs[i]         #while n < maxiter and abs(z) < 2:  # abs unrecognised         while n < maxiter and z.real * z.real + z.imag * z.imag < 4:             z = z * z + c             n += 1         output[i] = n     #return output Runtime 0.4s Some Python 3 support, some GPU prange support missing (was in 0.11)? 0.12 introduces temp limitations
  • 16. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Tool Tradeoffs ● PyPy no learning curve (pure Py only) easy win? ● ShedSkin easy (pure Py only) but fairly rare ● Cython pure Py hours to learn – team cost low (and lots of online help) ● Cython numpy OMP days+ to learn – heavy team cost? ● Numba/Pythran hours to learn, install a bit tricky (Anaconda easiest for Numba) ● Pythran OMP very impressive result for little effort ● Numba big toolchain which might hurt productivity? ● (numexpr not covered – great for numpy and easy to use)
  • 17. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Wrap up ● Our profiling options should be richer ● 4-12 physical CPU cores commonplace ● Cost of hand-annotating code is reduced agility ● JITs/AST compilers are getting fairly good, manual intervention still gives best results BUT! CONSIDER: ● Automation should (probably) be embraced ($CPUs < $humans) as team velocity is probably higher
  • 18. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Thank You • Ian@IanOzsvald.com • @IanOzsvald • MorConsulting.com • Annotate.io • GitHub/IanOzsvald

×