SlideShare a Scribd company logo
1 of 18
Download to read offline
www.morconsulting.c
The High Performance Python
Landscape
- profiling and fast calculation
Ian Ozsvald @IanOzsvald MorConsulting.com
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
What is “high performance”?
●
Profiling to understand system behaviour
●
We often ignore this step...
●
Speeding up the bottleneck
●
Keeps you on 1 machine (if possible)
●
Keeping team speed high
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
“High Performance Python”
• “Practical Performant
Programming
for Humans”
• Please join the mailing
list via IanOzsvald.com
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
cProfile
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
line_profiler
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     9                                           @profile
    10                                           def calculate_z_serial_purepython(
                                                      maxiter, zs, cs):
    12         1         6870   6870.0      0.0      output = [0] * len(zs)
    13   1000001       781959      0.8      0.8      for i in range(len(zs)):
    14   1000000       767224      0.8      0.8          n = 0
    15   1000000       843432      0.8      0.8          z = zs[i]
    16   1000000       786013      0.8      0.8          c = cs[i]
    17  34219980     36492596      1.1     36.2          while abs(z) < 2 
                                                               and n < maxiter:
    18  33219980     32869046      1.0     32.6              z = z * z + c
    19  33219980     27371730      0.8     27.2              n += 1
    20   1000000       890837      0.9      0.9          output[i] = n
    21         1            4      4.0      0.0      return output
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
memory_profiler
Line #    Mem usage    Increment   Line Contents
================================================
     9   89.934 MiB    0.000 MiB   @profile
    10                             def calculate_z_serial_purepython(
                                                     maxiter, zs, cs):            
                     
    12   97.566 MiB    7.633 MiB       output = [0] * len(zs)
    13  130.215 MiB   32.648 MiB       for i in range(len(zs)):
    14  130.215 MiB    0.000 MiB           n = 0
    15  130.215 MiB    0.000 MiB           z = zs[i]
    16  130.215 MiB    0.000 MiB           c = cs[i]
    17  130.215 MiB    0.000 MiB           while n < maxiter and abs(z) < 2:
    18  130.215 MiB    0.000 MiB               z = z * z + c
    19  130.215 MiB    0.000 MiB               n += 1
    20  130.215 MiB    0.000 MiB           output[i] = n
    21  122.582 MiB   ­7.633 MiB       return output
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
memory_profiler mprof
https://github.com/scikit-learn/scikit-l
earn/pull/2248
Before & After an improvement
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
Transforming memory_profiler
into a resource profiler?
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
Profiling possibilities
●
CPU (line by line or by function)
●
Memory (line by line)
●
Disk read/write (with some hacking)
●
Network read/write (with some hacking)
●
mmaps
●
File handles
●
Network connections
●
Cache utilisation via libperf?
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
Cython 0.20 (pyx annotations)
#cython: boundscheck=False
def calculate_z(int maxiter, zs, cs):
    """Calculate output list using Julia update rule"""
    cdef unsigned int i, n
    cdef double complex z, c
    output = [0] * len(zs)
    for i in range(len(zs)):
        n = 0
        z = zs[i]
        c = cs[i]
        while n < maxiter and (z.real * z.real + z.imag * z.imag) < 4:
            z = z * z + c
            n += 1
        output[i] = n
    return output
Pure CPython lists code 12s
Cython lists runtime 0.19s
Cython numpy runtime 0.16s
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
Cython + numpy + OMP nogil
#cython: boundscheck=False
from cython.parallel import parallel, prange
import numpy as np
cimport numpy as np
def calculate_z(int maxiter, double complex[:] zs, double complex[:] cs):
    cdef unsigned int i, length, n
    cdef double complex z, c
    cdef int[:] output = np.empty(len(zs), dtype=np.int32)
    length = len(zs)
    with nogil, parallel():
        for i in prange(length, schedule="guided"):
            z = zs[i]
            c = cs[i]
            n = 0
            while n < maxiter and (z.real * z.real + z.imag * z.imag) < 4:
                z = z * z + c
                n = n + 1
            output[i] = n
    return output
Runtime 0.05s
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
ShedSkin 0.9.4 annotations
def calculate_z(maxiter, zs, cs):        # maxiter: [int], zs: 
                           [list(complex)], cs: [list(complex)]
    output = [0] * len(zs)               # [list(int)]
    for i in range(len(zs)):             # [__iter(int)]
        n = 0                            # [int]
        z = zs[i]                        # [complex]
        c = cs[i]                        # [complex]
        while n < maxiter and (… <4):    # [complex]
            z = z * z + c                # [complex]
            n += 1                       # [int]
        output[i] = n                    # [int]
    return output                        # [list(int)]
Couldn't we generate Cython pyx? Runtime 0.22s
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
Pythran (0.40)
#pythran export calculate_z_serial_purepython(int, 
complex list, complex list)
def calculate_z_serial_purepython(maxiter, zs, cs):
 … 
Support for OpenMP on numpy arrays
Author Serge made an overnight fix – superb
support!
List Runtime 0.4s
#pythran export calculate_z(int, complex[], complex[], int[])
… 
#omp parallel for schedule(dynamic)
OMP numpy Runtime 0.10s
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
PyPy nightly (and numpypy)
●
“It just works” on Python 2.7 code
●
Clever list strategies (e.g. unboxed, uniform)
●
Little support for pre-existing C extensions (e.g.
the existing numpy)
●
multiprocessing, IPython etc all work fine
●
Python list code runtime: 0.3s
●
(pypy)numpy support is incomplete, bugs are
tackled (numpy runtime 5s [CPython+numpy 56s])
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
Numba 0.12
from numba import jit
@jit(nopython=True)
def calculate_z_serial_purepython(maxiter, zs, cs, output):
    # couldn't create output, had to pass it in
    # output = numpy.zeros(len(zs), dtype=np.int32)
    for i in xrange(len(zs)):
        n = 0
        z = zs[i]
        c = cs[i]
        #while n < maxiter and abs(z) < 2:  # abs unrecognised
        while n < maxiter and z.real * z.real + z.imag * z.imag < 4:
            z = z * z + c
            n += 1
        output[i] = n
    #return output
Runtime 0.4s
Some Python 3 support, some GPU
prange support missing (was in 0.11)?
0.12 introduces temp limitations
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
Tool Tradeoffs
●
PyPy no learning curve (pure Py only) easy win?
●
ShedSkin easy (pure Py only) but fairly rare
●
Cython pure Py hours to learn – team cost low (and lots of
online help)
●
Cython numpy OMP days+ to learn – heavy team cost?
●
Numba/Pythran hours to learn, install a bit tricky (Anaconda
easiest for Numba)
●
Pythran OMP very impressive result for little effort
●
Numba big toolchain which might hurt productivity?
●
(numexpr not covered – great for numpy and easy to use)
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
Wrap up
●
Our profiling options should be richer
●
4-12 physical CPU cores commonplace
●
Cost of hand-annotating code is reduced agility
●
JITs/AST compilers are getting fairly good, manual
intervention still gives best results
BUT! CONSIDER:
●
Automation should (probably) be embraced ($CPUs
< $humans) as team velocity is probably higher
Ian@MorConsulting.com @IanOzsvald
PyDataLondon February 2014
Thank You
• Ian@IanOzsvald.com
• @IanOzsvald
• MorConsulting.com
• Annotate.io
• GitHub/IanOzsvald

More Related Content

Viewers also liked

The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkSpark Summit
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profilingJon Haddad
 
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production ScaleGPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scalesparktc
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterSpark Summit
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...Spark Summit
 
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Spark Summit
 
Making your code faster cython and parallel processing in the jupyter notebook
Making your code faster   cython and parallel processing in the jupyter notebookMaking your code faster   cython and parallel processing in the jupyter notebook
Making your code faster cython and parallel processing in the jupyter notebookPyData
 
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleGPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleSpark Summit
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySparkSpark Summit
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesAhsan Javed Awan
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesKrishna Sankar
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityWes McKinney
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Alexander Korbonits
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Spark Summit
 
Python Performance Profiling: The Guts And The Glory
Python Performance Profiling: The Guts And The GloryPython Performance Profiling: The Guts And The Glory
Python Performance Profiling: The Guts And The Gloryemptysquare
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache SparkWes McKinney
 

Viewers also liked (17)

The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in Spark
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profiling
 
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production ScaleGPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
 
Making your code faster cython and parallel processing in the jupyter notebook
Making your code faster   cython and parallel processing in the jupyter notebookMaking your code faster   cython and parallel processing in the jupyter notebook
Making your code faster cython and parallel processing in the jupyter notebook
 
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleGPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of Techniques
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
 
Python Performance Profiling: The Guts And The Glory
Python Performance Profiling: The Guts And The GloryPython Performance Profiling: The Guts And The Glory
Python Performance Profiling: The Guts And The Glory
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
 

Similar to The High Performance Python Landscape by Ian Ozsvald

Open Source Monitoring in 2015
Open Source Monitoring in 2015Open Source Monitoring in 2015
Open Source Monitoring in 2015Kris Buytaert
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance PythonIan Ozsvald
 
Python for scientific computing
Python for scientific computingPython for scientific computing
Python for scientific computingGo Asgard
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CSteffen Wenz
 
Open Source Monitoring in 2019
Open Source Monitoring in 2019 Open Source Monitoring in 2019
Open Source Monitoring in 2019 Kris Buytaert
 
Tutorial on-python-programming
Tutorial on-python-programmingTutorial on-python-programming
Tutorial on-python-programmingChetan Giridhar
 
GDG Helwan Introduction to python
GDG Helwan Introduction to pythonGDG Helwan Introduction to python
GDG Helwan Introduction to pythonMohamed Hegazy
 
From MonitoringSucks to Monitoring Love , 2016 Edition
From MonitoringSucks to Monitoring Love , 2016 EditionFrom MonitoringSucks to Monitoring Love , 2016 Edition
From MonitoringSucks to Monitoring Love , 2016 EditionKris Buytaert
 
Introduction to Raspberry Pi and GPIO
Introduction to Raspberry Pi and GPIOIntroduction to Raspberry Pi and GPIO
Introduction to Raspberry Pi and GPIOKris Findlay
 
What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)wesley chun
 
Python 101 - Indonesia AI Society.pdf
Python 101 - Indonesia AI Society.pdfPython 101 - Indonesia AI Society.pdf
Python 101 - Indonesia AI Society.pdfHendri Karisma
 
PyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtimePyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtimeNational Cheng Kung University
 
Just-In-Time Compiler in PHP 8
Just-In-Time Compiler in PHP 8Just-In-Time Compiler in PHP 8
Just-In-Time Compiler in PHP 8Nikita Popov
 
On the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonOn the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonTakeshi Akutsu
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of pythonYung-Yu Chen
 
Monitoring Drupal In an Infrastructure as Code Age
Monitoring Drupal In an Infrastructure as Code AgeMonitoring Drupal In an Infrastructure as Code Age
Monitoring Drupal In an Infrastructure as Code AgeKris Buytaert
 
web programming UNIT VIII python by Bhavsingh Maloth
web programming UNIT VIII python by Bhavsingh Malothweb programming UNIT VIII python by Bhavsingh Maloth
web programming UNIT VIII python by Bhavsingh MalothBhavsingh Maloth
 
Python at yhat (august 2013)
Python at yhat (august 2013)Python at yhat (august 2013)
Python at yhat (august 2013)Austin Ogilvie
 
PYTHON by kunal.pptx
PYTHON by kunal.pptxPYTHON by kunal.pptx
PYTHON by kunal.pptxkunalGoyal89
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesTatiana Al-Chueyr
 

Similar to The High Performance Python Landscape by Ian Ozsvald (20)

Open Source Monitoring in 2015
Open Source Monitoring in 2015Open Source Monitoring in 2015
Open Source Monitoring in 2015
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
Python for scientific computing
Python for scientific computingPython for scientific computing
Python for scientific computing
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in C
 
Open Source Monitoring in 2019
Open Source Monitoring in 2019 Open Source Monitoring in 2019
Open Source Monitoring in 2019
 
Tutorial on-python-programming
Tutorial on-python-programmingTutorial on-python-programming
Tutorial on-python-programming
 
GDG Helwan Introduction to python
GDG Helwan Introduction to pythonGDG Helwan Introduction to python
GDG Helwan Introduction to python
 
From MonitoringSucks to Monitoring Love , 2016 Edition
From MonitoringSucks to Monitoring Love , 2016 EditionFrom MonitoringSucks to Monitoring Love , 2016 Edition
From MonitoringSucks to Monitoring Love , 2016 Edition
 
Introduction to Raspberry Pi and GPIO
Introduction to Raspberry Pi and GPIOIntroduction to Raspberry Pi and GPIO
Introduction to Raspberry Pi and GPIO
 
What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)
 
Python 101 - Indonesia AI Society.pdf
Python 101 - Indonesia AI Society.pdfPython 101 - Indonesia AI Society.pdf
Python 101 - Indonesia AI Society.pdf
 
PyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtimePyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtime
 
Just-In-Time Compiler in PHP 8
Just-In-Time Compiler in PHP 8Just-In-Time Compiler in PHP 8
Just-In-Time Compiler in PHP 8
 
On the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonOn the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of Python
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of python
 
Monitoring Drupal In an Infrastructure as Code Age
Monitoring Drupal In an Infrastructure as Code AgeMonitoring Drupal In an Infrastructure as Code Age
Monitoring Drupal In an Infrastructure as Code Age
 
web programming UNIT VIII python by Bhavsingh Maloth
web programming UNIT VIII python by Bhavsingh Malothweb programming UNIT VIII python by Bhavsingh Maloth
web programming UNIT VIII python by Bhavsingh Maloth
 
Python at yhat (august 2013)
Python at yhat (august 2013)Python at yhat (august 2013)
Python at yhat (august 2013)
 
PYTHON by kunal.pptx
PYTHON by kunal.pptxPYTHON by kunal.pptx
PYTHON by kunal.pptx
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummies
 

More from PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

The High Performance Python Landscape by Ian Ozsvald

  • 1. www.morconsulting.c The High Performance Python Landscape - profiling and fast calculation Ian Ozsvald @IanOzsvald MorConsulting.com
  • 2. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 What is “high performance”? ● Profiling to understand system behaviour ● We often ignore this step... ● Speeding up the bottleneck ● Keeps you on 1 machine (if possible) ● Keeping team speed high
  • 3. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 “High Performance Python” • “Practical Performant Programming for Humans” • Please join the mailing list via IanOzsvald.com
  • 5. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 line_profiler Line #      Hits         Time  Per Hit   % Time  Line Contents ==============================================================      9                                           @profile     10                                           def calculate_z_serial_purepython(                                                       maxiter, zs, cs):     12         1         6870   6870.0      0.0      output = [0] * len(zs)     13   1000001       781959      0.8      0.8      for i in range(len(zs)):     14   1000000       767224      0.8      0.8          n = 0     15   1000000       843432      0.8      0.8          z = zs[i]     16   1000000       786013      0.8      0.8          c = cs[i]     17  34219980     36492596      1.1     36.2          while abs(z) < 2                                                                 and n < maxiter:     18  33219980     32869046      1.0     32.6              z = z * z + c     19  33219980     27371730      0.8     27.2              n += 1     20   1000000       890837      0.9      0.9          output[i] = n     21         1            4      4.0      0.0      return output
  • 6. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 memory_profiler Line #    Mem usage    Increment   Line Contents ================================================      9   89.934 MiB    0.000 MiB   @profile     10                             def calculate_z_serial_purepython(                                                      maxiter, zs, cs):                                       12   97.566 MiB    7.633 MiB       output = [0] * len(zs)     13  130.215 MiB   32.648 MiB       for i in range(len(zs)):     14  130.215 MiB    0.000 MiB           n = 0     15  130.215 MiB    0.000 MiB           z = zs[i]     16  130.215 MiB    0.000 MiB           c = cs[i]     17  130.215 MiB    0.000 MiB           while n < maxiter and abs(z) < 2:     18  130.215 MiB    0.000 MiB               z = z * z + c     19  130.215 MiB    0.000 MiB               n += 1     20  130.215 MiB    0.000 MiB           output[i] = n     21  122.582 MiB   ­7.633 MiB       return output
  • 7. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 memory_profiler mprof https://github.com/scikit-learn/scikit-l earn/pull/2248 Before & After an improvement
  • 8. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Transforming memory_profiler into a resource profiler?
  • 9. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Profiling possibilities ● CPU (line by line or by function) ● Memory (line by line) ● Disk read/write (with some hacking) ● Network read/write (with some hacking) ● mmaps ● File handles ● Network connections ● Cache utilisation via libperf?
  • 10. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Cython 0.20 (pyx annotations) #cython: boundscheck=False def calculate_z(int maxiter, zs, cs):     """Calculate output list using Julia update rule"""     cdef unsigned int i, n     cdef double complex z, c     output = [0] * len(zs)     for i in range(len(zs)):         n = 0         z = zs[i]         c = cs[i]         while n < maxiter and (z.real * z.real + z.imag * z.imag) < 4:             z = z * z + c             n += 1         output[i] = n     return output Pure CPython lists code 12s Cython lists runtime 0.19s Cython numpy runtime 0.16s
  • 11. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Cython + numpy + OMP nogil #cython: boundscheck=False from cython.parallel import parallel, prange import numpy as np cimport numpy as np def calculate_z(int maxiter, double complex[:] zs, double complex[:] cs):     cdef unsigned int i, length, n     cdef double complex z, c     cdef int[:] output = np.empty(len(zs), dtype=np.int32)     length = len(zs)     with nogil, parallel():         for i in prange(length, schedule="guided"):             z = zs[i]             c = cs[i]             n = 0             while n < maxiter and (z.real * z.real + z.imag * z.imag) < 4:                 z = z * z + c                 n = n + 1             output[i] = n     return output Runtime 0.05s
  • 12. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 ShedSkin 0.9.4 annotations def calculate_z(maxiter, zs, cs):        # maxiter: [int], zs:                             [list(complex)], cs: [list(complex)]     output = [0] * len(zs)               # [list(int)]     for i in range(len(zs)):             # [__iter(int)]         n = 0                            # [int]         z = zs[i]                        # [complex]         c = cs[i]                        # [complex]         while n < maxiter and (… <4):    # [complex]             z = z * z + c                # [complex]             n += 1                       # [int]         output[i] = n                    # [int]     return output                        # [list(int)] Couldn't we generate Cython pyx? Runtime 0.22s
  • 13. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Pythran (0.40) #pythran export calculate_z_serial_purepython(int,  complex list, complex list) def calculate_z_serial_purepython(maxiter, zs, cs):  …  Support for OpenMP on numpy arrays Author Serge made an overnight fix – superb support! List Runtime 0.4s #pythran export calculate_z(int, complex[], complex[], int[]) …  #omp parallel for schedule(dynamic) OMP numpy Runtime 0.10s
  • 14. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 PyPy nightly (and numpypy) ● “It just works” on Python 2.7 code ● Clever list strategies (e.g. unboxed, uniform) ● Little support for pre-existing C extensions (e.g. the existing numpy) ● multiprocessing, IPython etc all work fine ● Python list code runtime: 0.3s ● (pypy)numpy support is incomplete, bugs are tackled (numpy runtime 5s [CPython+numpy 56s])
  • 15. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Numba 0.12 from numba import jit @jit(nopython=True) def calculate_z_serial_purepython(maxiter, zs, cs, output):     # couldn't create output, had to pass it in     # output = numpy.zeros(len(zs), dtype=np.int32)     for i in xrange(len(zs)):         n = 0         z = zs[i]         c = cs[i]         #while n < maxiter and abs(z) < 2:  # abs unrecognised         while n < maxiter and z.real * z.real + z.imag * z.imag < 4:             z = z * z + c             n += 1         output[i] = n     #return output Runtime 0.4s Some Python 3 support, some GPU prange support missing (was in 0.11)? 0.12 introduces temp limitations
  • 16. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Tool Tradeoffs ● PyPy no learning curve (pure Py only) easy win? ● ShedSkin easy (pure Py only) but fairly rare ● Cython pure Py hours to learn – team cost low (and lots of online help) ● Cython numpy OMP days+ to learn – heavy team cost? ● Numba/Pythran hours to learn, install a bit tricky (Anaconda easiest for Numba) ● Pythran OMP very impressive result for little effort ● Numba big toolchain which might hurt productivity? ● (numexpr not covered – great for numpy and easy to use)
  • 17. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Wrap up ● Our profiling options should be richer ● 4-12 physical CPU cores commonplace ● Cost of hand-annotating code is reduced agility ● JITs/AST compilers are getting fairly good, manual intervention still gives best results BUT! CONSIDER: ● Automation should (probably) be embraced ($CPUs < $humans) as team velocity is probably higher
  • 18. Ian@MorConsulting.com @IanOzsvald PyDataLondon February 2014 Thank You • Ian@IanOzsvald.com • @IanOzsvald • MorConsulting.com • Annotate.io • GitHub/IanOzsvald