Python for Speed,
Scale, and Science
Travis E. Oliphant
@teoliphant
Quansight and OpenTeams
• MS/BS degrees in Elec. Comp. Engineering
• PhD in Biomedical Engineering (Ultrasound and
MRI)
• Creator and Maintainer of SciPy (1998-2009)
• Professor of EE (2001-2007) Inverse Problems
• Creator and Developer of NumPy (2005-2013)
• Started Numba and Conda (2012)
• Founder of NumFOCUS / PyData
• Python Dev and Foundation Director
• CEO/Founder (2012) of Continuum Analytics/
Anaconda, Inc.
• CEO/Founder (2018) of Quansight
• CEO/Founder (2019) of OpenTeams
SciPy
Started my career in computational science
Satellites Measure Backscatter
Computer Algorithms Produce
Estimate of Earth Features
• Wind Speed
• Ice Cover
• Vegetation
• (and more)
1996 - 2001
Analyze 12.0
https://analyzedirect.com/
Richard Robb
Retired in 2015
Bringing “SciFi”
Medicine to Life
since 1971
More Science led to Python
Raja Muthupillai
Armando Manduca
Richard Ehman
1997
Professor
Scanning Impedance Imaging
My little “side projects” became my life
SciPy
1998
2018
2001
2015
2009
2012
2005
…
2001
2006
SciPy, NumPy, and PyData Time-Line
1991
2003
2014
2008
2010 2016
2009
2019
Making Python for Science Popular
renamed
~20 million (Ana)conda users
spun-out
But… Python is Slow…
But… Python Can’t Scale…
Java
JavaScript
Python
Google Search Trends
May 2020
Used by some of the Brightest Minds…
LIGO : Gravitional Waves
Higgs Boson
Discovery
Black Hole
Imaging
Scientists have big data and compute needs
Data Size
60 million GB
Compute Power
2000 TeraFLOPS
(~30,000 of my laptop)
How does Python
scale to that?
GTC Europe
NVIDIA CEO Jensen
Huang describes Python
as the DeFacto Data
Science Platform
Getting better supported
by GPUs over the next
months and years.
Reasons for Python’s success in Science
1) Python is expressive and easy to read.
2) Python (in particular CPython) is straightforward to
extend and with Cython, Python has become a glue
language for many other run-times.
When you use Python for speed, scale, and science you are almost always
actually running machine instructions compiled from another language
“glued” together with high-level and expressive Python.
Pythonic code helps me think better. For a scientist it gets out of your way
3) An engaging Open Source Community
Not all open-source is the same!
Community-Driven
Open Source
Software (CDOSS)
Company-Backed
Open Source
Software (CBOSS)
• Anyone can become the leader.
• Multiple-stake holders.
• Can look at community size for health.
• Users become contributors more often.
• Examples:
• Python
• Jupyter
• NumPy
• SciPy
• Pandas
• Need to work at a company to be the
leader,
• Many users, fewer developers
• Need to understand incentive of company
to understand health
• Examples:
• Swift
• Tensorflow
• PyTorch
• Conda
Both can be valuable, but have different implications!
Governance
models
NumPy
Tensorflow
Scikit Learn
PyTorch
NumPy
Pandas
My First Big Project
Started as Multipack in 1998 and became
SciPy in 2001 with the help of other
colleagues
115 releases, 854 contributors
Used by: 187,386
SciPy
“Distribution of Python Numerical Tools masquerading as one Library”
Name Description
cluster KMeans and Vector Quantization
fftpack Discrete Fourier Transform
integrate Numerical Integration
interpolate Interpolation routines
io Data Input and Output
linalg Fast Linear algebra
misc Utilities
ndimage N-dimensional Image processing
Name Description
odr Orthogonal Distance Regression
optimize
Constrained and Unconstrained
Optimization
signal Signal Processing Tools
sparse Sparse Matrices and Algebra
spatial Spatial Data Structures and Algorithms
special Special functions (e.g. Bessel)
stats Statistical Functions and Distributions
Published: February 3, 2020
Project
Started:
1998
Patience and
Persistence and
Grit
My Open Source
addiction continued…
Gave up my chance at tenured academic position
in 2005-2006 to bring together the diverging
array community in Python and bring Numeric
and Numarray together.
170 releases, 923 contributors
Used by: 378,828
Without NumPy
from math import sin, pi
def sinc(x):
if x == 0:
return 1.0
else:
pix = pi*x
return sin(pix)/pix
def step(x):
if x > 0:
return 1.0
elif x < 0:
return 0.0
else:
return 0.5
functions.py
>>> import functions as f
>>> xval = [x/3.0 for x in
range(-10,10)]
>>> yval1 = [f.sinc(x) for x
in xval]
>>> yval2 = [f.step(x) for x
in xval]
Python is a great language but
needed a way to operate quickly
and cleanly over multi-
dimensional arrays.
With NumPy
from numpy import sin, pi
from numpy import vectorize
import functions as f
vsinc = vectorize(f.sinc)
def sinc(x):
pix = pi*x
val = sin(pix)/pix
val[x==0] = 1.0
return val
vstep = vectorize(f.step)
def step(x):
y = x*0.0
y[x>0] = 1
y[x==0] = 0.5
return y
>>> import functions2 as f
>>> from numpy import *
>>> x = r_[-10:10]/3.0
>>> y1 = f.sinc(x)
>>> y2 = f.step(x)
functions2.py
Offers N-D array, element-by-element
functions, and basic random numbers,
linear algebra, and FFT capability for
Python
http://numpy.org
Fiscally sponsored by NumFOCUS
NumPy: an Array Extension of Python
• Data: the array object
– slicing and shaping
– data-type map to Bytes
• Fast Math (ufuncs):
– vectorization
– broadcasting
– aggregations
shape
NumPy Array
Key Attributes
• dtype
• shape
• ndim
• strides
• data
NumPy Examples
2d array
3d array
[439 472 477]
[217 205 261 222 245 238]
9.98330639789 2.96677717122
NumPy Slicing (Selection)
>>> a[0,3:5]
array([3, 4])
>>> a[4:,4:]
array([[44, 45],
[54, 55]])
>>> a[:,2]
array([2,12,22,32,42,52])
>>> a[2::2,::2]
array([[20, 22, 24],
[40, 42, 44]])
Summary
• Provides foundational N-dimensional array composed of
homogeneous elements of a particular “dtype”
• The dtype of the elements is extensive (but difficult to extend)
• Arrays can be sliced and diced with simple syntax to provide
easy manipulation and selection.
• Provides fast and powerful math, statistics, and linear algebra
functions that operate over arrays.
• Utilities for sorting, reading and writing data also provided.
Bokeh
Adapted from Jake Vanderplas
PyCon 2017 Keynote
Scale Up vs Scale Out
Big Memory &
Many Cores
/ GPU Box
Best of Both
(e.g. GPU Cluster)
Many commodity
nodes in a cluster
ScaleUp
(BiggerNodes)
Scale Out
(More Nodes)
Numba
Dask
Dask with Numba
• Python is one of the most popular languages for data science
• Python integrates well with compiled, accelerated libraries (MKL,
TensorFlow, etc)
• But what about custom algorithms and data processing tasks?
• Our goal was to make a compiler that:
• Worked within the standard Python interpreter, not replaced it
• Integrated tightly with NumPy
• Compatible with both multithreaded and distributed computing
paradigms
A Compiler for Python?
Combining Productivity and Performance
• An open-source, function-at-a-time compiler library for Python
• Compiler toolbox for different targets and execution models:
• single-threaded CPU, multi-threaded CPU, GPU
• regular functions, “universal functions” (array functions), etc
• Speedup: 2x (compared to basic NumPy code) to 200x (compared to pure
Python)
• Combine ease of writing Python with speeds approaching C/FORTRAN
• Empower data scientists who make tools for themselves and other data
scientists
Numba: A JIT Compiler for Python
7 things about Numba you may not know
1
2
3
4
5
6
7
Numba is 100% Open Source
Numba + Jupyter = Rapid
CUDA Prototyping
Numba can compile for the
CPU and the GPU at the same time
Numba makes array processing
easy with @(gu)vectorize
Numba comes with a
CUDA Simulator
You can send Numba
functions over the network
Numba developers contributing to
NVIDIA new rapids.ai work.
Numba (compile Python to CPUs and GPUs)
conda install numba
Intermediate
Representation
(IR)
x86
ARM
PTX
Python
LLVMNumba
Code Generation
Backend
Parsing
Frontend
How does Numba work?
Python Function
(bytecode)
Bytecode
Analysis
Functions
Arguments
Numba IR
Machine
Code
Execute!
Type
Inference
LLVM/NVVM JIT LLVM IR
Lowering
Rewrite IR
Cache
@jit
def do_math(a, b):
…
>>> do_math(x, y)
Supported Platforms and Hardware
OS HW SW
Windows
(7 and later)
32 and 64-bit CPUs (Incl
Xeon Phi)
Python 2.7, 3.4-3.7
OS X
(10.9 and later)
CUDA & HSA GPUs NumPy 1.10 and later
Linux
(RHEL 6 and later)
Some support for ARM and
ROCm
Basic Example
Basic Example
Array Allocation
Looping over ndarray x as an iterator
Using numpy math functions
Returning a slice of the array
2.7x speedup!
Numba decorator
(nopython=True not required)
• Detects CPU model during compilation and optimizes for that target
• Automatic type inference: No need to give type signatures for functions
• Dispatches to multiple type-specializations for the same function
• Call out to C libraries with CFFI and types
• Special "callback" mode for creating C callbacks to use with external
libraries
• Optional caching to disk, and ahead-of-time creation of shared libraries
• Compiler is extensible with new data types and functions
Numba Features
• Numba's CPU detection will enable
LLVM to autovectorize for
appropriate SIMD instruction set:
• SSE, AVX, AVX2, AVX-512
• Will become even more important
as AVX-512 is now available on
both Xeon Phi and Skylake Xeon
processors
SIMD: Single Instruction Multiple Data
Manual Multithreading: Release the GIL
SpeedupRatio 0
0.9
1.8
2.6
3.5
Number of Threads
1 2 4
Option to release the GIL
Using Python
concurrent.futures
Universal Functions (Ufuncs)
Ufuncs are a core concept in NumPy for array-oriented
computing.
◦ A function with scalar inputs is broadcast across the elements of
the input arrays:
• np.add([1,2,3], 3) == [4, 5, 6]
• np.add([1,2,3], [10, 20, 30]) == [11, 22, 33]
◦ Parallelism is present, by construction. Numba will generate
loops and can automatically multi-thread if requested.
◦ Before Numba, creating fast ufuncs required writing C. No
longer!
Universal Functions (Ufuncs)
Different decorator!
1.8x speedup!
Multi-threaded Ufuncs
Specify type signature
Select parallel target
Automatically uses all CPU cores!
target=“cuda” and “hsa” for easily using multiple cores on a GPU available too!
Other Numba topics
CUDA Python — write general NVIDIA GPU kernels with Python
Device Arrays — manage memory transfer from host to GPU
Streaming — manage asynchronous and parallel GPU compute streams
CUDA Simulator in Python — to help debug your kernels
HSA — support for AMD ROCm GPUs and APUs
Pyculib — access to cuFFT, cuBLAS, cuSPARSE, cuRAND, CUDA Sorting
https://github.com/ContinuumIO/gtc2017-numba
Dask
• Designed to parallelize the Python ecosystem
• Handles complex algorithms
• Co-developed with Pandas/SKLearn/Jupyter teams
• Familiar APIs for Python users
• Scales
• Scales from multicore to 1000-node clusters
• Resilience, responsive, and real-time
• Parallelizes NumPy, Pandas, SKLearn
• Satisfies subset of these APIs
• Uses these libraries internally
• Co-developed with these teams
• Task scheduler supports custom algorithms
• Parallelize existing code
• Build novel real-time systems
• Arbitrary task graphs
with data dependencies
• Same scalability
50
Big DataSmall Data
Numba
Dask: From User Interaction to Execution
51
delayed
52
>>> import pandas as pd
>>> df = pd.read_csv('iris.csv')
>>> df.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
>>> max_sepal_length_setosa = df[df.species
== 'setosa'].sepal_length.max()
5.7999999999999998
>>> import dask.dataframe as dd
>>> ddf = dd.read_csv('*.csv')
>>> ddf.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
…
>>> d_max_sepal_length_setosa = ddf[ddf.species
== 'setosa'].sepal_length.max()
>>> d_max_sepal_length_setosa.compute()
5.7999999999999998
Dask DataFrame is like Pandas
Example 1: Using Dask DataFrames on a cluster with CSV
data
53
• Built from Pandas DataFrames
• Match Pandas interface
• Access data from HDFS, S3, local, etc.
• Fast, low latency
• Responsive user interface
54
>>> import numpy as np
>>> np_ones = np.ones((5000, 1000))
>>> np_ones
array([[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
...,
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.]])
>>> np_y = np.log(np_ones + 1)[:5].sum(axis=1)
>>> np_y
array([ 693.14718056, 693.14718056,
693.14718056, 693.14718056, 693.14718056])
>>> import dask.array as da
>>> da_ones = da.ones((5000000, 1000000),
chunks=(1000, 1000))
>>> da_ones.compute()
array([[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
...,
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.]])
>>> da_y = da.log(da_ones + 1)[:5].sum(axis=1)
>>> np_da_y = np.array(da_y) #fits in memory
array([ 693.14718056, 693.14718056,
693.14718056, 693.14718056, …, 693.14718056])
# If result doesn’t fit in memory
>>> da_y.to_hdf5('myfile.hdf5', 'result')
Dask Array is like NumPy
Example 3: Using Dask Arrays with global temperature data
55
• Built from NumPy
n-dimensional arrays
• Matches NumPy interface
(subset)
• Solve medium-large
problems
• Complex algorithms
Dask Schedulers: Distributed Scheduler
56
• Scheduling arbitrary graphs is hard.
• Optimal graph scheduling is NP-hard
• Scalable Scheduling requires Linear time solutions
• Fortunately dask does well with a lot of heuristics
• … and a lot of monitoring and data about sizes
• … and how long functions take.
Dask Scheduler
57
Cluster Architecture Diagram
58
Client Machine Compute
Node
Compute
Node
Compute
Node
Head Node
Beautiful Diagnostic Dashboards
• Fast responsive dashboards
• Provide users performance insight
• Powered by Bokeh
• Dask is not a SQL database.
Does Pandas well, but won’t optimize complex queries.
• Dask is not MPI
Very fast, but does leave some performance on the table
200us task overhead
a couple copies in the network stack
• Dask is not a JVM technology
It’s a Python library
(although Julia bindings available)
• Dask is not always necessary
You may not need parallelism
Reasons not to Use: Dask’s limitations
Python
has taken
over!
Thanks to 1000s of
of my “closest”
friends who worked
on all the libraries
We won!
(sort of)
Downloads
49 Million
Estimated Cost
$7.57 Million
Contributors
866
Estimated Effort
76 person-years
4
Current Maintainers
Downloads
27.7 Million
Estimated Cost
$7 Million
Contributors
1,666
Estimated Effort
70 person-years
3
Current Maintainers
Downloads
13.8 Million
Estimated Cost
$6.63 Million
Contributors
860
Estimated Effort
64 person-years
2
Current Maintainers
Development began in 2003
Development began in 2005
Development began in 2008
The original developers were not paid to work on or improve these libraries!
Improving with
QLabs
What is next for me? What
am I working on for the next
few years…
High Level APIs for Arrays (Tensors),
DataFrames, and DataTypes
LABS
The extensions are an anchor to
Python runtime progress!
CPython C-API
What will work!
• Create a statically typed subset
of Python that is then used to
extend Python — EPython
• Port NumPy, SciPy, Scikits to
EPython (borrow heavily from
Cython ideas but use mypy-style
typing instead of new syntax).
LABS
Sustaining the Future
Open-source innovation and
maintenance around the entire data-
science and AI workflow.
• NumPy ecosystem maintenance (PyData Core Team)
• Improve connection of NumPy to ML Frameworks
• GPU Support for NumPy Ecosystem
• Improve foundations of Array computing
• JupyterLab and JupyterHub
• Data Catalog standards
• Packaging (conda-forge, PyPA, etc.)
PySparse - sparse n-d arrays
Ibis - Pandas-like front-end to SQL
uarray — unified array interface for SciPy refactor
xnd — re-factored NumPy (low-level cross-language
libraries for N-D (tensor) computing)
Collaborating with NumFOCUS!
Bokeh
Adapted from Jake Vanderplas
PyCon 2017 Keynote
Problem
Open Source Teams
● Burned out
● Underrepresented
● Underpaid
Organizations
● Disconnected from
the Community
● Lack support and
maintenance
There’s no easy way to connect the
community with organizations
Marketplace for Open Source Services
Partners
● Provide Open Source Services
● Training / Support
● Feature development / fixes
● Hire Open Source Devs
Clients
● Pay for support
● Pay for training and
mentoring
● Get support they need to
build effectively on open-
source.
Open-source Contributors create
profiles for themselves and manage
their reputation to get hired or work
with both!
FairOSS
A Public Benefit Company (goal is growing amount of freely available software)
• Owned by open-source contributors (will be doing a public fund-raise later this year)
• Those share-holders govern the organization (elect the board).
• Board appoints management and decides what is “fair”
Holds Companies accountable
• Allows usage of its trademarks only for companies that contribute back “fairly”
• Think “Kosher” or “Organic labeled”
• Companies give back by equity, revenue, and “in-kind” agreements with FairOSS
FairOSS is custodian of Revenue and Equity Agreements
• Equity agreements mean that FairOSS holds shares, options, or warrants of the company
(most companies are missing open-source community from their ‘cap-table’)
• Revenue agreements mean that companies pay FairOSS a portion of their revenue.
• FairOSS distributes almost all of the proceeds from these agreements to the open-
source communities.
If successful — this would make OpenSource investable and
make available >$45,000,000,000,000 (trillion) of investment
capital to open-source communities.
You can really change the world…
With Open Source Communities…
Let’s do more of that!
v
Thank You!

Travis Oliphant "Python for Speed, Scale, and Science"

  • 1.
    Python for Speed, Scale,and Science Travis E. Oliphant @teoliphant Quansight and OpenTeams
  • 2.
    • MS/BS degreesin Elec. Comp. Engineering • PhD in Biomedical Engineering (Ultrasound and MRI) • Creator and Maintainer of SciPy (1998-2009) • Professor of EE (2001-2007) Inverse Problems • Creator and Developer of NumPy (2005-2013) • Started Numba and Conda (2012) • Founder of NumFOCUS / PyData • Python Dev and Foundation Director • CEO/Founder (2012) of Continuum Analytics/ Anaconda, Inc. • CEO/Founder (2018) of Quansight • CEO/Founder (2019) of OpenTeams SciPy
  • 3.
    Started my careerin computational science Satellites Measure Backscatter Computer Algorithms Produce Estimate of Earth Features • Wind Speed • Ice Cover • Vegetation • (and more)
  • 4.
    1996 - 2001 Analyze12.0 https://analyzedirect.com/ Richard Robb Retired in 2015 Bringing “SciFi” Medicine to Life since 1971
  • 5.
    More Science ledto Python Raja Muthupillai Armando Manduca Richard Ehman 1997
  • 6.
  • 7.
    My little “sideprojects” became my life SciPy
  • 8.
    1998 2018 2001 2015 2009 2012 2005 … 2001 2006 SciPy, NumPy, andPyData Time-Line 1991 2003 2014 2008 2010 2016 2009 2019
  • 9.
    Making Python forScience Popular renamed ~20 million (Ana)conda users spun-out
  • 10.
  • 11.
  • 12.
  • 13.
    Used by someof the Brightest Minds… LIGO : Gravitional Waves Higgs Boson Discovery Black Hole Imaging
  • 14.
    Scientists have bigdata and compute needs Data Size 60 million GB Compute Power 2000 TeraFLOPS (~30,000 of my laptop) How does Python scale to that?
  • 15.
    GTC Europe NVIDIA CEOJensen Huang describes Python as the DeFacto Data Science Platform Getting better supported by GPUs over the next months and years.
  • 16.
    Reasons for Python’ssuccess in Science 1) Python is expressive and easy to read. 2) Python (in particular CPython) is straightforward to extend and with Cython, Python has become a glue language for many other run-times. When you use Python for speed, scale, and science you are almost always actually running machine instructions compiled from another language “glued” together with high-level and expressive Python. Pythonic code helps me think better. For a scientist it gets out of your way 3) An engaging Open Source Community
  • 17.
    Not all open-sourceis the same! Community-Driven Open Source Software (CDOSS) Company-Backed Open Source Software (CBOSS) • Anyone can become the leader. • Multiple-stake holders. • Can look at community size for health. • Users become contributors more often. • Examples: • Python • Jupyter • NumPy • SciPy • Pandas • Need to work at a company to be the leader, • Many users, fewer developers • Need to understand incentive of company to understand health • Examples: • Swift • Tensorflow • PyTorch • Conda Both can be valuable, but have different implications! Governance models
  • 18.
  • 19.
    My First BigProject Started as Multipack in 1998 and became SciPy in 2001 with the help of other colleagues 115 releases, 854 contributors Used by: 187,386
  • 20.
    SciPy “Distribution of PythonNumerical Tools masquerading as one Library” Name Description cluster KMeans and Vector Quantization fftpack Discrete Fourier Transform integrate Numerical Integration interpolate Interpolation routines io Data Input and Output linalg Fast Linear algebra misc Utilities ndimage N-dimensional Image processing Name Description odr Orthogonal Distance Regression optimize Constrained and Unconstrained Optimization signal Signal Processing Tools sparse Sparse Matrices and Algebra spatial Spatial Data Structures and Algorithms special Special functions (e.g. Bessel) stats Statistical Functions and Distributions
  • 21.
    Published: February 3,2020 Project Started: 1998 Patience and Persistence and Grit
  • 22.
    My Open Source addictioncontinued… Gave up my chance at tenured academic position in 2005-2006 to bring together the diverging array community in Python and bring Numeric and Numarray together. 170 releases, 923 contributors Used by: 378,828
  • 23.
    Without NumPy from mathimport sin, pi def sinc(x): if x == 0: return 1.0 else: pix = pi*x return sin(pix)/pix def step(x): if x > 0: return 1.0 elif x < 0: return 0.0 else: return 0.5 functions.py >>> import functions as f >>> xval = [x/3.0 for x in range(-10,10)] >>> yval1 = [f.sinc(x) for x in xval] >>> yval2 = [f.step(x) for x in xval] Python is a great language but needed a way to operate quickly and cleanly over multi- dimensional arrays.
  • 24.
    With NumPy from numpyimport sin, pi from numpy import vectorize import functions as f vsinc = vectorize(f.sinc) def sinc(x): pix = pi*x val = sin(pix)/pix val[x==0] = 1.0 return val vstep = vectorize(f.step) def step(x): y = x*0.0 y[x>0] = 1 y[x==0] = 0.5 return y >>> import functions2 as f >>> from numpy import * >>> x = r_[-10:10]/3.0 >>> y1 = f.sinc(x) >>> y2 = f.step(x) functions2.py Offers N-D array, element-by-element functions, and basic random numbers, linear algebra, and FFT capability for Python http://numpy.org Fiscally sponsored by NumFOCUS
  • 25.
    NumPy: an ArrayExtension of Python • Data: the array object – slicing and shaping – data-type map to Bytes • Fast Math (ufuncs): – vectorization – broadcasting – aggregations
  • 26.
    shape NumPy Array Key Attributes •dtype • shape • ndim • strides • data
  • 27.
    NumPy Examples 2d array 3darray [439 472 477] [217 205 261 222 245 238] 9.98330639789 2.96677717122
  • 28.
    NumPy Slicing (Selection) >>>a[0,3:5] array([3, 4]) >>> a[4:,4:] array([[44, 45], [54, 55]]) >>> a[:,2] array([2,12,22,32,42,52]) >>> a[2::2,::2] array([[20, 22, 24], [40, 42, 44]])
  • 29.
    Summary • Provides foundationalN-dimensional array composed of homogeneous elements of a particular “dtype” • The dtype of the elements is extensive (but difficult to extend) • Arrays can be sliced and diced with simple syntax to provide easy manipulation and selection. • Provides fast and powerful math, statistics, and linear algebra functions that operate over arrays. • Utilities for sorting, reading and writing data also provided.
  • 30.
    Bokeh Adapted from JakeVanderplas PyCon 2017 Keynote
  • 31.
    Scale Up vsScale Out Big Memory & Many Cores / GPU Box Best of Both (e.g. GPU Cluster) Many commodity nodes in a cluster ScaleUp (BiggerNodes) Scale Out (More Nodes) Numba Dask Dask with Numba
  • 32.
    • Python isone of the most popular languages for data science • Python integrates well with compiled, accelerated libraries (MKL, TensorFlow, etc) • But what about custom algorithms and data processing tasks? • Our goal was to make a compiler that: • Worked within the standard Python interpreter, not replaced it • Integrated tightly with NumPy • Compatible with both multithreaded and distributed computing paradigms A Compiler for Python? Combining Productivity and Performance
  • 33.
    • An open-source,function-at-a-time compiler library for Python • Compiler toolbox for different targets and execution models: • single-threaded CPU, multi-threaded CPU, GPU • regular functions, “universal functions” (array functions), etc • Speedup: 2x (compared to basic NumPy code) to 200x (compared to pure Python) • Combine ease of writing Python with speeds approaching C/FORTRAN • Empower data scientists who make tools for themselves and other data scientists Numba: A JIT Compiler for Python
  • 34.
    7 things aboutNumba you may not know 1 2 3 4 5 6 7 Numba is 100% Open Source Numba + Jupyter = Rapid CUDA Prototyping Numba can compile for the CPU and the GPU at the same time Numba makes array processing easy with @(gu)vectorize Numba comes with a CUDA Simulator You can send Numba functions over the network Numba developers contributing to NVIDIA new rapids.ai work.
  • 35.
    Numba (compile Pythonto CPUs and GPUs) conda install numba Intermediate Representation (IR) x86 ARM PTX Python LLVMNumba Code Generation Backend Parsing Frontend
  • 36.
    How does Numbawork? Python Function (bytecode) Bytecode Analysis Functions Arguments Numba IR Machine Code Execute! Type Inference LLVM/NVVM JIT LLVM IR Lowering Rewrite IR Cache @jit def do_math(a, b): … >>> do_math(x, y)
  • 37.
    Supported Platforms andHardware OS HW SW Windows (7 and later) 32 and 64-bit CPUs (Incl Xeon Phi) Python 2.7, 3.4-3.7 OS X (10.9 and later) CUDA & HSA GPUs NumPy 1.10 and later Linux (RHEL 6 and later) Some support for ARM and ROCm
  • 38.
  • 39.
    Basic Example Array Allocation Loopingover ndarray x as an iterator Using numpy math functions Returning a slice of the array 2.7x speedup! Numba decorator (nopython=True not required)
  • 40.
    • Detects CPUmodel during compilation and optimizes for that target • Automatic type inference: No need to give type signatures for functions • Dispatches to multiple type-specializations for the same function • Call out to C libraries with CFFI and types • Special "callback" mode for creating C callbacks to use with external libraries • Optional caching to disk, and ahead-of-time creation of shared libraries • Compiler is extensible with new data types and functions Numba Features
  • 41.
    • Numba's CPUdetection will enable LLVM to autovectorize for appropriate SIMD instruction set: • SSE, AVX, AVX2, AVX-512 • Will become even more important as AVX-512 is now available on both Xeon Phi and Skylake Xeon processors SIMD: Single Instruction Multiple Data
  • 42.
    Manual Multithreading: Releasethe GIL SpeedupRatio 0 0.9 1.8 2.6 3.5 Number of Threads 1 2 4 Option to release the GIL Using Python concurrent.futures
  • 43.
    Universal Functions (Ufuncs) Ufuncsare a core concept in NumPy for array-oriented computing. ◦ A function with scalar inputs is broadcast across the elements of the input arrays: • np.add([1,2,3], 3) == [4, 5, 6] • np.add([1,2,3], [10, 20, 30]) == [11, 22, 33] ◦ Parallelism is present, by construction. Numba will generate loops and can automatically multi-thread if requested. ◦ Before Numba, creating fast ufuncs required writing C. No longer!
  • 44.
  • 45.
    Multi-threaded Ufuncs Specify typesignature Select parallel target Automatically uses all CPU cores! target=“cuda” and “hsa” for easily using multiple cores on a GPU available too!
  • 46.
    Other Numba topics CUDAPython — write general NVIDIA GPU kernels with Python Device Arrays — manage memory transfer from host to GPU Streaming — manage asynchronous and parallel GPU compute streams CUDA Simulator in Python — to help debug your kernels HSA — support for AMD ROCm GPUs and APUs Pyculib — access to cuFFT, cuBLAS, cuSPARSE, cuRAND, CUDA Sorting https://github.com/ContinuumIO/gtc2017-numba
  • 47.
  • 48.
    • Designed toparallelize the Python ecosystem • Handles complex algorithms • Co-developed with Pandas/SKLearn/Jupyter teams • Familiar APIs for Python users • Scales • Scales from multicore to 1000-node clusters • Resilience, responsive, and real-time
  • 49.
    • Parallelizes NumPy,Pandas, SKLearn • Satisfies subset of these APIs • Uses these libraries internally • Co-developed with these teams • Task scheduler supports custom algorithms • Parallelize existing code • Build novel real-time systems • Arbitrary task graphs with data dependencies • Same scalability
  • 50.
  • 51.
    Dask: From UserInteraction to Execution 51 delayed
  • 52.
    52 >>> import pandasas pd >>> df = pd.read_csv('iris.csv') >>> df.head() sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa >>> max_sepal_length_setosa = df[df.species == 'setosa'].sepal_length.max() 5.7999999999999998 >>> import dask.dataframe as dd >>> ddf = dd.read_csv('*.csv') >>> ddf.head() sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa … >>> d_max_sepal_length_setosa = ddf[ddf.species == 'setosa'].sepal_length.max() >>> d_max_sepal_length_setosa.compute() 5.7999999999999998 Dask DataFrame is like Pandas
  • 53.
    Example 1: UsingDask DataFrames on a cluster with CSV data 53 • Built from Pandas DataFrames • Match Pandas interface • Access data from HDFS, S3, local, etc. • Fast, low latency • Responsive user interface
  • 54.
    54 >>> import numpyas np >>> np_ones = np.ones((5000, 1000)) >>> np_ones array([[ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.], ..., [ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.]]) >>> np_y = np.log(np_ones + 1)[:5].sum(axis=1) >>> np_y array([ 693.14718056, 693.14718056, 693.14718056, 693.14718056, 693.14718056]) >>> import dask.array as da >>> da_ones = da.ones((5000000, 1000000), chunks=(1000, 1000)) >>> da_ones.compute() array([[ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.], ..., [ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.]]) >>> da_y = da.log(da_ones + 1)[:5].sum(axis=1) >>> np_da_y = np.array(da_y) #fits in memory array([ 693.14718056, 693.14718056, 693.14718056, 693.14718056, …, 693.14718056]) # If result doesn’t fit in memory >>> da_y.to_hdf5('myfile.hdf5', 'result') Dask Array is like NumPy
  • 55.
    Example 3: UsingDask Arrays with global temperature data 55 • Built from NumPy n-dimensional arrays • Matches NumPy interface (subset) • Solve medium-large problems • Complex algorithms
  • 56.
  • 57.
    • Scheduling arbitrarygraphs is hard. • Optimal graph scheduling is NP-hard • Scalable Scheduling requires Linear time solutions • Fortunately dask does well with a lot of heuristics • … and a lot of monitoring and data about sizes • … and how long functions take. Dask Scheduler 57
  • 58.
    Cluster Architecture Diagram 58 ClientMachine Compute Node Compute Node Compute Node Head Node
  • 59.
    Beautiful Diagnostic Dashboards •Fast responsive dashboards • Provide users performance insight • Powered by Bokeh
  • 60.
    • Dask isnot a SQL database. Does Pandas well, but won’t optimize complex queries. • Dask is not MPI Very fast, but does leave some performance on the table 200us task overhead a couple copies in the network stack • Dask is not a JVM technology It’s a Python library (although Julia bindings available) • Dask is not always necessary You may not need parallelism Reasons not to Use: Dask’s limitations
  • 61.
    Python has taken over! Thanks to1000s of of my “closest” friends who worked on all the libraries We won! (sort of)
  • 62.
    Downloads 49 Million Estimated Cost $7.57Million Contributors 866 Estimated Effort 76 person-years 4 Current Maintainers Downloads 27.7 Million Estimated Cost $7 Million Contributors 1,666 Estimated Effort 70 person-years 3 Current Maintainers Downloads 13.8 Million Estimated Cost $6.63 Million Contributors 860 Estimated Effort 64 person-years 2 Current Maintainers Development began in 2003 Development began in 2005 Development began in 2008 The original developers were not paid to work on or improve these libraries! Improving with QLabs
  • 63.
    What is nextfor me? What am I working on for the next few years…
  • 64.
    High Level APIsfor Arrays (Tensors), DataFrames, and DataTypes LABS
  • 65.
    The extensions arean anchor to Python runtime progress! CPython C-API
  • 66.
    What will work! •Create a statically typed subset of Python that is then used to extend Python — EPython • Port NumPy, SciPy, Scikits to EPython (borrow heavily from Cython ideas but use mypy-style typing instead of new syntax).
  • 67.
    LABS Sustaining the Future Open-sourceinnovation and maintenance around the entire data- science and AI workflow. • NumPy ecosystem maintenance (PyData Core Team) • Improve connection of NumPy to ML Frameworks • GPU Support for NumPy Ecosystem • Improve foundations of Array computing • JupyterLab and JupyterHub • Data Catalog standards • Packaging (conda-forge, PyPA, etc.) PySparse - sparse n-d arrays Ibis - Pandas-like front-end to SQL uarray — unified array interface for SciPy refactor xnd — re-factored NumPy (low-level cross-language libraries for N-D (tensor) computing) Collaborating with NumFOCUS! Bokeh Adapted from Jake Vanderplas PyCon 2017 Keynote
  • 68.
    Problem Open Source Teams ●Burned out ● Underrepresented ● Underpaid Organizations ● Disconnected from the Community ● Lack support and maintenance There’s no easy way to connect the community with organizations
  • 69.
    Marketplace for OpenSource Services Partners ● Provide Open Source Services ● Training / Support ● Feature development / fixes ● Hire Open Source Devs Clients ● Pay for support ● Pay for training and mentoring ● Get support they need to build effectively on open- source. Open-source Contributors create profiles for themselves and manage their reputation to get hired or work with both!
  • 70.
    FairOSS A Public BenefitCompany (goal is growing amount of freely available software) • Owned by open-source contributors (will be doing a public fund-raise later this year) • Those share-holders govern the organization (elect the board). • Board appoints management and decides what is “fair” Holds Companies accountable • Allows usage of its trademarks only for companies that contribute back “fairly” • Think “Kosher” or “Organic labeled” • Companies give back by equity, revenue, and “in-kind” agreements with FairOSS FairOSS is custodian of Revenue and Equity Agreements • Equity agreements mean that FairOSS holds shares, options, or warrants of the company (most companies are missing open-source community from their ‘cap-table’) • Revenue agreements mean that companies pay FairOSS a portion of their revenue. • FairOSS distributes almost all of the proceeds from these agreements to the open- source communities. If successful — this would make OpenSource investable and make available >$45,000,000,000,000 (trillion) of investment capital to open-source communities.
  • 71.
    You can reallychange the world… With Open Source Communities… Let’s do more of that!
  • 72.