Travis Oliphant "Python for Speed, Scale, and Science"

Python for Speed,
Scale, and Science
Travis E. Oliphant
@teoliphant
Quansight and OpenTeams

• MS/BS degrees in Elec. Comp. Engineering
• PhD in Biomedical Engineering (Ultrasound and
MRI)
• Creator and Maintainer of SciPy (1998-2009)
• Professor of EE (2001-2007) Inverse Problems
• Creator and Developer of NumPy (2005-2013)
• Started Numba and Conda (2012)
• Founder of NumFOCUS / PyData
• Python Dev and Foundation Director
• CEO/Founder (2012) of Continuum Analytics/
Anaconda, Inc.
• CEO/Founder (2018) of Quansight
• CEO/Founder (2019) of OpenTeams
SciPy

Started my career in computational science
Satellites Measure Backscatter
Computer Algorithms Produce
Estimate of Earth Features
• Wind Speed
• Ice Cover
• Vegetation
• (and more)

1996 - 2001
Analyze 12.0
https://analyzedirect.com/
Richard Robb
Retired in 2015
Bringing “SciFi”
Medicine to Life
since 1971

More Science led to Python
Raja Muthupillai
Armando Manduca
Richard Ehman
1997

Professor
Scanning Impedance Imaging

My little “side projects” became my life
SciPy

1998
2018
2001
2015
2009
2012
2005
…
2001
2006
SciPy, NumPy, and PyData Time-Line
1991
2003
2014
2008
2010 2016
2009
2019

Making Python for Science Popular
renamed
~20 million (Ana)conda users
spun-out

But… Python Can’t Scale…

Java
JavaScript
Python
Google Search Trends
May 2020

Used by some of the Brightest Minds…
LIGO : Gravitional Waves
Higgs Boson
Discovery
Black Hole
Imaging

Scientists have big data and compute needs
Data Size
60 million GB
Compute Power
2000 TeraFLOPS
(~30,000 of my laptop)
How does Python
scale to that?

GTC Europe
NVIDIA CEO Jensen
Huang describes Python
as the DeFacto Data
Science Platform
Getting better supported
by GPUs over the next
months and years.

Reasons for Python’s success in Science
1) Python is expressive and easy to read.
2) Python (in particular CPython) is straightforward to
extend and with Cython, Python has become a glue
language for many other run-times.
When you use Python for speed, scale, and science you are almost always
actually running machine instructions compiled from another language
“glued” together with high-level and expressive Python.
Pythonic code helps me think better. For a scientist it gets out of your way
3) An engaging Open Source Community

Not all open-source is the same!
Community-Driven
Open Source
Software (CDOSS)
Company-Backed
Open Source
Software (CBOSS)
• Anyone can become the leader.
• Multiple-stake holders.
• Can look at community size for health.
• Users become contributors more often.
• Examples:
• Python
• Jupyter
• NumPy
• SciPy
• Pandas
• Need to work at a company to be the
leader,
• Many users, fewer developers
• Need to understand incentive of company
to understand health
• Examples:
• Swift
• Tensorflow
• PyTorch
• Conda
Both can be valuable, but have different implications!
Governance
models

NumPy
Tensorflow
Scikit Learn
PyTorch
NumPy
Pandas

My First Big Project
Started as Multipack in 1998 and became
SciPy in 2001 with the help of other
colleagues
115 releases, 854 contributors
Used by: 187,386

SciPy
“Distribution of Python Numerical Tools masquerading as one Library”
Name Description
cluster KMeans and Vector Quantization
fftpack Discrete Fourier Transform
integrate Numerical Integration
interpolate Interpolation routines
io Data Input and Output
linalg Fast Linear algebra
misc Utilities
ndimage N-dimensional Image processing
Name Description
odr Orthogonal Distance Regression
optimize
Constrained and Unconstrained
Optimization
signal Signal Processing Tools
sparse Sparse Matrices and Algebra
spatial Spatial Data Structures and Algorithms
special Special functions (e.g. Bessel)
stats Statistical Functions and Distributions

Published: February 3, 2020
Project
Started:
1998
Patience and
Persistence and
Grit

My Open Source
addiction continued…
Gave up my chance at tenured academic position
in 2005-2006 to bring together the diverging
array community in Python and bring Numeric
and Numarray together.
170 releases, 923 contributors
Used by: 378,828

Without NumPy
from math import sin, pi
def sinc(x):
if x == 0:
return 1.0
else:
pix = pi*x
return sin(pix)/pix
def step(x):
if x > 0:
return 1.0
elif x < 0:
return 0.0
else:
return 0.5
functions.py
>>> import functions as f
>>> xval = [x/3.0 for x in
range(-10,10)]
>>> yval1 = [f.sinc(x) for x
in xval]
>>> yval2 = [f.step(x) for x
in xval]
Python is a great language but
needed a way to operate quickly
and cleanly over multi-
dimensional arrays.

With NumPy
from numpy import sin, pi
from numpy import vectorize
import functions as f
vsinc = vectorize(f.sinc)
def sinc(x):
pix = pi*x
val = sin(pix)/pix
val[x==0] = 1.0
return val
vstep = vectorize(f.step)
def step(x):
y = x*0.0
y[x>0] = 1
y[x==0] = 0.5
return y
>>> import functions2 as f
>>> from numpy import *
>>> x = r_[-10:10]/3.0
>>> y1 = f.sinc(x)
>>> y2 = f.step(x)
functions2.py
Offers N-D array, element-by-element
functions, and basic random numbers,
linear algebra, and FFT capability for
Python
http://numpy.org
Fiscally sponsored by NumFOCUS

NumPy: an Array Extension of Python
• Data: the array object
– slicing and shaping
– data-type map to Bytes
• Fast Math (ufuncs):
– vectorization
– broadcasting
– aggregations

shape
NumPy Array
Key Attributes
• dtype
• shape
• ndim
• strides
• data

NumPy Examples
2d array
3d array
[439 472 477]
[217 205 261 222 245 238]
9.98330639789 2.96677717122

NumPy Slicing (Selection)
>>> a[0,3:5]
array([3, 4])
>>> a[4:,4:]
array([[44, 45],
[54, 55]])
>>> a[:,2]
array([2,12,22,32,42,52])
>>> a[2::2,::2]
array([[20, 22, 24],
[40, 42, 44]])

Summary
• Provides foundational N-dimensional array composed of
homogeneous elements of a particular “dtype”
• The dtype of the elements is extensive (but difficult to extend)
• Arrays can be sliced and diced with simple syntax to provide
easy manipulation and selection.
• Provides fast and powerful math, statistics, and linear algebra
functions that operate over arrays.
• Utilities for sorting, reading and writing data also provided.

Bokeh
Adapted from Jake Vanderplas
PyCon 2017 Keynote

Scale Up vs Scale Out
Big Memory &
Many Cores
/ GPU Box
Best of Both
(e.g. GPU Cluster)
Many commodity
nodes in a cluster
ScaleUp
(BiggerNodes)
Scale Out
(More Nodes)
Numba
Dask
Dask with Numba

• Python is one of the most popular languages for data science
• Python integrates well with compiled, accelerated libraries (MKL,
TensorFlow, etc)
• But what about custom algorithms and data processing tasks?
• Our goal was to make a compiler that:
• Worked within the standard Python interpreter, not replaced it
• Integrated tightly with NumPy
• Compatible with both multithreaded and distributed computing
paradigms
A Compiler for Python?
Combining Productivity and Performance

• An open-source, function-at-a-time compiler library for Python
• Compiler toolbox for different targets and execution models:
• single-threaded CPU, multi-threaded CPU, GPU
• regular functions, “universal functions” (array functions), etc
• Speedup: 2x (compared to basic NumPy code) to 200x (compared to pure
Python)
• Combine ease of writing Python with speeds approaching C/FORTRAN
• Empower data scientists who make tools for themselves and other data
scientists
Numba: A JIT Compiler for Python

7 things about Numba you may not know
1
2
3
4
5
6
7
Numba is 100% Open Source
Numba + Jupyter = Rapid
CUDA Prototyping
Numba can compile for the
CPU and the GPU at the same time
Numba makes array processing
easy with @(gu)vectorize
Numba comes with a
CUDA Simulator
You can send Numba
functions over the network
Numba developers contributing to
NVIDIA new rapids.ai work.

Numba (compile Python to CPUs and GPUs)
conda install numba
Intermediate
Representation
(IR)
x86
ARM
PTX
Python
LLVMNumba
Code Generation
Backend
Parsing
Frontend

How does Numba work?
Python Function
(bytecode)
Bytecode
Analysis
Functions
Arguments
Numba IR
Machine
Code
Execute!
Type
Inference
LLVM/NVVM JIT LLVM IR
Lowering
Rewrite IR
Cache
@jit
def do_math(a, b):
…
>>> do_math(x, y)

Supported Platforms and Hardware
OS HW SW
Windows
(7 and later)
32 and 64-bit CPUs (Incl
Xeon Phi)
Python 2.7, 3.4-3.7
OS X
(10.9 and later)
CUDA & HSA GPUs NumPy 1.10 and later
Linux
(RHEL 6 and later)
Some support for ARM and
ROCm

Basic Example
Array Allocation
Looping over ndarray x as an iterator
Using numpy math functions
Returning a slice of the array
2.7x speedup!
Numba decorator
(nopython=True not required)

• Detects CPU model during compilation and optimizes for that target
• Automatic type inference: No need to give type signatures for functions
• Dispatches to multiple type-specializations for the same function
• Call out to C libraries with CFFI and types
• Special "callback" mode for creating C callbacks to use with external
libraries
• Optional caching to disk, and ahead-of-time creation of shared libraries
• Compiler is extensible with new data types and functions
Numba Features

• Numba's CPU detection will enable
LLVM to autovectorize for
appropriate SIMD instruction set:
• SSE, AVX, AVX2, AVX-512
• Will become even more important
as AVX-512 is now available on
both Xeon Phi and Skylake Xeon
processors
SIMD: Single Instruction Multiple Data

Manual Multithreading: Release the GIL
SpeedupRatio 0
0.9
1.8
2.6
3.5
Number of Threads
1 2 4
Option to release the GIL
Using Python
concurrent.futures

Universal Functions (Ufuncs)
Ufuncs are a core concept in NumPy for array-oriented
computing.
◦ A function with scalar inputs is broadcast across the elements of
the input arrays:
• np.add([1,2,3], 3) == [4, 5, 6]
• np.add([1,2,3], [10, 20, 30]) == [11, 22, 33]
◦ Parallelism is present, by construction. Numba will generate
loops and can automatically multi-thread if requested.
◦ Before Numba, creating fast ufuncs required writing C. No
longer!

Universal Functions (Ufuncs)
Different decorator!
1.8x speedup!

Multi-threaded Ufuncs
Specify type signature
Select parallel target
Automatically uses all CPU cores!
target=“cuda” and “hsa” for easily using multiple cores on a GPU available too!

Other Numba topics
CUDA Python — write general NVIDIA GPU kernels with Python
Device Arrays — manage memory transfer from host to GPU
Streaming — manage asynchronous and parallel GPU compute streams
CUDA Simulator in Python — to help debug your kernels
HSA — support for AMD ROCm GPUs and APUs
Pyculib — access to cuFFT, cuBLAS, cuSPARSE, cuRAND, CUDA Sorting
https://github.com/ContinuumIO/gtc2017-numba

• Designed to parallelize the Python ecosystem
• Handles complex algorithms
• Co-developed with Pandas/SKLearn/Jupyter teams
• Familiar APIs for Python users
• Scales
• Scales from multicore to 1000-node clusters
• Resilience, responsive, and real-time

• Parallelizes NumPy, Pandas, SKLearn
• Satisfies subset of these APIs
• Uses these libraries internally
• Co-developed with these teams
• Task scheduler supports custom algorithms
• Parallelize existing code
• Build novel real-time systems
• Arbitrary task graphs
with data dependencies
• Same scalability

Dask: From User Interaction to Execution
51
delayed

52
>>> import pandas as pd
>>> df = pd.read_csv('iris.csv')
>>> df.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
>>> max_sepal_length_setosa = df[df.species
== 'setosa'].sepal_length.max()
5.7999999999999998
>>> import dask.dataframe as dd
>>> ddf = dd.read_csv('*.csv')
>>> ddf.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
…
>>> d_max_sepal_length_setosa = ddf[ddf.species
== 'setosa'].sepal_length.max()
>>> d_max_sepal_length_setosa.compute()
5.7999999999999998
Dask DataFrame is like Pandas

Example 1: Using Dask DataFrames on a cluster with CSV
data
53
• Built from Pandas DataFrames
• Match Pandas interface
• Access data from HDFS, S3, local, etc.
• Fast, low latency
• Responsive user interface

54
>>> import numpy as np
>>> np_ones = np.ones((5000, 1000))
>>> np_ones
array([[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
...,
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.]])
>>> np_y = np.log(np_ones + 1)[:5].sum(axis=1)
>>> np_y
array([ 693.14718056, 693.14718056,
693.14718056, 693.14718056, 693.14718056])
>>> import dask.array as da
>>> da_ones = da.ones((5000000, 1000000),
chunks=(1000, 1000))
>>> da_ones.compute()
array([[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
...,
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.]])
>>> da_y = da.log(da_ones + 1)[:5].sum(axis=1)
>>> np_da_y = np.array(da_y) #fits in memory
array([ 693.14718056, 693.14718056,
693.14718056, 693.14718056, …, 693.14718056])
# If result doesn’t fit in memory
>>> da_y.to_hdf5('myfile.hdf5', 'result')
Dask Array is like NumPy

Example 3: Using Dask Arrays with global temperature data
55
• Built from NumPy
n-dimensional arrays
• Matches NumPy interface
(subset)
• Solve medium-large
problems
• Complex algorithms

Dask Schedulers: Distributed Scheduler
56

• Scheduling arbitrary graphs is hard.
• Optimal graph scheduling is NP-hard
• Scalable Scheduling requires Linear time solutions
• Fortunately dask does well with a lot of heuristics
• … and a lot of monitoring and data about sizes
• … and how long functions take.
Dask Scheduler
57

Cluster Architecture Diagram
58
Client Machine Compute
Node
Compute
Node
Compute
Node
Head Node

Beautiful Diagnostic Dashboards
• Fast responsive dashboards
• Provide users performance insight
• Powered by Bokeh

• Dask is not a SQL database.
Does Pandas well, but won’t optimize complex queries.
• Dask is not MPI
Very fast, but does leave some performance on the table
200us task overhead
a couple copies in the network stack
• Dask is not a JVM technology
It’s a Python library
(although Julia bindings available)
• Dask is not always necessary
You may not need parallelism
Reasons not to Use: Dask’s limitations

Python
has taken
over!
Thanks to 1000s of
of my “closest”
friends who worked
on all the libraries
We won!
(sort of)

Downloads
49 Million
Estimated Cost
$7.57 Million
Contributors
866
Estimated Effort
76 person-years
4
Current Maintainers
Downloads
27.7 Million
Estimated Cost
$7 Million
Contributors
1,666
Estimated Effort
70 person-years
3
Current Maintainers
Downloads
13.8 Million
Estimated Cost
$6.63 Million
Contributors
860
Estimated Effort
64 person-years
2
Current Maintainers
Development began in 2003
The original developers were not paid to work on or improve these libraries!
Improving with
QLabs

What is next for me? What
am I working on for the next
few years…

High Level APIs for Arrays (Tensors),
DataFrames, and DataTypes
LABS

The extensions are an anchor to
Python runtime progress!
CPython C-API

What will work!
• Create a statically typed subset
of Python that is then used to
extend Python — EPython
• Port NumPy, SciPy, Scikits to
EPython (borrow heavily from
Cython ideas but use mypy-style
typing instead of new syntax).

LABS
Sustaining the Future
Open-source innovation and
maintenance around the entire data-
science and AI workﬂow.
• NumPy ecosystem maintenance (PyData Core Team)
• Improve connection of NumPy to ML Frameworks
• GPU Support for NumPy Ecosystem
• Improve foundations of Array computing
• JupyterLab and JupyterHub
• Data Catalog standards
• Packaging (conda-forge, PyPA, etc.)
PySparse - sparse n-d arrays
Ibis - Pandas-like front-end to SQL
uarray — unified array interface for SciPy refactor
xnd — re-factored NumPy (low-level cross-language
libraries for N-D (tensor) computing)
Collaborating with NumFOCUS!
Bokeh
Adapted from Jake Vanderplas
PyCon 2017 Keynote

Problem
Open Source Teams
● Burned out
● Underrepresented
● Underpaid
Organizations
● Disconnected from
the Community
● Lack support and
maintenance
There’s no easy way to connect the
community with organizations

Marketplace for Open Source Services
Partners
● Provide Open Source Services
● Training / Support
● Feature development / fixes
● Hire Open Source Devs
Clients
● Pay for support
● Pay for training and
mentoring
● Get support they need to
build effectively on open-
source.
Open-source Contributors create
profiles for themselves and manage
their reputation to get hired or work
with both!

FairOSS
A Public Benefit Company (goal is growing amount of freely available software)
• Owned by open-source contributors (will be doing a public fund-raise later this year)
• Those share-holders govern the organization (elect the board).
• Board appoints management and decides what is “fair”
Holds Companies accountable
• Allows usage of its trademarks only for companies that contribute back “fairly”
• Think “Kosher” or “Organic labeled”
• Companies give back by equity, revenue, and “in-kind” agreements with FairOSS
FairOSS is custodian of Revenue and Equity Agreements
• Equity agreements mean that FairOSS holds shares, options, or warrants of the company
(most companies are missing open-source community from their ‘cap-table’)
• Revenue agreements mean that companies pay FairOSS a portion of their revenue.
• FairOSS distributes almost all of the proceeds from these agreements to the open-
source communities.
If successful — this would make OpenSource investable and
make available >$45,000,000,000,000 (trillion) of investment
capital to open-source communities.

You can really change the world…
With Open Source Communities…
Let’s do more of that!

Travis Oliphant "Python for Speed, Scale, and Science"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Travis Oliphant "Python for Speed, Scale, and Science"

Similar to Travis Oliphant "Python for Speed, Scale, and Science" (20)

More from Fwdays

More from Fwdays (20)

Recently uploaded

Recently uploaded (20)

Travis Oliphant "Python for Speed, Scale, and Science"