PyCon Estonia 2019

© 2017 Continuum Analytics - Confidential & Proprietary© 2018 Quansight - Confidential & Proprietary
Extending Python: Past, Present, and Future
Quansight
travis@quansight.com
@quansightai
@teoliphant
PyCon EE
September 2019

Python and in particular PyData keeps Growing

Google
Search
Trends
Python now most popular

1998 20182001
2015
2009 20122005
…
2001
2006
A brief history of [my] time [with Python]
1991
2003
2014
2008
2010 2016
2009

Where I started
Started as my graduate student
“procrastination project” (as Multipack)
in 1998 and became SciPy in 2001 with
the help of colleagues.
108 releases, 766 contributors
Used by: 128,495
Pearu Peterson
Estonia was critical
To both SciPy and
NumPy

Where it led for me
Gave up my chance at a tenured academic
position in 2005-2006 to bring together the
diverging array community in Python and unify
Numeric and Numarray.
Used by: 254,856

What amplified data science
Created by Wes McKinney. Also, AQR agreed to
release this data-frame he started at AQR (while
dozens of other data-frames in hedge-funds and
investment banks did not get open-sourced)
Used by: 139,133

Why Python for ML?
Created by David Cournapeau as Google Summer
of Code Project and then quickly added to by
100s of researchers around the world. Supported
by INRIA.
Used by: 70,287

First DL Framework in Python
Built at Université de Montréal by Frédéric
Bastien and his students. Many contributors.
Forms foundation for PyMC3 and other libraries.
Used by: 6,194

Bokeh
Adapted from Jake Vanderplas
PyCon 2017 Keynote

Python’s Scientific Ecosystem
Bokeh
Jake Vanderplas PyCon 2017 Keynote

Keys to Python Success
Modular Extensibility
New Types and Functions
Protocol Overloading (i.e. “dunder” methods)
Interoperability

Modular Extensibility
Modules Packages
>>> import numpy
>>> numpy.__file__
{path-prefix}numpy/__init__.py
>>> numpy.__path__
{path-prefix}numpy
>>> numpy.linalg.__file__
{path-prefix}numpy/linalg/__init__.py
>>> import math
>>> math.__file__
{path}math{platform}.so
>>> import os
>>> os.__file__
{path}os.py
.pydor
# my_module.py
a = 3
b = 4
def cross(x,y):
Return a*x + b*y
>>> import my_module
>>> my_module.__file__
{path}my_module.py
>>> ks = my_module.__dict__.keys()
>>> [y for y in ks
if not y.startswith('__')]
['a', 'b', 'cross']
subpackages = []
for name in dir(numpy):
obj = getattr(numpy, name)
if hasattr(obj, '__file__') and
obj.__file__.endswith('__init__.py')
subpackages.append(obj.__name__)
>>> print subpackages
[‘numpy.matrixlib','numpy.compat','numpy.core',
'numpy.fft','numpy.lib','numpy.linalg','numpy.ma',
'numpy.matrixlib','numpy.polynomial','numpy.random',
'numpy.testing']

New types New functions
class Node:
def __init__(self, item, parent=None):
self.item = item
self.children = []
if parent is not None:
parent.children.append(self)
from math import sqrt
def kurtosis(data):
N = len(data)
mean = sum(data)/N
std = sqrt(sum((x-mean)**2 for x in data)/N)
zi = ((x-mean)/std for x in data)
return sum(z**4 for z in zi)/N - 3
>>> g = Node(“Root”)
>>> type(g)
__main__.Node
>>> type(g).__mro__
(__main__.Node, object)
>>> type(Node).__mro__
(type, object)
>>> type(3)
int
>>> type(3).__mro__
(int, object)
>> type(int).__mro__
(type, object)
>>> type(kurtosis)
function
>>> type(sqrt)
builtin_function_or_method
>>> type(sum)
>>> import numpy; type(numpy.add)
numpy.ufunc
New Types and Functions

Protocol Overloading
Number
Sequence/Mapping
Object
__str__, __new__, __doc__, __del__,
__init__, __repr__, __setattr__
__getattribute__, __delattr__,
__hash__, __reduce__, __class__,
__dir__, __format__, __reduce_ex__,
__call__, __enter__, __exit__,
__next__, __dict__, __slots__
__getitem__, __setitem__,
__delitem__, __contains__,
__iter__, __reversed__,
__len__
__abs__, __add__, __and__, __bool__,
__ceil__, __divmod__, __eq__, __float__,
__floor__, __floordiv__, __ge__, __gt__,
__index__, __int__, __invert__, __le__,
__lshift__, __lt__, __mod__, __mul__,
__ne__, __neg__, __or__, __pos__, __pow__,
__radd__, __rand__, __rdivmod__,
__rfloordiv__, __rlshift__, __rmod__,
__rmul__, __ror__, __round__, __rpow__,
__rrshift__, __rshift__, __rsub__,
__rtruediv__, __rxor__, __truediv__,
__trunc__, __xor__

Interoperability
C/C++ — Cython, Numba, CFFI, ctypes, boost.python, pybind11
Fortran — f2py
JAVA — Py4J, JPyPe, javabridge
C#/.NET — Python for .NET (pythonnet)
An Opinionated List (there are others)
Rust — PyO3, Rust-CPython

First problem: Efficient Data Input
The first step is to get the data right
“It’s Always About the Data”
http://www.python.org/doc/essays/refcnt/
Reference Counting Essay
May 1998
Guido van Rossum
TableIO
April 1998
Michael A. Miller
NumPyIO
June 1998

A walk through bitarray
Ilan Schnell
Built all first versions
of Anaconda
bitarray: efﬁcient arrays of booleans
https://github.com/ilanschnell/bitarray

Note: docstrings removed!
>>> import bitarray
>>> bitarray.bitarray.__mro__
(bitarray.bitarray, bitarray._bitarray, object)
>>> type(bitarray.bits2bytes)
>>> bitarray._sysinfo()
(8, 8, 8, 8, 9223372036854775807
>>> 2**63 - 1
9223372036854775807

Function table for module
Function table for module (Python 2)
Add new _bitarray “built-in” type}
METH_KEYWORDS

Python Functions in C
Single Argument Function because METH_O

Must ParseTuple to get arguments
| METH_KEYWORDS to accept a 3rd
dictionary argument to function.

Expands to PyVarObject ob_base;

Powerful but requires care!
• Reference counting (you have to do this manually)
• Error handling (can be tedious)
• Initialization (can byte you badly if you aren’t careful)
• Other run-times (PyPy, RustPython) can’t easily use
your tool.
• You have access to all the machinery Python itself
uses to create all of its own builtins.
• You are literally extending Python with new builtin
types and functions.
• Incredible speed as fast as machine can work.

What should you do today?
• Just write your code in Python and use existing extensions.
• If More Speed is needed:
My opinionated modern view
• Use Numba
• Use Cython
• Use mypy (and eventually mypyc)
• Run with PyPy
• Use Rust and PyO3
• Or if few existing extensions being used:

• An open-source, function-at-a-time compiler library for Python
• Compiler toolbox for different targets and execution models:
• single-threaded CPU, multi-threaded CPU, GPU
• regular functions, “universal functions” (array functions), etc
• Speedup: 2x (compared to basic NumPy code) to 200x (compared to pure
Python)
• Combine ease of writing Python with speeds approaching FORTRAN
• Empowers scientists who make tools for themselves and other scientists
Numba: A JIT Compiler for Python

7 things about Numba you may not know
1
2
3
4
5
6
7
Numba is 100% Open Source
Numba + Jupyter = Rapid
CUDA Prototyping
Numba can compile for the
CPU and the GPU at the same time
Numba makes array processing
easy with @(gu)vectorize
Numba comes with a
CUDA Simulator
You can send Numba
functions over the network
Numba has typed Lists and
Dictionaries (soon)

Numba (compile Python to CPUs and GPUs)
conda install numba
Intermediate
Representation
(IR)
x86
ARM
PTX
Python
LLVMNumba
Code Generation
Backend
Parsing
Frontend

How does Numba work?
Python Function
(bytecode)
Bytecode
Analysis
Functions
Arguments
Numba IR
Machine
Code
Execute!
Type
Inference
LLVM/NVVM JIT LLVM IR
Lowering
Rewrite IR
Cache
@jit
def do_math(a, b):
…
>>> do_math(x, y)

Supported Platforms and Hardware
OS HW SW
Windows 
(7 and later)
32 and 64-bit CPUs (Incl
Xeon Phi)
Python 2.7, 3.4-3.7
OS X 
(10.9 and later)
CUDA & HSA GPUs NumPy 1.10 and later
Linux 
(RHEL 6 and later)
Some support for ARM and
ROCm

Basic Example
Array Allocation
Looping over ndarray x as an iterator
Using numpy math functions
Returning a slice of the array
2.7x speedup!
Numba decorator 
(nopython=True not required)

• Detects CPU model during compilation and optimizes for that target
• Automatic type inference: No need to give type signatures for functions
• Dispatches to multiple type-specializations for the same function
• Call out to C libraries with CFFI and types
• Special "callback" mode for creating C callbacks to use with external
libraries
• Optional caching to disk, and ahead-of-time creation of shared libraries
• Compiler is extensible with new data types and functions
Numba Features

• Three main technologies for parallelism:
Parallel Computing
SIMD Multi-threading Distributed Computing
x0x1x2x3 x0x1x2x3
x0x3
x2 x1

• Numba's CPU detection will enable
LLVM to autovectorize for
appropriate SIMD instruction set:
• SSE, AVX, AVX2, AVX-512
• Will become even more important
as AVX-512 is now available on
both Xeon Phi and Skylake Xeon
processors
SIMD: Single Instruction Multiple Data

Manual Multithreading: Release the GIL
SpeedupRatio 0
0.9
1.8
2.6
3.5
Number of Threads
1 2 4
Option to release the GIL
Using Python
concurrent.futures

Universal Functions (Ufuncs)
Ufuncs are a core concept in NumPy for array-oriented
computing.
◦ A function with scalar inputs is broadcast across the elements of
the input arrays:
• np.add([1,2,3], 3) == [4, 5, 6]
• np.add([1,2,3], [10, 20, 30]) == [11, 22, 33]
◦ Parallelism is present, by construction. Numba will generate
loops and can automatically multi-thread if requested.
◦ Before Numba, creating fast ufuncs required writing C. No
longer!

Universal Functions (Ufuncs)
Different decorator!
1.8x speedup!

Multi-threaded Ufuncs
Specify type signature
Select parallel target
Automatically uses all CPU cores!

ParallelAccelerator
• ParallelAccelerator is a special compiler pass contributed by Intel Labs
• Todd A. Anderson, Ehsan Totoni, Paul Liu
• Based on similar contribution to Julia
• Automatically generates mulithreaded code in a Numba compiled-
function:
• Array expressions and reductions
• Random functions
• Dot products
• Explicit loops indicated with prange() call

ParallelAccelerator: Example #1
Time(ms)
0
1000
2000
3000
4000
NumPy Numba Numba+PA
1.8x
3.6x
1000000x10 input,
Core i7 Quad Core CPU

ParallelAccelerator: prange()
Time(ms)
0
25
50
75
100
NumPy Numba Numba+PA
4.3x
50x
1000000x10 input,
Core i7 Quad Core CPU

Cython
Cython is Python with C data types

https://www.linkedin.com/in/kwmsmith/
https://github.com/kwmsmith
https://github.com/kwmsmith/scipy-2017-cython-tutorial

Basic use
Create a text file with a .pyx extension along with a setup.py
setup.py
helloworld.pyx
Hint: can use %%cython magic in notebooks
After %load_ext Cython
Borrowed from Cython documentation
cython.readthedocs.io

primes.pyx
Type definitions
Convert to Python list

Some of the C++ stdlib is available
Auto-conversion on return

Creates an extension type (like _bitarray)
Write fast functions that work
on anything supporting PEP3118
buffer protocol.
entropy.pyx

MyPyC
mypyc is a compiler that compiles mypy-annotated, statically typed
Python modules into CPython C extensions.
https://github.com/python/mypy/tree/master/mypyc
• Most type annotations are enforced at runtime (raising TypeError on mismatch) 
• Classes are compiled into extension classes without __dict__ (much, but not quite, like if they used __slots__) 
• Monkey patching doesn't work 
• Instance attributes won't fall back to class attributes if undeﬁned 
• Metaclasses not supported 
• Also there are still a bunch of bad bugs and unsupported features :)
Still Experimental!

http://mypy-lang.org/examples.htmlFrom

http://mypy-lang.org/examples.html

Extending Python in the
Future

PyPy
IronPython
Jython
MicroPython
RustPython
Batavia
Android?
iOS?Most of these are CPython only!

These extensions are an anchor to
Python runtime progress!
CPython C-API

What do we need?
•A way to extend Python that targets multiple run-
times by default (at least PyPy, CPython,
RustPython) with the ability to add new run-times
•Use a subset of typed-Python to do it — i.e. a
domain-specific extension language in Python
itself
•Need NumPy, Pandas, SciPy, Scikit-Learn, and
more to use this approach (this will take time)

Early Hope
https://github.com/pyhandle/hpy

A Bold Proposal
• Create a Cython-like tool that uses mypy typing
• Borrow heavily from Cython ideas but start a new
project that could be pulled into Python itself.
• At the same time work from below to continue the
clean-up of CPython C-API that has already started.

Need ~$5million commitment for a 3-year project to start this
• Core team of 5+ devs with 1 lead
• 1/2 time project manager and PSF representative
• 3+ community liaisons and developer evangelists
• Start with $500k Phase 0 to prove the idea
• Get total funding from at least 20 companies:
$25k initial buy-in, at least $250k
commitment over 3 years to start the effort.
• Allow up to $100k initial and $1million
commitment.
• Paying participants get project-management
attention and early easy-to-use runtimes and
binary extensions delivered with ability to set
priorities (plus marketing and the knowledge
they are leading Python forward).
How?
LABS
Cooperative
Community
Work Order
• We have the people in our network of
collaborators.
• We have a sales and marketing team
that will pitch this.
• We are just rolling out the proposal.
Interested? travis@quansight.com

A new platform to help open-source projects and
developers thrive professionally and financially.
Sign up to:
• build your open-
source portfolio
• show which
projects you use
• thank contributors
for projects you
love
• (soon) get
connected to
initiatives like the
one to make
Python universally
extensible.
https://openteams.com

LABS
Sustaining the Future
Open-source innovation and
maintenance around the entire data-
science and AI workﬂow.
• NumPy ecosystem maintenance
• Maintenance and support with PyData core team
• Improve connection of NumPy to ML Frameworks
• GPU Support for NumPy Ecosystem
• Improve foundations of Array computing
• JupyterLab
• Data Catalog standards
• Packaging (conda-forge, PyPA, etc.)
uarray — unified array interface and symbolic NumPy
xnd — re-factored NumPy (low-level cross-language
libraries for N-D (tensor) computing)
Partnered with NumFOCUS
and Ursa Labs (supporting
Apache Arrow)

PyCon Estonia 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PyCon Estonia 2019

Similar to PyCon Estonia 2019 (20)

More from Travis Oliphant

More from Travis Oliphant (11)

Recently uploaded

Recently uploaded (20)

PyCon Estonia 2019