© 2017 Continuum Analytics - Confidential & Proprietary© 2018 Quansight - Confidential & Proprietary
Extending Python: Past, Present, and Future
Quansight
travis@quansight.com
@quansightai
@teoliphant
PyCon EE
September 2019
Python and in particular PyData keeps Growing
Google
Search
Trends
Python now most popular
1998 20182001
2015
2009 20122005
…
2001
2006
A brief history of [my] time [with Python]
1991
2003
2014
2008
2010 2016
2009
Where I started
Started as my graduate student
“procrastination project” (as Multipack)
in 1998 and became SciPy in 2001 with
the help of colleagues.
108 releases, 766 contributors
Used by: 128,495
Pearu Peterson
Estonia was critical
To both SciPy and
NumPy
Where it led for me
Gave up my chance at a tenured academic
position in 2005-2006 to bring together the
diverging array community in Python and unify
Numeric and Numarray.
159 releases, 827 contributors
Used by: 254,856
What amplified data science
Created by Wes McKinney. Also, AQR agreed to
release this data-frame he started at AQR (while
dozens of other data-frames in hedge-funds and
investment banks did not get open-sourced)
106 releases, 1601 contributors
Used by: 139,133
Why Python for ML?
Created by David Cournapeau as Google Summer
of Code Project and then quickly added to by
100s of researchers around the world. Supported
by INRIA.
100 releases, 1433 contributors
Used by: 70,287
First DL Framework in Python
Built at Université de Montréal by Frédéric
Bastien and his students. Many contributors.
Forms foundation for PyMC3 and other libraries.
33 releases, 332 contributors
Used by: 6,194
Bokeh
Adapted from Jake Vanderplas
PyCon 2017 Keynote
Python’s Scientific Ecosystem
Bokeh
Jake Vanderplas PyCon 2017 Keynote
Keys to Python Success
Keys to Python Success
Modular Extensibility
New Types and Functions
Protocol Overloading (i.e. “dunder” methods)
Interoperability
Modular Extensibility
Modules Packages
>>> import numpy
>>> numpy.__file__
{path-prefix}numpy/__init__.py
>>> numpy.__path__
{path-prefix}numpy
>>> numpy.linalg.__file__
{path-prefix}numpy/linalg/__init__.py
>>> import math
>>> math.__file__
{path}math{platform}.so
>>> import os
>>> os.__file__
{path}os.py
.pydor
# my_module.py
a = 3
b = 4
def cross(x,y):
Return a*x + b*y
>>> import my_module
>>> my_module.__file__
{path}my_module.py
>>> ks = my_module.__dict__.keys()
>>> [y for y in ks
if not y.startswith('__')]
['a', 'b', 'cross']
subpackages = []
for name in dir(numpy):
obj = getattr(numpy, name)
if hasattr(obj, '__file__') and 
obj.__file__.endswith('__init__.py')
subpackages.append(obj.__name__)
>>> print subpackages
[‘numpy.matrixlib','numpy.compat','numpy.core',
'numpy.fft','numpy.lib','numpy.linalg','numpy.ma',
'numpy.matrixlib','numpy.polynomial','numpy.random',
'numpy.testing']
New types New functions
class Node:
def __init__(self, item, parent=None):
self.item = item
self.children = []
if parent is not None:
parent.children.append(self)
from math import sqrt
def kurtosis(data):
N = len(data)
mean = sum(data)/N
std = sqrt(sum((x-mean)**2 for x in data)/N)
zi = ((x-mean)/std for x in data)
return sum(z**4 for z in zi)/N - 3
>>> g = Node(“Root”)
>>> type(g)
__main__.Node
>>> type(g).__mro__
(__main__.Node, object)
>>> type(Node).__mro__
(type, object)
>>> type(3)
int
>>> type(3).__mro__
(int, object)
>> type(int).__mro__
(type, object)
>>> type(kurtosis)
function
>>> type(sqrt)
builtin_function_or_method
>>> type(sum)
builtin_function_or_method
>>> import numpy; type(numpy.add)
numpy.ufunc
New Types and Functions
Protocol Overloading
Number
Sequence/Mapping
Object
__str__, __new__, __doc__, __del__,
__init__, __repr__, __setattr__
__getattribute__, __delattr__,
__hash__, __reduce__, __class__,
__dir__, __format__, __reduce_ex__,
__call__, __enter__, __exit__,
__next__, __dict__, __slots__
__getitem__, __setitem__,
__delitem__, __contains__,
__iter__, __reversed__,
__len__
__abs__, __add__, __and__, __bool__,
__ceil__, __divmod__, __eq__, __float__,
__floor__, __floordiv__, __ge__, __gt__,
__index__, __int__, __invert__, __le__,
__lshift__, __lt__, __mod__, __mul__,
__ne__, __neg__, __or__, __pos__, __pow__,
__radd__, __rand__, __rdivmod__,
__rfloordiv__, __rlshift__, __rmod__,
__rmul__, __ror__, __round__, __rpow__,
__rrshift__, __rshift__, __rsub__,
__rtruediv__, __rxor__, __truediv__,
__trunc__, __xor__
Interoperability
C/C++ — Cython, Numba, CFFI, ctypes, boost.python, pybind11
Fortran — f2py
JAVA — Py4J, JPyPe, javabridge
C#/.NET — Python for .NET (pythonnet)
An Opinionated List (there are others)
Rust — PyO3, Rust-CPython
Extending Python in the Past
First problem: Efficient Data Input
The first step is to get the data right
“It’s Always About the Data”
http://www.python.org/doc/essays/refcnt/
Reference Counting Essay
May 1998
Guido van Rossum
TableIO
April 1998
Michael A. Miller
NumPyIO
June 1998
A walk through bitarray
Ilan Schnell
Built all first versions
of Anaconda
bitarray: efficient arrays of booleans
https://github.com/ilanschnell/bitarray
Note: docstrings removed!
>>> import bitarray
>>> bitarray.bitarray.__mro__
(bitarray.bitarray, bitarray._bitarray, object)
>>> type(bitarray.bits2bytes)
builtin_function_or_method
>>> bitarray._sysinfo()
(8, 8, 8, 8, 9223372036854775807
>>> 2**63 - 1
9223372036854775807
Function table for module
Function table for module (Python 2)
Add new _bitarray “built-in” type}
METH_KEYWORDS
Python Functions in C
Single Argument Function because METH_O
_bitarray type def
Must ParseTuple to get arguments
| METH_KEYWORDS to accept a 3rd
dictionary argument to function.
Expands to PyVarObject ob_base;
Powerful but requires care!
• Reference counting (you have to do this manually)
• Error handling (can be tedious)
• Initialization (can byte you badly if you aren’t careful)
• Other run-times (PyPy, RustPython) can’t easily use
your tool.
• You have access to all the machinery Python itself
uses to create all of its own builtins.
• You are literally extending Python with new builtin
types and functions.
• Incredible speed as fast as machine can work.
Extending Python Today
What should you do today?
• Just write your code in Python and use existing extensions.
• If More Speed is needed:
My opinionated modern view
• Use Numba
• Use Cython
• Use mypy (and eventually mypyc)
• Run with PyPy
• Use Rust and PyO3
• Or if few existing extensions being used:
• An open-source, function-at-a-time compiler library for Python
• Compiler toolbox for different targets and execution models:
• single-threaded CPU, multi-threaded CPU, GPU
• regular functions, “universal functions” (array functions), etc
• Speedup: 2x (compared to basic NumPy code) to 200x (compared to pure
Python)
• Combine ease of writing Python with speeds approaching FORTRAN
• Empowers scientists who make tools for themselves and other scientists
Numba: A JIT Compiler for Python
7 things about Numba you may not know
1
2
3
4
5
6
7
Numba is 100% Open Source
Numba + Jupyter = Rapid
CUDA Prototyping
Numba can compile for the
CPU and the GPU at the same time
Numba makes array processing
easy with @(gu)vectorize
Numba comes with a
CUDA Simulator
You can send Numba
functions over the network
Numba has typed Lists and
Dictionaries (soon)
Numba (compile Python to CPUs and GPUs)
conda install numba
Intermediate
Representation
(IR)
x86
ARM
PTX
Python
LLVMNumba
Code Generation
Backend
Parsing
Frontend
How does Numba work?
Python Function
(bytecode)
Bytecode
Analysis
Functions
Arguments
Numba IR
Machine
Code
Execute!
Type
Inference
LLVM/NVVM JIT LLVM IR
Lowering
Rewrite IR
Cache
@jit
def do_math(a, b):
…
>>> do_math(x, y)
Supported Platforms and Hardware
OS HW SW
Windows

(7 and later)
32 and 64-bit CPUs (Incl
Xeon Phi)
Python 2.7, 3.4-3.7
OS X

(10.9 and later)
CUDA & HSA GPUs NumPy 1.10 and later
Linux

(RHEL 6 and later)
Some support for ARM and
ROCm
Basic Example
Basic Example
Array Allocation
Looping over ndarray x as an iterator
Using numpy math functions
Returning a slice of the array
2.7x speedup!
Numba decorator

(nopython=True not required)
• Detects CPU model during compilation and optimizes for that target
• Automatic type inference: No need to give type signatures for functions
• Dispatches to multiple type-specializations for the same function
• Call out to C libraries with CFFI and types
• Special "callback" mode for creating C callbacks to use with external
libraries
• Optional caching to disk, and ahead-of-time creation of shared libraries
• Compiler is extensible with new data types and functions
Numba Features
• Three main technologies for parallelism:
Parallel Computing
SIMD Multi-threading Distributed Computing
x0x1x2x3 x0x1x2x3
x0x3
x2 x1
• Numba's CPU detection will enable
LLVM to autovectorize for
appropriate SIMD instruction set:
• SSE, AVX, AVX2, AVX-512
• Will become even more important
as AVX-512 is now available on
both Xeon Phi and Skylake Xeon
processors
SIMD: Single Instruction Multiple Data
Manual Multithreading: Release the GIL
SpeedupRatio 0
0.9
1.8
2.6
3.5
Number of Threads
1 2 4
Option to release the GIL
Using Python
concurrent.futures
Universal Functions (Ufuncs)
Ufuncs are a core concept in NumPy for array-oriented
computing.
◦ A function with scalar inputs is broadcast across the elements of
the input arrays:
• np.add([1,2,3], 3) == [4, 5, 6]
• np.add([1,2,3], [10, 20, 30]) == [11, 22, 33]
◦ Parallelism is present, by construction. Numba will generate
loops and can automatically multi-thread if requested.
◦ Before Numba, creating fast ufuncs required writing C. No
longer!
Universal Functions (Ufuncs)
Different decorator!
1.8x speedup!
Multi-threaded Ufuncs
Specify type signature
Select parallel target
Automatically uses all CPU cores!
ParallelAccelerator
• ParallelAccelerator is a special compiler pass contributed by Intel Labs
• Todd A. Anderson, Ehsan Totoni, Paul Liu
• Based on similar contribution to Julia
• Automatically generates mulithreaded code in a Numba compiled-
function:
• Array expressions and reductions
• Random functions
• Dot products
• Explicit loops indicated with prange() call
ParallelAccelerator: Example #1
Time(ms)
0
1000
2000
3000
4000
NumPy Numba Numba+PA
1.8x
3.6x
1000000x10 input,
Core i7 Quad Core CPU
ParallelAccelerator: prange()
Time(ms)
0
25
50
75
100
NumPy Numba Numba+PA
4.3x
50x
1000000x10 input,
Core i7 Quad Core CPU
Cython
Cython is Python with C data types
https://www.linkedin.com/in/kwmsmith/
https://github.com/kwmsmith
https://github.com/kwmsmith/scipy-2017-cython-tutorial
Basic use
Create a text file with a .pyx extension along with a setup.py
setup.py
helloworld.pyx
Hint: can use %%cython magic in notebooks
After %load_ext Cython
Borrowed from Cython documentation
cython.readthedocs.io
primes.pyx
Type definitions
Convert to Python list
Some of the C++ stdlib is available
Auto-conversion on return
Creates an extension type (like _bitarray)
Write fast functions that work
on anything supporting PEP3118
buffer protocol.
entropy.pyx
Mypyc
MyPyC
mypyc is a compiler that compiles mypy-annotated, statically typed
Python modules into CPython C extensions.
https://github.com/python/mypy/tree/master/mypyc
• Most type annotations are enforced at runtime (raising TypeError on mismatch)

• Classes are compiled into extension classes without __dict__ (much, but not quite, like if they used __slots__)

• Monkey patching doesn't work

• Instance attributes won't fall back to class attributes if undefined

• Metaclasses not supported

• Also there are still a bunch of bad bugs and unsupported features :)
Still Experimental!
http://mypy-lang.org/examples.htmlFrom
http://mypy-lang.org/examples.html
Extending Python in the
Future
PyPy
IronPython
Jython
MicroPython
RustPython
Batavia
Android?
iOS?Most of these are CPython only!
These extensions are an anchor to
Python runtime progress!
CPython C-API
What do we need?
•A way to extend Python that targets multiple run-
times by default (at least PyPy, CPython,
RustPython) with the ability to add new run-times
•Use a subset of typed-Python to do it — i.e. a
domain-specific extension language in Python
itself
•Need NumPy, Pandas, SciPy, Scikit-Learn, and
more to use this approach (this will take time)
Early Hope
https://github.com/pyhandle/hpy
A Bold Proposal
• Create a Cython-like tool that uses mypy typing
• Borrow heavily from Cython ideas but start a new
project that could be pulled into Python itself.
• At the same time work from below to continue the
clean-up of CPython C-API that has already started.
Need ~$5million commitment for a 3-year project to start this
• Core team of 5+ devs with 1 lead
• 1/2 time project manager and PSF representative
• 3+ community liaisons and developer evangelists
• Start with $500k Phase 0 to prove the idea
• Get total funding from at least 20 companies:
$25k initial buy-in, at least $250k
commitment over 3 years to start the effort.
• Allow up to $100k initial and $1million
commitment.
• Paying participants get project-management
attention and early easy-to-use runtimes and
binary extensions delivered with ability to set
priorities (plus marketing and the knowledge
they are leading Python forward).
How?
LABS
Cooperative
Community
Work Order
• We have the people in our network of
collaborators.
• We have a sales and marketing team
that will pitch this.
• We are just rolling out the proposal.
Interested? travis@quansight.com
We can do this!
A new platform to help open-source projects and
developers thrive professionally and financially.
Sign up to:
• build your open-
source portfolio
• show which
projects you use
• thank contributors
for projects you
love
• (soon) get
connected to
initiatives like the
one to make
Python universally
extensible.
https://openteams.com
LABS
Sustaining the Future
Open-source innovation and
maintenance around the entire data-
science and AI workflow.
• NumPy ecosystem maintenance
• Maintenance and support with PyData core team
• Improve connection of NumPy to ML Frameworks
• GPU Support for NumPy Ecosystem
• Improve foundations of Array computing
• JupyterLab
• Data Catalog standards
• Packaging (conda-forge, PyPA, etc.)
uarray — unified array interface and symbolic NumPy
xnd — re-factored NumPy (low-level cross-language
libraries for N-D (tensor) computing)
Partnered with NumFOCUS
and Ursa Labs (supporting
Apache Arrow)

PyCon Estonia 2019

  • 1.
    © 2017 ContinuumAnalytics - Confidential & Proprietary© 2018 Quansight - Confidential & Proprietary Extending Python: Past, Present, and Future Quansight travis@quansight.com @quansightai @teoliphant PyCon EE September 2019
  • 2.
    Python and inparticular PyData keeps Growing
  • 3.
  • 4.
    1998 20182001 2015 2009 20122005 … 2001 2006 Abrief history of [my] time [with Python] 1991 2003 2014 2008 2010 2016 2009
  • 5.
    Where I started Startedas my graduate student “procrastination project” (as Multipack) in 1998 and became SciPy in 2001 with the help of colleagues. 108 releases, 766 contributors Used by: 128,495 Pearu Peterson Estonia was critical To both SciPy and NumPy
  • 6.
    Where it ledfor me Gave up my chance at a tenured academic position in 2005-2006 to bring together the diverging array community in Python and unify Numeric and Numarray. 159 releases, 827 contributors Used by: 254,856
  • 7.
    What amplified datascience Created by Wes McKinney. Also, AQR agreed to release this data-frame he started at AQR (while dozens of other data-frames in hedge-funds and investment banks did not get open-sourced) 106 releases, 1601 contributors Used by: 139,133
  • 8.
    Why Python forML? Created by David Cournapeau as Google Summer of Code Project and then quickly added to by 100s of researchers around the world. Supported by INRIA. 100 releases, 1433 contributors Used by: 70,287
  • 9.
    First DL Frameworkin Python Built at Université de Montréal by Frédéric Bastien and his students. Many contributors. Forms foundation for PyMC3 and other libraries. 33 releases, 332 contributors Used by: 6,194
  • 10.
    Bokeh Adapted from JakeVanderplas PyCon 2017 Keynote
  • 11.
    Python’s Scientific Ecosystem Bokeh JakeVanderplas PyCon 2017 Keynote
  • 12.
  • 13.
    Keys to PythonSuccess Modular Extensibility New Types and Functions Protocol Overloading (i.e. “dunder” methods) Interoperability
  • 14.
    Modular Extensibility Modules Packages >>>import numpy >>> numpy.__file__ {path-prefix}numpy/__init__.py >>> numpy.__path__ {path-prefix}numpy >>> numpy.linalg.__file__ {path-prefix}numpy/linalg/__init__.py >>> import math >>> math.__file__ {path}math{platform}.so >>> import os >>> os.__file__ {path}os.py .pydor # my_module.py a = 3 b = 4 def cross(x,y): Return a*x + b*y >>> import my_module >>> my_module.__file__ {path}my_module.py >>> ks = my_module.__dict__.keys() >>> [y for y in ks if not y.startswith('__')] ['a', 'b', 'cross'] subpackages = [] for name in dir(numpy): obj = getattr(numpy, name) if hasattr(obj, '__file__') and obj.__file__.endswith('__init__.py') subpackages.append(obj.__name__) >>> print subpackages [‘numpy.matrixlib','numpy.compat','numpy.core', 'numpy.fft','numpy.lib','numpy.linalg','numpy.ma', 'numpy.matrixlib','numpy.polynomial','numpy.random', 'numpy.testing']
  • 15.
    New types Newfunctions class Node: def __init__(self, item, parent=None): self.item = item self.children = [] if parent is not None: parent.children.append(self) from math import sqrt def kurtosis(data): N = len(data) mean = sum(data)/N std = sqrt(sum((x-mean)**2 for x in data)/N) zi = ((x-mean)/std for x in data) return sum(z**4 for z in zi)/N - 3 >>> g = Node(“Root”) >>> type(g) __main__.Node >>> type(g).__mro__ (__main__.Node, object) >>> type(Node).__mro__ (type, object) >>> type(3) int >>> type(3).__mro__ (int, object) >> type(int).__mro__ (type, object) >>> type(kurtosis) function >>> type(sqrt) builtin_function_or_method >>> type(sum) builtin_function_or_method >>> import numpy; type(numpy.add) numpy.ufunc New Types and Functions
  • 16.
    Protocol Overloading Number Sequence/Mapping Object __str__, __new__,__doc__, __del__, __init__, __repr__, __setattr__ __getattribute__, __delattr__, __hash__, __reduce__, __class__, __dir__, __format__, __reduce_ex__, __call__, __enter__, __exit__, __next__, __dict__, __slots__ __getitem__, __setitem__, __delitem__, __contains__, __iter__, __reversed__, __len__ __abs__, __add__, __and__, __bool__, __ceil__, __divmod__, __eq__, __float__, __floor__, __floordiv__, __ge__, __gt__, __index__, __int__, __invert__, __le__, __lshift__, __lt__, __mod__, __mul__, __ne__, __neg__, __or__, __pos__, __pow__, __radd__, __rand__, __rdivmod__, __rfloordiv__, __rlshift__, __rmod__, __rmul__, __ror__, __round__, __rpow__, __rrshift__, __rshift__, __rsub__, __rtruediv__, __rxor__, __truediv__, __trunc__, __xor__
  • 17.
    Interoperability C/C++ — Cython,Numba, CFFI, ctypes, boost.python, pybind11 Fortran — f2py JAVA — Py4J, JPyPe, javabridge C#/.NET — Python for .NET (pythonnet) An Opinionated List (there are others) Rust — PyO3, Rust-CPython
  • 18.
  • 19.
    First problem: EfficientData Input The first step is to get the data right “It’s Always About the Data” http://www.python.org/doc/essays/refcnt/ Reference Counting Essay May 1998 Guido van Rossum TableIO April 1998 Michael A. Miller NumPyIO June 1998
  • 20.
    A walk throughbitarray Ilan Schnell Built all first versions of Anaconda bitarray: efficient arrays of booleans https://github.com/ilanschnell/bitarray
  • 21.
    Note: docstrings removed! >>>import bitarray >>> bitarray.bitarray.__mro__ (bitarray.bitarray, bitarray._bitarray, object) >>> type(bitarray.bits2bytes) builtin_function_or_method >>> bitarray._sysinfo() (8, 8, 8, 8, 9223372036854775807 >>> 2**63 - 1 9223372036854775807
  • 22.
    Function table formodule Function table for module (Python 2) Add new _bitarray “built-in” type} METH_KEYWORDS
  • 23.
    Python Functions inC Single Argument Function because METH_O
  • 24.
  • 25.
    Must ParseTuple toget arguments | METH_KEYWORDS to accept a 3rd dictionary argument to function.
  • 26.
  • 27.
    Powerful but requirescare! • Reference counting (you have to do this manually) • Error handling (can be tedious) • Initialization (can byte you badly if you aren’t careful) • Other run-times (PyPy, RustPython) can’t easily use your tool. • You have access to all the machinery Python itself uses to create all of its own builtins. • You are literally extending Python with new builtin types and functions. • Incredible speed as fast as machine can work.
  • 28.
  • 29.
    What should youdo today? • Just write your code in Python and use existing extensions. • If More Speed is needed: My opinionated modern view • Use Numba • Use Cython • Use mypy (and eventually mypyc) • Run with PyPy • Use Rust and PyO3 • Or if few existing extensions being used:
  • 30.
    • An open-source,function-at-a-time compiler library for Python • Compiler toolbox for different targets and execution models: • single-threaded CPU, multi-threaded CPU, GPU • regular functions, “universal functions” (array functions), etc • Speedup: 2x (compared to basic NumPy code) to 200x (compared to pure Python) • Combine ease of writing Python with speeds approaching FORTRAN • Empowers scientists who make tools for themselves and other scientists Numba: A JIT Compiler for Python
  • 31.
    7 things aboutNumba you may not know 1 2 3 4 5 6 7 Numba is 100% Open Source Numba + Jupyter = Rapid CUDA Prototyping Numba can compile for the CPU and the GPU at the same time Numba makes array processing easy with @(gu)vectorize Numba comes with a CUDA Simulator You can send Numba functions over the network Numba has typed Lists and Dictionaries (soon)
  • 32.
    Numba (compile Pythonto CPUs and GPUs) conda install numba Intermediate Representation (IR) x86 ARM PTX Python LLVMNumba Code Generation Backend Parsing Frontend
  • 33.
    How does Numbawork? Python Function (bytecode) Bytecode Analysis Functions Arguments Numba IR Machine Code Execute! Type Inference LLVM/NVVM JIT LLVM IR Lowering Rewrite IR Cache @jit def do_math(a, b): … >>> do_math(x, y)
  • 34.
    Supported Platforms andHardware OS HW SW Windows
 (7 and later) 32 and 64-bit CPUs (Incl Xeon Phi) Python 2.7, 3.4-3.7 OS X
 (10.9 and later) CUDA & HSA GPUs NumPy 1.10 and later Linux
 (RHEL 6 and later) Some support for ARM and ROCm
  • 35.
  • 36.
    Basic Example Array Allocation Loopingover ndarray x as an iterator Using numpy math functions Returning a slice of the array 2.7x speedup! Numba decorator
 (nopython=True not required)
  • 37.
    • Detects CPUmodel during compilation and optimizes for that target • Automatic type inference: No need to give type signatures for functions • Dispatches to multiple type-specializations for the same function • Call out to C libraries with CFFI and types • Special "callback" mode for creating C callbacks to use with external libraries • Optional caching to disk, and ahead-of-time creation of shared libraries • Compiler is extensible with new data types and functions Numba Features
  • 38.
    • Three maintechnologies for parallelism: Parallel Computing SIMD Multi-threading Distributed Computing x0x1x2x3 x0x1x2x3 x0x3 x2 x1
  • 39.
    • Numba's CPUdetection will enable LLVM to autovectorize for appropriate SIMD instruction set: • SSE, AVX, AVX2, AVX-512 • Will become even more important as AVX-512 is now available on both Xeon Phi and Skylake Xeon processors SIMD: Single Instruction Multiple Data
  • 40.
    Manual Multithreading: Releasethe GIL SpeedupRatio 0 0.9 1.8 2.6 3.5 Number of Threads 1 2 4 Option to release the GIL Using Python concurrent.futures
  • 41.
    Universal Functions (Ufuncs) Ufuncsare a core concept in NumPy for array-oriented computing. ◦ A function with scalar inputs is broadcast across the elements of the input arrays: • np.add([1,2,3], 3) == [4, 5, 6] • np.add([1,2,3], [10, 20, 30]) == [11, 22, 33] ◦ Parallelism is present, by construction. Numba will generate loops and can automatically multi-thread if requested. ◦ Before Numba, creating fast ufuncs required writing C. No longer!
  • 42.
  • 43.
    Multi-threaded Ufuncs Specify typesignature Select parallel target Automatically uses all CPU cores!
  • 44.
    ParallelAccelerator • ParallelAccelerator isa special compiler pass contributed by Intel Labs • Todd A. Anderson, Ehsan Totoni, Paul Liu • Based on similar contribution to Julia • Automatically generates mulithreaded code in a Numba compiled- function: • Array expressions and reductions • Random functions • Dot products • Explicit loops indicated with prange() call
  • 45.
    ParallelAccelerator: Example #1 Time(ms) 0 1000 2000 3000 4000 NumPyNumba Numba+PA 1.8x 3.6x 1000000x10 input, Core i7 Quad Core CPU
  • 46.
    ParallelAccelerator: prange() Time(ms) 0 25 50 75 100 NumPy NumbaNumba+PA 4.3x 50x 1000000x10 input, Core i7 Quad Core CPU
  • 47.
    Cython Cython is Pythonwith C data types
  • 48.
  • 49.
    Basic use Create atext file with a .pyx extension along with a setup.py setup.py helloworld.pyx Hint: can use %%cython magic in notebooks After %load_ext Cython Borrowed from Cython documentation cython.readthedocs.io
  • 50.
  • 51.
    Some of theC++ stdlib is available Auto-conversion on return
  • 52.
    Creates an extensiontype (like _bitarray) Write fast functions that work on anything supporting PEP3118 buffer protocol. entropy.pyx
  • 53.
  • 54.
    MyPyC mypyc is acompiler that compiles mypy-annotated, statically typed Python modules into CPython C extensions. https://github.com/python/mypy/tree/master/mypyc • Most type annotations are enforced at runtime (raising TypeError on mismatch)
 • Classes are compiled into extension classes without __dict__ (much, but not quite, like if they used __slots__)
 • Monkey patching doesn't work
 • Instance attributes won't fall back to class attributes if undefined
 • Metaclasses not supported
 • Also there are still a bunch of bad bugs and unsupported features :) Still Experimental!
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
    These extensions arean anchor to Python runtime progress! CPython C-API
  • 60.
    What do weneed? •A way to extend Python that targets multiple run- times by default (at least PyPy, CPython, RustPython) with the ability to add new run-times •Use a subset of typed-Python to do it — i.e. a domain-specific extension language in Python itself •Need NumPy, Pandas, SciPy, Scikit-Learn, and more to use this approach (this will take time)
  • 61.
  • 62.
    A Bold Proposal •Create a Cython-like tool that uses mypy typing • Borrow heavily from Cython ideas but start a new project that could be pulled into Python itself. • At the same time work from below to continue the clean-up of CPython C-API that has already started.
  • 63.
    Need ~$5million commitmentfor a 3-year project to start this • Core team of 5+ devs with 1 lead • 1/2 time project manager and PSF representative • 3+ community liaisons and developer evangelists • Start with $500k Phase 0 to prove the idea • Get total funding from at least 20 companies: $25k initial buy-in, at least $250k commitment over 3 years to start the effort. • Allow up to $100k initial and $1million commitment. • Paying participants get project-management attention and early easy-to-use runtimes and binary extensions delivered with ability to set priorities (plus marketing and the knowledge they are leading Python forward). How? LABS Cooperative Community Work Order • We have the people in our network of collaborators. • We have a sales and marketing team that will pitch this. • We are just rolling out the proposal. Interested? travis@quansight.com
  • 64.
    We can dothis!
  • 65.
    A new platformto help open-source projects and developers thrive professionally and financially. Sign up to: • build your open- source portfolio • show which projects you use • thank contributors for projects you love • (soon) get connected to initiatives like the one to make Python universally extensible. https://openteams.com
  • 66.
    LABS Sustaining the Future Open-sourceinnovation and maintenance around the entire data- science and AI workflow. • NumPy ecosystem maintenance • Maintenance and support with PyData core team • Improve connection of NumPy to ML Frameworks • GPU Support for NumPy Ecosystem • Improve foundations of Array computing • JupyterLab • Data Catalog standards • Packaging (conda-forge, PyPA, etc.) uarray — unified array interface and symbolic NumPy xnd — re-factored NumPy (low-level cross-language libraries for N-D (tensor) computing) Partnered with NumFOCUS and Ursa Labs (supporting Apache Arrow)