SlideShare a Scribd company logo
1 of 22
Download to read offline
The evolution of array
computing in Python
Ralf Gommers

PyData Amsterdam 2019
whoami
Maintainer of NumPy & SciPy (2010 -- )
NumFOCUS board member (2012 -- 2018)
Director of Quansight Labs (2019 -- )
You can find me at:
https://github.com/rgommers
rgommers@quansight.com
2
Public benefit division of Quansight,
providing a home for a “PyData Core
Team” and growing the community
A very brief history of array computing in Python
3
Numeric
Numarray
1995
2003
2006
2008
2012
2015 -- today
Sparse
… ?
A motivating example
4
Another motivating example
5
Do try this at home
6
$ conda create -n pydata-ams
$ conda activate pydata-ams
$ conda install numpy dask jupyterlab
$ # If you have a NVIDIA GPU. Needs CUDA installed.
$ # CuPy 6.0.0 will be out soon, then conda-installable.
$ pip install --pre cupy-cuda100 # or cupy-90 for CUDA 9
$ python -c "import numpy; print(numpy.__version__)"
1.16.3
$ python -c "import cupy; print(cupy.__version__)"
6.0.0rc1
$ python -c "import dask; print(dask.__version__)"
1.2.0
$ export NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1
The NumPy array protocols - goals
7
Separate NumPy API from NumPy “execution engine”
Allow other libraries (Dask, CuPy, PyTorch, …) to reuse
the NumPy API
Bigger picture: avoid or reduce ecosystem
fragmentation (we don’t want to see a reimplementation of SciPy for
PyTorch, SciPy for Tensorflow, etc.)
The NumPy array protocols - goals
Current state of N-dimensional arrays in Python
8
The NumPy array protocols - goals
What we’re aiming for
9
The NumPy array protocols - concept
10
Function body

(the “implementation”, defines
semantics)
Function signature (the API)
Example for one function - this can be a ufunc or a regular function.
The NumPy array protocols - concept
11
Function body
Function signature (the API)
In short: use the NumPy API, bring your own implementation
Function body
Function signature
Function signature (the API)
if input arg has __array_function__:
execute other_function
Function body
Function signature
==
Using array protocols in your own code
Suitable for code that uses NumPy functions and ndarray
methods.
Try CuPy if you need more performance on large arrays,
and Dask if you want a distributed array.
Let’s play with this in a notebook!
12
Limits to these array protocols
13
Only functions can be overridden. And not even all functions -- only the
ones with an array_like parameter. Important exceptions:
np.array, np.asarray, np.linspace, np.concatenate
NumPy’s roadmap - what’s next?
Interoperability
Roll out __array_function__,
handle subclasses better,
new protocols?
Extensibility
Easier custom dtypes
Performance
Ufunc optimizations,

more SIMD instructions,
...
np.random rewrite
Is about to be merged
Indexing
NEP 21 (oindex/vindex), for
more intuitive behavior.
Type annotations
PEP 484 / mypy compatible
annotations, see numpy-
stubs repo.
14
The array computing landscape
15
maturity
n-D arrays (a.k.a. tensors)
dataframes
CPU
GPU
NumPy
XND
CuPy
PyTorch
xtensor
CPU
GPU
Pandas
MXNet
DyND
cuDF
Apache Arrow
Tensorflow
Uarray
TensorLy
xframe
Xarray
XND
Recreates the foundations of NumPy as a number of
smaller libraries. Plus:
Variable length strings
Ragged arrays
Categorical type
Missing data support
Easy custom dtypes
Automatic multi-threading
JIT compilation (via Numba)
16
XND
17
Variable length strings:
Ragged arrays:
xtensor
A C++ library for n-D arrays. Plus:
Lazy evaluation
Performance - very fast
Can operate on NumPy arrays
Python, Julia and R bindings
JIT compilation (via Pythran)
Built on top: rray (NumPy-like arrays for R)
18
xtensor
19
Uarray
A more general solution than the NumPy array protocols
for building APIs with multiple backends.
Override any object: functions, classes, ufuncs, dtypes,
context managers, and more
Uses multiple dispatch rather than protocols.
20
Uarray
21
Thanks!
Any questions?
22

More Related Content

What's hot

Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsTravis Oliphant
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016MLconf
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataTravis Oliphant
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019Travis Oliphant
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
 
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationTravis Oliphant
 
SciPy 2019: How to Accelerate an Existing Codebase with Numba
SciPy 2019: How to Accelerate an Existing Codebase with NumbaSciPy 2019: How to Accelerate an Existing Codebase with Numba
SciPy 2019: How to Accelerate an Existing Codebase with Numbastan_seibert
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016MLconf
 
Scientific Computing with Python Webinar --- August 28, 2009
Scientific Computing with Python Webinar --- August 28, 2009Scientific Computing with Python Webinar --- August 28, 2009
Scientific Computing with Python Webinar --- August 28, 2009Enthought, Inc.
 
Pycon tw 2013
Pycon tw 2013Pycon tw 2013
Pycon tw 2013show you
 
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16MLconf
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData StackPeadar Coyle
 
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...MLconf
 
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...Edureka!
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf
 
Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Peadar Coyle
 
Five python libraries should know for machine learning
Five python libraries should know for machine learningFive python libraries should know for machine learning
Five python libraries should know for machine learningNaveen Davis
 

What's hot (20)

Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
 
Numba
NumbaNumba
Numba
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyData
 
Python libraries
Python librariesPython libraries
Python libraries
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
PyCon Estonia 2019
PyCon Estonia 2019PyCon Estonia 2019
PyCon Estonia 2019
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft Presentation
 
SciPy 2019: How to Accelerate an Existing Codebase with Numba
SciPy 2019: How to Accelerate an Existing Codebase with NumbaSciPy 2019: How to Accelerate an Existing Codebase with Numba
SciPy 2019: How to Accelerate an Existing Codebase with Numba
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
 
Scientific Computing with Python Webinar --- August 28, 2009
Scientific Computing with Python Webinar --- August 28, 2009Scientific Computing with Python Webinar --- August 28, 2009
Scientific Computing with Python Webinar --- August 28, 2009
 
Pycon tw 2013
Pycon tw 2013Pycon tw 2013
Pycon tw 2013
 
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
 
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
 
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.
 
Five python libraries should know for machine learning
Five python libraries should know for machine learningFive python libraries should know for machine learning
Five python libraries should know for machine learning
 

Similar to The evolution of array computing in Python

Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance PythonIan Ozsvald
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIYoni Davidson
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSPeterAndreasEntschev
 
__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related conceptsRalf Gommers
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Databricks
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsStijn Decubber
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsAsad Masood Qazi
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataTravis Oliphant
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceobdit
 
Querying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and DrillQuerying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and DrillVince Gonzalez
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceTim Ellison
 
Strata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jStrata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jAdam Gibson
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonVitthal Gogate
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezMapR Technologies
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonChristian Perone
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeRizwan Habib
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
 
علم البيانات - Data Sience
علم البيانات - Data Sience علم البيانات - Data Sience
علم البيانات - Data Sience App Ttrainers .com
 

Similar to The evolution of array computing in Python (20)

Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
 
__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
Querying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and DrillQuerying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and Drill
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark Performance
 
Strata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jStrata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4j
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College London
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
علم البيانات - Data Sience
علم البيانات - Data Sience علم البيانات - Data Sience
علم البيانات - Data Sience
 

More from Ralf Gommers

Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfRalf Gommers
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based programRalf Gommers
 
The road ahead for scientific computing with Python
The road ahead for scientific computing with PythonThe road ahead for scientific computing with Python
The road ahead for scientific computing with PythonRalf Gommers
 
Building SciPy kernels with Pythran
Building SciPy kernels with PythranBuilding SciPy kernels with Pythran
Building SciPy kernels with PythranRalf Gommers
 
Strengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond codeStrengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond codeRalf Gommers
 
Inside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decadeInside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decadeRalf Gommers
 
NumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_sessionNumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_sessionRalf Gommers
 
SciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and CodeSciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and CodeRalf Gommers
 

More from Ralf Gommers (8)

Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdf
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based program
 
The road ahead for scientific computing with Python
The road ahead for scientific computing with PythonThe road ahead for scientific computing with Python
The road ahead for scientific computing with Python
 
Building SciPy kernels with Pythran
Building SciPy kernels with PythranBuilding SciPy kernels with Pythran
Building SciPy kernels with Pythran
 
Strengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond codeStrengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond code
 
Inside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decadeInside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decade
 
NumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_sessionNumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_session
 
SciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and CodeSciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and Code
 

Recently uploaded

What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapIshara Amarasekera
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
logical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptxlogical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptxRemote DBA Services
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxAS Design & AST.
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...kalichargn70th171
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
oracle 23c new features for developer and dba
oracle 23c new features for developer and dbaoracle 23c new features for developer and dba
oracle 23c new features for developer and dbaRemote DBA Services
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024OpenMetadata
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 

Recently uploaded (20)

What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery Roadmap
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
logical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptxlogical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptx
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptx
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
oracle 23c new features for developer and dba
oracle 23c new features for developer and dbaoracle 23c new features for developer and dba
oracle 23c new features for developer and dba
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 

The evolution of array computing in Python

  • 1. The evolution of array computing in Python Ralf Gommers
 PyData Amsterdam 2019
  • 2. whoami Maintainer of NumPy & SciPy (2010 -- ) NumFOCUS board member (2012 -- 2018) Director of Quansight Labs (2019 -- ) You can find me at: https://github.com/rgommers rgommers@quansight.com 2 Public benefit division of Quansight, providing a home for a “PyData Core Team” and growing the community
  • 3. A very brief history of array computing in Python 3 Numeric Numarray 1995 2003 2006 2008 2012 2015 -- today Sparse … ?
  • 6. Do try this at home 6 $ conda create -n pydata-ams $ conda activate pydata-ams $ conda install numpy dask jupyterlab $ # If you have a NVIDIA GPU. Needs CUDA installed. $ # CuPy 6.0.0 will be out soon, then conda-installable. $ pip install --pre cupy-cuda100 # or cupy-90 for CUDA 9 $ python -c "import numpy; print(numpy.__version__)" 1.16.3 $ python -c "import cupy; print(cupy.__version__)" 6.0.0rc1 $ python -c "import dask; print(dask.__version__)" 1.2.0 $ export NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1
  • 7. The NumPy array protocols - goals 7 Separate NumPy API from NumPy “execution engine” Allow other libraries (Dask, CuPy, PyTorch, …) to reuse the NumPy API Bigger picture: avoid or reduce ecosystem fragmentation (we don’t want to see a reimplementation of SciPy for PyTorch, SciPy for Tensorflow, etc.)
  • 8. The NumPy array protocols - goals Current state of N-dimensional arrays in Python 8
  • 9. The NumPy array protocols - goals What we’re aiming for 9
  • 10. The NumPy array protocols - concept 10 Function body
 (the “implementation”, defines semantics) Function signature (the API) Example for one function - this can be a ufunc or a regular function.
  • 11. The NumPy array protocols - concept 11 Function body Function signature (the API) In short: use the NumPy API, bring your own implementation Function body Function signature Function signature (the API) if input arg has __array_function__: execute other_function Function body Function signature ==
  • 12. Using array protocols in your own code Suitable for code that uses NumPy functions and ndarray methods. Try CuPy if you need more performance on large arrays, and Dask if you want a distributed array. Let’s play with this in a notebook! 12
  • 13. Limits to these array protocols 13 Only functions can be overridden. And not even all functions -- only the ones with an array_like parameter. Important exceptions: np.array, np.asarray, np.linspace, np.concatenate
  • 14. NumPy’s roadmap - what’s next? Interoperability Roll out __array_function__, handle subclasses better, new protocols? Extensibility Easier custom dtypes Performance Ufunc optimizations,
 more SIMD instructions, ... np.random rewrite Is about to be merged Indexing NEP 21 (oindex/vindex), for more intuitive behavior. Type annotations PEP 484 / mypy compatible annotations, see numpy- stubs repo. 14
  • 15. The array computing landscape 15 maturity n-D arrays (a.k.a. tensors) dataframes CPU GPU NumPy XND CuPy PyTorch xtensor CPU GPU Pandas MXNet DyND cuDF Apache Arrow Tensorflow Uarray TensorLy xframe Xarray
  • 16. XND Recreates the foundations of NumPy as a number of smaller libraries. Plus: Variable length strings Ragged arrays Categorical type Missing data support Easy custom dtypes Automatic multi-threading JIT compilation (via Numba) 16
  • 18. xtensor A C++ library for n-D arrays. Plus: Lazy evaluation Performance - very fast Can operate on NumPy arrays Python, Julia and R bindings JIT compilation (via Pythran) Built on top: rray (NumPy-like arrays for R) 18
  • 20. Uarray A more general solution than the NumPy array protocols for building APIs with multiple backends. Override any object: functions, classes, ufuncs, dtypes, context managers, and more Uses multiple dispatch rather than protocols. 20