SlideShare a Scribd company logo
1 of 31
Download to read offline
The road ahead for scientific
computing with Python
Ralf Gommers
7 Dec 2021
The open source PyData ecosystem
The state of PyData today
#1 language for scientific computing, data science, ML & AI
Estimated user base: 25-40 million users
Incredibly broad ecosystem & high-profile scientific successes
Array-based computing in Python
Today’s Python data ecosystem
Many other array libraries - fragmentation!
Why?
Technical: performance, GPU & distributed computing, and autograd.
Social/commercial: large tech companies want control, and to move fast.
Key technical challenges for the
PyData ecosystem
My top 5 technical challenges
1. Fragmentation in array/tensor libraries & support for heterogeneous
computing
2. (lack of) parallel execution capabilities in the NumPy-based PyData stack
3. Packaging: PyPI (the Python Package Index) imposes serious constraints
4. Performance for algorithms that can’t be expressed in vectorized form.
5. Technical debt: old Fortran 77 & C code hard to maintain; long double
support probably needs removing; distutils is end-of-life
(5) technical debt
SciPy: old Fortran 77 & C code
hard to maintain
long double: basically obsolete (MSVC
on Windows, macOS on arm64 and
Linux/aarch64 all alias it to double)
and takes a ton of time to maintain.
Needs to be replaced by proper
quad-precision float128.
distutils will be removed in Python
3.12 (see PEP 632), packages with a lot
of compiled code should migrate to
Meson or CMake.
These are just 3 out of many examples
(4) performance of numerical code
Vectorization Use compiled code
Python compilers Python interpreters
Pythran
CPython
Plus Cinder, Pyston, and more -- very experimental,
and limited gains for numerical code
(3) packaging constraints
● PyPI does not offer fundamentals for scientific computing like
BLAS/LAPACK or OpenMP. It was not designed for non-Python
dependencies.
● PyPI + pip != a package manager
As a result, PyPI’s author-led model cannot ensure that a set of packages
was built with a consistent set of rules (e.g., a single compiler toolchain)
● PyPI serves 3 distinct purposes:
○ Flow of source code from authors to redistributors (OK)
○ End-user binary installs via wheels (OK-ish)
○ End-user installs from source (very problematic)
(2) lack of parallelism in PyData stack
1. NumPy: single-threaded, except for calls to BLAS/LAPACK
2. SciPy:
a. single-threaded by default, except for calls to BLAS/LAPACK
b. `workers=` API to let user enable multiple threads
3. Scikit-learn:
a. Most functionality single-threaded, with `n_jobs=` API to let user
enable multiple threads
b. Starts to use OpenMP more, for automatic parallelization
c. Complex control (see threadpoolctl package) of NumPy/SciPy’s
BLAS and LAPACK libraries to prevent oversubscription in the
presence of multiprocessing on top of multi-threading.
With 32/64-core CPUs becoming more common, this isn’t a tenable situation.
A common threading layer is needed.
(1) fragmentation & heterogeneous
computing
We need common APIs to address CPUs, GPUs, TPUs, FPGAs & other emerging
hardware. Separate libraries for each type of hardware is not composable.
To address this, we created a
standardization effort - see
https://data-apis.org
Consortium for Python Data API Standards
Goals for and scope of the array API
Syntax and semantics of functions
and objects in the API
Casting rules, broadcasting, indexing,
Python operator support
Data interchange & device support
Execution semantics (e.g. task
scheduling, parallelism, lazy eval)
Non-standard dtypes, masked arrays,
I/O, subclassing array object, C API
Error handling & behaviour for invalid
inputs to functions and methods
Goal 1: enable writing code & packages that support multiple array libraries
Goal 2: make it easy for end users to switch between array libraries
In Scope Out of Scope
Use case: the einops package
● A popular package for array manipulation
● Supports 8 popular array/tensor libraries.
● Almost 50% of the code can be removed through array API standardization!
Array API - participation & adoption
In numpy.array_api namespace
API adoption done
or close to done
Design participation,
adoption in progress or being discussed
In cupy.array_api namespace
In torch (main) namespace
Distributed & GPU arrays with SciPy,
scikit-learn and scikit-image
Key social challenges for the
PyData ecosystem
My top 3 social challenges
1. Sustainability of key projects
2. Big tech has discovered PyData
3. Academia still needs to find its role
Sustainability
For NumPy, SciPy, scikit-learn: each ~10-20 active maintainers, O(10-20 million) users
Funding is still very hard to obtain. Number of funded devs:
● NumPy: 1 full-time, 3 part-time
● SciPy: 2 part-time
● Scikit-learn: 2 full-time, 3 part-time
● Most other projects: volunteer-only
Burnout of maintainers is a real risk/problem.
Funding is mostly coming from independent funders: Sloan Foundation, Moore
Foundation, Chan-Zuckerberg Institute.
Diversity & inclusivity remains challenging: >90% of maintainers are white men
Big tech has discovered PyData
Big tech focuses on a narrower set of capabilities for deep learning than is needed for
scientific computing.
Ex.: complex dtype support only arrived in 2021)
Corporate-backed vs. community-driven culture:
● Tension in funded vs. volunteer efforts
○ “Move fast and break things” vs.
○ “Backwards compatibility for 10+ year old scientific models”
● Bandwidth problem:
TensorFlow/PyTorch have ~200-300 full-time engineers, RAPIDS >50.
Role of academia
Academia relies heavily on scientific open source, however:
● Open source contributions are valued significantly less than papers today.
Even if often much more impactful (ex: 13% of all papers on ArXiV used Matplotlib)
● Institutional funders are behind the times. The investment in, e.g., exoscale
facilities is >100x (or >1000x ?) that in open source.
● Publishers & reviewers need to require source code for papers (slowly happening).
Recommendations:
1. Work on fixing career paths and funding
2. Focus on
a. missing science-specific key pieces (e.g., sparse linear algebra)
b. production-quality code
c. skills building for grad students
A look at NumPy’s technical and
social roadmap
Where is NumPy going - community
NumPy is a community-driven project.
Where is NumPy going - technical
Interoperability
Array API standard support
Extensibility
Easier custom dtypes
Performance
SIMD acceleration on:
x86, arm64, PPC, …?
C++ (?)
Just dipping our toes in the
water here - so far it was just
Python and C
Platform support
PPC, AIX, s390x,
cross-compiling to embedded
ARM systems, ...
Type annotations
Main namespace annotations
just completed
Where is NumPy going - community
NumPy is a community-driven project. Most people are volunteers.
Goals for the coming 1-2 years include:
● Grow more (autonomous) teams: web, docs, triage, f2py, SIMD, API, …
● Further increase the diversity of the team & inclusivity of the project
● Better communication channels: move to Discourse, with separate user
forum
● Increasing active alignment across PyData projects (from processes like
when to drop Python versions, to hairy technical topics like parallelism)
Getting involved
How can you get involved?
Yes, you can! People are friendly, and you have
talent and knowledge that has value!
Domain knowledge (linear algebra, special
functions, statistics, etc.) is just as important as
coding skills.
How can you get involved?
How do you pick a project to start contributing too?
● First, pick something that interests you!
● Look at the activity on GitHub - do pull requests get reviewed
and merged in a reasonable amount of time? Is the feedback
constructive and given in a friendly manner?
● Small vs. large projects
Find me at: ralf.gommers@gmail.com, rgommers, ralfgommers
Thank you!

More Related Content

What's hot

EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingJonathan Dursi
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016MLconf
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016MLconf
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...MLconf
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016MLconf
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016MLconf
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...MLconf
 
Five python libraries should know for machine learning
Five python libraries should know for machine learningFive python libraries should know for machine learning
Five python libraries should know for machine learningNaveen Davis
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016MLconf
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf
 
Get Your Hands Dirty with Intel® Distribution for Python*
Get Your Hands Dirty with Intel® Distribution for Python*Get Your Hands Dirty with Intel® Distribution for Python*
Get Your Hands Dirty with Intel® Distribution for Python*Intel® Software
 
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...Edureka!
 
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Databricks
 
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...MLconf
 
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKBig Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKMatt Stubbs
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15MLconf
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainabilitygeetachauhan
 
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"Fwdays
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextDataWorks Summit
 

What's hot (20)

EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Five python libraries should know for machine learning
Five python libraries should know for machine learningFive python libraries should know for machine learning
Five python libraries should know for machine learning
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Get Your Hands Dirty with Intel® Distribution for Python*
Get Your Hands Dirty with Intel® Distribution for Python*Get Your Hands Dirty with Intel® Distribution for Python*
Get Your Hands Dirty with Intel® Distribution for Python*
 
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
 
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
 
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
 
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKBig Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
 
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured Text
 

Similar to The road ahead for scientific computing with Python

Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019Travis Oliphant
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonInsuk (Chris) Cho
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIYoni Davidson
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataTravis Oliphant
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpcDr Reeja S R
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Codemotion
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinalProf. Wim Van Criekinge
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015Wes McKinney
 
Open-Source Frameworks for Deep Learning: an Overview
Open-Source Frameworks for Deep Learning: an OverviewOpen-Source Frameworks for Deep Learning: an Overview
Open-Source Frameworks for Deep Learning: an OverviewVincenzo Lomonaco
 
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Dan Cundiff
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersIntel® Software
 
How to integrate python into a scala stack
How to integrate python into a scala stackHow to integrate python into a scala stack
How to integrate python into a scala stackFliptop
 
Scallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsScallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsGanesan Narayanasamy
 
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...apidays
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analyticsSouth West Data Meetup
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and FutureKeiichiro Ono
 

Similar to The road ahead for scientific computing with Python (20)

Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyData
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpc
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
 
Open-Source Frameworks for Deep Learning: an Overview
Open-Source Frameworks for Deep Learning: an OverviewOpen-Source Frameworks for Deep Learning: an Overview
Open-Source Frameworks for Deep Learning: an Overview
 
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing Clusters
 
How to integrate python into a scala stack
How to integrate python into a scala stackHow to integrate python into a scala stack
How to integrate python into a scala stack
 
Unit 2 Java
Unit 2 JavaUnit 2 Java
Unit 2 Java
 
Scallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsScallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systems
 
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analytics
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 

More from Ralf Gommers

Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfRalf Gommers
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based programRalf Gommers
 
Python array API standardization - current state and benefits
Python array API standardization - current state and benefitsPython array API standardization - current state and benefits
Python array API standardization - current state and benefitsRalf Gommers
 
Building SciPy kernels with Pythran
Building SciPy kernels with PythranBuilding SciPy kernels with Pythran
Building SciPy kernels with PythranRalf Gommers
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
Strengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond codeStrengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond codeRalf Gommers
 
PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019Ralf Gommers
 
Inside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decadeInside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decadeRalf Gommers
 
__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related conceptsRalf Gommers
 
NumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_sessionNumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_sessionRalf Gommers
 
SciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and CodeSciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and CodeRalf Gommers
 

More from Ralf Gommers (11)

Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdf
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based program
 
Python array API standardization - current state and benefits
Python array API standardization - current state and benefitsPython array API standardization - current state and benefits
Python array API standardization - current state and benefits
 
Building SciPy kernels with Pythran
Building SciPy kernels with PythranBuilding SciPy kernels with Pythran
Building SciPy kernels with Pythran
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Strengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond codeStrengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond code
 
PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019
 
Inside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decadeInside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decade
 
__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts
 
NumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_sessionNumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_session
 
SciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and CodeSciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and Code
 

Recently uploaded

COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...naitiksharma1124
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Andrea Goulet
 
Jax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckJax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckMarc Lester
 
^Clinic ^%[+27788225528*Abortion Pills For Sale In harare
^Clinic ^%[+27788225528*Abortion Pills For Sale In harare^Clinic ^%[+27788225528*Abortion Pills For Sale In harare
^Clinic ^%[+27788225528*Abortion Pills For Sale In hararekasambamuno
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanNeo4j
 
^Clinic ^%[+27788225528*Abortion Pills For Sale In witbank
^Clinic ^%[+27788225528*Abortion Pills For Sale In witbank^Clinic ^%[+27788225528*Abortion Pills For Sale In witbank
^Clinic ^%[+27788225528*Abortion Pills For Sale In witbankkasambamuno
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfWSO2
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Maxim Salnikov
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMarkus Moeller
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024SimonedeGijt
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Lisi Hocke
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Andreas Granig
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024Shane Coughlan
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksJinanKordab
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAShane Coughlan
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfMehmet Akar
 
Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfSrushith Repakula
 
Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14VMware Tanzu
 

Recently uploaded (20)

COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
Jax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckJax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined Deck
 
^Clinic ^%[+27788225528*Abortion Pills For Sale In harare
^Clinic ^%[+27788225528*Abortion Pills For Sale In harare^Clinic ^%[+27788225528*Abortion Pills For Sale In harare
^Clinic ^%[+27788225528*Abortion Pills For Sale In harare
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 
^Clinic ^%[+27788225528*Abortion Pills For Sale In witbank
^Clinic ^%[+27788225528*Abortion Pills For Sale In witbank^Clinic ^%[+27788225528*Abortion Pills For Sale In witbank
^Clinic ^%[+27788225528*Abortion Pills For Sale In witbank
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdf
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
 
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with Links
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdf
 
Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14
 

The road ahead for scientific computing with Python

  • 1. The road ahead for scientific computing with Python Ralf Gommers 7 Dec 2021
  • 2. The open source PyData ecosystem
  • 3.
  • 4. The state of PyData today #1 language for scientific computing, data science, ML & AI Estimated user base: 25-40 million users Incredibly broad ecosystem & high-profile scientific successes
  • 6. Today’s Python data ecosystem Many other array libraries - fragmentation! Why? Technical: performance, GPU & distributed computing, and autograd. Social/commercial: large tech companies want control, and to move fast.
  • 7. Key technical challenges for the PyData ecosystem
  • 8. My top 5 technical challenges 1. Fragmentation in array/tensor libraries & support for heterogeneous computing 2. (lack of) parallel execution capabilities in the NumPy-based PyData stack 3. Packaging: PyPI (the Python Package Index) imposes serious constraints 4. Performance for algorithms that can’t be expressed in vectorized form. 5. Technical debt: old Fortran 77 & C code hard to maintain; long double support probably needs removing; distutils is end-of-life
  • 9. (5) technical debt SciPy: old Fortran 77 & C code hard to maintain long double: basically obsolete (MSVC on Windows, macOS on arm64 and Linux/aarch64 all alias it to double) and takes a ton of time to maintain. Needs to be replaced by proper quad-precision float128. distutils will be removed in Python 3.12 (see PEP 632), packages with a lot of compiled code should migrate to Meson or CMake. These are just 3 out of many examples
  • 10. (4) performance of numerical code Vectorization Use compiled code Python compilers Python interpreters Pythran CPython Plus Cinder, Pyston, and more -- very experimental, and limited gains for numerical code
  • 11. (3) packaging constraints ● PyPI does not offer fundamentals for scientific computing like BLAS/LAPACK or OpenMP. It was not designed for non-Python dependencies. ● PyPI + pip != a package manager As a result, PyPI’s author-led model cannot ensure that a set of packages was built with a consistent set of rules (e.g., a single compiler toolchain) ● PyPI serves 3 distinct purposes: ○ Flow of source code from authors to redistributors (OK) ○ End-user binary installs via wheels (OK-ish) ○ End-user installs from source (very problematic)
  • 12. (2) lack of parallelism in PyData stack 1. NumPy: single-threaded, except for calls to BLAS/LAPACK 2. SciPy: a. single-threaded by default, except for calls to BLAS/LAPACK b. `workers=` API to let user enable multiple threads 3. Scikit-learn: a. Most functionality single-threaded, with `n_jobs=` API to let user enable multiple threads b. Starts to use OpenMP more, for automatic parallelization c. Complex control (see threadpoolctl package) of NumPy/SciPy’s BLAS and LAPACK libraries to prevent oversubscription in the presence of multiprocessing on top of multi-threading. With 32/64-core CPUs becoming more common, this isn’t a tenable situation. A common threading layer is needed.
  • 13. (1) fragmentation & heterogeneous computing We need common APIs to address CPUs, GPUs, TPUs, FPGAs & other emerging hardware. Separate libraries for each type of hardware is not composable. To address this, we created a standardization effort - see https://data-apis.org
  • 14. Consortium for Python Data API Standards
  • 15. Goals for and scope of the array API Syntax and semantics of functions and objects in the API Casting rules, broadcasting, indexing, Python operator support Data interchange & device support Execution semantics (e.g. task scheduling, parallelism, lazy eval) Non-standard dtypes, masked arrays, I/O, subclassing array object, C API Error handling & behaviour for invalid inputs to functions and methods Goal 1: enable writing code & packages that support multiple array libraries Goal 2: make it easy for end users to switch between array libraries In Scope Out of Scope
  • 16. Use case: the einops package ● A popular package for array manipulation ● Supports 8 popular array/tensor libraries. ● Almost 50% of the code can be removed through array API standardization!
  • 17. Array API - participation & adoption In numpy.array_api namespace API adoption done or close to done Design participation, adoption in progress or being discussed In cupy.array_api namespace In torch (main) namespace
  • 18. Distributed & GPU arrays with SciPy, scikit-learn and scikit-image
  • 19. Key social challenges for the PyData ecosystem
  • 20. My top 3 social challenges 1. Sustainability of key projects 2. Big tech has discovered PyData 3. Academia still needs to find its role
  • 21. Sustainability For NumPy, SciPy, scikit-learn: each ~10-20 active maintainers, O(10-20 million) users Funding is still very hard to obtain. Number of funded devs: ● NumPy: 1 full-time, 3 part-time ● SciPy: 2 part-time ● Scikit-learn: 2 full-time, 3 part-time ● Most other projects: volunteer-only Burnout of maintainers is a real risk/problem. Funding is mostly coming from independent funders: Sloan Foundation, Moore Foundation, Chan-Zuckerberg Institute. Diversity & inclusivity remains challenging: >90% of maintainers are white men
  • 22. Big tech has discovered PyData Big tech focuses on a narrower set of capabilities for deep learning than is needed for scientific computing. Ex.: complex dtype support only arrived in 2021) Corporate-backed vs. community-driven culture: ● Tension in funded vs. volunteer efforts ○ “Move fast and break things” vs. ○ “Backwards compatibility for 10+ year old scientific models” ● Bandwidth problem: TensorFlow/PyTorch have ~200-300 full-time engineers, RAPIDS >50.
  • 23. Role of academia Academia relies heavily on scientific open source, however: ● Open source contributions are valued significantly less than papers today. Even if often much more impactful (ex: 13% of all papers on ArXiV used Matplotlib) ● Institutional funders are behind the times. The investment in, e.g., exoscale facilities is >100x (or >1000x ?) that in open source. ● Publishers & reviewers need to require source code for papers (slowly happening). Recommendations: 1. Work on fixing career paths and funding 2. Focus on a. missing science-specific key pieces (e.g., sparse linear algebra) b. production-quality code c. skills building for grad students
  • 24. A look at NumPy’s technical and social roadmap
  • 25. Where is NumPy going - community NumPy is a community-driven project.
  • 26. Where is NumPy going - technical Interoperability Array API standard support Extensibility Easier custom dtypes Performance SIMD acceleration on: x86, arm64, PPC, …? C++ (?) Just dipping our toes in the water here - so far it was just Python and C Platform support PPC, AIX, s390x, cross-compiling to embedded ARM systems, ... Type annotations Main namespace annotations just completed
  • 27. Where is NumPy going - community NumPy is a community-driven project. Most people are volunteers. Goals for the coming 1-2 years include: ● Grow more (autonomous) teams: web, docs, triage, f2py, SIMD, API, … ● Further increase the diversity of the team & inclusivity of the project ● Better communication channels: move to Discourse, with separate user forum ● Increasing active alignment across PyData projects (from processes like when to drop Python versions, to hairy technical topics like parallelism)
  • 29. How can you get involved? Yes, you can! People are friendly, and you have talent and knowledge that has value! Domain knowledge (linear algebra, special functions, statistics, etc.) is just as important as coding skills.
  • 30. How can you get involved? How do you pick a project to start contributing too? ● First, pick something that interests you! ● Look at the activity on GitHub - do pull requests get reviewed and merged in a reasonable amount of time? Is the feedback constructive and given in a friendly manner? ● Small vs. large projects
  • 31. Find me at: ralf.gommers@gmail.com, rgommers, ralfgommers Thank you!