PyData Paris keynote: Big ideas shaping scientific Python: the quest for performance and usability

Big ideas shaping scientiﬁc Python
The quest for performance and usability
Ralf Gommers
PyData Paris
30 September 2025

NumPy 2.0
Two decades in the making - why ﬁnally in 2024,
and what did it bring us?

$ uv venv venv-pandas --python=3.12
$ source venv-pandas/bin/activate
$ uv pip install ipython pandas==2.1.1
$ ipython
...
In [1]: import pandas as pd
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
File interval.pyx:1, in init pandas._libs.interval
()
ValueError: numpy.dtype size changed, may indicate binary incompatibility.
Expected 96 from C head
er, got 88 from PyObject
Eﬀect of changing the ABI
The more benign version:

$ python run_analysis.py
...
Program received signal SIGSEGV, Segmentation fault.
Windows fatal exception: code 0xc06c005d
Main thread:
Current thread 0x00003934 (most recent call first):
File "C:UsersralfAppDataLocalTempipykernel_114611893578611.py", line 1
in <module>
Restarting kernel...
The less benign version:
Eﬀect of changing the ABI
or

What did we get out of this eﬀort?
Primarily:
● Usability: cleaner Python and C APIs, and array API standard support
● Better maintainability of NumPy internals
● Motivation & accomplishment of individual contributors
Also:
● Some new features: better FFT support, custom dtypes
● Some performance improvements (those just landed at the right time,
would have happened anyway)

Who helped make it possible?
The individuals really wanting to make this happen and driving the change
A large group of people throughout the community helping with direct
contributions to NumPy, and with adapting their own packages
The companies and funders who sponsored a signiﬁcant fraction of the
work by some of the folks doing the heaviest lifting:

Array types support
Using PyTorch, CuPy, JAX and beyond
- also on GPUs -
in scientiﬁc Python libraries

The idea (2020): an “array API standard” for portable code in array-consuming libs
def softmax(x):
# grab standard namespace from the passed-in array
xp = array_namespace(x)
x_exp = xp.exp(x)
partition = xp.sum(x_exp, axis=1,keepdims=True)
return x_exp / partition

The idea (2020): an “array API standard” for portable code in array-consuming libs
def softmax(x):
# grab standard namespace from the passed-in array
xp = array_namespace(x)
x_exp = xp.exp(x)
partition = xp.sum(x_exp, axis=1,keepdims=True)
return x_exp / partition
Wide participation (2020-2021) to ensure design ﬁts all array libraries:
State October 2021

Early demo: LIGO data analysis
Image credit: https://labs.quansight.org/blog/array-libraries-interoperability, Anirudh Dagar
LIGO collaboration released notebooks with their gravitational wave detection data
analysis - by using array API support, it was shown to also work with PyTorch.

Performance: scikit-learn LDA (PyTorch)
Image credit: https://labs.quansight.org/blog/array-api-support-scikit-learn, Thomas Fan

Performance: scipy.cluster (JAX)

Performance: Rotation transforms
Image credit: https://github.com/scipy/scipy/pull/23249, Martin Schuck
Pure Python rewrite of scipy.spatial.transform.Rotation can be orders of magnitude
faster with JAX & PyTorch than the original Cython code with NumPy.

Q: Sounds great, can I use this today?
A: Yes! Check docs for coverage, and use:
export SCIPY_ARRAY_API=1 (SciPy)
set_config(array_api_dispatch=True) (Scikit-learn)

What are we getting out of this eﬀort?
● Performance! 🚀
● Composability / functionality.
E.g., functionality that was NumPy-only at ﬁrst is now becoming available for other array
libraries. And it is now possible to write new pure Python libraries that are generic
across array libraries.

Who is helping make it possible?
The individuals really wanting to make this happen and driving the change
A large group of people throughout the community helping with
contributions to and testing of array & array-consuming libraries.

Free-threading
Python-level threading to unlock
major performance gains

Image credit: Unraveling Community Support For Free-Threaded Python (PyCon 2025), Lysandros Nikolaou & Nathan Goldbaum
Threading in Python
Default (with GIL) Python
Pure Python code: only a single thread
executes at any time
Native code in extension modules: parallel
execution only when releasing the GIL
Free-threaded Python
All pure Python and native code can
execute in parallel

Image credit: https://py-free-threading.github.io/examples/mandelbrot-threads/, Nathan Goldbaum
Parallel scaling of calculation time for a Mandelbrot set, (800, 800) ndarray
Performance: Mandelbrot set

Image credit: https://github.com/numpy/numpy/pull/27896, Nathan Goldbaum
Performance of parallel operations on small arrays/lists
Performance: parallel ops on arrays

Image credit: https://labs.quansight.org/blog/scaling-asyncio-on-free-threaded-python, Kumar Aditya
Raw TCP performance on a Windows machine with 6 physical CPU cores
Performance: asyncio

Image credit: https://labs.quansight.org/blog/scaling-asyncio-on-free-threaded-python, Kumar Aditya
Scraping Hacker News stories with Beautiful Soup, aiohttp, asyncio.
Multi-threaded measurements are done with 8 threads on a 12-core machine.
Performance: web scraping

Image credit: https://developer.nvidia.com/blog/improved-data-loading-with-threads, Michał Szołucha & Rostan Tabet
nvImageCodec performance scales only with threads. When using processes, it
needs N copies of the CUDA context; switching between them has high overhead.
Performance: image decoding on GPU

Image credit: https://trent.me/articles/pytorch-and-python-free-threading, Trent Nelson
Performance: PyTorch GPT2 inference
Inference (text generation) on single GPU, accessed from multiple threads. Performance is
http server inference requests per second. CPU: 20 physical cores. GPU: Tesla V100, 32 GB.
The GPT2 model used has 124M parameters.

Image credit: https://hugovk.github.io/free-threaded-wheels, Hugo van Kemenade
15 May 2025 26 Sep 2025
Ecosystem support is growing quickly

Q: Sounds great, can I use this today?
A: It’s still early days, however Python 3.14 comes out
in exactly one week and is much more stable than
3.13, and library support is growing fast -
give it a try soon!

What are we getting out of this eﬀort?
● Parallel performance for a range of use cases. 🚀
● Some types of applications become way simpler to implement.
E.g., GUI applications that today have to spend a lot of eﬀort working around the GIL to
ensure their application is responsive enough.

Who is helping make it possible?
The individuals really wanting to make this happen and driving the change:
Sam Gross in particular
A large group of people throughout the community helping with direct
contributions to CPython, and with adapting their own packages

$ uv init --name demo-pydata-paris --python 3.14t
$ uv add numpy scipy jax matplotlib pooch joblib array-api-compat array-api-extra
$ uv run jupyter lab
Demo - putting it all together
Code: https://github.com/rgommers/demo-pydata-paris

Big ideas:
they take individual champions, broad
community buy-in, and signiﬁcant funding -
and are worth pursuing

PyData Paris keynote: Big ideas shaping scientific Python: the quest for performance and usability

More Related Content

Similar to PyData Paris keynote: Big ideas shaping scientific Python: the quest for performance and usability

More from Ralf Gommers

Recently uploaded

PyData Paris keynote: Big ideas shaping scientific Python: the quest for performance and usability