Big ideas shaping scientific Python
The quest for performance and usability
Ralf Gommers
PyData Paris
30 September 2025
NumPy 2.0
Two decades in the making - why finally in 2024,
and what did it bring us?
$ uv venv venv-pandas --python=3.12
$ source venv-pandas/bin/activate
$ uv pip install ipython pandas==2.1.1
$ ipython
...
In [1]: import pandas as pd
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
File interval.pyx:1, in init pandas._libs.interval
()
ValueError: numpy.dtype size changed, may indicate binary incompatibility.
Expected 96 from C head
er, got 88 from PyObject
Effect of changing the ABI
The more benign version:
$ python run_analysis.py
...
Program received signal SIGSEGV, Segmentation fault.
Windows fatal exception: code 0xc06c005d
Main thread:
Current thread 0x00003934 (most recent call first):
File "C:UsersralfAppDataLocalTempipykernel_114611893578611.py", line 1
in <module>
Restarting kernel...
The less benign version:
Effect of changing the ABI
or
Chosen ambition level: high
(Sep 26, 2025)
What did we get out of this effort?
Primarily:
● Usability: cleaner Python and C APIs, and array API standard support
● Better maintainability of NumPy internals
● Motivation & accomplishment of individual contributors
Also:
● Some new features: better FFT support, custom dtypes
● Some performance improvements (those just landed at the right time,
would have happened anyway)
Who helped make it possible?
The individuals really wanting to make this happen and driving the change
A large group of people throughout the community helping with direct
contributions to NumPy, and with adapting their own packages
The companies and funders who sponsored a significant fraction of the
work by some of the folks doing the heaviest lifting:
Array types support
Using PyTorch, CuPy, JAX and beyond
- also on GPUs -
in scientific Python libraries
The idea (2020): an “array API standard” for portable code in array-consuming libs
def softmax(x):
# grab standard namespace from the passed-in array
xp = array_namespace(x)
x_exp = xp.exp(x)
partition = xp.sum(x_exp, axis=1,keepdims=True)
return x_exp / partition
The idea (2020): an “array API standard” for portable code in array-consuming libs
def softmax(x):
# grab standard namespace from the passed-in array
xp = array_namespace(x)
x_exp = xp.exp(x)
partition = xp.sum(x_exp, axis=1,keepdims=True)
return x_exp / partition
Wide participation (2020-2021) to ensure design fits all array libraries:
State October 2021
Early demo: LIGO data analysis
Image credit: https://labs.quansight.org/blog/array-libraries-interoperability, Anirudh Dagar
LIGO collaboration released notebooks with their gravitational wave detection data
analysis - by using array API support, it was shown to also work with PyTorch.
The key problem
First stable release
Performance: scikit-learn LDA (PyTorch)
Image credit: https://labs.quansight.org/blog/array-api-support-scikit-learn, Thomas Fan
Performance: scipy.cluster (JAX)
Performance: Rotation transforms
Image credit: https://github.com/scipy/scipy/pull/23249, Martin Schuck
Pure Python rewrite of scipy.spatial.transform.Rotation can be orders of magnitude
faster with JAX & PyTorch than the original Cython code with NumPy.
Q: Sounds great, can I use this today?
A: Yes! Check docs for coverage, and use:
export SCIPY_ARRAY_API=1 (SciPy)
set_config(array_api_dispatch=True) (Scikit-learn)
What are we getting out of this effort?
● Performance! 🚀
● Composability / functionality.
E.g., functionality that was NumPy-only at first is now becoming available for other array
libraries. And it is now possible to write new pure Python libraries that are generic
across array libraries.
Who is helping make it possible?
The individuals really wanting to make this happen and driving the change
A large group of people throughout the community helping with
contributions to and testing of array & array-consuming libraries.
The companies and funders who sponsored a significant fraction of the
work by some of the folks doing the heaviest lifting:
Free-threading
Python-level threading to unlock
major performance gains
Image credit: Unraveling Community Support For Free-Threaded Python (PyCon 2025), Lysandros Nikolaou & Nathan Goldbaum
Threading in Python
Default (with GIL) Python
Pure Python code: only a single thread
executes at any time
Native code in extension modules: parallel
execution only when releasing the GIL
Free-threaded Python
All pure Python and native code can
execute in parallel
Image credit: https://py-free-threading.github.io/examples/mandelbrot-threads/, Nathan Goldbaum
Parallel scaling of calculation time for a Mandelbrot set, (800, 800) ndarray
Performance: Mandelbrot set
Image credit: https://github.com/numpy/numpy/pull/27896, Nathan Goldbaum
Performance of parallel operations on small arrays/lists
Performance: parallel ops on arrays
Image credit: https://labs.quansight.org/blog/scaling-asyncio-on-free-threaded-python, Kumar Aditya
Raw TCP performance on a Windows machine with 6 physical CPU cores
Performance: asyncio
Image credit: https://labs.quansight.org/blog/scaling-asyncio-on-free-threaded-python, Kumar Aditya
Scraping Hacker News stories with Beautiful Soup, aiohttp, asyncio.
Multi-threaded measurements are done with 8 threads on a 12-core machine.
Performance: web scraping
Image credit: https://developer.nvidia.com/blog/improved-data-loading-with-threads, Michał Szołucha & Rostan Tabet
nvImageCodec performance scales only with threads. When using processes, it
needs N copies of the CUDA context; switching between them has high overhead.
Performance: image decoding on GPU
Image credit: https://trent.me/articles/pytorch-and-python-free-threading, Trent Nelson
Performance: PyTorch GPT2 inference
Inference (text generation) on single GPU, accessed from multiple threads. Performance is
http server inference requests per second. CPU: 20 physical cores. GPU: Tesla V100, 32 GB.
The GPT2 model used has 124M parameters.
Timeline
Timeline - ecosystem support
Image credit: https://hugovk.github.io/free-threaded-wheels, Hugo van Kemenade
15 May 2025 26 Sep 2025
Ecosystem support is growing quickly
Q: Sounds great, can I use this today?
A: It’s still early days, however Python 3.14 comes out
in exactly one week and is much more stable than
3.13, and library support is growing fast -
give it a try soon!
What are we getting out of this effort?
● Parallel performance for a range of use cases. 🚀
● Some types of applications become way simpler to implement.
E.g., GUI applications that today have to spend a lot of effort working around the GIL to
ensure their application is responsive enough.
Who is helping make it possible?
The individuals really wanting to make this happen and driving the change:
Sam Gross in particular
A large group of people throughout the community helping with direct
contributions to CPython, and with adapting their own packages
The companies and funders who sponsored a significant fraction of the
work by some of the folks doing the heaviest lifting:
$ uv init --name demo-pydata-paris --python 3.14t
$ uv add numpy scipy jax matplotlib pooch joblib array-api-compat array-api-extra
$ uv run jupyter lab
Demo - putting it all together
Code: https://github.com/rgommers/demo-pydata-paris
Big ideas:
they take individual champions, broad
community buy-in, and significant funding -
and are worth pursuing

PyData Paris keynote: Big ideas shaping scientific Python: the quest for performance and usability

  • 1.
    Big ideas shapingscientific Python The quest for performance and usability Ralf Gommers PyData Paris 30 September 2025
  • 3.
    NumPy 2.0 Two decadesin the making - why finally in 2024, and what did it bring us?
  • 8.
    $ uv venvvenv-pandas --python=3.12 $ source venv-pandas/bin/activate $ uv pip install ipython pandas==2.1.1 $ ipython ... In [1]: import pandas as pd --------------------------------------------------------------------------- ValueError Traceback (most recent call last) ... File interval.pyx:1, in init pandas._libs.interval () ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C head er, got 88 from PyObject Effect of changing the ABI The more benign version:
  • 9.
    $ python run_analysis.py ... Programreceived signal SIGSEGV, Segmentation fault. Windows fatal exception: code 0xc06c005d Main thread: Current thread 0x00003934 (most recent call first): File "C:UsersralfAppDataLocalTempipykernel_114611893578611.py", line 1 in <module> Restarting kernel... The less benign version: Effect of changing the ABI or
  • 12.
  • 14.
  • 15.
    What did weget out of this effort? Primarily: ● Usability: cleaner Python and C APIs, and array API standard support ● Better maintainability of NumPy internals ● Motivation & accomplishment of individual contributors Also: ● Some new features: better FFT support, custom dtypes ● Some performance improvements (those just landed at the right time, would have happened anyway)
  • 16.
    Who helped makeit possible? The individuals really wanting to make this happen and driving the change A large group of people throughout the community helping with direct contributions to NumPy, and with adapting their own packages The companies and funders who sponsored a significant fraction of the work by some of the folks doing the heaviest lifting:
  • 17.
    Array types support UsingPyTorch, CuPy, JAX and beyond - also on GPUs - in scientific Python libraries
  • 18.
    The idea (2020):an “array API standard” for portable code in array-consuming libs def softmax(x): # grab standard namespace from the passed-in array xp = array_namespace(x) x_exp = xp.exp(x) partition = xp.sum(x_exp, axis=1,keepdims=True) return x_exp / partition
  • 19.
    The idea (2020):an “array API standard” for portable code in array-consuming libs def softmax(x): # grab standard namespace from the passed-in array xp = array_namespace(x) x_exp = xp.exp(x) partition = xp.sum(x_exp, axis=1,keepdims=True) return x_exp / partition Wide participation (2020-2021) to ensure design fits all array libraries: State October 2021
  • 20.
    Early demo: LIGOdata analysis Image credit: https://labs.quansight.org/blog/array-libraries-interoperability, Anirudh Dagar LIGO collaboration released notebooks with their gravitational wave detection data analysis - by using array API support, it was shown to also work with PyTorch.
  • 21.
  • 23.
  • 24.
    Performance: scikit-learn LDA(PyTorch) Image credit: https://labs.quansight.org/blog/array-api-support-scikit-learn, Thomas Fan
  • 27.
  • 28.
    Performance: Rotation transforms Imagecredit: https://github.com/scipy/scipy/pull/23249, Martin Schuck Pure Python rewrite of scipy.spatial.transform.Rotation can be orders of magnitude faster with JAX & PyTorch than the original Cython code with NumPy.
  • 29.
    Q: Sounds great,can I use this today? A: Yes! Check docs for coverage, and use: export SCIPY_ARRAY_API=1 (SciPy) set_config(array_api_dispatch=True) (Scikit-learn)
  • 30.
    What are wegetting out of this effort? ● Performance! 🚀 ● Composability / functionality. E.g., functionality that was NumPy-only at first is now becoming available for other array libraries. And it is now possible to write new pure Python libraries that are generic across array libraries.
  • 31.
    Who is helpingmake it possible? The individuals really wanting to make this happen and driving the change A large group of people throughout the community helping with contributions to and testing of array & array-consuming libraries. The companies and funders who sponsored a significant fraction of the work by some of the folks doing the heaviest lifting:
  • 32.
    Free-threading Python-level threading tounlock major performance gains
  • 33.
    Image credit: UnravelingCommunity Support For Free-Threaded Python (PyCon 2025), Lysandros Nikolaou & Nathan Goldbaum Threading in Python Default (with GIL) Python Pure Python code: only a single thread executes at any time Native code in extension modules: parallel execution only when releasing the GIL Free-threaded Python All pure Python and native code can execute in parallel
  • 34.
    Image credit: https://py-free-threading.github.io/examples/mandelbrot-threads/,Nathan Goldbaum Parallel scaling of calculation time for a Mandelbrot set, (800, 800) ndarray Performance: Mandelbrot set
  • 35.
    Image credit: https://github.com/numpy/numpy/pull/27896,Nathan Goldbaum Performance of parallel operations on small arrays/lists Performance: parallel ops on arrays
  • 36.
    Image credit: https://labs.quansight.org/blog/scaling-asyncio-on-free-threaded-python,Kumar Aditya Raw TCP performance on a Windows machine with 6 physical CPU cores Performance: asyncio
  • 37.
    Image credit: https://labs.quansight.org/blog/scaling-asyncio-on-free-threaded-python,Kumar Aditya Scraping Hacker News stories with Beautiful Soup, aiohttp, asyncio. Multi-threaded measurements are done with 8 threads on a 12-core machine. Performance: web scraping
  • 38.
    Image credit: https://developer.nvidia.com/blog/improved-data-loading-with-threads,Michał Szołucha & Rostan Tabet nvImageCodec performance scales only with threads. When using processes, it needs N copies of the CUDA context; switching between them has high overhead. Performance: image decoding on GPU
  • 39.
    Image credit: https://trent.me/articles/pytorch-and-python-free-threading,Trent Nelson Performance: PyTorch GPT2 inference Inference (text generation) on single GPU, accessed from multiple threads. Performance is http server inference requests per second. CPU: 20 physical cores. GPU: Tesla V100, 32 GB. The GPT2 model used has 124M parameters.
  • 40.
  • 41.
  • 42.
    Image credit: https://hugovk.github.io/free-threaded-wheels,Hugo van Kemenade 15 May 2025 26 Sep 2025 Ecosystem support is growing quickly
  • 43.
    Q: Sounds great,can I use this today? A: It’s still early days, however Python 3.14 comes out in exactly one week and is much more stable than 3.13, and library support is growing fast - give it a try soon!
  • 44.
    What are wegetting out of this effort? ● Parallel performance for a range of use cases. 🚀 ● Some types of applications become way simpler to implement. E.g., GUI applications that today have to spend a lot of effort working around the GIL to ensure their application is responsive enough.
  • 45.
    Who is helpingmake it possible? The individuals really wanting to make this happen and driving the change: Sam Gross in particular A large group of people throughout the community helping with direct contributions to CPython, and with adapting their own packages The companies and funders who sponsored a significant fraction of the work by some of the folks doing the heaviest lifting:
  • 46.
    $ uv init--name demo-pydata-paris --python 3.14t $ uv add numpy scipy jax matplotlib pooch joblib array-api-compat array-api-extra $ uv run jupyter lab Demo - putting it all together Code: https://github.com/rgommers/demo-pydata-paris
  • 47.
    Big ideas: they takeindividual champions, broad community buy-in, and significant funding - and are worth pursuing