SlideShare a Scribd company logo
1 of 42
Download to read offline
Awkward Packaging:


building Scikit-HEP
Jim Pivarski, Eduardo Rodrigues, Henry Schreiner
July 14, 2022
SciPy '22
2
History of Python in HEP
(Brief)
Scikit-HEP
New languages: C++ & Python
3
Python 1.0
1994 1998 2002 2006 2010 2014 2018 2022
FORTRAN
ROOT


(C++)
PyROOT


(Official bindings)
ROOTPy (third party
pythonizations)
New PyROOT
Experiments start adopting


Python for configuration
(ROOT is both a toolkit and a file format)
ROOT: C++ toolkit (& interpreter)
4
Physicists only need to learn a single language


Python bindings introduced later as PyROOT
import ROOT


import numpy as np


file = ROOT.TFile("tree.root", "recreate")


tree = ROOT.TTree("name", "title")


px = np.zeros(1, dtype=float)


phi = np.zeros(1, dtype=float)


tree.Branch("px", px, "normal/D")


tree.Branch("phi", phi, "uniform/D")


for i in range(10000):


px[0] = ROOT.gRandom.Gaus(20,2)


phi[0] = ROOT.gRandom.Uniform(2*3.1416)


tree.Fill()


file.Write()


file.Close()
Creating memory for pointer
Assigning pointers to fill from
Write on file (tree is contained in file)
This is classic PyROOT!


More Pythonic methods exist now
Based on https://wiki.physik.uzh.ch/cms/root:pyroot_ttree
Python: a configuration language
5
C++


Components
Python


configuration
Compiled
Runtime
Experiments started driving their C++ applications with Python
Sneaking into analysis
6
More students were entering with Python knowledge


The Python data-science stack was growing
root-numpy & root-pandas


bridged the gap between ROOT & NumPy/Pandas
By 2015, most analysis work could be done in the Python data science stack!


A few domain specific things were “missing”
Data IO Fitting Jagged Data Vectors Particle info
Python analysis circa 2015
7
ROOT


data
PyROOT
C++ / ROOT
Python ecosystem
Python analysis circa 2015
8
ROOT


data
PyROOT


to HDF5
C++ / ROOT
Python DS
stack
Python ecosystem
Other
needs
Python analysis with Scikit-HEP
9
Python DS
stack
uproot
Other
needs
Python ecosystem
Beginnings of a Scikit
10
A new package
11
Eduardo Rodrigues created the Scikit-HEP org with the scikit-hep package in 2016
vectors units statistics
Toolkit
Initially all-in-one toolkit approach with topical modules
But this would change…
vectors units statistics
Toolset
vs.
Joining forces
12
He also created an organization and invited some other popular packages
iminuit
pyjet
numpythia
root-numpy
root-pandas
ROOTPy was winding down


We got several of the related packages
Several simulation bindings
And the standalone MINUIT binding


The most popular package to join
First major new package
13
import uproot


import numpy as np


rng = np.random.default_rng(12345)


px = rng.normal(20, 2, 10_000)


phi = rng.uniform(0, 2*np.pi, 10_000)


with uproot.create("tree.root") as f:


f["tree_name"] = {"px": px, "phi": phi}
Just* IO
Pure Python (pip install uproot)
Answer to a key need!
*: Missing functionality in the Python ecosystem became new packages (Awkward, Vector)
ROOT file reader (and then writer)
First new general package
14
V0: pure Python


V1+: Compiled
Regular array
Jagged structured data format


NumPy like manipulation


Assignable “behaviors”


Custom Numba support
Originally HEP focused,


now being developed for 6+ fields
First new general package
14
V0: pure Python


V1+: Compiled
Jagged array
Jagged structured data format


NumPy like manipulation


Assignable “behaviors”


Custom Numba support
Originally HEP focused,


now being developed for 6+ fields
Uptake (one expirement - CMS)
15
Use
of Scientific
Python
in
HEP
ROOT (C++ and PyROOT)
fi
Scienti c
Python
PyROOT
Use of Scikit-HEP packages
(as a baseline for scale)
CMSSW config
(Python but not data analysis)
Scikit-HEP
Building Scikit-HEP
16
Case study: Histograms
17
Development of reliable bindings
Development of reliable binary building tools
Development of a Protocol based ecosystem
Work developed here adapted to other packages
Bindings tools
18
Chose pybind11 over Cython / SWIG
Simpler build (no preprocessing)


Simpler distribution - no NumPy build dependence


Powerful generation capabilities via template meta programming
Integrated improvements upstream
Awkward 1.0 and iMinuit 2 rewrite followed
Building tools
19
Building redistributable wheels is not straight forward
Linux macOS Windows
Manylinux docker image Official download Anything
Python source
Post-process step Auditwheel Delocate Develwheel
Architectures 64/32/ARM/PPC/… Intel/AS/Universal 32/64/ARM
+ Testing, PyPy, Musllinux, …
First attempt: azure-wheel-helpers
20
Early attempt at shared infrastructure
Built using git subtree (before Azure templates & remote config support)
Adopted by:


Boost-histogram


awkward-array


pyjet


numpythia


iminuit
Covered by blog-post series


https://iscinumpy.dev/categories/azure-devops/
Great learning experience - but some problems:


Tied to one CI system


Mostly YAML files - poor testability / Code QA


Small number of users
Using cibuildwheel
21
🎡cibuildwheel
We merged our improvements to cibuildwheel & joined that project!


All projects moved (and now pyhepmc added too)
Great positives:


Not tied to a CI system (easy move to GHA)


Python package - great testability / code QA


Large number of users
Contributions from Scikit-HEP:


Static configuration in TOML


Static overrides


Direct SDist support


build (the tool) support


Automatic Python version limit detection


Better globbing support


Full static typing & more Code QA
From azure-wheel-helpers:


Better Windows support


VSC versioning support


Better PEP 518 support
And we helped get


cibuildwheel into the PyPA!
Protocol ecosystem
22
Boost::Histogram


C++ Boost library
boost-histogram


Python wrapper
Hist


Fully featured
histograms
mplhep


Matplotlib HEP
Histoprint


Terminal plotting
Uproot


ROOT file IO
How should these be related?
Protocol ecosystem
23
Boost::Histogram


C++ Boost library
boost-histogram


Python wrapper
Hist


Fully featured
histograms
mplhep


Matplotlib plots
Histoprint


Terminal plotting
Uproot


ROOT file IO
Static, not runtime dependency!
PlottableHistogram
UHI


Static Protocol
Defines expected behaviors for producers
Consumers expect these behaviors
Example of a Protocol
24
from typing import Protocol


class Duck(Protocol):


def quack() -> str: ...
Definition
class MyDuck:


def quack() -> str:


return "quack"


if typing.TYPE_CHECKING:


_: Duck = typing.cast(MyDuck, None)
Producer
def need_a_duck(duck: Duck) -> None:


print(duck.quack())
Consumer
No runtime dependence, unlike ABC!


All static type annotations


Validation by checker
Success of histograms
25
YODA
YODA
histograms
in Coffea
B
o
o
s
t
:
:
H
i
s
t
o
g
r
a
m
,
h
i
s
t
,
m
p
l
h
e
p
c
o
m
b
o
ROOT
mainstream Python adoption
in HEP: when many histogram
libraries lived and died
histograms
in rootpy
histogram part of ROOT
(395 C++ files)
All together: uproot-browser
26
Textual + Rich UI
uproot file
Hist computation
plotext rendering
Awkward data
Broader Ecosystem
27
numpythia
hepunits
histoprint
uhi
pyhepmc
pylhe
nndrone
 
Scikit-HEP


today
28
Core
Physics
Specifics
Domain
Affiliated
package
Many contributions upstream
29
We can all benefit!
Helping with maintaining several projects


Affiliated status helped
Upstreaming features and fixes
Scikit-HEP Developer Pages
30
Scikit-HEP developer pages
31
Universally useful advice for maintaining packages!
Scikit-HEP was growing quickly


How to keep quality high across all packages?


scikit-hep.org/developer!
Utilities


Cookie-cutter


Repo-review
Topics


Packaging (classic & simple)


Style guide (pre-commit)


Development setup


pytest


Static typing (MyPy)


GitHub Actions (3 pages)


Nox
Maintained by nox


& GitHub Actions
Guided our other packages, like Awkward
Style
33
Pre-commit hooks


Black


Check-Manifest (setuptools only)


Type checking


PyCln


Flake8


YesQA


isort


PyUpgrade


Setup.cfg format (setuptools only)


Spelling


PyGrep hooks


Prettier


Clang-format (C++ only)


Shellcheck (shell scripts only)


PyLint (noisy)
Modern (simple) packaging
34
pyproject.toml
[build-system]


requires = ["hatchling"]


build-backend = "hatchling.build"


[project]


name = "package"


version = "0.1.2"
No setup.py, setup.cfg, or MANIFEST.in required!
Cookiecutter
35
pipx run cookiecutter gh:scikit-hep/cookie
11 backends to pick from


Generation tested by nox


In sync with the developer pages
Setuptools


Setuptools PEP 621


Flit Hatch PDM


Poetry
Scikit-build


Setuptools C++


Maturin (Rust)
+
more!
Runs in WebAssembly locally!


(via Pyodide)
The Future
37
Scikit-build
38
CMake bridge for Python packaging
Year 1 plans:


Introduce scikit-build-core: modern PEP 517 backend


Support PEP 621 configuration


Support use as plugin (possibly via extesionlib)


Tighter CMake integration (config from PyPI packages)


Distutils-free code ready for Python 3.12
Year 2 plans:


Convert selected projects to Scikit-build
Year 3 plans:


Website, tutorials, outreach
https://iscinumpy.dev/post/scikit-build-proposal/
Scikit-build
38
CMake bridge for Python packaging
Year 1 plans:


Introduce scikit-build-core: modern PEP 517 backend


Support PEP 621 configuration


Support use as plugin (possibly via extesionlib)


Tighter CMake integration (config from PyPI packages)


Distutils-free code ready for Python 3.12
Year 2 plans:


Convert selected projects to Scikit-build
Year 3 plans:


Website, tutorials, outreach
Yesterday, NSF 2209877 was awarded!
https://iscinumpy.dev/post/scikit-build-proposal/
Web Assembly
39
Already have boost-histogram (pybind11 project) added to pyodide!
iminuit (better CMake support) next, then work on Awkward Array!
numpy.org
Questions?
40
• History of Python in HEP


• ROOT, PyROOT, Analysis


• Beginnings of a Scikit


• New package, joining forces


• Uproot, Awkward Array


• Building Scikit-HEP


• Histograms case study


• All together: uproot-browser


• Broader ecosystem


• Scikit-HEP Developer Pages


• The Future


• Scikit-build


• Web Assembly
https://scikit-hep.org
https://iscinumpy.dev

More Related Content

Similar to SciPy 2022 Scikit-HEP

First python project
First python projectFirst python project
First python projectNeetu Jain
 
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...Henry Schreiner
 
Behold the Power of Python
Behold the Power of PythonBehold the Power of Python
Behold the Power of PythonSarah Dutkiewicz
 
Python packaging: how did we get here, and where are we going?
Python packaging: how did we get here, and where are we going?Python packaging: how did we get here, and where are we going?
Python packaging: how did we get here, and where are we going?takluyver
 
PyQt Application Development On Maemo
PyQt Application Development On MaemoPyQt Application Development On Maemo
PyQt Application Development On Maemoachipa
 
Python 101 for the .NET Developer
Python 101 for the .NET DeveloperPython 101 for the .NET Developer
Python 101 for the .NET DeveloperSarah Dutkiewicz
 
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat Pôle Systematic Paris-Region
 
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017Codemotion
 
Python Dependency Management - PyconDE 2018
Python Dependency Management - PyconDE 2018Python Dependency Management - PyconDE 2018
Python Dependency Management - PyconDE 2018Patrick Muehlbauer
 
Rust + python: lessons learnt from building a toy filesystem
Rust + python: lessons learnt from building a toy filesystemRust + python: lessons learnt from building a toy filesystem
Rust + python: lessons learnt from building a toy filesystemChengHui Weng
 
Intoroduction of py7zr
Intoroduction of py7zrIntoroduction of py7zr
Intoroduction of py7zrHiroshi Miura
 
Writing a Python C extension
Writing a Python C extensionWriting a Python C extension
Writing a Python C extensionSqreen
 
Intro to Pinax: Kickstarting Your Django Apps
Intro to Pinax: Kickstarting Your Django AppsIntro to Pinax: Kickstarting Your Django Apps
Intro to Pinax: Kickstarting Your Django AppsRoger Barnes
 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python courseEran Shlomo
 
CHEP 2018: A Python upgrade to the GooFit package for parallel fitting
CHEP 2018: A Python upgrade to the GooFit package for parallel fittingCHEP 2018: A Python upgrade to the GooFit package for parallel fitting
CHEP 2018: A Python upgrade to the GooFit package for parallel fittingHenry Schreiner
 
Using SWIG to Control, Prototype, and Debug C Programs with Python
Using SWIG to Control, Prototype, and Debug C Programs with PythonUsing SWIG to Control, Prototype, and Debug C Programs with Python
Using SWIG to Control, Prototype, and Debug C Programs with PythonDavid Beazley (Dabeaz LLC)
 
PyCon Taiwan 2013 Tutorial
PyCon Taiwan 2013 TutorialPyCon Taiwan 2013 Tutorial
PyCon Taiwan 2013 TutorialJustin Lin
 

Similar to SciPy 2022 Scikit-HEP (20)

First python project
First python projectFirst python project
First python project
 
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
 
Behold the Power of Python
Behold the Power of PythonBehold the Power of Python
Behold the Power of Python
 
05 python.pdf
05 python.pdf05 python.pdf
05 python.pdf
 
Python packaging: how did we get here, and where are we going?
Python packaging: how did we get here, and where are we going?Python packaging: how did we get here, and where are we going?
Python packaging: how did we get here, and where are we going?
 
Pybind11 - SciPy 2021
Pybind11 - SciPy 2021Pybind11 - SciPy 2021
Pybind11 - SciPy 2021
 
PyQt Application Development On Maemo
PyQt Application Development On MaemoPyQt Application Development On Maemo
PyQt Application Development On Maemo
 
Python 101 for the .NET Developer
Python 101 for the .NET DeveloperPython 101 for the .NET Developer
Python 101 for the .NET Developer
 
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
 
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
 
Python Dependency Management - PyconDE 2018
Python Dependency Management - PyconDE 2018Python Dependency Management - PyconDE 2018
Python Dependency Management - PyconDE 2018
 
Rust + python: lessons learnt from building a toy filesystem
Rust + python: lessons learnt from building a toy filesystemRust + python: lessons learnt from building a toy filesystem
Rust + python: lessons learnt from building a toy filesystem
 
Intoroduction of py7zr
Intoroduction of py7zrIntoroduction of py7zr
Intoroduction of py7zr
 
Writing a Python C extension
Writing a Python C extensionWriting a Python C extension
Writing a Python C extension
 
Intro to Pinax: Kickstarting Your Django Apps
Intro to Pinax: Kickstarting Your Django AppsIntro to Pinax: Kickstarting Your Django Apps
Intro to Pinax: Kickstarting Your Django Apps
 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python course
 
CHEP 2018: A Python upgrade to the GooFit package for parallel fitting
CHEP 2018: A Python upgrade to the GooFit package for parallel fittingCHEP 2018: A Python upgrade to the GooFit package for parallel fitting
CHEP 2018: A Python upgrade to the GooFit package for parallel fitting
 
Using SWIG to Control, Prototype, and Debug C Programs with Python
Using SWIG to Control, Prototype, and Debug C Programs with PythonUsing SWIG to Control, Prototype, and Debug C Programs with Python
Using SWIG to Control, Prototype, and Debug C Programs with Python
 
PyCon Taiwan 2013 Tutorial
PyCon Taiwan 2013 TutorialPyCon Taiwan 2013 Tutorial
PyCon Taiwan 2013 Tutorial
 
Python on pi
Python on piPython on pi
Python on pi
 

More from Henry Schreiner

Princeton RSE Peer network first meeting
Princeton RSE Peer network first meetingPrinceton RSE Peer network first meeting
Princeton RSE Peer network first meetingHenry Schreiner
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11Henry Schreiner
 
Everything you didn't know you needed
Everything you didn't know you neededEverything you didn't know you needed
Everything you didn't know you neededHenry Schreiner
 
boost-histogram / Hist: PyHEP Topical meeting
boost-histogram / Hist: PyHEP Topical meetingboost-histogram / Hist: PyHEP Topical meeting
boost-histogram / Hist: PyHEP Topical meetingHenry Schreiner
 
RDM 2020: Python, Numpy, and Pandas
RDM 2020: Python, Numpy, and PandasRDM 2020: Python, Numpy, and Pandas
RDM 2020: Python, Numpy, and PandasHenry Schreiner
 
HOW 2019: Machine Learning for the Primary Vertex Reconstruction
HOW 2019: Machine Learning for the Primary Vertex ReconstructionHOW 2019: Machine Learning for the Primary Vertex Reconstruction
HOW 2019: Machine Learning for the Primary Vertex ReconstructionHenry Schreiner
 
HOW 2019: A complete reproducible ROOT environment in under 5 minutes
HOW 2019: A complete reproducible ROOT environment in under 5 minutesHOW 2019: A complete reproducible ROOT environment in under 5 minutes
HOW 2019: A complete reproducible ROOT environment in under 5 minutesHenry Schreiner
 
ACAT 2019: A hybrid deep learning approach to vertexing
ACAT 2019: A hybrid deep learning approach to vertexingACAT 2019: A hybrid deep learning approach to vertexing
ACAT 2019: A hybrid deep learning approach to vertexingHenry Schreiner
 
2019 CtD: A hybrid deep learning approach to vertexing
2019 CtD: A hybrid deep learning approach to vertexing2019 CtD: A hybrid deep learning approach to vertexing
2019 CtD: A hybrid deep learning approach to vertexingHenry Schreiner
 
2019 IRIS-HEP AS workshop: Boost-histogram and hist
2019 IRIS-HEP AS workshop: Boost-histogram and hist2019 IRIS-HEP AS workshop: Boost-histogram and hist
2019 IRIS-HEP AS workshop: Boost-histogram and histHenry Schreiner
 
IRIS-HEP: Boost-histogram and Hist
IRIS-HEP: Boost-histogram and HistIRIS-HEP: Boost-histogram and Hist
IRIS-HEP: Boost-histogram and HistHenry Schreiner
 
2019 IRIS-HEP AS workshop: Particles and decays
2019 IRIS-HEP AS workshop: Particles and decays2019 IRIS-HEP AS workshop: Particles and decays
2019 IRIS-HEP AS workshop: Particles and decaysHenry Schreiner
 
IRIS-HEP Retreat: Boost-Histogram Roadmap
IRIS-HEP Retreat: Boost-Histogram RoadmapIRIS-HEP Retreat: Boost-Histogram Roadmap
IRIS-HEP Retreat: Boost-Histogram RoadmapHenry Schreiner
 
PyHEP 2019: Python Histogramming Packages
PyHEP 2019: Python Histogramming PackagesPyHEP 2019: Python Histogramming Packages
PyHEP 2019: Python Histogramming PackagesHenry Schreiner
 
2019 IML workshop: A hybrid deep learning approach to vertexing
2019 IML workshop: A hybrid deep learning approach to vertexing2019 IML workshop: A hybrid deep learning approach to vertexing
2019 IML workshop: A hybrid deep learning approach to vertexingHenry Schreiner
 
CHEP 2019: Recent developments in histogram libraries
CHEP 2019: Recent developments in histogram librariesCHEP 2019: Recent developments in histogram libraries
CHEP 2019: Recent developments in histogram librariesHenry Schreiner
 
DIANA: Recent developments in GooFit
DIANA: Recent developments in GooFitDIANA: Recent developments in GooFit
DIANA: Recent developments in GooFitHenry Schreiner
 
LHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNsLHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNsHenry Schreiner
 

More from Henry Schreiner (20)

Princeton RSE Peer network first meeting
Princeton RSE Peer network first meetingPrinceton RSE Peer network first meeting
Princeton RSE Peer network first meeting
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11
 
Everything you didn't know you needed
Everything you didn't know you neededEverything you didn't know you needed
Everything you didn't know you needed
 
boost-histogram / Hist: PyHEP Topical meeting
boost-histogram / Hist: PyHEP Topical meetingboost-histogram / Hist: PyHEP Topical meeting
boost-histogram / Hist: PyHEP Topical meeting
 
CMake best practices
CMake best practicesCMake best practices
CMake best practices
 
RDM 2020: Python, Numpy, and Pandas
RDM 2020: Python, Numpy, and PandasRDM 2020: Python, Numpy, and Pandas
RDM 2020: Python, Numpy, and Pandas
 
HOW 2019: Machine Learning for the Primary Vertex Reconstruction
HOW 2019: Machine Learning for the Primary Vertex ReconstructionHOW 2019: Machine Learning for the Primary Vertex Reconstruction
HOW 2019: Machine Learning for the Primary Vertex Reconstruction
 
HOW 2019: A complete reproducible ROOT environment in under 5 minutes
HOW 2019: A complete reproducible ROOT environment in under 5 minutesHOW 2019: A complete reproducible ROOT environment in under 5 minutes
HOW 2019: A complete reproducible ROOT environment in under 5 minutes
 
ACAT 2019: A hybrid deep learning approach to vertexing
ACAT 2019: A hybrid deep learning approach to vertexingACAT 2019: A hybrid deep learning approach to vertexing
ACAT 2019: A hybrid deep learning approach to vertexing
 
2019 CtD: A hybrid deep learning approach to vertexing
2019 CtD: A hybrid deep learning approach to vertexing2019 CtD: A hybrid deep learning approach to vertexing
2019 CtD: A hybrid deep learning approach to vertexing
 
2019 IRIS-HEP AS workshop: Boost-histogram and hist
2019 IRIS-HEP AS workshop: Boost-histogram and hist2019 IRIS-HEP AS workshop: Boost-histogram and hist
2019 IRIS-HEP AS workshop: Boost-histogram and hist
 
IRIS-HEP: Boost-histogram and Hist
IRIS-HEP: Boost-histogram and HistIRIS-HEP: Boost-histogram and Hist
IRIS-HEP: Boost-histogram and Hist
 
2019 IRIS-HEP AS workshop: Particles and decays
2019 IRIS-HEP AS workshop: Particles and decays2019 IRIS-HEP AS workshop: Particles and decays
2019 IRIS-HEP AS workshop: Particles and decays
 
IRIS-HEP Retreat: Boost-Histogram Roadmap
IRIS-HEP Retreat: Boost-Histogram RoadmapIRIS-HEP Retreat: Boost-Histogram Roadmap
IRIS-HEP Retreat: Boost-Histogram Roadmap
 
PyHEP 2019: Python 3.8
PyHEP 2019: Python 3.8PyHEP 2019: Python 3.8
PyHEP 2019: Python 3.8
 
PyHEP 2019: Python Histogramming Packages
PyHEP 2019: Python Histogramming PackagesPyHEP 2019: Python Histogramming Packages
PyHEP 2019: Python Histogramming Packages
 
2019 IML workshop: A hybrid deep learning approach to vertexing
2019 IML workshop: A hybrid deep learning approach to vertexing2019 IML workshop: A hybrid deep learning approach to vertexing
2019 IML workshop: A hybrid deep learning approach to vertexing
 
CHEP 2019: Recent developments in histogram libraries
CHEP 2019: Recent developments in histogram librariesCHEP 2019: Recent developments in histogram libraries
CHEP 2019: Recent developments in histogram libraries
 
DIANA: Recent developments in GooFit
DIANA: Recent developments in GooFitDIANA: Recent developments in GooFit
DIANA: Recent developments in GooFit
 
LHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNsLHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNs
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

SciPy 2022 Scikit-HEP

  • 1. Awkward Packaging: 
 building Scikit-HEP Jim Pivarski, Eduardo Rodrigues, Henry Schreiner July 14, 2022 SciPy '22
  • 2. 2 History of Python in HEP (Brief)
  • 3. Scikit-HEP New languages: C++ & Python 3 Python 1.0 1994 1998 2002 2006 2010 2014 2018 2022 FORTRAN ROOT (C++) PyROOT (Official bindings) ROOTPy (third party pythonizations) New PyROOT Experiments start adopting Python for configuration (ROOT is both a toolkit and a file format)
  • 4. ROOT: C++ toolkit (& interpreter) 4 Physicists only need to learn a single language Python bindings introduced later as PyROOT import ROOT import numpy as np file = ROOT.TFile("tree.root", "recreate") tree = ROOT.TTree("name", "title") px = np.zeros(1, dtype=float) phi = np.zeros(1, dtype=float) tree.Branch("px", px, "normal/D") tree.Branch("phi", phi, "uniform/D") for i in range(10000): px[0] = ROOT.gRandom.Gaus(20,2) phi[0] = ROOT.gRandom.Uniform(2*3.1416) tree.Fill() file.Write() file.Close() Creating memory for pointer Assigning pointers to fill from Write on file (tree is contained in file) This is classic PyROOT! More Pythonic methods exist now Based on https://wiki.physik.uzh.ch/cms/root:pyroot_ttree
  • 5. Python: a configuration language 5 C++ Components Python configuration Compiled Runtime Experiments started driving their C++ applications with Python
  • 6. Sneaking into analysis 6 More students were entering with Python knowledge The Python data-science stack was growing root-numpy & root-pandas bridged the gap between ROOT & NumPy/Pandas By 2015, most analysis work could be done in the Python data science stack! A few domain specific things were “missing” Data IO Fitting Jagged Data Vectors Particle info
  • 7. Python analysis circa 2015 7 ROOT data PyROOT C++ / ROOT Python ecosystem
  • 8. Python analysis circa 2015 8 ROOT data PyROOT to HDF5 C++ / ROOT Python DS stack Python ecosystem Other needs
  • 9. Python analysis with Scikit-HEP 9 Python DS stack uproot Other needs Python ecosystem
  • 10. Beginnings of a Scikit 10
  • 11. A new package 11 Eduardo Rodrigues created the Scikit-HEP org with the scikit-hep package in 2016 vectors units statistics Toolkit Initially all-in-one toolkit approach with topical modules But this would change… vectors units statistics Toolset vs.
  • 12. Joining forces 12 He also created an organization and invited some other popular packages iminuit pyjet numpythia root-numpy root-pandas ROOTPy was winding down We got several of the related packages Several simulation bindings And the standalone MINUIT binding The most popular package to join
  • 13. First major new package 13 import uproot import numpy as np rng = np.random.default_rng(12345) px = rng.normal(20, 2, 10_000) phi = rng.uniform(0, 2*np.pi, 10_000) with uproot.create("tree.root") as f: f["tree_name"] = {"px": px, "phi": phi} Just* IO Pure Python (pip install uproot) Answer to a key need! *: Missing functionality in the Python ecosystem became new packages (Awkward, Vector) ROOT file reader (and then writer)
  • 14. First new general package 14 V0: pure Python V1+: Compiled Regular array Jagged structured data format NumPy like manipulation Assignable “behaviors” Custom Numba support Originally HEP focused, now being developed for 6+ fields
  • 15. First new general package 14 V0: pure Python V1+: Compiled Jagged array Jagged structured data format NumPy like manipulation Assignable “behaviors” Custom Numba support Originally HEP focused, now being developed for 6+ fields
  • 16. Uptake (one expirement - CMS) 15 Use of Scientific Python in HEP ROOT (C++ and PyROOT) fi Scienti c Python PyROOT Use of Scikit-HEP packages (as a baseline for scale) CMSSW config (Python but not data analysis) Scikit-HEP
  • 18. Case study: Histograms 17 Development of reliable bindings Development of reliable binary building tools Development of a Protocol based ecosystem Work developed here adapted to other packages
  • 19. Bindings tools 18 Chose pybind11 over Cython / SWIG Simpler build (no preprocessing) Simpler distribution - no NumPy build dependence Powerful generation capabilities via template meta programming Integrated improvements upstream Awkward 1.0 and iMinuit 2 rewrite followed
  • 20. Building tools 19 Building redistributable wheels is not straight forward Linux macOS Windows Manylinux docker image Official download Anything Python source Post-process step Auditwheel Delocate Develwheel Architectures 64/32/ARM/PPC/… Intel/AS/Universal 32/64/ARM + Testing, PyPy, Musllinux, …
  • 21. First attempt: azure-wheel-helpers 20 Early attempt at shared infrastructure Built using git subtree (before Azure templates & remote config support) Adopted by: Boost-histogram awkward-array pyjet numpythia iminuit Covered by blog-post series https://iscinumpy.dev/categories/azure-devops/ Great learning experience - but some problems: Tied to one CI system Mostly YAML files - poor testability / Code QA Small number of users
  • 22. Using cibuildwheel 21 🎡cibuildwheel We merged our improvements to cibuildwheel & joined that project! All projects moved (and now pyhepmc added too) Great positives: Not tied to a CI system (easy move to GHA) Python package - great testability / code QA Large number of users Contributions from Scikit-HEP: Static configuration in TOML Static overrides Direct SDist support build (the tool) support Automatic Python version limit detection Better globbing support Full static typing & more Code QA From azure-wheel-helpers: Better Windows support VSC versioning support Better PEP 518 support And we helped get cibuildwheel into the PyPA!
  • 23. Protocol ecosystem 22 Boost::Histogram C++ Boost library boost-histogram Python wrapper Hist Fully featured histograms mplhep Matplotlib HEP Histoprint Terminal plotting Uproot ROOT file IO How should these be related?
  • 24. Protocol ecosystem 23 Boost::Histogram C++ Boost library boost-histogram Python wrapper Hist Fully featured histograms mplhep Matplotlib plots Histoprint Terminal plotting Uproot ROOT file IO Static, not runtime dependency! PlottableHistogram UHI Static Protocol Defines expected behaviors for producers Consumers expect these behaviors
  • 25. Example of a Protocol 24 from typing import Protocol class Duck(Protocol): def quack() -> str: ... Definition class MyDuck: def quack() -> str: return "quack" if typing.TYPE_CHECKING: _: Duck = typing.cast(MyDuck, None) Producer def need_a_duck(duck: Duck) -> None: print(duck.quack()) Consumer No runtime dependence, unlike ABC! All static type annotations Validation by checker
  • 26. Success of histograms 25 YODA YODA histograms in Coffea B o o s t : : H i s t o g r a m , h i s t , m p l h e p c o m b o ROOT mainstream Python adoption in HEP: when many histogram libraries lived and died histograms in rootpy histogram part of ROOT (395 C++ files)
  • 27. All together: uproot-browser 26 Textual + Rich UI uproot file Hist computation plotext rendering Awkward data
  • 30. Many contributions upstream 29 We can all benefit! Helping with maintaining several projects Affiliated status helped Upstreaming features and fixes
  • 32. Scikit-HEP developer pages 31 Universally useful advice for maintaining packages! Scikit-HEP was growing quickly How to keep quality high across all packages? scikit-hep.org/developer! Utilities Cookie-cutter Repo-review Topics Packaging (classic & simple) Style guide (pre-commit) Development setup pytest Static typing (MyPy) GitHub Actions (3 pages) Nox Maintained by nox & GitHub Actions Guided our other packages, like Awkward
  • 33.
  • 34. Style 33 Pre-commit hooks Black Check-Manifest (setuptools only) Type checking PyCln Flake8 YesQA isort PyUpgrade Setup.cfg format (setuptools only) Spelling PyGrep hooks Prettier Clang-format (C++ only) Shellcheck (shell scripts only) PyLint (noisy)
  • 35. Modern (simple) packaging 34 pyproject.toml [build-system] requires = ["hatchling"] build-backend = "hatchling.build" [project] name = "package" version = "0.1.2" No setup.py, setup.cfg, or MANIFEST.in required!
  • 36. Cookiecutter 35 pipx run cookiecutter gh:scikit-hep/cookie 11 backends to pick from Generation tested by nox In sync with the developer pages Setuptools Setuptools PEP 621 Flit Hatch PDM Poetry Scikit-build Setuptools C++ Maturin (Rust) + more!
  • 37. Runs in WebAssembly locally! 
 (via Pyodide)
  • 39. Scikit-build 38 CMake bridge for Python packaging Year 1 plans: Introduce scikit-build-core: modern PEP 517 backend Support PEP 621 configuration Support use as plugin (possibly via extesionlib) Tighter CMake integration (config from PyPI packages) Distutils-free code ready for Python 3.12 Year 2 plans: Convert selected projects to Scikit-build Year 3 plans: Website, tutorials, outreach https://iscinumpy.dev/post/scikit-build-proposal/
  • 40. Scikit-build 38 CMake bridge for Python packaging Year 1 plans: Introduce scikit-build-core: modern PEP 517 backend Support PEP 621 configuration Support use as plugin (possibly via extesionlib) Tighter CMake integration (config from PyPI packages) Distutils-free code ready for Python 3.12 Year 2 plans: Convert selected projects to Scikit-build Year 3 plans: Website, tutorials, outreach Yesterday, NSF 2209877 was awarded! https://iscinumpy.dev/post/scikit-build-proposal/
  • 41. Web Assembly 39 Already have boost-histogram (pybind11 project) added to pyodide! iminuit (better CMake support) next, then work on Awkward Array! numpy.org
  • 42. Questions? 40 • History of Python in HEP • ROOT, PyROOT, Analysis • Beginnings of a Scikit • New package, joining forces • Uproot, Awkward Array • Building Scikit-HEP • Histograms case study • All together: uproot-browser • Broader ecosystem • Scikit-HEP Developer Pages • The Future • Scikit-build • Web Assembly https://scikit-hep.org https://iscinumpy.dev