SlideShare a Scribd company logo
1 of 27
Download to read offline
Histogramming
Henry Schreiner
October 17, 2019
Overview
• Part 1: Overview of histograms
▶ Components of a Histogram
▶ Histograms in Python
▶ Boost.Histogram in C++14
▶ Introducing: boost-histogram for Python
▶ Outlook, with hist and aghast
• Part 2: Hands-on with boost-histogram
1/26Henry Schreiner Histogramming October 17, 2019
What is a histogram?
• A histogram is a set of accumulators over data in ranges
▶ Usually continuous in Physics, could also be categories
▶ Accumulators often are a sum of values - can contain other components
• Input values are digitized by axes (AKA binnings)
▶ Categories
▶ Real values
▶ Variable sized bins (usually give edges)
▶ Regular binning (#bins, start, stop)
▶ May have special features (overflow, circular, etc.)
2/26Henry Schreiner Histogramming October 17, 2019
Histogram components
A ‘histogram is a
collection of 1+ axes
and an accumulator.
Performance
• Variable axis - list
of edges is most
general but
requires a sorted
search.
• Regular axis:
regular spacing
Regular axis
Variable axis
axes
Optional overflowOptional underflow
Accumulator
3/26Henry Schreiner Histogramming October 17, 2019
Histograms in (classic) PyROOT
• 1D Regular
h = ROOT.TH1D("", "", 10, 0, 1)
h.fillN(arr)
• 1D Variable
h = ROOT.TH1D("", "", (1,2,3,4,5,6))
h.fillN(arr)
• 2D Regular
h = ROOT.TH2D("", "", 10, 0, 1, 20, 0, 2)
h.fillN(arr)
4/26Henry Schreiner Histogramming October 17, 2019
Histogram in Numpy
• 1D Regular
bins, edges = np.histogram(arr, bins=10, range=(0,1))
• 1D Variable
bins, edges = np.histogram(arr, bins=(1,2,3,4,5,6))
• 2D regular
b, e1, e2 = np.histogram2d(x, y, bins=(10,20), range=((0,1),(0,2)))
5/26Henry Schreiner Histogramming October 17, 2019
Numpy Pros and Cons
Pros
• Comes with Numpy
• Good for interactive operations (auto
binning)
• Reasonably fast
• Density option, weight support too
Cons
• Manipulation of plain arrays
• One time fill
• 2D+ not optimized for regular binning
• 1D, 2D, and ND syntax variations
• MPL had to mimic: plt.hist
6/26Henry Schreiner Histogramming October 17, 2019
PyROOT Pros and Cons
Pros
• Full histogram object
• Iterative fill option
• Weights option
• Can track sum of weights too
Cons
• ROOT requirement (Conda-forge helps)
• Can be slow in Python (and C++)
• Poor interactive exploration
• Odd syntax, odd memory model
• Max 3D
7/26Henry Schreiner Histogramming October 17, 2019
Histogram Libraries
• Narrow focus: speed,
plotting, or language
• Many are abandoned
• Often issues with design,
backends, distribution
• No/little interaction
HistBook
Histogrammar
pygram11
rootplotlib
PyROOT
YODA
physt
fast-histogramqhist
Vaex
hdrhistogram
multihist
matplotlib-hep
pyhistogram
histogram
SimpleHist
paida
theodoregoetz
numpy
8/26Henry Schreiner Histogramming October 17, 2019
Physt
• Histograms as objects
• Pure Python - Dropped Python 2 this
year :)
• Very slow fills (slower than numpy)
hist = histogram(heights)
hist.plot(show_values=True)
• Powerful plotting
• Easy conversion to Pandas and many
more (ROOT through uproot)
• Special histograms, like polar histograms Figure 1: Physt example default plot
9/26Henry Schreiner Histogramming October 17, 2019
Fast-Histogram
• Exactly like numpy, but faster
▶ C kernel
▶ Takes advantage of regular binning
▶ Can be 20-25x faster for 2D histograms
▶ Missing some features / combinations
Figure 2: Fast Histogram 2d comparison with
Numpy
10/26Henry Schreiner Histogramming October 17, 2019
HistBook (archived)
The first Scikit-HEP library for histograms
• Designed for shared axis histogram collections
• Plotting with Vega-Light
Now deprecated and in archive mode, functionality may return in Hist (see next slides).
>>> array = np.random.normal(0, 1, 1000000)
>>> histogram = Hist(bin("data", 10, -5, 5))
>>> histogram.fill(data=array)
>>> histogram.step("data").to(canvas)
11/26Henry Schreiner Histogramming October 17, 2019
SciKit-HEP Histogramming plan
• boost-histogram: Fast filling and manipulation (core library)
• hist: Simple analysis frontend
• aghast: Conversions between histogram libraries
• UHI: Unified Histogram Indexing: A way for histograms to be indexed cross-library
(boost-histogram and hist to begin with)
Core histogramming libraries boost-histogram ROOT
Universal adaptor Aghast
Front ends (plotting, etc) hist mpl-hep physt others
12/26Henry Schreiner Histogramming October 17, 2019
Boost.Histogram C++14
• Multidimensional templated header-only histogram library: /boostorg/histogram
• Designed by Hans Dembinski, inspired by ROOT and GSL
Histogram
• Axes
• Storage
Axes types
• Regular, Circular
• Variable
• Integer
• Category
Storage (
Static
Dynamic
)Regular axis
Regular axis with
log transformaxes
Optional overflowOptional underflow
Accumulator
int, double,
unlimited, ...
13/26Henry Schreiner Histogramming October 17, 2019
Boost.Histogram example
#include <boost/histogram.hpp>
#include <boost/histogram/ostream.hpp>
#include <random>
int main() {
namespace bh = boost::histogram;
auto hist = bh::make_histogram(bh::axis::regular<>{20, -3, 3});
std::default_random_engine eng;
std::normal_distribution<double> dist(0, 1);
for(int n = 0; n < 10'000; ++n)
hist(dist(eng));
std::cout << hist << std::endl;
return 0;
}
14/26Henry Schreiner Histogramming October 17, 2019
Boost.Histogram example (output)
histogram(regular(20, -3, 3, options=underflow | overflow))
+----------------------------------------------------------
+
[-inf, -3) 9 | |
[ -3, -2.7) 19 |= |
[-2.7, -2.4) 36 |== |
[-2.4, -2.1) 110 |===== |
[-2.1, -1.8) 191 |========= |
[-1.8, -1.5) 275 |============= |
[-1.5, -1.2) 518 |========================= |
[-1.2, -0.9) 644 |=============================== |
[-0.9, -0.6) 914 |============================================ |
[-0.6, -0.3) 1107 |===================================================== |
[-0.3, 0) 1183 |========================================================= |
[ 0, 0.3) 1185 |========================================================= |
[ 0.3, 0.6) 1120 |====================================================== |
[ 0.6, 0.9) 874 |========================================== |
[ 0.9, 1.2) 663 |================================ |
[ 1.2, 1.5) 491 |======================== |
[ 1.5, 1.8) 322 |=============== |
[ 1.8, 2.1) 172 |======== |
[ 2.1, 2.4) 79 |==== |
[ 2.4, 2.7) 38 |== |
[ 2.7, 3) 28 |= |
[ 3, inf) 22 |= |
+----------------------------------------------------------
+
15/26Henry Schreiner Histogramming October 17, 2019
boost-histogram: Python bindings
Design
• A histogram should be an object
• Manipulation and plotting should be easy
Performance
• Fast filling
• Compiled composable manipulations
Flexibility
• Axes options: sparse, growing, labels
• Storage: integers, weights, errors…
Distribution
• Easy to use anywhere, pip or conda
• Should have wheels, be easy to build, etc.
16/26Henry Schreiner Histogramming October 17, 2019
Intro to the Python bindings
• Boost.Histogram developed with Python in mind
• Original bindings based on Boost::Python
▶ Hard to build and distribute
▶ Somewhat limited
• New bindings: /scikit-hep/boost-histogram
▶ 0-dependency build (C++14 only)
▶ State-of-the-art PyBind11
Design Flexibility Speed Distribution
17/26Henry Schreiner Histogramming October 17, 2019
Design
• 500+ unit tests run on Azure on Linux, macOS, and Windows
Resembles the original Boost.Histogram where possible, with changes where needed for Python
performance and idioms.
C++14
#include <boost/histogram.hpp>
namespace bh = boost::histogram;
auto hist = bh::make_histogram(
bh::axis::regular<>{2, 0, 1, "x"},
bh::axis::regular<>{4, 0, 1, "y"});
hist(.2, .3); // Fill will also be
hist(.4, .5); // availble in 1.7.2
hist(.3, .2);
Python
import boost.histogram as bh
hist = bh.histogram(
bh.axis.regular(2, 0, 1, metadata="x"),
bh.axis.regular(4, 0, 1, metadata="y"))
hist.fill(
[.2, .4, .3],
[.3, .5, .2])
18/26Henry Schreiner Histogramming October 17, 2019
Design: Manipulations
Combine two histograms
hist1 + hist2
Scale a histogram
hist * 2.0
Sum a histogram contents
hist.sum()
Access an axis
ax = hist.axis(0)
ax.edges # The edges array
ax.centers # Centers of bins
ax.widths # Width of each bin
Fill 2D histogram with values or arrays
hist.fill(x, y)
Convert contents to Numpy array
hist.view()
Convert to Numpy style histogram tuple
hist.to_numpy()
Pickle supported (multiprocessing)
pickle.dumps(hist, -1)
Copy/deepcopy supported
hist2 = copy.deepcopy(hist)
19/26Henry Schreiner Histogramming October 17, 2019
Unified Histogram Indexing (UHI)
The language here (bh.loc, etc) is defined in such a way that any library can provide them -
“Unified”.
Access
v = h[b] # Returns bin contents, indexed by bin number
v = h[bh.loc(b)] # Returns the bin containing the value
v = h[bh.underflow] # Underflow and overflow can be accessed with special tags
Setting
h[b] = v
h[bh.loc(b)] = v
h[bh.underflow] = v
20/26Henry Schreiner Histogramming October 17, 2019
Unified Histogram Indexing (UHI) (2)
h == h[:] # Slice over everything
h2 = h[a:b] # Slice of histogram (includes flow bins)
h2 = h[:b] # Leaving out endpoints is okay
h2 = h[bh.loc(v):] # Slices can be in data coordinates, too
h2 = h[::bh.project] # Sum an axis (name may change)
h2 = h[::bh.rebin(2)] # Modification operations (rebin)
h2 = h[a:b:bh.rebin(2)] # Modifications can combine with slices
h2 = h[a:b, ...] # Ellipsis work just like normal numpy
• Docs are here
• Description may move to a new repository
21/26Henry Schreiner Histogramming October 17, 2019
Performance
• Factor of 2 faster than 1D regular binning in Numpy 1.17
▶ Currently no specialization, just a 1D regular fill
▶ Could be optimized further
• Factor of 6-10 faster than 2D regular binning Numpy
22/26Henry Schreiner Histogramming October 17, 2019
Distribution
• We must provide excellent distribution.
▶ If anyone writes pip install boost-histogram and it fails, we have failed.
• Docker ManyLinux1 GCC 9.2: /scikit-hep/manylinuxgcc
• Used in /scikit-hep/iMinuit, see /scikit-hep/azure-wheel-helpers
Wheels
• manylinux1 32 and 64 bit, Py 2.7 &
3.5–3.7
• manylinux2010 64 bit, Py 2.7 & 3.5–3.8
• macOS 10.9+ 64 bit, Py 2.7 & 3.6–3.8
• Windows 32 and 64 bit, Py 2.7 & 3.6–3.7
Source
• SDist
• Build directly from GitHub
Conda
• conda-forge package planned
python -m pip install boost-histogram
# OR git+https://github.com/scikit-hep/boost-histogram.git@develop
23/26Henry Schreiner Histogramming October 17, 2019
Hist
hist is the ‘wrapper’ piece that does plotting and interacts with the rest of the ecosystem.
Plans
• Easy plotting adaptors (mpl-hep)
• Serialization formats via Aghast (ROOT, HDF5)
• Auto-multithreading
• Statistical functions (Like TEfficiency)
• Multihistograms (HistBook)
• Interaction with fitters (ZFit, GooFit, etc)
• Bayesian Blocks algorithm from SciKit-HEP
• Command line histograms for stream of numbers
Call for contributions
• What do you need?
• What do you want?
• What would you like?
Join in the development! This
should combine the best features
of other packages.
24/26Henry Schreiner Histogramming October 17, 2019
Aghast
Aghast is a histogramming library that does not fill histograms
and does not plot them.
• A memory format for histograms, like Apache Arrow
• Converts to and from other libraries
• Uses flatbuffers to hold histograms
• Indexing ideas inspired the UHI
Binnings
IntegerBinning • RegularBinning • HexagonalBinning • EdgesBinning • IrregularBinning •
CategoryBinning • SparseRegularBinning • FractionBinning • PredicateBinning •
VariationBinning
25/26Henry Schreiner Histogramming October 17, 2019
End of part 1
Now, we will go hands on with the first beta of boost-histogram!
Support
• Supported by IRIS-HEP, NSF OAC-1836650
26/26Henry Schreiner Histogramming October 17, 2019

More Related Content

What's hot

Digital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meetingDigital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meetingHenry Schreiner
 
CHEP 2018: A Python upgrade to the GooFit package for parallel fitting
CHEP 2018: A Python upgrade to the GooFit package for parallel fittingCHEP 2018: A Python upgrade to the GooFit package for parallel fitting
CHEP 2018: A Python upgrade to the GooFit package for parallel fittingHenry Schreiner
 
ROOT 2018: iminuit and MINUIT2 Standalone
ROOT 2018: iminuit and MINUIT2 StandaloneROOT 2018: iminuit and MINUIT2 Standalone
ROOT 2018: iminuit and MINUIT2 StandaloneHenry Schreiner
 
RDM 2020: Python, Numpy, and Pandas
RDM 2020: Python, Numpy, and PandasRDM 2020: Python, Numpy, and Pandas
RDM 2020: Python, Numpy, and PandasHenry Schreiner
 
2019 IRIS-HEP AS workshop: Particles and decays
2019 IRIS-HEP AS workshop: Particles and decays2019 IRIS-HEP AS workshop: Particles and decays
2019 IRIS-HEP AS workshop: Particles and decaysHenry Schreiner
 
PEARC17: Modernizing GooFit: A Case Study
PEARC17: Modernizing GooFit: A Case StudyPEARC17: Modernizing GooFit: A Case Study
PEARC17: Modernizing GooFit: A Case StudyHenry Schreiner
 
Massively Parallel Processing with Procedural Python (PyData London 2014)
Massively Parallel Processing with Procedural Python (PyData London 2014)Massively Parallel Processing with Procedural Python (PyData London 2014)
Massively Parallel Processing with Procedural Python (PyData London 2014)Ian Huston
 
Pig: Data Analysis Tool in Cloud
Pig: Data Analysis Tool in Cloud Pig: Data Analysis Tool in Cloud
Pig: Data Analysis Tool in Cloud Jianfeng Zhang
 
Mixing C++ & Python II: Pybind11
Mixing C++ & Python II: Pybind11Mixing C++ & Python II: Pybind11
Mixing C++ & Python II: Pybind11corehard_by
 
Pypy is-it-ready-for-production-the-sequel
Pypy is-it-ready-for-production-the-sequelPypy is-it-ready-for-production-the-sequel
Pypy is-it-ready-for-production-the-sequelMark Rees
 
GitRecruit final 1
GitRecruit final 1GitRecruit final 1
GitRecruit final 1Yinghan Fu
 
High scalable applications with Python
High scalable applications with PythonHigh scalable applications with Python
High scalable applications with PythonGiuseppe Broccolo
 
私は如何にして心配するのを止めてPyTorchを愛するようになったか
私は如何にして心配するのを止めてPyTorchを愛するようになったか私は如何にして心配するのを止めてPyTorchを愛するようになったか
私は如何にして心配するのを止めてPyTorchを愛するようになったかYuta Kashino
 
確率的プログラミングライブラリEdward
確率的プログラミングライブラリEdward確率的プログラミングライブラリEdward
確率的プログラミングライブラリEdwardYuta Kashino
 

What's hot (20)

Digital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meetingDigital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meeting
 
CHEP 2018: A Python upgrade to the GooFit package for parallel fitting
CHEP 2018: A Python upgrade to the GooFit package for parallel fittingCHEP 2018: A Python upgrade to the GooFit package for parallel fitting
CHEP 2018: A Python upgrade to the GooFit package for parallel fitting
 
Pybind11 - SciPy 2021
Pybind11 - SciPy 2021Pybind11 - SciPy 2021
Pybind11 - SciPy 2021
 
ROOT 2018: iminuit and MINUIT2 Standalone
ROOT 2018: iminuit and MINUIT2 StandaloneROOT 2018: iminuit and MINUIT2 Standalone
ROOT 2018: iminuit and MINUIT2 Standalone
 
RDM 2020: Python, Numpy, and Pandas
RDM 2020: Python, Numpy, and PandasRDM 2020: Python, Numpy, and Pandas
RDM 2020: Python, Numpy, and Pandas
 
2019 IRIS-HEP AS workshop: Particles and decays
2019 IRIS-HEP AS workshop: Particles and decays2019 IRIS-HEP AS workshop: Particles and decays
2019 IRIS-HEP AS workshop: Particles and decays
 
PEARC17: Modernizing GooFit: A Case Study
PEARC17: Modernizing GooFit: A Case StudyPEARC17: Modernizing GooFit: A Case Study
PEARC17: Modernizing GooFit: A Case Study
 
Massively Parallel Processing with Procedural Python (PyData London 2014)
Massively Parallel Processing with Procedural Python (PyData London 2014)Massively Parallel Processing with Procedural Python (PyData London 2014)
Massively Parallel Processing with Procedural Python (PyData London 2014)
 
Pig: Data Analysis Tool in Cloud
Pig: Data Analysis Tool in Cloud Pig: Data Analysis Tool in Cloud
Pig: Data Analysis Tool in Cloud
 
Mixing C++ & Python II: Pybind11
Mixing C++ & Python II: Pybind11Mixing C++ & Python II: Pybind11
Mixing C++ & Python II: Pybind11
 
Substituting HDF5 tools with Python/H5py scripts
Substituting HDF5 tools with Python/H5py scriptsSubstituting HDF5 tools with Python/H5py scripts
Substituting HDF5 tools with Python/H5py scripts
 
Pypy is-it-ready-for-production-the-sequel
Pypy is-it-ready-for-production-the-sequelPypy is-it-ready-for-production-the-sequel
Pypy is-it-ready-for-production-the-sequel
 
Using HDF5 and Python: The H5py module
Using HDF5 and Python: The H5py moduleUsing HDF5 and Python: The H5py module
Using HDF5 and Python: The H5py module
 
HDF5 Tools
HDF5 ToolsHDF5 Tools
HDF5 Tools
 
GitRecruit final 1
GitRecruit final 1GitRecruit final 1
GitRecruit final 1
 
Adding CF Attributes to an HDF5 File
Adding CF Attributes to an HDF5 FileAdding CF Attributes to an HDF5 File
Adding CF Attributes to an HDF5 File
 
High scalable applications with Python
High scalable applications with PythonHigh scalable applications with Python
High scalable applications with Python
 
私は如何にして心配するのを止めてPyTorchを愛するようになったか
私は如何にして心配するのを止めてPyTorchを愛するようになったか私は如何にして心配するのを止めてPyTorchを愛するようになったか
私は如何にして心配するのを止めてPyTorchを愛するようになったか
 
Pydata2017 11-29
Pydata2017 11-29Pydata2017 11-29
Pydata2017 11-29
 
確率的プログラミングライブラリEdward
確率的プログラミングライブラリEdward確率的プログラミングライブラリEdward
確率的プログラミングライブラリEdward
 

Similar to PyHEP 2019: Python Histogramming Packages

[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...NAVER D2
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
HyperLogLog in Hive - How to count sheep efficiently?
HyperLogLog in Hive - How to count sheep efficiently?HyperLogLog in Hive - How to count sheep efficiently?
HyperLogLog in Hive - How to count sheep efficiently?bzamecnik
 
C++ tutorial boost – 2013
C++ tutorial   boost – 2013C++ tutorial   boost – 2013
C++ tutorial boost – 2013Ratsietsi Mokete
 
Approximate methods for scalable data mining (long version)
Approximate methods for scalable data mining (long version)Approximate methods for scalable data mining (long version)
Approximate methods for scalable data mining (long version)Andrew Clegg
 
Beyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To CodeBeyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To CodeYuto Hayamizu
 
Web Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web ArchivesWeb Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web ArchivesHelge Holzmann
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Andrii Gakhov
 
Fletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAFletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAGanesan Narayanasamy
 
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...Piotr Przymus
 
Take advantage of C++ from Python
Take advantage of C++ from PythonTake advantage of C++ from Python
Take advantage of C++ from PythonYung-Yu Chen
 
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)Ontico
 
Whippet: A new production embeddable garbage collector for Guile
Whippet: A new production embeddable garbage collector for GuileWhippet: A new production embeddable garbage collector for Guile
Whippet: A new production embeddable garbage collector for GuileIgalia
 
Graph operations in Git version control system
Graph operations in Git version control systemGraph operations in Git version control system
Graph operations in Git version control systemJakub Narębski
 
Python in geospatial analysis
Python in geospatial analysisPython in geospatial analysis
Python in geospatial analysisSakthivel R
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Christian Peel
 

Similar to PyHEP 2019: Python Histogramming Packages (20)

Pronto raster v3
Pronto raster v3Pronto raster v3
Pronto raster v3
 
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
doug
dougdoug
doug
 
HyperLogLog in Hive - How to count sheep efficiently?
HyperLogLog in Hive - How to count sheep efficiently?HyperLogLog in Hive - How to count sheep efficiently?
HyperLogLog in Hive - How to count sheep efficiently?
 
C++ tutorial boost – 2013
C++ tutorial   boost – 2013C++ tutorial   boost – 2013
C++ tutorial boost – 2013
 
Approximate methods for scalable data mining (long version)
Approximate methods for scalable data mining (long version)Approximate methods for scalable data mining (long version)
Approximate methods for scalable data mining (long version)
 
Beyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To CodeBeyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To Code
 
Web Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web ArchivesWeb Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web Archives
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
 
Fletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAFletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGA
 
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
 
Take advantage of C++ from Python
Take advantage of C++ from PythonTake advantage of C++ from Python
Take advantage of C++ from Python
 
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
 
Whippet: A new production embeddable garbage collector for Guile
Whippet: A new production embeddable garbage collector for GuileWhippet: A new production embeddable garbage collector for Guile
Whippet: A new production embeddable garbage collector for Guile
 
Graph operations in Git version control system
Graph operations in Git version control systemGraph operations in Git version control system
Graph operations in Git version control system
 
Python in geospatial analysis
Python in geospatial analysisPython in geospatial analysis
Python in geospatial analysis
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
 

More from Henry Schreiner

Software Quality Assurance Tooling - Wintersession 2024
Software Quality Assurance Tooling - Wintersession 2024Software Quality Assurance Tooling - Wintersession 2024
Software Quality Assurance Tooling - Wintersession 2024Henry Schreiner
 
Princeton RSE Peer network first meeting
Princeton RSE Peer network first meetingPrinceton RSE Peer network first meeting
Princeton RSE Peer network first meetingHenry Schreiner
 
Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023Henry Schreiner
 
Princeton Wintersession: Software Quality Assurance Tooling
Princeton Wintersession: Software Quality Assurance ToolingPrinceton Wintersession: Software Quality Assurance Tooling
Princeton Wintersession: Software Quality Assurance ToolingHenry Schreiner
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11Henry Schreiner
 
Everything you didn't know you needed
Everything you didn't know you neededEverything you didn't know you needed
Everything you didn't know you neededHenry Schreiner
 
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...Henry Schreiner
 
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packagingPyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packagingHenry Schreiner
 
PyCon2022 - Building Python Extensions
PyCon2022 - Building Python ExtensionsPyCon2022 - Building Python Extensions
PyCon2022 - Building Python ExtensionsHenry Schreiner
 
boost-histogram / Hist: PyHEP Topical meeting
boost-histogram / Hist: PyHEP Topical meetingboost-histogram / Hist: PyHEP Topical meeting
boost-histogram / Hist: PyHEP Topical meetingHenry Schreiner
 
HOW 2019: Machine Learning for the Primary Vertex Reconstruction
HOW 2019: Machine Learning for the Primary Vertex ReconstructionHOW 2019: Machine Learning for the Primary Vertex Reconstruction
HOW 2019: Machine Learning for the Primary Vertex ReconstructionHenry Schreiner
 
HOW 2019: A complete reproducible ROOT environment in under 5 minutes
HOW 2019: A complete reproducible ROOT environment in under 5 minutesHOW 2019: A complete reproducible ROOT environment in under 5 minutes
HOW 2019: A complete reproducible ROOT environment in under 5 minutesHenry Schreiner
 
ACAT 2019: A hybrid deep learning approach to vertexing
ACAT 2019: A hybrid deep learning approach to vertexingACAT 2019: A hybrid deep learning approach to vertexing
ACAT 2019: A hybrid deep learning approach to vertexingHenry Schreiner
 
2019 CtD: A hybrid deep learning approach to vertexing
2019 CtD: A hybrid deep learning approach to vertexing2019 CtD: A hybrid deep learning approach to vertexing
2019 CtD: A hybrid deep learning approach to vertexingHenry Schreiner
 
2019 IML workshop: A hybrid deep learning approach to vertexing
2019 IML workshop: A hybrid deep learning approach to vertexing2019 IML workshop: A hybrid deep learning approach to vertexing
2019 IML workshop: A hybrid deep learning approach to vertexingHenry Schreiner
 
LHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNsLHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNsHenry Schreiner
 

More from Henry Schreiner (18)

Software Quality Assurance Tooling - Wintersession 2024
Software Quality Assurance Tooling - Wintersession 2024Software Quality Assurance Tooling - Wintersession 2024
Software Quality Assurance Tooling - Wintersession 2024
 
Princeton RSE Peer network first meeting
Princeton RSE Peer network first meetingPrinceton RSE Peer network first meeting
Princeton RSE Peer network first meeting
 
Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023
 
Princeton Wintersession: Software Quality Assurance Tooling
Princeton Wintersession: Software Quality Assurance ToolingPrinceton Wintersession: Software Quality Assurance Tooling
Princeton Wintersession: Software Quality Assurance Tooling
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11
 
Everything you didn't know you needed
Everything you didn't know you neededEverything you didn't know you needed
Everything you didn't know you needed
 
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
 
SciPy 2022 Scikit-HEP
SciPy 2022 Scikit-HEPSciPy 2022 Scikit-HEP
SciPy 2022 Scikit-HEP
 
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packagingPyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
 
PyCon2022 - Building Python Extensions
PyCon2022 - Building Python ExtensionsPyCon2022 - Building Python Extensions
PyCon2022 - Building Python Extensions
 
boost-histogram / Hist: PyHEP Topical meeting
boost-histogram / Hist: PyHEP Topical meetingboost-histogram / Hist: PyHEP Topical meeting
boost-histogram / Hist: PyHEP Topical meeting
 
CMake best practices
CMake best practicesCMake best practices
CMake best practices
 
HOW 2019: Machine Learning for the Primary Vertex Reconstruction
HOW 2019: Machine Learning for the Primary Vertex ReconstructionHOW 2019: Machine Learning for the Primary Vertex Reconstruction
HOW 2019: Machine Learning for the Primary Vertex Reconstruction
 
HOW 2019: A complete reproducible ROOT environment in under 5 minutes
HOW 2019: A complete reproducible ROOT environment in under 5 minutesHOW 2019: A complete reproducible ROOT environment in under 5 minutes
HOW 2019: A complete reproducible ROOT environment in under 5 minutes
 
ACAT 2019: A hybrid deep learning approach to vertexing
ACAT 2019: A hybrid deep learning approach to vertexingACAT 2019: A hybrid deep learning approach to vertexing
ACAT 2019: A hybrid deep learning approach to vertexing
 
2019 CtD: A hybrid deep learning approach to vertexing
2019 CtD: A hybrid deep learning approach to vertexing2019 CtD: A hybrid deep learning approach to vertexing
2019 CtD: A hybrid deep learning approach to vertexing
 
2019 IML workshop: A hybrid deep learning approach to vertexing
2019 IML workshop: A hybrid deep learning approach to vertexing2019 IML workshop: A hybrid deep learning approach to vertexing
2019 IML workshop: A hybrid deep learning approach to vertexing
 
LHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNsLHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNs
 

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

PyHEP 2019: Python Histogramming Packages

  • 2. Overview • Part 1: Overview of histograms ▶ Components of a Histogram ▶ Histograms in Python ▶ Boost.Histogram in C++14 ▶ Introducing: boost-histogram for Python ▶ Outlook, with hist and aghast • Part 2: Hands-on with boost-histogram 1/26Henry Schreiner Histogramming October 17, 2019
  • 3. What is a histogram? • A histogram is a set of accumulators over data in ranges ▶ Usually continuous in Physics, could also be categories ▶ Accumulators often are a sum of values - can contain other components • Input values are digitized by axes (AKA binnings) ▶ Categories ▶ Real values ▶ Variable sized bins (usually give edges) ▶ Regular binning (#bins, start, stop) ▶ May have special features (overflow, circular, etc.) 2/26Henry Schreiner Histogramming October 17, 2019
  • 4. Histogram components A ‘histogram is a collection of 1+ axes and an accumulator. Performance • Variable axis - list of edges is most general but requires a sorted search. • Regular axis: regular spacing Regular axis Variable axis axes Optional overflowOptional underflow Accumulator 3/26Henry Schreiner Histogramming October 17, 2019
  • 5. Histograms in (classic) PyROOT • 1D Regular h = ROOT.TH1D("", "", 10, 0, 1) h.fillN(arr) • 1D Variable h = ROOT.TH1D("", "", (1,2,3,4,5,6)) h.fillN(arr) • 2D Regular h = ROOT.TH2D("", "", 10, 0, 1, 20, 0, 2) h.fillN(arr) 4/26Henry Schreiner Histogramming October 17, 2019
  • 6. Histogram in Numpy • 1D Regular bins, edges = np.histogram(arr, bins=10, range=(0,1)) • 1D Variable bins, edges = np.histogram(arr, bins=(1,2,3,4,5,6)) • 2D regular b, e1, e2 = np.histogram2d(x, y, bins=(10,20), range=((0,1),(0,2))) 5/26Henry Schreiner Histogramming October 17, 2019
  • 7. Numpy Pros and Cons Pros • Comes with Numpy • Good for interactive operations (auto binning) • Reasonably fast • Density option, weight support too Cons • Manipulation of plain arrays • One time fill • 2D+ not optimized for regular binning • 1D, 2D, and ND syntax variations • MPL had to mimic: plt.hist 6/26Henry Schreiner Histogramming October 17, 2019
  • 8. PyROOT Pros and Cons Pros • Full histogram object • Iterative fill option • Weights option • Can track sum of weights too Cons • ROOT requirement (Conda-forge helps) • Can be slow in Python (and C++) • Poor interactive exploration • Odd syntax, odd memory model • Max 3D 7/26Henry Schreiner Histogramming October 17, 2019
  • 9. Histogram Libraries • Narrow focus: speed, plotting, or language • Many are abandoned • Often issues with design, backends, distribution • No/little interaction HistBook Histogrammar pygram11 rootplotlib PyROOT YODA physt fast-histogramqhist Vaex hdrhistogram multihist matplotlib-hep pyhistogram histogram SimpleHist paida theodoregoetz numpy 8/26Henry Schreiner Histogramming October 17, 2019
  • 10. Physt • Histograms as objects • Pure Python - Dropped Python 2 this year :) • Very slow fills (slower than numpy) hist = histogram(heights) hist.plot(show_values=True) • Powerful plotting • Easy conversion to Pandas and many more (ROOT through uproot) • Special histograms, like polar histograms Figure 1: Physt example default plot 9/26Henry Schreiner Histogramming October 17, 2019
  • 11. Fast-Histogram • Exactly like numpy, but faster ▶ C kernel ▶ Takes advantage of regular binning ▶ Can be 20-25x faster for 2D histograms ▶ Missing some features / combinations Figure 2: Fast Histogram 2d comparison with Numpy 10/26Henry Schreiner Histogramming October 17, 2019
  • 12. HistBook (archived) The first Scikit-HEP library for histograms • Designed for shared axis histogram collections • Plotting with Vega-Light Now deprecated and in archive mode, functionality may return in Hist (see next slides). >>> array = np.random.normal(0, 1, 1000000) >>> histogram = Hist(bin("data", 10, -5, 5)) >>> histogram.fill(data=array) >>> histogram.step("data").to(canvas) 11/26Henry Schreiner Histogramming October 17, 2019
  • 13. SciKit-HEP Histogramming plan • boost-histogram: Fast filling and manipulation (core library) • hist: Simple analysis frontend • aghast: Conversions between histogram libraries • UHI: Unified Histogram Indexing: A way for histograms to be indexed cross-library (boost-histogram and hist to begin with) Core histogramming libraries boost-histogram ROOT Universal adaptor Aghast Front ends (plotting, etc) hist mpl-hep physt others 12/26Henry Schreiner Histogramming October 17, 2019
  • 14. Boost.Histogram C++14 • Multidimensional templated header-only histogram library: /boostorg/histogram • Designed by Hans Dembinski, inspired by ROOT and GSL Histogram • Axes • Storage Axes types • Regular, Circular • Variable • Integer • Category Storage ( Static Dynamic )Regular axis Regular axis with log transformaxes Optional overflowOptional underflow Accumulator int, double, unlimited, ... 13/26Henry Schreiner Histogramming October 17, 2019
  • 15. Boost.Histogram example #include <boost/histogram.hpp> #include <boost/histogram/ostream.hpp> #include <random> int main() { namespace bh = boost::histogram; auto hist = bh::make_histogram(bh::axis::regular<>{20, -3, 3}); std::default_random_engine eng; std::normal_distribution<double> dist(0, 1); for(int n = 0; n < 10'000; ++n) hist(dist(eng)); std::cout << hist << std::endl; return 0; } 14/26Henry Schreiner Histogramming October 17, 2019
  • 16. Boost.Histogram example (output) histogram(regular(20, -3, 3, options=underflow | overflow)) +---------------------------------------------------------- + [-inf, -3) 9 | | [ -3, -2.7) 19 |= | [-2.7, -2.4) 36 |== | [-2.4, -2.1) 110 |===== | [-2.1, -1.8) 191 |========= | [-1.8, -1.5) 275 |============= | [-1.5, -1.2) 518 |========================= | [-1.2, -0.9) 644 |=============================== | [-0.9, -0.6) 914 |============================================ | [-0.6, -0.3) 1107 |===================================================== | [-0.3, 0) 1183 |========================================================= | [ 0, 0.3) 1185 |========================================================= | [ 0.3, 0.6) 1120 |====================================================== | [ 0.6, 0.9) 874 |========================================== | [ 0.9, 1.2) 663 |================================ | [ 1.2, 1.5) 491 |======================== | [ 1.5, 1.8) 322 |=============== | [ 1.8, 2.1) 172 |======== | [ 2.1, 2.4) 79 |==== | [ 2.4, 2.7) 38 |== | [ 2.7, 3) 28 |= | [ 3, inf) 22 |= | +---------------------------------------------------------- + 15/26Henry Schreiner Histogramming October 17, 2019
  • 17. boost-histogram: Python bindings Design • A histogram should be an object • Manipulation and plotting should be easy Performance • Fast filling • Compiled composable manipulations Flexibility • Axes options: sparse, growing, labels • Storage: integers, weights, errors… Distribution • Easy to use anywhere, pip or conda • Should have wheels, be easy to build, etc. 16/26Henry Schreiner Histogramming October 17, 2019
  • 18. Intro to the Python bindings • Boost.Histogram developed with Python in mind • Original bindings based on Boost::Python ▶ Hard to build and distribute ▶ Somewhat limited • New bindings: /scikit-hep/boost-histogram ▶ 0-dependency build (C++14 only) ▶ State-of-the-art PyBind11 Design Flexibility Speed Distribution 17/26Henry Schreiner Histogramming October 17, 2019
  • 19. Design • 500+ unit tests run on Azure on Linux, macOS, and Windows Resembles the original Boost.Histogram where possible, with changes where needed for Python performance and idioms. C++14 #include <boost/histogram.hpp> namespace bh = boost::histogram; auto hist = bh::make_histogram( bh::axis::regular<>{2, 0, 1, "x"}, bh::axis::regular<>{4, 0, 1, "y"}); hist(.2, .3); // Fill will also be hist(.4, .5); // availble in 1.7.2 hist(.3, .2); Python import boost.histogram as bh hist = bh.histogram( bh.axis.regular(2, 0, 1, metadata="x"), bh.axis.regular(4, 0, 1, metadata="y")) hist.fill( [.2, .4, .3], [.3, .5, .2]) 18/26Henry Schreiner Histogramming October 17, 2019
  • 20. Design: Manipulations Combine two histograms hist1 + hist2 Scale a histogram hist * 2.0 Sum a histogram contents hist.sum() Access an axis ax = hist.axis(0) ax.edges # The edges array ax.centers # Centers of bins ax.widths # Width of each bin Fill 2D histogram with values or arrays hist.fill(x, y) Convert contents to Numpy array hist.view() Convert to Numpy style histogram tuple hist.to_numpy() Pickle supported (multiprocessing) pickle.dumps(hist, -1) Copy/deepcopy supported hist2 = copy.deepcopy(hist) 19/26Henry Schreiner Histogramming October 17, 2019
  • 21. Unified Histogram Indexing (UHI) The language here (bh.loc, etc) is defined in such a way that any library can provide them - “Unified”. Access v = h[b] # Returns bin contents, indexed by bin number v = h[bh.loc(b)] # Returns the bin containing the value v = h[bh.underflow] # Underflow and overflow can be accessed with special tags Setting h[b] = v h[bh.loc(b)] = v h[bh.underflow] = v 20/26Henry Schreiner Histogramming October 17, 2019
  • 22. Unified Histogram Indexing (UHI) (2) h == h[:] # Slice over everything h2 = h[a:b] # Slice of histogram (includes flow bins) h2 = h[:b] # Leaving out endpoints is okay h2 = h[bh.loc(v):] # Slices can be in data coordinates, too h2 = h[::bh.project] # Sum an axis (name may change) h2 = h[::bh.rebin(2)] # Modification operations (rebin) h2 = h[a:b:bh.rebin(2)] # Modifications can combine with slices h2 = h[a:b, ...] # Ellipsis work just like normal numpy • Docs are here • Description may move to a new repository 21/26Henry Schreiner Histogramming October 17, 2019
  • 23. Performance • Factor of 2 faster than 1D regular binning in Numpy 1.17 ▶ Currently no specialization, just a 1D regular fill ▶ Could be optimized further • Factor of 6-10 faster than 2D regular binning Numpy 22/26Henry Schreiner Histogramming October 17, 2019
  • 24. Distribution • We must provide excellent distribution. ▶ If anyone writes pip install boost-histogram and it fails, we have failed. • Docker ManyLinux1 GCC 9.2: /scikit-hep/manylinuxgcc • Used in /scikit-hep/iMinuit, see /scikit-hep/azure-wheel-helpers Wheels • manylinux1 32 and 64 bit, Py 2.7 & 3.5–3.7 • manylinux2010 64 bit, Py 2.7 & 3.5–3.8 • macOS 10.9+ 64 bit, Py 2.7 & 3.6–3.8 • Windows 32 and 64 bit, Py 2.7 & 3.6–3.7 Source • SDist • Build directly from GitHub Conda • conda-forge package planned python -m pip install boost-histogram # OR git+https://github.com/scikit-hep/boost-histogram.git@develop 23/26Henry Schreiner Histogramming October 17, 2019
  • 25. Hist hist is the ‘wrapper’ piece that does plotting and interacts with the rest of the ecosystem. Plans • Easy plotting adaptors (mpl-hep) • Serialization formats via Aghast (ROOT, HDF5) • Auto-multithreading • Statistical functions (Like TEfficiency) • Multihistograms (HistBook) • Interaction with fitters (ZFit, GooFit, etc) • Bayesian Blocks algorithm from SciKit-HEP • Command line histograms for stream of numbers Call for contributions • What do you need? • What do you want? • What would you like? Join in the development! This should combine the best features of other packages. 24/26Henry Schreiner Histogramming October 17, 2019
  • 26. Aghast Aghast is a histogramming library that does not fill histograms and does not plot them. • A memory format for histograms, like Apache Arrow • Converts to and from other libraries • Uses flatbuffers to hold histograms • Indexing ideas inspired the UHI Binnings IntegerBinning • RegularBinning • HexagonalBinning • EdgesBinning • IrregularBinning • CategoryBinning • SparseRegularBinning • FractionBinning • PredicateBinning • VariationBinning 25/26Henry Schreiner Histogramming October 17, 2019
  • 27. End of part 1 Now, we will go hands on with the first beta of boost-histogram! Support • Supported by IRIS-HEP, NSF OAC-1836650 26/26Henry Schreiner Histogramming October 17, 2019