2. Black boxes are bad. As physicists, our goal is to
understand everything about the experiment as much as
possible. That includes (especially!) the analysis.
2
3. Goals for the next slides
• Key words - some you’ve heard before, some
perhaps not
• Explanation
• Practical examples
WARNING: may be
a bit pedantic
3
4. What I’m not going to talk
about
• The very basics… Gaussian, Poisson, Binomial are
all probably concepts you understand at least at
some level.
• Any sort of derivations
• A deep discussion of probability (e.g. Bayesian vs.
Frequentist)
4
5. Bread and butter
• Minimization:
• Maximum Likelihood
• χ2 - fits
• (Markov Chain Monte Carlo) - fitting models with
Bayesian statistics and MC methods
5
6. Maximum Likelihood
• ‘Maximize your likelihood function’
• (Actually, you typically minimize the -log L)
• You may choose a binning, but you don’t have to. (so-called ‘unbinned’ ML fits…
important when e.g. a binning choice may bias your fits.)
• The probability functions depend upon the underlying statistics, it’s probably normally
Poisson functions. Note: you can of course ‘multiply’ the likelihood function with
external information… this is how you build in external constraints.
L =
NY
n=1
f(✓, xn)
number of
points (bins)
f: probability for
seeing xn given θ
xn: nth data point
θ : set of parameters
6
7. χ2 fits
L =
NY
n=1
f(✓, xn)
assume f is Gaussian
and take the - log L
NY
n=1
Ce [xn y(✓,xn)]2
/2 2
n
NX
n=1
(xn y(✓, xn))2
2 2
n
This is χ2 modulo a
factor 2
7
8. χ2 fits
NX
n=1
(xn y(✓, xn))2
2
n
• This is a χ
2
function. It assumes that your underlying statistics are
Gaussian. If this is not true (e.g. your statistics are low) your
results may not be correct.
• You must choose a binning, which can open you up to binning biases
• This has some neat features, including a built-in “Goodness-of-
fit” (more on this later)
8
9. Ok, so how do I use this?
• Typically, you don’t have to build your own χ2 or -log
L function. e.g. ROOT, RooFit will do this for you.
But sometimes you have to build your own.
• You minimize this function using some sort of
arbitrary minimizer. ROOT using MIGRAD which is
part of the MINUIT2 suite. It is generally very robust.
• (The following uses the output from ROOT, but
generally other minimizers should give you similar
output.)
9
10. Note, the fit was
unbinned, but obviously
the plotting needs a
choice of bins
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-18938.8 FROM HESSE STATUS=OK 16 CALLS 65 TOTAL
EDM=5.46776e-08 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 mean 4.99743e+01 1.42546e-01 2.70887e-03 4.99743e+01
2 num 5.00000e+03 7.07084e+01 5.37579e-05 6.02076e-07
3 sigma 1.00794e+01 1.00805e-01 3.83045e-04 1.00794e+01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=0.5
2.032e-02 5.071e-07 2.719e-06
5.071e-07 5.000e+03 -1.795e-06
2.719e-06 -1.795e-06 1.016e-02
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00019 1.000 0.000 0.000
2 0.00000 0.000 1.000 -0.000
3 0.00019 0.000 -0.000 1.000
Examples: ML
10
11. COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=67.5572 FROM MIGRAD STATUS=CONVERGED 43 CALLS 44 TOTAL
EDM=5.17307e-07 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 mean 5.00336e+01 1.43872e-01 5.80851e-04 4.58652e-03
2 num 5.03375e+03 7.09462e+01 5.73799e-05 -5.16551e-02
3 sigma 1.02075e+01 9.98379e-02 4.02277e-04 -2.48082e-03
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=1
2.070e-02 3.374e-04 -2.595e-05
3.374e-04 5.034e+03 1.873e-04
-2.595e-05 1.873e-04 9.968e-03
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00181 1.000 0.000 -0.002
2 0.00004 0.000 1.000 0.000
3 0.00181 -0.002 0.000 1.000
Examples: χ2
Fit has 100 bins
11
12. Some comments
• The value of the minimized -log L is arbitrary. In
contrast, the minimized χ2 gives you information
(goodness-of-fit, more later)
http://seal.web.cern.ch/seal/documents/minuit/mnerror.pdf
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=67.5572 FROM MIGRAD STATUS=CONVERGED 43 CALLS 44 TOTAL
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-18938.8 FROM HESSE STATUS=OK 16 CALLS 65 TOTAL
12
13. Some comments
• The correlation matrix (which is just a normalized
covariance matrix) gives you information about the
interaction of your parameters. Values close to 1
(e.g. 0.95 and above) can be a cause for concern!
http://seal.web.cern.ch/seal/documents/minuit/mnerror.pdf
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00019 1.000 0.000 0.000
2 0.00000 0.000 1.000 -0.000
3 0.00019 0.000 -0.000 1.000
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00181 1.000 0.000 -0.002
2 0.00004 0.000 1.000 0.000
3 0.00181 -0.002 0.000 1.000
13
14. Some comments
• What about the errors on the parameters and all
that information?
http://seal.web.cern.ch/seal/documents/minuit/mnerror.pdf
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 mean 4.99743e+01 1.42546e-01 2.70887e-03 4.99743e+01
2 num 5.00000e+03 7.07084e+01 5.37579e-05 6.02076e-07
3 sigma 1.00794e+01 1.00805e-01 3.83045e-04 1.00794e+01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=0.5
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 mean 5.00336e+01 1.43872e-01 5.80851e-04 4.58652e-03
2 num 5.03375e+03 7.09462e+01 5.73799e-05 -5.16551e-02
3 sigma 1.02075e+01 9.98379e-02 4.02277e-04 -2.48082e-03
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=1
14
15. Errors on parameters
• are defined as those values which increase the χ2 (-log
L) function from its minimum by 1 (0.5). (This was the
“ERROR DEF” seen on previous slide.)
• If your measurement error bars are wrong, the errors
on your parameters will be wrong, too!
• Most of this is derived from the curvature of the -log L, or
χ2, function: the second derivative matrix is calculated
and inverted at the minimum… (this always assumes the
shape at the minimum is parabolic). These errors are
not always appropriate, or accurate.
http://seal.web.cern.ch/seal/documents/minuit/mnerror.pdf
15
16. Covariance Matrix (or Error Matrix)
16
Vxy = E[(x µx)(y µy)]
(This is shown later as Σ).
Note: this is typically estimated during the
minimization process and is related to the curvature
of the -log L/χ2 function with respect to the
parameters
17. Covariance Matrix (or Error Matrix)
17
It is the inverse of the second derivative matrix:
(Vxy)
1
=
@2
log L
@x@y x=ˆx,y=ˆy
evaluated at the minimum. (To convince yourself of this,
consider that -log L should have the form of a multivariate
gaussian at/near the minimum.)
18. • Procedure: fix the parameter(s) of interest, and minimize the -
log L, χ2
. Scan the parameter and repeat.
• This finds the minimum contour for this parameter(s) of
interest.
• See, likelihood ratio, profile likelihood scan, chi-square profile.
This is a -log L
function, so the
1σ error is at 0.5
Profiling
18
19. A 2-D example of a profile scan
39
Rn in air gap500 100015002000250030003500400045005000
UinHFE
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
ProfileNLL
0
10
20
30
40
50
FIG. 29. 2D profile likelihood surface as a function of di↵erent contribution of U-like components
Shows correlation of two parameters, as well as the
combined error.19
20. How do I tell if the fit is ok?
• Goodness of fit!
• χ2, this is built in. (Note, you should also quote the χ2
and the number of degrees of freedom, never only
the reduced chi-sq). Also, as a shortcut, usually we
think χ2/NDF ~ 1 is “good”, but remember that larger
deviations from 1 are expected for small NDF, and only
small deviations are tolerated for large NDFs.
• You should use a chi-squared distribution lookup to get
the appropriate probability for a given χ2, NDF value pair.
e.g. ROOT: TMath::Prob(chi2, ndf)
20
21. How do I tell if the fit is ok?
Chi-Sq PDF with different NDFs:
(The minimum of a χ2 function is so distributed.)
https://en.wikipedia.org/wiki/Chi-squared_distribution
21
22. How do I tell if the fit is ok?
Chi-Sq CDF with different NDFs:
(The minimum of a χ2 function is so distributed.)
https://en.wikipedia.org/wiki/Chi-squared_distribution
22
23. How do I tell if the fit is ok?
• For ML fits, you can also calculate a chi-square
(but you must first choose a binning).
• Other “goodness of fit” tests, or rather “tests to see
if the data are derived from your fit model”:
Kolmogorov-Smirnov test.
• https://en.wikipedia.org/wiki/Kolmogorov
%E2%80%93Smirnov_test
23
24. Adding external information
L =
NY
n=1
f(✓, xn)
• You’ll hear talk of adding “penalty functions”, “weighting functions” to your
-log L, or χ
2
function
• Really, all that’s being done is a multiplication of the likelihood with an
additional probability.
• The exact form used depends on your problem, a Gaussian is a typical
choice
How do I incorporate other measurements,
systematic errors?
24
25. Adding external information
L =
NY
n=1
f(✓, xn)
Multivariate gaussian. Here, Σ is a positive definite
(symmetric) covariance matrix
25
27. Adding external information
which means your -log L function gets an additional
term added (ρ is the correlation between parameters):
(for χ2, this term multiplied by 2 is added)
27
1
2(1 ⇢2)
(x µx)2
2
x
+
(y µy)2
2
y
(x µx)(y µy)
x y
28. Adding external information
For just a single parameter (or a set of uncorrelated
parameters), one just adds
parameter of
interest
expected
(measured?) value
(measured) error
on this value
28
1
2
(x µx)2
2
x
29. That’s it, for now…
There’s a lot more to learn. If your program/fit gives you output
you don’t understand, try to understand it!
29
30. G. Cowan Statistical Data Analysis / Stat 1 5
Some statistics books, papers, etc.
G. Cowan, Statistical Data Analysis, Clarendon, Oxford, 1998
R.J. Barlow, Statistics: A Guide to the Use of Statistical Methods in
the Physical Sciences, Wiley, 1989
Ilya Narsky and Frank C. Porter, Statistical Analysis Techniques in
Particle Physics, Wiley, 2014.
L. Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986
F. James., Statistical and Computational Methods in Experimental
Physics, 2nd ed., World Scientific, 2006
S. Brandt, Statistical and Computational Methods in Data
Analysis, Springer, New York, 1998 (with program library on CD)
J. Beringer et al. (Particle Data Group), Review of Particle Physics,
Phys. Rev. D86, 010001 (2012) ; see also pdg.lbl.gov sections on
probability, statistics, Monte Carlo
From: http://www.pp.rhul.ac.uk/~cowan/stat/stat_1.pdf
Have a look at this course series!
A python script with some of the ROOT/RooFit info:
https://gist.github.com/mgmarino/9c030c67072e4295d6ec
30