PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

!
PDF uncertainties at the LHC made easy:
A compression algorithm for the combination of PDF sets
Juan Rojo!
STFC Rutherford Fellow!
Rudolf Peierls Center for Theoretical Physics!
University of Oxford!
!
in collaboration with S. Carrazza, J. I. Latorre and G. Watt!
!
PDF4LHC Meeting!
CERN, 21/01/2015
Juan Rojo PDF4LHC Meeting, 21/01/2015

2
Motivation
!
To provide a practical implementation of the PDF4LHC recommendation, easy to use by the
experiments and computationally less intensive that the original prescription!
Based on the Monte Carlo statistical combination of different PDF sets, followed by a compression
algorithm to end up with a reduced number of replicas!
Similar in spirit to the Meta-PDF approach (Gao and Nadolsky 14) but important conceptual and
practical differences!
Having a single combined PDF sets (even with large number of eigenvector/sets) would already
simplify the life of many people since widely-used tools like MadGraph5_aMC@NLO, POWHEG or
FEWZ provide the PDF uncertainties without any additional cost!
But this is not true for all theory tools used at the LHC, so there is still a strong motivation to be able
to use a combined PDF set with a small number of eigenvectors/replicas!
Here all results obtained for ﬁxed alphas(MZ)=0.118, adding the combined PDF+alphas uncertainty
in quadrature (updated PDF4LHC recommendation) trivial in our approach!
In addition, the compression algorithm can also be used in native MC sets, like NNPDF, starting
from a 1000 replica sample and reducing it to a smaller sample while reproducing all statistical
estimators

3
Basic strategy (I)!
Select the PDF sets that enter the combination. Results here based on NNPDF3.0, CT10 and
MMHT14, but any other choice possible!
Transform the Hessian PDF sets into their Monte Carlo representation (Watt and Thorne 12)
!
Now combine the same number of replicas from each of the three sets (assume equal weight in the
combination). First proposed by Forte 12 !
The resulting Monte Carlo ensemble has a robust statistical interpretation, and in many cases leads
to similar results, with somewhat smaller uncertainties, compared to the original PDF4LHC envelope.
Forte and Watt 13Juan Rojo PDF4LHC Meeting, 21/01/2015

4
The combined PDF set!
Since there is reasonable agreement between CT10, MMHT14 and NNPDF3.0, the resulting
combined distribution is in general Gaussian, but there are also important cases where the non-
gaussianity of the combined PDFs is substantial
x
5
10 4
10
3
10 2
10 1
10
1
0
1
2
3
4
5
6
Q = 1.4142 GeV
NNPDF30_nnlo_as_0118
MMHT2014nnlo68cl_rand1002
CT10nnlo_rand1004
MCcompPDFnnlo
Q = 1.4142 GeV
x * PDF
0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98
Probabilityperbin
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Gluon PDF, x=0.1, Q=100 GeV
histo1
Entries 100
Mean 0.8887
RMS 0.0151
NNPDF3.0
CT10
MMHT14
MCcompPDFs
Gluon PDF, x=0.1, Q=100 GeV
!
It is possible to have smoother distributions by increasing the number of replicas for each set, but
this does not seem to be required by phenomenology!
For typical applications using Nrep=100 for each of the three PDF sets is enough!
Note that in general, the combination of Gaussian distributions is not a Gaussian itself

5
Combined MC set vs PDF4LHC envelope!
As already noted in the Forte-Watt study, the MC combination leads to somewhat smaller uncertainties
than the PDF4LHC envelope (same here and in the Meta-PDFs)!
This can be understood because now each PDF set receives the same weight, while the PDF4LHC
envelope effectively gives more weight to the outliers
Juan Rojo PDF4LHC pre-Meeting, 16/01/2014
(pb)
40
40.5
41
41.5
42
42.5
43
43.5
44
=0.118sggH, ggHiggs NNLO, LHC 13 TeV,
NNPDF3.0
MMHT14
CT10
CMCPDF
=0.118sggH, ggHiggs NNLO, LHC 13 TeV,
(pb)
750
760
770
780
790
800
810
820
830
840
850
=0.118sttbar, top++ NNLO, LHC 13 TeV,
NNPDF3.0
MMHT14
CT10
CMCPDF
=0.118sttbar, top++ NNLO, LHC 13 TeV,
(nb)
3300
3350
3400
3450
3500
3550
3600
3650
=0.118sW+, VRAP NNLO, LHC 13 TeV,
NNPDF3.0
MMHT14
CT10
CMCPDF
=0.118sW+, VRAP NNLO, LHC 13 TeV,
(nb)
460
470
480
490
500
510
=0.118sZ0, VRAP NNLO, LHC 13 TeV,
NNPDF3.0
MMHT14
CT10
CMCPDF
=0.118sZ0, VRAP NNLO, LHC 13 TeV,
PDF4LHC envelope
PDF4LHC envelope
PDF4LHC envelope
PDF4LHC envelope

6
Compression as a mathematical problem!
The goal now is to compressed the combined set of Nrep=300 replicas to a smaller subset, in a way
that this subset reproduces the statistical properties of the original distribution
!
Mathematically, this is a well-deﬁned problem:
compression is ﬁnding the subset that minimises
the distance between two probability
distributions
!
Many equally good minimisations possible,
so choice of minimisation algorithm not crucial
(similar to the travelling salesman problem)!
Mathematically well-posed problem, with a
number of robust solutions!
1. Kolmogorov distance!
2. Kullback-Leibler entropy!
3. …..!
Optimal choice determined by the requirements
of the problem at hand, in this case LHC
phenomenology

7
Basic strategy (II)!
Now we have a single combined set, but number of MC replicas still too large!
Compress the original probability distribution to one with a smaller number of replicas, in a way
that all the relevant estimators (mean, variances, correlations etc) for the PDFs are reproduced
!
The compression is applied at Q = 2 GeV,
though the results are robust wrt other choices!
Various options about how the error
function to be minimised can be deﬁned, ie.,
to reproduce central values add a term
!
The algorithm also minimises the Kolmogorov distance
between the original and compressed distributions
!
Same for variances, correlations and higher
moments!
At the end, optimal choice decided by the
resulting phenomenology

8
Results of the compression!
To gauge improvements due to compression, compare various contributions to the error function in
the best compression and in randoms selection with the same number of replicas
!
Substantial improvements as
compared to random
compressions, typically by one
order or magnitude or more!
Compression is also able to
successfully reproduce higher
moments like skewness or
kurtosis!
Similar improvements for the
correlations and the Kolmogorov
distances
Horizontal dashed line: !
lower limit of 68%CL range for random
compressions with Nrep=100

9
Results of the compression
!
For example, for Nrep=40 replicas the compressed and the original PDFs are virtually identical
x
5
10 4
10
3
10 2
10 1
10
Gluon,ratiotoprior
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
Q = 100 GeV
Prior, 300 MC replicas
Compressed set, 40 MC replicas
Q = 100 GeV
x
5
10 4
10
3
10 2
10 1
10
Up,ratiotoprior
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
Q = 100 GeV
Q = 100 GeV
x
5
10 4
10
3
10 2
10 1
10
Up,ratiotoprior
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
Q = 100 GeV
Q = 100 GeV
x
5
10 4
10
3
10 2
10 1
10
Gluon,ratiotoprior
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
Q = 100 GeV
Q = 100 GeV
!
As expected, for a very small number of replicas (10 in this case) agreement is much worse

10
The compressed PDF set!
Since there is reasonable agreement between CT10, MMHT14 and NNPDF3.0, the resulting
combined distribution looks typically Gaussian
!
On average, the same number of replicas from each of the three sets is selected in the compressed
set, a further demonstration that the algorithm is unbiased
Replicas
0 50 100 150 200 250 300
Entries
0
1
CMC-PDF NLO - 25 replica distribution
NNPDF3.0 CT10 MMHT14
9 replicas 8 replicas 8 replicas

11
Gaussian vs non-Gaussian!
Even if the original PDF sets in the combination are approximately Gaussian, their combination in general will
be non-Gaussian, and linear propagation might not be adequate!
Working in Gaussian approximation might not be reliable: i.e. skewness is not reproduced in the compression
(despite central values and variances are) unless we explicitly include it in the minimised ﬁgure of merit
Juan Rojo PDF4LHC pre-Meeting, 16/01/2014
Skewness included
Skewness excluded

12
Compressing native MC sets!
The compression algorithm can of course be also used in native MC sets, like NNPDF. We have
shown that starting from NNPDF3.0 with Nrep=1000 replicas we can compress down to 40-50 replicas
maintaining all relevant statistical properties
!
Central values and variances well reproduced, but also, non-trivially, also higher moments and
correlations!
Sets with Nrep=1000 replicas are still useful for other applications, like Bayesian reweighting
All plots done with the APFEL Web plotter

13
Phenomenology!
The ultimate validation is of course to check that the compressed set reproduces the original PDF
combination for a wide variety of LHC observables!
We have tested a very large number of processes, both at the inclusive and differential level and
always found that Nrep=20-30 replicas are enough for phenomenology
!
NNLO cross-sections:!
gg->H with ggHiggs!
tt with top++!
W,Z with Vrap

14
Phenomenology!
The ultimate validation is of course to check that the compressed set reproduces the original
combination for a wide variety of observables!
!
NLO inclusive cross sections
with MCFM!
H VBF!
WW!
WH

15
Phenomenology!
The ultimate validation is of course to check that the compressed set reproduces the original
combination for a wide variety of observables!
Compression also works for
fully differential distributions!
Tested on a large number of
processes: jets, Drell-Yan, WW,
W+charm, Z+jets, ….!
Calculations use fast NLO
interfaces:!
1. aMCfast/applgrid for
MadGraph5_aMC@NLO!
2. applgrid for MCFM/
NLOjet++!
Very ﬂexible to redo validation for
any other compressed set

16
Correlations!
The compression algorithm also manages to reproduce the correlations between physical
observables, even for a small number of replicas
Correlationcoefficient
1
0.5
0
0.5
1
Correlation Coefficient for ttbar
ggH tt W+ W- Z
= 40repN
Reference
Compressed
!
Direct consequence of the fact that correlations between PDFs are reproduced

17
Correlations
!
Not an accident: selecting replicas at random fails to reproduce the correlation accurately enough
Correlationcoefficient
1
0.5
0
0.5
1
= 40repN
Reference
Compressed
Random (68% CL)

18
Comparison with Meta-PDFs!
To compare with the available Meta-PDFs in LHAPDF6, we have produced compressed sets based
on MSTW08, CT10 and NNPDF2.3!
Reasonable agreement found for central values and variances, except perhaps small- and at large-x!
Need to redo the comparison when the two approaches use NNPDF3.0, MMHT14 and CT14

19
Summary and discussion!
Our investigations suggest that the Compressed MC PDFs (CMC-PDFs) provide an efficient and easy to use
implementation of the PDF4LHC combination!
A single set with only 25 replicas seems to be enough to reproduce central values and variances of all the LHC
processes, inclusive and differential, that we have explored. Correlations both between PDFs and cross-sections
are also well reproduced!
Our plan is to release publicly the combination and compression codes, so that users can create their own
compressed sets in LHAPDF6 format with the only input the names of the sets to be used!
Some of the advantages of our approach wrt the MetaPDFs are!
1. !Improvement in CPU time (baseline Meta-PDF has 100 eigenvector members)!
2. Streamlined construction of the combined sets with no need of refitting (and thus of testing the
accuracy of the refit, which strongly depends on the PDF sets used) nor to redo the PDF evolution and
reconstruct PDF interpolation grids: for CMC-PDFs only the LHAPDF6 files needed as input!
3. No need of Gaussian/linear approximations: the compression algorithm reproduces the full
probability distribution of the combined set (which in general is non-Gaussian)!
4. No strong need for process-dependent PDF combined sets (like a Higgs-specific Meta-PDFs)!
However we believe that the two approaches can nicely complement each other: The agreement between the
compressed PDFs and the Meta-PDFs, for the same input tests, provide a non-trivial validation of the
combination. It is conceivable that the two of them could be used in LHC analysis.!
To make further progress, we need that:!
1. !PDF4LHC decides which PDF sets (and how) should be used in the combination!
2. This input is used to construct Meta-PDFs and CMC-PDFs!
3. Agree on benchmark tests for LHC processes and test the performance of both methods!

PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets

Similar to PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets (20)

More from juanrojochacon

More from juanrojochacon (18)

PDF uncertainties the LHC made easy: a compression algorithm for the combination of PDF sets