InternshipReport

Multivariate Analysis of the Vector Boson Fusion Higgs Boson
Brendan Marsh University of Missouri August 8, 2016
Ph.D. Student Supervisor: Antonio De Maria
Supervisor: Prof. Dr. Arnulf Quadt
Abstract
A multivariate analysis is presented for the study of the vector boson
fusion (VBF) Higgs boson decaying to a pair of tau leptons. While the VBF
production mechanism of the Higgs is roughly an order of magnitude lower
in cross section than the dominant gluon-gluon fusion mechanism, it is
shown that VBF produces a distinctive signature that is well suited for
detection by multivariate analyses. A number of discriminant variables are
explored in addition to a direct comparison of different machine learning
toolkits. Ultimately, a statistical significance of 7.9 is achieved for detection
of the VBF Higgs boson in this truth level study.

Multivariate Analysis of the Vector Boson Fusion Higgs Boson 1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
. . . . . . . . . . . . . . . . . . . . . . . . . 9
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
. . . . . . . . . . . . . . . . . . . . . . . . . . 14
. . . . . . . . . . . . . . . . . . . . . . . . . . . 14
. . . . . . . . . . . . . . . . . . . . . . . . . . . 15
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Contents
1. Motivation and Background
1.1 The Higgs Boson
1.2 Vector Boson Fusion
1.3 Fully Hadronic Decay Mode
1.4 Background Processes
2. Multivariate Analysis
2.1 Monte Carlo Samples
2.2 Preselection Cuts
2.3 Cut Based Analysis
2.4 Decision Trees
2.5 Adaptive Boosting
2.6 Discriminant Variables
2.6.1 Collinear Approximation
2.6.2 Tau Centrality Product
2.6.3 𝜂 Variables
2.6.4 Tau-Jet Angular Correlations
2.6.5 Fox-Wolfram Moments
2.6.6 MVA Variables
2.7 TMVA Multivariate Analysis
2.8 Scikit Learn Multivariate Analysis
3. Conclusions
3.1 Outlook for VBF Higgs Analysis
3.2 Suggestions for Future Studies
3.3 Thanks!
References

1. Motivation and Background
1.1 The Higgs Boson
Within the context of the Standard Model (SM),
the Higgs mechanism is necessary for the mass
generation of the W and Z gauge bosons. By
invoking a break in electroweak symmetry, the
Higgs mechanism implies the existence of a spin
zero, neutral particle; we know this particle as the
Higgs boson.
For many years, the Higgs remained elusive in
particle detectors. It was not until July 4, 2012 that
CERN announced that both the CMS and ATLAS
experiments at the large hadron collider (LHC) met
the 5𝜎 discovery benchmark for a new boson with a
mass of roughly 125 GeV that was consistent with
a Higgs boson. It seems the Higgs has finally been
found!
Many studies of the Higgs boson are ongoing as Run II of the LHC is currently approaching an
online integrated luminosity of 20 inverse femtobarns. As our studies of the Higgs progress, the vector
boson fusion production mechanism becomes increasingly important as a detection pathway, in CP
violation studies [1], and in other areas.
1.2 Vector Boson Fusion
A standard model Higgs boson may be produced via one of four production mechanisms at the
LHC. The vector boson fusion (VBF) mechanism involves the scattering of two quarks via the
exchange of a W or Z (vector) boson. This pair of vector bosons then fuses to produce a low mass
Higgs boson.
Figure 2 Left: Feynman diagrams of the four Higgs production mechanisms at the LHC, with vector boson
fusion highlighted in red. Right: Corresponding cross section for Higgs production mechanisms.
One can see from the cross section that the gluon-gluon mechanism is roughly an order of
magnitude greater than that of the VBF mechanism for a Higgs of mass 125 GeV [2]. However,
the addition of the two quarks into the final state, visible as highly energetic jets, produces a
Figure 1 The elementary particles of the Standard
Model, labelled with their mass, charge, and spin.

distinctive signature that is lacking in gluon-gluon fusion. In terms of measurable quantities,
VBF events may be recognized by the following characteristics:
• Highly 𝜂 separated jets
• Jets in opposite hemispheres
• High invariant mass of jets
• No central jets above a certain 𝑝$
1.3 Fully Hadronic Decay Mode
The 125 GeV Higgs boson most often decays into a 𝑏𝑏 pair, however this decay mode is not easily
recovered in a sea of 𝑡𝑡 background [3]. The Higgs additionally may decay into a 𝜏(
𝜏)
pair; this is the
decay mode studied in this analysis. Specifically, I investigate the “fully hadronic” decay mode in which
both tau leptons subsequently decay into a tau neutrino and a number of pions, which accounts for
roughly 41% of the branching ratio[2]. A Feynman diagram of the signal process is given below.
Figure 3 The Feynman diagram of the signal process of this study; a Higgs boson production via vector boson
fusion with a subsequent decay into tau leptons, a tau neutrino, and pion.
1.4 Background Processes
A bit like searching for a needle in a haystack, the VBF Higgs process is a rare event that is drowned
out by background processes with similar event characteristics and much higher cross sections. To
detect a small signal in a sea of background, one’s goal is to remove as much of the background as
possible while retaining as many signal events as possible. Thus, it is equally as important to
understand the background processes competing with your signal process as it is important to
understand your signal process. The main background processes relevant to this study are the Z→ 𝜏𝜏
and 𝑡𝑡 processes.
Z→ 𝜏𝜏 + 𝑗𝑒𝑡𝑠
According to the particle data group [2], the Z boson decays into a pair of tau leptons with a
branching ratio of roughly 3.4%. As Z bosons are produced in excess at the LHC, this channel
introduces a large background with the same final state, a pair of tau leptons. Fortunately, there do
exist features of VBF that we expect to differ in the case of Z→ 𝜏𝜏. Foremost, the invariant mass of the
reconstructed taus should reflect the mass of the particle from which it came, although mass
reconstruction can be difficult (section 2.6.1). For VBF taus we expect to see the mass of the Higgs,
roughly 125 GeV, while for the Z→ 𝜏𝜏 channel we expect a peak around 91 GeV. Additionally, the
distinctive jet topology of VBF is not expected in the Z→ 𝜏𝜏 channel.
𝑡𝑡
Top quarks almost always decay into W boson – b quark pairs, with the W boson then emitting a
tau lepton. Thus, given two top quarks it is possible to have two taus in the final state. Therefore 𝑡𝑡
background, also produced in excess at the LHC, poses another background process. However, there
exist a number of features of the 𝑡𝑡 background that make it quite easy to eliminate. Very often in the
,
,

final state of the 𝑡𝑡 background there exist jets originating from b quarks, while this is rare for VBF final
states. Fortunately, there exist “b-tagging” algorithms capable of labelling jets in the detector that most
likely arise from b quarks. Thus, we may cut out events with b jets, leaving Z→ 𝜏𝜏 as irreducible
background. Additionally, we do not expect to find any correlations between the tau decay products
and the missing transverse energy, unlike VBF in which they are heavily correlated.
2. Multivariate Analysis
The basic goal of any multivariate analysis (MVA) is to classify signal events over background
events, with as high of an efficiency as possible, given some input variables for each event. Most
MVAs take a number of input variables and return a single measure of “signal-likeness”, which must
hit a certain threshold to be considered a signal event.
Before diving into the multivariate techniques used for this analysis, the training samples used to
develop and test the analysis will be described, along with the traditional cut based analysis for VBF
and reasons why it can be improved using a multivariate analysis.
2.1 Monte Carlo Samples
Monte Carlo simulations provide a powerful tool for studying stochastic processes. Here, Powheg
and Pythia 8 Monte Carlo generators were used to simulate truth level events for both VBF and the
relevant background processes at a centre of mass energy of 𝑠 = 13 TeV. Using these simulated
events, one may train a multivariate analysis method to be applied to real data. The Monte Carlo
samples used for this study are given below.
It is important to note that this was truth level study only; no reconstruction or trigger level effects
have been incorporated. These effects are non-negligible and should incorporated in future studies.
2.2 Preselection Cuts
A number of cuts may be applied to the events before any classifier is used. Some of these cuts
correspond to limitations of the ATLAS detector (corresponding to events that would not be well
reconstructed in practice) while others are made specifically to remove background events. The
preselection cuts used for this analysis are given below. If any event does not fulfill the criteria, it is
discarded from the analysis.
The transverse momentum of both tau leptons must be at least 20 GeV
to be detected and reconstructed by tau reconstruction algorithms.
The absolute value of 𝜂, the pseudorapidity, of each tau lepton must be
less than 2.5 for good reconstruction in the tracker.
The missing transverse energy should be greater than 20 GeV, as we
expect missing energy from neutrinos in the final state.
𝜏678 p$
> 20 GeV
|𝜂;| < 2.5
MET > 20 GeV

The transverse momentum of the leading and subleading jet should be
greater than 20 GeV to be detected.
B-tagging algorithms can identify jets originating from b quarks, thus b-
tagged jets can be cut to eliminate 𝑡𝑡 background. In truth level studies,
one uses the PDG (Particle Data Group) ID to identify and cut b-jets.
2.3 Cut Based Analysis
The most basic form of classifier, and the one that is often used due to its simplicity and physical
motivation, is a simple cut based analysis. This entails requiring a candidate event to pass a series of
univariate “cuts” which are motivated by knowledge of the signal process. The traditional cuts used to
identify VBF events over background events are given below [4].
VBF produces highly energetic quark jets into the final state, we expect
to see a leading jet with high transverse momentum.
There are two quark jets into the final state, thus the subleading jet
should also have high transverse momentum.
The jets of VBF have characteristically high separation in
pseudorapidity.
The VBF topology exhibits jets that are back-to-back.
The highly energetic jets show a high invariant mass.
The tau leptons should be detected in the central part of the detector in
comparison to the jets. Explicitly, the pseudorapidity of the taus should
lie between the range spanned by the jets.
The cut based analysis has its advantages; it is very simple to implement, requires no “training” like
the multivariate methods, and the rationale for each of the cuts is grounded in physics. However, while
it excels in its understandability, it often lacks the classification power required to recover rare
processes like the VBF Higgs.
The inferiority of the cut based analysis lies in the assumption that each variable can be cut upon
independently of the others when, in fact, the best cut to make on one variable may depend on another,
or even many others. That is, correlations cannot be accounted for. This issue is addressed by
multivariate classification methods like decision trees.
2.4 Decision Trees
Decision trees, like cut based analyses, split events into groups by setting a threshold on some
variable. However, while the cut based analysis only makes a single round of cuts, decision trees
continue to further subdivide groups, separating signal from background more and more at each step
by making the most efficient cut possible. Additionally, the most efficient cuts are calculated
algorithmically from a set of data used to “train” the decision tree.
p$
<=>?
> 40 GeV
p$
8@A<=>?
> 30 GeV
|𝜂<=>? − 𝜂8@A<=>?| > 3
𝜂<=>? ∗ 𝜂8@A<=>? < 0
𝑚EFGHI(EJKLFGHI
> 300 GeV
Jets-Taus Centrality

p$
<=>?
, p$
8@A<=>?
> 20 GeV
No b-tagged jets

Figure 4 A simple decision tree. Here orange represents VBF events while blue represents background events.
At each stage, groups become more purely signal or background by splitting on some variable.
The metric that is normally minimized for each split is the Gini impurity of the current group of
events. It is defined as the probability of incorrectly labelling a random event in the group based on
the known distribution of signal and background within the group. For a binary classification problem,
the Gini impurity for a group of events is given by the following formula:
𝐼O = 𝑛87Q ∗ 1 − 𝜂87Q + 𝑛AQ ∗ 1 − 𝑛AQ
Unlike a cut based analysis, which can only form rectangular signal regions in the variable phase
space, decision trees can be grown to approximate arbitrarily complex decision functions. However,
decision trees, too, are not without their flaws. The intuition of a cut based analysis is lost since the
splits are generated algorithmically. Additionally, it is very easy to grow a tree that is too deep that
begins to train itself to recognize individual points in the training data, becoming artificially complex.
This phenomenon is well known in the field of machine learning, and is commonly known as
“overtraining”. To address this issue, a technique known as boosting is performed as opposed to older
“pruning” methods which grow full decision trees then backtrack and discard unimportant splits.
2.5 Adaptive Boosting
Adaptive boosting, or AdaBoost, is a general method that can be applied to a number of
classifiers, such as decisions trees, to improve reliability, performance, and resistance to
overtraining. In the context of adaptive boosting of decision trees, the single decision tree is replaced
by a “forest” consisting of hundreds of decision trees which are restricted to only a few levels, such
as the one above. As a whole, this forest of decision trees is called a boosted decision tree (BDT),
and the output of the BDT is a weighted sum of the outputs of each individual tree.
Each individual decision tree is called a “weak learner” in the sense that it is only one of many
classifiers in the forest. Here is where the adaptive boosting comes in; each weak learner is trained
iteratively to improve upon the previous one. The first weak learner is trained as a normal decision
tree from the training data. However, the results of the first weak learner are then used to weight the
importance of the training data for the next weak learner; points that were classified correctly receive
small weights while incorrectly classified points receive large weights. In this way, the next weak
learner is trained focusing on points that have not been classified well by the previous weak learner.
This process continues such that each weak learner focuses on correcting mistakes of the last,
improving at each step. The process is visualised below.

Figure 6 A view of the transverse plane depicting the collinear
approximation. The tau neutrinos go collinearly with the tau leptons
such that their sum matches the missing transverse energy.
Figure 5 Training of an AdaBoost classifier. The first classifier trains on unweighted data, then
reweights the data for the next and so on to produce the final classifier.
2.6 Discriminant Variables
When training a BDT, a balance should be found between the number of variable inputs to the
BDT and the performance of the BDT. Additionally, while BDTs are known to handle correlated
variables quite well, it is superfluous to include two strongly correlated variables, only one of which
adds discriminatory power to the classification.
Much of my work this summer was spent investigating variables, both common and newly
devised, to search for new discriminating variables for use in a multivariate analysis. The most
important in the analysis was the ditau mass, calculated via the collinear approximation.
2.6.1 Collinear Approximation
In the case of VBF, the mass of the ditau should correspond to the mass of the Higgs, for Z→ 𝜏𝜏
the mass of the Z boson, and for 𝑡𝑡 we expect no clear peak. Thus, there are good physical motivations
for the use of the ditau mass in our MVA. However, in order to fully reconstruct the ditau one needs
the missing neutrinos. The collinear approximation accounts for the missing neutrinos by making the
following assumptions.
1. The tau neutrinos are perfectly collinear with their associated tau lepton.
2. The missing transverse energy is entirely due to the tau neutrinos.
Under these approximations, the magnitude
of the neutrino momenta becomes completely
determined by the missing transverse energy.
One is then left with a simple matter of
constructing the neutrinos collinearly with the
taus such that the sum of the neutrinos is
precisely the missing transverse energy.
The collinear approximation is not always
applicable; when the tau leptons are emitted
back to back in the 𝜙 plane, it is impossible to
reconstruct the missing transverse energy.
This leads to a simple constraint between taus:
cos ∆𝜙 > −0.99

Historically, the collinear approximation has relied upon using the charged decay products of the
tau leptons, be it either 1-prong or 3-prong decays. However, the decay products may also include a
neutral pion. Recently, tau substructure algorithms have become available that allow for reconstruction
of the entire visible (charged + neutral) tau [5]. One of my first studies was on the marked improvement
in the collinear approximation as a result of using the entire visible tau.
Figure 7 The collinear approximation using the charged tau leptons (left) and the full visible tau leptons (right).
The blue histograms represent VBF and red represents combined backgrounds scaled appropriately. All
distributions normalized to unity, and units are in GeV.
As you can see, there is a remarkable improvement using tau substructure techniques to
reconstruct the visible tau. In future studies, I suggest applying smearing of the transverse momentum
or otherwise modelling imprecision in the detector to see if the collinear approximation remains as
robust as it is in this truth study. Needless to say, this variable made it to the final MVA.
2.6.2 Tau Centrality Product
In the context of VBF topology, centrality has been used as a flag indicating whether or not a tau
lepton is centrally located in the detector with respect to the jets. Explicitly, a tau lepton is central if
its pseudorapidity lies in the range spanned by the leading and subleading jet. To generalize this
binary variable to a continuous variable, which is more powerful in multivariate analyses, the
following definition has been suggested [6].
𝐶; ≔ exp −
𝜂; − 𝜂>6Q
∆𝜂
^
where 𝜂>6Q ≔
𝜂<=>? + 𝜂8@A<=>?
2
, ∆𝜂 ≔ 𝜂<=>? − 𝜂8@A<=>?
A perfectly central tau lepton (with exactly the average 𝜂 of the jets) will have a centrality of one,
while a tau lepton far from the average 𝜂 of the jets will have centrality close to zero. Note that if the
jets are not well separated in 𝜂, the centrality also approaches zero.
The authors of this continuous centrality variable used the centrality of the two taus as independent
variables. However, I found the two variables to have an 88% positive correlation for VBF. By taking
the product of the two tau centralities, a single uncorrelated variable is achieved with greater
separation power than either of the individual centralities.
𝐶cde? ≔ 𝐶;f
∗ 𝐶;g
= exp −
𝜂;f
− 𝜂>6Q
∆𝜂
^
−
𝜂;g
− 𝜂>6Q
∆𝜂
^
Collinear Approximation Ditau Mass (Charged)
0 20 40 60 80 100 120 140 160
Events
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Collinear Approximation Ditau Mass (Visible)
0 20 40 60 80 100 120 140 160
Events
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5

Figure 8 The centrality of the individual tau leptons (left and centre) vs. the product of tau centrality (right).
Given the redundancy of the correlated variables and increased separation power of the product
variable, it was the centrality product variable that made it to the final multivariate analysis.
2.6.3 𝜼 Variables
Variables explicitly related to the pseudorapidity of the leading and subleading jets are common in
analyses of the VBF Higgs, including the cut based analysis already presented. On the surface, these
variables seem well suited to multivariate analysis as well given their separation power. However, I
found that these traditional VBF variables are highly correlated with the invariant mass of the jets.
Figure 9 ∆𝜂 (centre) and 𝜂<=>? ∗ 𝜂8@A<=>? (right) of the leading and subleading jets, along with their correlations to
the invariant mass of the jets (left).
Given the strong correlations within this group of variables, I was not surprised to find that
eliminating ∆𝜂 and 𝜂<=>? ∗ 𝜂8@A<=>? from the MVA led to no decrease in performance of the BDT. The
invariant mass of the jets displayed the greatest separation power (see figure 11), thus, despite their
prevalence in traditional VBF studies, I have chosen to exclude ∆𝜂 and 𝜂<=>? ∗ 𝜂8@A<=>? from the final
analysis.
2.6.4 Tau-Jet Angular Correlations
The Higgs boson is a spin 0 particle; Z bosons are spin 1 particles. My Ph.D. supervisor and I were
interested in whether or not this difference in spin quantum number manifests itself in angular
correlations between the tau leptons themselves or between tau leptons and the leading and
subleading jet. A number of variables were investigated, boosted into different reference frames,
probing any angular correlations.
Tau 0 Centrality
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Events
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Tau 1 Centrality
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Events
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Tau Centrality Product
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Events
0
0.05
0.1
0.15
0.2
0.25
0.3
Jets dEta
0 1 2 3 4 5 6 7 8 9
Events
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Jets Eta Product
15− 10− 5− 0 5 10
Events
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18

Jets Plane / Taus Plane Angle
0 0.5 1 1.5 2 2.5 3
Events
0
0.005
0.01
0.015
0.02
0.025
0.03
Jets Plane Eta
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Events
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Selected Angular Variables
Taus ∆𝑅 The ∆𝑅 separation of the two tau leptons.
Taus 𝜙 Centrality The same as the continuous tau centrality
variable, but in 𝜙 instead of 𝜂.
Jets-Taus Plane Total angle between the two planes formed by
Angle the tau leptons and the jets.
Jets Plane 𝜂 𝜂 of the normal vector to the plane formed by
the two jets.
The angular relationships amongst the tau leptons and jets, beyond the expected VBF jet topology,
seems to be subtle if existent at all. While the ∆𝑅 of the taus above shows modest separation, inclusion
in the MVA yielded no improvement, and unfortunately the angle between the tau plane and jet plane
seems indifferentiable between VBF and background. Boosting to various center of mass reference
frames generally had little effect on separation power.
2.6.5 Fox-Wolfram Moments
The Fox-Wolfram moments are a set of event descriptors that are currently under investigation for
use in replacing traditional cuts with these more advanced metrics [7]. The moments arise from
superpositions of spherical harmonics, defined as follows.
𝑊7,E
k
∶=
Above, the sum goes over any number of objects in the event (such as the leading and subleading
jet for the VBF topology), Ω7,E corresponds to the total angle between the i’th and j’th objects, and 𝑃<
are the Legendre polynomials. The weight term 𝑊7,E
k
may take many forms, as given above.
A preliminary study of the Fox-Wolfram moments in the analysis of VBF has shown that the
moments display considerable separation power, however, when included in the multivariate analysis
have not improved the classification efficiency. Included below are plots of two sets of Fox-Wolfram
moments. On the left, only the leading and subleading jets were considered, and the best weight was
found to be the unit weight. On the right, both tau leptons are also included as objects into the moment
calculations, for which the transverse momentum weighting scheme was found to be best.
Tau 1 Phi Centrality
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Events
0
0.1
0.2
0.3
0.4
0.5
Taus dR
0.5 1 1.5 2 2.5 3 3.5
Events
0
0.01
0.02
0.03
0.04
0.05
0.06

100−
80−
60−
40−
20−
0
20
40
60
80
100
ditauMass
mjj sumPT
PTsum
tausCentrality
ditauMass
mjj
sumPT
PTsum
tausCentrality
Correlation Matrix (signal)
100
100 -12 26 42
-12 100 21 -28
26 21 100 -2
42 -28 -2 100
Linear correlation coefficients in %
𝑚;f(;g

𝑚EFGHI(EJKLFGHI
𝐶;f
∗ 𝐶;g
p$
<=>?
+ p$
8@A<=>?
p$
<=>?(8@A<=>?
100−
80−
60−
40−
20−
0
20
40
60
80
100
ditauMass
mjj sumPT
PTsum
tausCentrality
ditauMass
mjj
sumPT
PTsum
tausCentrality
Correlation Matrix (background)
100 2 2
100 19 38 39
2 19 100 35
2 38 35 100 -2
39 -2 100
Linear correlation coefficients in %
Figure 10 The first four Fox-Wolfram moments considering only jets, with a unit weighting (left). The first four
Fox-Wolfram moments considering jets and tau leptons, with transverse momentum weight (right).
While only the first four moments are displayed here for brevity, the odd and even moments
are highly correlated though distinct. Unfortunately, my time has run short to fully investigate
the Fox-Wolfram moments as potentially useful discriminating variables in the multivariate
analysis. For future studies, I would suggest to explore the “modified” Fox-Wolfram moments
which are invariant to Lorentz boosts, and explore any correlations that may exist between the
moments and the MVA variables already in use.
2.6.6 MVA Variables
The final list of variables for use in the multivariate analysis was pruned down starting with roughly
ten variables that showed the strongest separation power. After identifying correlations and removing
variables that led to no improvement in classification efficiency, the following variables remain in the
final analysis.
The invariant mass of the ditau, reconstructed via the collinear approximation using
the full visible tau leptons.
The invariant mass of the leading and subleading jets.
The product of the centrality of the two tau leptons.
The scalar sum of the transverse momenta of the leading and subleading jets.
The transverse momentum of the vector sum of the leading and subleading jets.

Figure 11 Discriminatory variables for the multivariate analysis. The blue histograms represent VBF and red
represents combined backgrounds scaled appropriately. All distributions normalized to unity, masses and
momenta are in units of GeV.
2.7 TMVA Multivariate Analysis
This multivariate analysis was performed at a centre-of-mass energy of 𝑠 = 13 TeV and at an
integrated luminosity of 20 inverse femtobarns, corresponding roughly to current Run II conditions at
the LHC. The ROOT analysis framework (or my preference, the python adaptation PyROOT) provides
a toolkit for multivariate analysis known as TMVA [8]. This toolkit was utilized to train a boosted
decision tree using the discriminant variables presented in section 2.6.6. I was interested in comparing
the performance of TMVA with the well-known python machine learning library Scikit Learn. To this
end, a boosted decision tree was optimized in TMVA and compared with an identically parameterized
boosted decision tree trained in Scikit Learn.
Optimization of the BDT parameters in TMVA was
performed by performing single scans over parameters
like the number of trees or tree depth. A full multivariate
sweep over parameter settings and variables was simply
too computationally timely and out of the scope of this
project. Should one like to take this analysis to the next
step, I would recommend performing such a multivariate
sweep over BDT parameter settings. The final
configuration of the BDT parameters that were found to be
important are given to the left.

When training and testing any multivariate method, one must be careful to weigh the training data
correctly; while we have a similar amount training data for both the VBF and background processes,
in reality the number of background events is much larger than the number of signal events. Thus a
weight needs to be applied to events from each process to correct for their relative abundance.
𝑊 =
opqr.
ℒtu
=
ℒpqr.u
ℒtu
where ℒv
𝜎 is provided by the Monte Carlo sample.
Cross sections were determined for each Monte Carlo sample from the TWiki cross section
summaries of the MC15 samples for Run II analyses. Given these cross sections and an integrated
luminosity of 20 inverse femtobarns, the expected number of events may be calculated. Additionally,
the percentage of events that pass the preselection criteria presented earlier may be calculated per
sample, and then applied to determine the expected number of events after preselection.
Process Cross Section pb)x
Events at ℒ = 20fb)x Events (Preselected)
VBF 9.993941 ∗ 10)^ 1,999 398
Z→ 𝝉𝝉 1.950632 ∗ 10~ 39,012,642 1,148,098
𝒕𝒕 4.515915 ∗ 10^ 9,031,830 26,394
As was expected from eliminating b-tagged jets, the 𝑡𝑡 background is more than decimated, leaving
Z→ 𝜏𝜏 as the main background. Roughly speaking, the signal to combined background ratio is a
staggering 1 3000!
The metric for defining the optimal cut value of the classifier is the statistical significance defined
as follows, where “s” is the number of signal events and “b” the number of background events. For a
Poisson random variable, the standard deviation is defined as the square root of the total number of
events, 𝑠 + 𝑏. Then, the following statistical significance measures the ratio of signal events relative
to one standard deviation.
Statistical Significance ∶=
𝑠
𝑠 + 𝑏
≈
𝑠
𝑏
for b ≫ s
Thus, this definition of the statistical significance can either be interpreted as the number of signal
events relative to one standard deviation or, if b is much larger than s, as is usual, the number of signal
events over the background fluctuation level.
The TMVA output classifier along with the optimal cut value after training a boosted decision tree
using the parameters given above is shown below.
BDT response
0.15− 0.1− 0.05− 0 0.05 0.1 0.15 0.2
dx/(1/N)dN
0
2
4
6
8
10
12
14
16
18 Signal (test sample)
Background (test sample)
Signal (training sample)
Background (training sample)
Kolmogorov-Smirnov test: signal (background) probability = 0.008 (0.016)
U/O-flow(S,B):(0.0,0.0)%/(0.0,0.0)%
TMVA overtraining check for classifier: BDT
Cut value applied on BDT output
0.15− 0.1− 0.05− 0 0.05 0.1 0.15 0.2
Efficiency(Purity)
0
0.2
0.4
0.6
0.8
1
Signal efficiency
Background efficiency
Signal purity
Signal efficiency*purity
S+BS/
For 398 signal and 1174098 background
isS+Bevents the maximum S/
7.9024 when cutting at 0.1453
Cut efficiencies and optimal cut value
Significance
0
1
2
3
4
5
6
7
8

The final statistical significance of the classifier reaches 7.9, albeit the significance curve becomes
noisy most likely due to statistical fluctuations with such heavily weighted background events. By any
interpretation, the statistical significance can be said to be roughly 6 at minimum. The full interpretation
of the outcome will be discussed in the conclusion.
2.8 Scikit Learn Multivariate Analysis
Scikit Learn (SKL) is a free, general machine learning library for python [9]. Given its popularity
and ease of use, I was interested to see how SKL compares to TMVA in terms of final classifier
efficiency, ease of use, and configurability.
SKL supports all of the machine learning methods implemented by TMVA and many more, and in
the case of boosted decision trees supports many of the same configuration options. However,
unlike TMVA, SKL does not directly provide the user with plots (classifier output distributions,
optimum cuts, correlation matrices) via a nice GUI. Code had to be written to randomize training and
test samples, for viewing the output classifier distribution, for calculation of the maximum statistical
significance, and other tasks.
For a direct comparison of TMVA and SKL, a boosted decision tree was trained in SKL with
identical parameters as was done for TMVA. The resulting output classifier is given below.
Max. Statistical Significance: 3.5
SKL performed worse in many regards. As
can be seen by the shape of the output
classifiers, there exists much more overlap
between signal and background even when
trained identically to TMVA, leading to roughly
only half the statistical significance, seen as
the green line, not to scale, that was achieved
by TMVA. Additionally, SKL took almost five
times longer to train the BDT.
3. Conclusions
3.1 Outlook for VBF Higgs Analysis
Overall, the development of a multivariate analysis for the detection of a VBF Higgs boson
decaying to a pair of tau leptons with subsequent hadronic decays was quite successful. A theoretical
basis was developed to understand the signal process and main backgrounds at play. With only a few
basic preselection cuts, the vast majority of 𝑡𝑡 background was eliminated, leaving the Z→ 𝜏𝜏 process
as the main background. From knowledge of the underlying physics, a number of candidate
discriminant variables were explored for use in the multivariate analysis. Deserving of special attention
is the reconstructed ditau mass using the collinear approximation, which has shown very promising
improvements in mass resolution with the introduction of tau substructure reconstruction algorithm.
Some of the variables typically associated with vector boson fusion, such as the distinctively large
separation in pseudorapidity of the leading and subleading jet, were found to be highly correlated and
did not make it into the final analysis. Both TMVA and Scikit Learn were used to train boosted decision

trees; TMVA provided faster results with better classification power, and a convenient interface for
producing plots. The final statistical significance of the VBF signal reached 7.9.
Many aspects of the study, including the final statistical significance, must be kept in context. First
and foremost, all aspects of this study were calculated on purely the truth level, no trigger level effects
were accounted for, no detector effects beyond simple preselection cuts on pseudorapidity ranges
accounted for, and no reconstruction level effects were considered. These effects may pose important
effects that should be taken into account in further analyses. Additionally, every algorithm, in particular
the b-tagging, tau ID and tau substructure algorithm, has an associated efficiency. On truth level, these
efficiencies are not modelled and will further decrease performance on the reconstruction level.
Nevertheless, I hope that this multivariate analysis serves as a useful proof of concept for a full scale
multivariate analysis in which all of the above issues are addressed. Finally, I hope this study has
provided insight into the nature of the vector boson fusion production pathway of the Higgs and into
associated variables that may be used in the analysis.
3.2 Suggestions for Future Studies
The collinear approximation performed surprisingly, perhaps suspiciously, well once the entire
visible tau was used as opposed to the charged tau products. It is possible that the collinear
approximation is in fact a valid approximation much of the time, however, I have strong suspicions that
it will not work as well on reconstructed data. One way this could be studied still within a truth study is
by “smearing” (adding zero mean Gaussian noise) to the transverse momentum of all objects in the
event to simulate reconstruction inaccuracy and observe how well the collinear approximation holds
up. Additionally, one could test just how collinear the neutrinos are with their respective tau leptons
explicitly by studying the ∆𝑅 between the neutrino and tau on the truth level.
While there were over 750,000 Z→ 𝜏𝜏 events, and over 6,000,000 𝑡𝑡 events, in the Monte Carlo
samples, only about 40,000 total background events survived preselection cuts, then only half of those
events were used to train the boosted decision tree while the other half was used for testing. In
comparison, over 300,000 VBF events make it past preselection to the multivariate analysis stage.
Although the initial number of events is very large for the background processes, I could have actually
used far more while training the BDT. For further Monte Carlo studies, I would suggest increasing the
statistics at least for the Z→ 𝜏𝜏 background to at least a couple millions of events to ensure that enough
events make it past preselection to the BDT training.
The Fox-Wolfram moments have shown promising separation power, and may be very powerful
given a correct tuning to the VBF topology. In this study, moments calculated using just the leading
and subleading jet were experimented with in addition to a few studies using both the jets and the two
tau leptons. Further analyses may explore different combinations of objects to use in the moments,
perhaps even a third jet or no jets at all, in addition finding the optimal weighting term to use.
Additionally, there exist modified Fox-Wolfram moments that are invariant to Lorentz boosts which
may provide more clear results. In any case, it will need to be demonstrated the Fox-Wolfram moments
provide new information about the event that is not contained in the five variables presented for the
analysis in this study if they are to be useful in a multivariate analysis.
3.3 Thanks!
I can’t express my gratitude enough for the opportunity to study here in Göttingen for the
summer, it has been an eye opening and truly enjoyable experience to live abroad and get a taste of
particle physics. To everyone within the institute, thank you for your kindness and help over the
summer; you’re all brilliant physicists and even better people. Finally, I have to thank my Ph.D.
student supervisor Antonio De Maria for organizing a great project for me to work on, for his help
whenever it was needed, and his fantastic taste in music.

References
[1] “Test of CP Invariance in vector-boson fusion production of the Higgs bson using the Optimal
Observable method in the ditau decay channel with the ATLAS detector”.
arXiv:1602.04516v1
[2] K.A. Olive et al. (Particle Data Group), Chin. Phys. C, 38, 090001 (2014).
[3] “Search for the 𝑏𝑏 decay of the Standard Model Higgs boson in associated (W/Z)H
production with the ATLAS detector”. arXiv:1409.6212v2
[4] “Prospects for the Search for a Standard Model Higgs Boson in ATLAS using Vector Boson
Fusion”. arXiv:hep-ph/0402254v1
[5] “Reconstruction of hadronic decay products of tau leptons with the ATLAS experiment”.
arXiv:1512.05955
[6] “Evidence for the Higgs-boson Yukawa coupling to tau leptons with the ATLAS detector”.
arXiv:1501.04943
[7] “Fox-Wolfram Moments in Higgs Physics”. arXiv:1212.4436
[8] A. Hoecker, P. Speckmayer, J. Stelzer, J. Therhaag, E. von Toerne, and H. Voss, TMVA -
Toolkit for Multivariate Data Analysis, PoS ACAT 040 (2007), arXiv:physics/0703039
[9] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.

InternshipReport

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Similar to InternshipReport

Similar to InternshipReport (20)

InternshipReport