Tesi

ALMA MATER STUDIORUM
UNIVERSITÀ DI BOLOGNA
TESI DI LAUREA
in
Computational Statistics
SCUOLA DI ECONOMIA, MANAGEMENT E STATISTICA
Corso di Laurea Magistrale in Scienze Statistiche - curriculum Statistico-Informatico
Particle identification in ALICE:
a mixture model approach
CANDIDATO: RELATORI:
LUCA CLISSA PROF.SSA ANGELA MONTANARI
0000733102 DOTT. FRANCESCO NOFERINI
Appello ottobre 2016
Anno Accademico2015/2016

iii
“Physical research has clearly and definitely shown that the
common element underpinning the consistency observable in the
overwhelming majority of natural processes, whose regularity
and invariability have led to the establishment of the postulate
of universal causality, is chance.”
(Erwin Schrödinger)

iv
Se anche parlassi le lingue degli uomini e degli angeli,
ma non avessi la carità,
sarei un bronzo risonante o un cembalo squillante.
Se avessi il dono della profezia
e conoscessi tutti i misteri e tutta la scienza
e avessi tutta la fede in modo da spostare le montagne,
ma non avessi la carità,
non sarei nulla.
Se distribuissi tutti i miei beni per nutrire i poveri,
se dessi il mio corpo per essere arso,
e non avessi la carità,
non mi gioverebbe a nulla.
La carità è paziente,
è benigna la carità;
la carità non invidia, non si vanta,
non si gonfia, non manca di rispetto,
non cerca il proprio interesse, non si adira,
non tiene conto del male ricevuto,
ma si compiace della verità;
tutto tollera, tutto crede,
tutto spera, tutto sopporta.
La carità non verrà mai meno.
Le profezie scompariranno;
il dono delle lingue cesserà, la scienza svanirà;
conosciamo infatti imperfettamente,
e imperfettamente profetizziamo;
ma quando verrà la perfezione, sparirà ciò che è imperfetto.
Quando ero bambino, parlavo da bambino,
pensavo da bambino, ragionavo da bambino.
Da quando sono diventato uomo,
ho smesso le cose da bambino.
Adesso vediamo come in uno specchio, in modo oscuro;
ma allora vedremo faccia a faccia.
Ora conosco in parte, ma allora conoscerò perfettamente,
come perfettamente sono conosciuto.
Ora esistono queste tre cose: la fede, la speranza e la carità;
ma la più grande di esse è la carità.
S. Paolo – Prima lettera ai Corinzi 13,1

v
Oh, Signore,
fa’ di me lo strumento della Tua Pace;
Là, dove è l’odio che io porti l’amore.
Là, dove è l’offesa che io porti il Perdono.
Là, dove è la discordia che io porti l’unione.
Là, dove è il dubbio che io porti la Fede.
Là, dove è l’errore che io porti la Verità.
Là, dove è la disperazione che io porti la speranza.
Là, dove è la tristezza, che io porti la Gioia.
Là, dove sono le tenebre che io porti la Luce.
Oh Maestro,
fa’ ch’io non cerchi tanto d’essere consolato, ma di consolare.
Di essere compreso, ma di comprendere.
Di essere amato, ma di amare.
Poiché:
è donando che si riceve,
è perdonando che si ottiene il Perdono,
ed è morendo, che si risuscita alla Vita eterna.
San Francesco d’Assisi

vii
Abstract
In the framework of the ALICE experiment the task of particle
identification (PID) is carried out majorly through frequentist analysis,
although a Bayesian approach was adopted in the earliest stages of the
experiment in order to combine the information from the different detectors.
Thus, currently, the extraction of the raw yields of particles is achieved by
means of either a fit on the inclusive distributions, i.e. considering all the data
points simultaneously, or a track-by-track analysis that consists in imposing
some selection cuts inasmuch to classify each signal into one of the particle
species of interest.
The aim of this thesis is, based on simulated data, i) to develop and test
further a recently published proposal1 for an iterative algorithm for PID that
carries the advantages of the statistical approach, though extending its usage to
a track-by-track analysis and ii) to explore the implementation of a Bayesian
method built on the former technique. The rationale beneath the iterative
procedure is to apply the Bayesian definition of probability to compute at each
step some weights, which are then used to fill a histogram representing the
distribution of an observable of interest for each particle species. Hence, no
actual classification is performed and the bin relative to each single track is
incremented by an amount defined by the corresponding weight’s value for
every species. The reconstruction of each single track enables the application of
the algorithm to a wider set of analyses (e.g. the study of resonances by means
of their decays). On the other hand, the method inherits the benefits of the
traditional statistical approach, thus allowing to optimize the trade-off between
efficiency and contamination which is very troublesome when the track
selection is based on cuts.
1 (ALICE Collaboration,2016)

ix
Acknowledgements
I would like to thank all the people that have helped me in the
achievement of this degree, from the start of my journey in the University of
Bologna to the final writing of my dissertation. All them, with different
contribution, have participated to the successful completion of this academic
title.
In particular, I would like to address a really big thank you to my
supervisor, Professor Angela Montanari, for creating a dense network of
collaborators in order to accommodate my inclinations regarding the choice of
the topic, for her professional help, for the precious corrections and, most of all,
for her human support. A huge thank you is also owed to my external
supervisor, Doctor Francesco Noferini, who guided me throughout the
development of this work with his inspiring insights, his useful advices and his
patient explanations.
Another huge thanksgiving has to be addressed to Andrea, Daniele,
Elisa, Federico, the two Francesca and Laura for sharing their days with me
during my period in Geneva. Among CERN’s staff then, a special mention goes
to Manuel who has made me really feeling home, acting with the kindness of an
elder brother and the wisdom of a master.
All them have somehow participated to the realization of this work and
with their strong interaction have contributed to give this thesis a different
flavour.
Despite this, the responsibility of any eventual mistake present here is
solely mine.
Furthermore, I would like to express my gratitude to all the people that
have sustained me in these years, both spiritually and materially. A very big
thank you goes to all my family for their support and comprehension and for
enabling me, each one according to his or her capability, to pursue this target. A
particular mention goes to Rocco for his linguistic suggestions.
Finally, a big thank you also goes to all my friends, those who I had
before starting the university and those amazing people I have met during the
University. I have learnt a lot from their determination and their dedication to
duty.

x
In conclusion, I have a special, truly marvellous thank you which this
margin is too narrow to contain.

xiii
Table of Contents
Abstract......................................................................................................................................................................vii
Acknowledgements................................................................................................................................................. ix
Table of Contents ...................................................................................................................................................xiii
1. Introduction ....................................................................................................................................................... 1
1.1 Context overview..................................................................................................................................... 1
1.2 Aim and purposes.................................................................................................................................... 5
2. Problem modelling........................................................................................................................................... 7
2.1 Simulated data.......................................................................................................................................... 8
2.1.1 Real spectra ......................................................................................................................................... 8
2.1.2 Flat simulation.................................................................................................................................... 9
2.2 The model ................................................................................................................................................10
3. A frequentist approach .................................................................................................................................13
3.1 Estimation method................................................................................................................................13
3.2 The analysis.............................................................................................................................................17
3.2.1 Real spectra .......................................................................................................................................18
3.2.2 Flat-ratio simulation.......................................................................................................................27
3.2.3 Different set of priors.....................................................................................................................29
3.3 Analysis of the resonances .................................................................................................................31
3.3.1 The meson ϕ .....................................................................................................................................31
3.3.2 Results.................................................................................................................................................32
3.4 Limitations and outlook ......................................................................................................................34
4. A Bayesian approach .....................................................................................................................................37
4.1 Estimation method................................................................................................................................37
4.1.1 Approach I: fix parameters...........................................................................................................38
4.1.2 Approach II: nuisance parameters .............................................................................................42
4.2 Results.......................................................................................................................................................43
4.2.1 Convergence evaluation and burn-in period ..........................................................................43

xiv
4.2.2 Posterior distributions...................................................................................................................45
4.2.3 Reconstruction of the spectra......................................................................................................48
4.3 Limitations and outlook ......................................................................................................................50
5. Discussion.........................................................................................................................................................53
Appendix....................................................................................................................................................................55
5.1 Real spectra .............................................................................................................................................55
5.2 Flat-ratio simulation.............................................................................................................................57
Bibliography .............................................................................................................................................................63

1
1. Introduction
Statistics is a powerful tool employed as a support in many different
areas. A peculiar, very challenging field of application is experimental physics,
in which the rise of ever new problems offers the possibility, on one side, to test
how current statistical techniques behave in situations where extreme
precision is needed, on the other to stimulate the research beyond the limits of
present methodologies and towards the development of different, innovative
solutions.
1.1 Context overview
Experimental physics arose as an independent branch of physics in the
early 17th century with the intent of observing physical processes and of
conducting experiments. These observations were then adopted in order to
formulate hypotheses about the laws that rule natural phenomena and to
validate those theories, as suggested by the so-called scientific method due to
Galileo Galilei. Following his example, many other eminent scientific
personalities, such as Newton, Kepler and Pascal, have contributed with their
own works to the establishment of this philosophy which is still at the basis of
modern sciences. The building blocks of the prosperity of this approach are to
be found in the numerous and great results it brought to in many areas, from
classical and statistical mechanics to thermodynamics and electromagnetism.
Within this paradigm, and thanks to the work of scientists like Ernest
Rutherford and Niels Bohr (together with many others), the new discipline of
nuclear (and later on sub-nuclear) physics born at the beginning of the 1900.
The success of the studies in this field and the constant new discoveries, some
anticipated, others totally unexpected and somewhat bewildering, opened the
doors to the exploration of a new and very broad branch of physics that has
almost monopolised the field of experimental physics, addressing the research
towards particle physics from there onwards. After more than a century, the
status of the studies in this area is still far from an exhaustive knowledge of all
the elementary particles that form the matter and the fundamental principles
that govern their interaction, therefore many experiments are still ongoing,
both exploiting natural phenomena, like cosmic rays, and using the most
innovative technologies in order to build laboratories in which it is possible to
reproduce the physical conditions of interest.
The CERN’s (European Council for Nuclear Research) laboratories are
part of this second strand and make use of the world’s biggest accelerator

2
complex (Figure 1) in order to study
high-energy collisions of charged
particles. The idea of the creation of a
European laboratory came up after
Second World War with the purpose of
uniting European scientists and sharing
the increasing costs of a world-class
research programme. Thanks to the
support of renowned scientists, like the
Italian Amaldi and the French De
Broglie, the proposal was put forward
and on 17th May 1954, the first shovel of earth was dug on the Meyrin site, in
the western suburb of Geneva. After 3 years, in 1957, CERN’s first accelerator,
the Synchrocyclotron, was finished and started its operation. Nowadays, CERN
has become the biggest particle physics laboratory in the world and since 2008
it includes, among its various facilities, the longest particle accelerator ever
built, the Large Hadron Collider (LHC), which consists of a 26.7-kilometer two-
rings tunnel made of superconducting magnets with several accelerating
structures aiming at increasing the
energy of the particles along the way
(Figure 2). Inside the accelerator, two
high-energy particle beams, made of
either protons, p, or lead-ion, Pb,
travelling in opposite directions are
accelerated, through the strong
magnetic field generated by
superconducting magnets, up to close to
the speed of light. Once they have acquired the desired energy, the beams are
addressed towards the interaction points in which they are squeezed by means
of a very strong magnetic field in order to increase the chances for them to
collide to one another. Along the LHC path, there are four major interaction
points in which the four biggest LHC’s experiments are hosted (ATLAS, CMS,
LHCb and ALICE). As far as this thesis is concerned, only the latter experiment
will be dealt with in more detail.
ALICE, A Large Ion Collider Experiment (Figure 3), is a 26 m long, 16 m
high, and 16 m wide heavy-ion detector in the Large Hadron Collider ring. It sits
in a vast cavern 56 m below the ground and has been designed specifically to
study heavy-ions interactions. ALICE assembles together several devices which
measure different properties of the particles produced. The ALICE detector
system (Figure 4) is composed of a central part (central barrel) and a muon
spectrometer that covers the region far from the point in which the initial
Figure 2: LHC accelerator
Figure 1: CERN’s accelerator complex

3
interaction takes place. The main central barrel detectors are, from small to
large radii, the Inner Tracking System (ITS), the Time Projection Chamber
(TPC), the Transition Radiation Detector (TRD) and the Time Of Flight system
(TOF). This study is dedicated to the
physics of a specific state of the matter
called quark-gluon plasma through the
observation of high-energy heavy-ion
collisions, in the view of understanding
how the matter as we see it nowadays
has formed. The common matter that we
bump into more frequently in today’s
universe is made up of atoms, which in
turn are composed by protons, electrons and neutrons. In 1995, the research on
the form factor of protons conducted by Robert Hofstadter at the Stanford
Linear Accelerator (SLAC) have shown that protons themselves are not
elementary particles (the same was observed shortly after also for neutrons).
On the contrary, further studies have then proven that they are in fact a bound
state of other particles called quarks2, which are bound together by some
mediators of the strong nuclear interaction named gluons, forming the nucleus.
Hence, we now deem that quarks constitute the fundamental building blocks
that make up ordinary matter along with leptons. In nature, the bonds between
quarks of the same composite particle are permanent, so as to confine them
inside its structure, thus making it impossible to see them separately.
Nevertheless, current theories suggest that it has not always been like that.
Indeed, quarks and gluons are in principle free to move as long as they are close
to each other. However, as soon as one of them tries to walk away, the laws of
strong nuclear interaction prevent them from escaping (this condition is
referred to as confinement or asymptotic freedom). Therefore, they are forced
2 The name quark was introduced by Gell-Mann and was inspired by a James Joys’
poetry.
Figure 4: ALICE detectors system
Figure 3: ALICE experiment. On the left the hall in which the complex of detectors is set is shown,
while on the right there is the corresponding building located on the surface.

4
to cluster together in either triplets made by combinations of quarks and
antiquarks resulting in integer electric charge (baryons), or couples of quarks
and antiquarks (mesons).3 However, it is believed that the extreme conditions
of energy density and temperature in the early universe may have contrived
protons and neutrons to "melt", freeing the quarks from their bonds with the
gluons and forming the quark-gluon plasma. Therefore, the collisions inside the
accelerator are tuned in such a way to generate temperatures more than 105
times hotter than the core of the Sun, so allowing to recreate a situation similar
to that about 1 μs after the Big Bang. Under these extreme conditions, the
quark-gluon plasma is created, making possible the study of strongly
interacting particles at extremely high-energy densities and of the way
ordinary matter was created from this “quark-gluon primordial soup” as the
system expands and cools down. In this regard, some evidences of the existence
of a deconfined phase of the matter have been found in the Super Proton
Synchrotron (SPS) at CERN and in the Relativistic Heavy Ion Collider (RHIC) at
the Brookhaven National Laboratory.
In order to get valid
and reliable information from
these collisions, it is
fundamental to reconstruct
the types of particle produced
(or particle species) and the
multiplicity of each particle
type starting from the signals
measured by the various
devices. This task is usually
referred to as particle
identification (PID) and is
conducted differently according to the specific physical phenomena one may be
more interested in and to which detectors one is including in the analysis. In
ALICE the central barrel detectors provide complementary PID information and
the capability to separate particle species in different momentum4 intervals. At
intermediate momenta (pT ≲ 3–4 GeV/c), a track-by-track separation of pions,
kaons and protons is made possible by combining the PID signals from different
3 The rules to which these interactions have to submit, and in turn the allowed
combinations, are describedin the quark model theory developed independently by the
physicist Murray Gell-MannandGeorge Zweig in1964.
4 Generally speaking the momentum is the relativistic four-vector given by the energy
of the system dividedby the speed of light (temporal coordinate) and the 3 components
of the classical momentum (spatial coordinates).However,inhere it was meant just the
transverse momentum, pT, i.e. the projection of the momentum relative to the plane
perpendicular to the beam’s direction.
Figure 5: ALICE control room. In this place detectors
experts work all day long in order to guarantee the correct
functioning of the detectors and ensuring that the data
taken are valid.

5
detectors. At higher momenta, statistical reconstruction based on the
relativistic rise of the TPC signal can be performed for PID. Given the wide
range of momenta covered, ALICE has the strongest particle identification
capabilities of any of the LHC experiments. For the analysis here presented the
detectors taken into account are:
i) time projection chamber (TPC): it is the main tracking device of
ALICE and is made of 159 read-out pad rows surrounded by a
sensitive volume of gas so as to allow, by means of a
combination of electric and magnetic fields, a three-dimensional
reconstruction of the particle trajectory thanks to the
measurement of the specific energy loss (dE/dx). The
distribution of the signals in this detector is Gaussian with a
variance in the order of the 5-8% of the measured dE/dx values
and an expected value described by the Bethe-Bloch formula;
ii) time of flight (TOF): it is based on Multigap Resistive Plate
Chamber technology that allows to identify the mass of each
particle produced measuring its arrival time and comparing it to
the expected arrival time according to each species hypothesis.
The final distribution of the signals of this detector is similar to a
normal distribution with a heavy right tail and with an intrinsic
resolution of about 80 ps5
.
1.2 Aim and purposes
The aim of this thesis is to generalize the traditional statistical approach
for particle identification available within the ALICE experiment in order to
allow a track-by-track particle identification. In particular, this work is focused
on the development of the method proposed in (ALICE Collaboration, 2016)
relative to the assessment of its performances in presence of systematic errors,
the proposal of general recommendations for a stopping criterion and, finally,
the extension to the analysis of resonances. Furthermore, a bayesian approach
to the solution of this problem is also explored.
In the next chapter, the general problem of particle identification is
detailed more specifically with respect to the interest of the analysis and its
statistical modelling through a mixture model is described. In Chapter 3 a
frequentist approach for the estimation of the mixture model parameters is
presented. Particular attention is given to the advantages of the proposed
technique with respect to the traditional inclusive methods commonly adopted
5 Picosecond: 1 ps = 10-12 s.

6
for this task. The results obtained on the simulated data are also discussed
along with some limitations. Chapter 4, instead, presents a Bayesian framework
for the problem which makes use of a Gibbs Sampling to obtain parameters
estimates. The possible benefits of this approach are examined and then the
results of its application to the data are demonstrated. Finally, the fifth and last
chapter is aimed at the discussion of the two approaches and the respective
pros and cons.

7
2. Problem modelling
Every particle physics experiment is designed to either observe or
reproduce some physical phenomena of interest in order to gather information
about them from the study of their products. For instance, laboratory
experiments make use of accelerators to increase the speed of particle beams
which are then made to collide. The results of these collisions are then analysed
with the aim of comparing theoretical prevision with experimental data, so to
validate and possibly expand our current knowledge of that phenomena.
Therefore, it is clear that the ability to recognize the products of the
interactions is crucial in order to get useful information from these collisions,
so a good particle identification strategy is needed.
As far as ALICE is concerned, beams of either protons or lead-ions are
accelerated throughout the LHC complex. Once they have reached almost the
speed of light, it is no longer possible to increment their velocity, hence the
effect of the acceleration is rather to enhance their energy. In this way the
beams are given the potential to originate production reactions of other
particles. The more energy they have, the more the yield of their interaction
will grow, with the possibility of including ever heavier products as the mass of
the initial particles gets bigger. The beam’s composition and energy are thus
decided according to the purpose of the research. When the desired energy
level is acquired, the two beams are forced to collide into the hall hosting the
experimental equipment. After the central crash, thousands of particles are
produced (such as hadrons, electrons, muons, and photons) and emitted in all
directions. When one of them hits one of the various experimental devices that
surround the interaction point, it produces a signal that is registered and
transmitted to data processing experts in order to be analysed. However,
within the whole set of signals, only some of the tracks detected are generated
directly from the principal interaction and are, therefore, referred to as primary
particles. Indeed, it is possible that some of them, mothers, decay into daughter
particles shortly after they are produced, thus thwarting their direct
observation. Among the hadrons, a special case of this phenomenon is that of
the resonances, particles with very short life time (in the order of magnitude of
10-20/10-23 seconds) which decay due to the strong or the electroweak force
almost immediately. Nevertheless, from the kinematic properties of the
daughters it is possible to deduce the original mass of the mother.
Given this framework, particle identification provides information
about the mass and flavour composition of particle production, i.e. the
distribution of species multiplicities over the different kinematic areas. Hence,

8
for a PID strategy to be effective, it has to satisfy essentially two requirements.
Firstly, to reproduce correctly the spectrum of the signals for each particle
species over the kinematic range considered and, secondly, to reconstruct the
distribution of the yield of unobservable particles starting from the former
results.
2.1 Simulated data
In this thesis, a particle identification strategy is tested by means of a
simulation study. In particular, two different simulations are run. The first one
mimics the measured spectra reconstructed from previous official analyses of
the ALICE Collaboration, while the other reproduces constant ratios π/K and
K/p over the range of transverse momentum, pT.
2.1.1 Realspectra
The real spectra simulation is intended to be a fair representation of the
physical processes under study. Particles abundances and transverse
momentum distributions are thus simulated according to the real data
observed in ALICE for lead-ion (Pb-Pb) collisions at 2.76 ATeV with centrality
10-20%. The toy Monte Carlo is designed to reproduce 100000 collisions, each
generating a big number of particles of different types, precisely pions (π),
kaons (K), protons (p), K zero star and its antiparticle (K0* and 𝐾0∗̅̅̅̅̅), phi (ϕ),
deltas (𝛥++
and 𝛥−−
) and, finally, lambda c and its antiparticle (Λc and 𝛬 𝑐
̅̅̅). In
this context, it is assumed that only pions, kaons and protons may reach the
detector, while for all the other species the decay channels into π, K, p are
simulated as well.
The physical quantities reproduced for each particle generated in the
simulation are:
i) the momentum, p, which is strictly related to the mass and the
energy of the particle. In particular, all the three spatial
momentum components are simulated;
ii) the signal, S, that the track would release in an imaginary
detector reproducing the species separation capability of TPC
and TOF together (expressed in n 𝜎 separation from kaon’s
signal);
iii) the projection of the scattering angle onto the plane
perpendicular to the beam’s direction, 𝜑;
iv) the pseudo-rapidity, 𝜂.
Among these, the most relevant to the analyses presented in this thesis
are the transverse momentum, pT, (i.e. the projection of p onto the plane
perpendicular to the beam’s direction) and the signal. In particular, the latter is

9
simulated from a Gaussian
distribution whose expected value
is species-dependent and the
variance is common for all particle
types. Thus, more specifically,
kaon’s signal is taken as reference
(i.e. the mean value is set to 0
irrespectively of the pT value),
while for pions and protons the
signal is generated in terms of n 𝜎
separation from kaons. This is done
differentially according to two
functions that describe the separation π/K and K/p over pT for a hypothetical
detector combining both TOF and TPC discrimination capabilities (the algebraic
expressions can be found in Figure 6). Precisely, protons are simulated from a
Normal distribution whose
expected value is found
evaluating the K/p separation
function in the corresponding
track’s pT value. The same
happens for pions, with the
exception that the average of
the Gaussian distribution is
generated evaluating the π/K
separation and then changing
the sign. The distribution over
pT of simulated data is shown in
Figure 7 forall the three species.
2.1.2 Flat simulation
Even though the former Monte Carlo provides the best settings to test
the proposed method inasmuch as it reproduces realistic data, a possible
drawback may be that the estimates are indeed not homogeneous, since each pT
bin relies on a possibly different number of observations. Therefore, the
estimates and their uncertainties are expected to be worse in the bins for which
less information is available, i.e. at very low and very high pT. In order to control
how much this issue might affect the results in terms of percentage error and
speed of convergence, a further simulation was run fixing the ratios π/K and
K/p to the values they assume in real spectra for transverse momentum equal
to 5 GeV/c. In this way the estimates are still based on possibly different
amount of observations, however the total number of events is set to 100000 as
to ensure that in the last bin the number of protons, which are the less
Figure 6: Combined TPC&TOF separation
Figure 7: Signal distribution over transverse momentum.
On the negative semi-axis the signal of pions ranges
from very low (low pT) to almost zero (high momenta).
Kaons signal is instead centered around 0 irrespective
of pT. Finally, in the right tail protons signal that moves
away as the pT decreases.

10
numerous at high pT, is
sufficiently large to guarantee
sound estimates (greater than
40000 entries).
A comparison between
species distribution according
to the constant ratio simulated
data (referred to as flat in the
following, dashed lines) and
the true spectra (solid lines) is
shown in Figure 8.
2.2 The model
Let Si be the random variable representing the signal released in a
generic detector by the i-th track, 𝑖 = 1, … , 𝑛, and let indicate with S = {Si}i=1,…,n
the corresponding random vector. Let now denote by {si}i=1,…,n a realisation of
this random vector, i.e. {si}i=1,…,n constitutes the sample of all the signals
detected experimentally, irrespectively of the type of particles that has
produced them. When a track si is observed, no information on the true particle
identity is available, hence, it may be regarded as sampled from a population
which is made by different sub-populations with possibly different relative
proportions (weights). Moreover, the distribution of the variable may change
from component to component. Thus, depending on the signal, that track may
belong to each of the sub-populations with possibly different probabilities. In
light of these considerations, it is possible to express the probability to observe
a specific realisation of the random vector S as a mixture model:
𝑃( 𝒔| 𝜴, 𝜽) = ∏ {∑ 𝜔𝑠𝑝𝑒𝑐 ∙ 𝑓𝑠𝑝𝑒𝑐
𝑑𝑒𝑡.( 𝑠𝑖, 𝜽)
𝑠𝑝𝑒𝑐
}
𝑛
𝑖=1
(𝟏)6
where θ is the vector of parameters, possibly unknown, that determine the
functional form of the detector response distribution for each particle species
and Ω is the vector of weights attached to each population (such that
∑ 𝜔𝑠𝑝𝑒𝑐𝑠𝑝𝑒𝑐 = 1). For more details on mixture models see (McLachlan, 2004).
As far as this thesis is concerned, the analysis considers only three particle
species and assumes the detector responses to be homoscedastic perfect
6 N.B. equation (1) assumes independence among tracks, although this may be
unrealistic inasmuch as particles produced together are certainly somehow correlated;
thence a proper modelling of this dependence canbe introduced.
Figure 8: Flat-ratio VS realistic simulation comparison

11
Gaussian distributions with different expected values. Hence, the information
on the single tracks can be expressed in terms of the individual likelihoods:
ℒ( 𝜳| 𝑠𝑖) = 𝜔 𝜋 ∙ 𝜙( 𝑠𝑖; μ 𝜋, 𝜎2) + 𝜔 𝐾 ∙ 𝜙( 𝑠𝑖; μ 𝐾, 𝜎2) + 𝜔 𝑝 ∙ 𝜙( 𝑠𝑖;μ 𝑝, 𝜎2) (𝟐)
for 𝑖 = 1, … , 𝑛, where 𝜳 = {Ω, 𝜽} and 𝜽 = {μ 𝜋,μ 𝐾, μ 𝑝, 𝜎2
}.
Given this formalisation of the problem, the interest of the study is to maximize
equation (2) with respect to the vector of parameters 𝜳 made by θ and the
mixing proportions ωπ, ωK, ωp, i.e. 𝜳 = {ω 𝜋, ω 𝐾, ω 𝑝, μ 𝜋, μ 𝐾, μ 𝑝, 𝜎}, under the
restriction ∑ 𝜔𝑠𝑝𝑒𝑐𝑠𝑝𝑒𝑐 = 1. However, in here only the weights of each
population are relevant to the analysis, so the others are considered as known
and fixed a priori (alternatively one may decide to treat them as nuisance
parameters).
In the following, the inference about the parameters of this model is conducted
both in a frequentist and in a Bayesian context. Notice that both the estimating
procedure and the assessment have to be repeated inside each pT bin since the
relative proportions of the three species change across transverse momentum.

13
3. A frequentist approach
Currently, within the ALICE experiment, the task of particle
identification is conducted either with an inclusive statistical approach or
through track-by-track methods based on the definition of selection cuts. The
former extracts the average number of particles of each type considering
simultaneously all the produced signals and has the advantage to remove
naturally the contamination among tracks, due to the overlapping of signal
distributions for different particle species, without losing efficiency. However,
in this fitting process the information on the single tracks is not reconstructed,
thus preventing the application of this method for the purpose of many
analyses, e.g. the study of resonances by means of decays. In such cases, a
track-by-track method is required. The techniques of the latter type, instead,
enable to assign the identity to every signal detected as long as it satisfies some
conditions, i.e. imposing some selection cuts onto the raw signal or onto its
transformation, referred to as discriminating variable. For example, let indicate
by ξ = 𝑓(𝑆, 𝑅) the discriminating variable the PID strategy is built on, where S
is the raw signal as expressed by the detector (e.g. a time for the TOF or a
specific energy loss for the TPC) and R is the expected response. For a detector
with Gaussian response, the most used discriminating variable is the n 𝜎
variable, defined as the deviation of the measured signal from that expected for
species Hspec, in terms of the detector resolution:
𝑛 𝜎 𝑑𝑒𝑡
𝑠𝑝𝑒𝑐 =
𝑆 𝑑𝑒𝑡 − 𝔼[𝑆] 𝑑𝑒𝑡
𝑠𝑝𝑒𝑐
𝜎 𝑑𝑒𝑡
𝑠𝑝𝑒𝑐 (𝟑)
where det denote the detector and 𝜎 𝑑𝑒𝑡
𝑠𝑝𝑒𝑐
is the resolution of the detector
relative to species spec. Thus, the n 𝜎 PID approach corresponds to a true/false
decision on whether a particle belongs to a given species. A certain identity is
therefore assigned to a track if this value lies within a certain range around the
expectation (typically 2𝜎 or 3𝜎). In this way, a track may be compatible with
more than one identity, depending on the detector discrimination power.
Moreover, this strategy may lead to the definition of selection cuts which are
either too loose, resulting in low signal purity, or too stringent, implying a loss
of efficiency. In the latter case, in fact, some tracks may not exceed any of the
determined thresholds and hence they are not assigned to any species.
3.1 Estimation method
As aforementioned, this dissertation will deal with a PID strategy which
is intended to extend the advantages of the inclusive statistical approach to a

14
track-by-track method, generalising thus its application to a wider set of
analyses. This is done by moving from a “cuts approach” to a “weights
approach”. The rationale beneath this technique is to combine the information
coming from experimental data, i.e. the particle signal, to the prior belief for
that track to belong to any particle species under study, resulting in an update
of the prior belief itself through the logic of Bayes’ theorem.
𝑃( 𝐻𝑠𝑝𝑒𝑐| 𝑆) =
𝑃(𝑆| 𝐻𝑠𝑝𝑒𝑐 ) ∙ 𝑃(𝐻𝑠𝑝𝑒𝑐 )
∑ 𝑃(𝑆| 𝐻𝑠𝑝𝑒𝑐 ) ∙ 𝑃(𝐻𝑠𝑝𝑒𝑐 )𝑠𝑝𝑒𝑐
(𝟒)
where 𝑃(𝐻𝑠𝑝𝑒𝑐 ) is the prior probability of species spec, 𝑃(𝑆| 𝐻𝑠𝑝𝑒𝑐 ) is the
likelihood of that particular signal and 𝑃( 𝐻𝑠𝑝𝑒𝑐 | 𝑆) is the posterior probability
for that track to belong to species spec given the observed signal S. The
posterior probability is then used as a weight to fill in the histograms of the
spectra of all particle species in the pT bin corresponding to that track. This
process is then iterated using the posteriors as new priors for the next step
until a convergence criterion is met.
It is possible to show that the Bayesian probability used as a weight
allows to mimic perfect PID at statistical level, i.e. the true multiplicity of each
given species is reconstructed without the need of efficiency/contamination
corrections, provided that the number of observations is sufficiently large. In
order to demonstrate this, let consider the following expression:
∑ 𝑓(𝑆𝑖)
𝑛
𝑖=1
= ∑ 𝑁𝑠𝑝𝑒𝑐 ∫ 𝑓(𝑆𝑖) ∙ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 ) 𝑑𝑆
𝑅 𝑆𝑠𝑝𝑒𝑐
(𝟓)
Equation (5) is true in general since it simply translates the sum of
some function of the signal, Si, over the whole sample of tracks into the
weighted sum of the expected signals for each particle flavour, with weights
equal to the species true multiplicities. Then, if the detector response is known
conditional on each particle type under study it is also possible to use the
former formula to compute the left-hand side based on the right-hand side. In
particular, choosing 𝑓( 𝑆𝑖) = 𝑃( 𝐻𝑠𝑝𝑒𝑐 𝑖
| 𝑆𝑖), and replacing equation (4) in the
right-hand side of equation (5), one obtains:
∑ 𝑃( 𝐻 𝑠𝑝𝑒𝑐 𝑖
| 𝑆𝑖),
𝑛
𝑖=1
=
= ∑ 𝑁𝑠𝑝𝑒𝑐 ∫
𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 𝑖
) ∙ 𝑃(𝐻𝑠𝑝𝑒𝑐 𝑖
)
∑ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 𝑖
)𝑠𝑝𝑒𝑐 𝑖
∙ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 )𝑑𝑆
𝑅 𝑆𝑠𝑝𝑒𝑐
(𝟔)
where 𝐻 𝑠𝑝𝑒𝑐 𝑖
indicates the species hypothesis for the i-th track.

15
Thus, it is possible to reconstruct the total sum of posterior
probabilities of each track to belong to the flavour speci conditional on the
signal Si starting from some priors 𝑃(𝐻𝑠𝑝𝑒𝑐 𝑖
) and the species-specific detector
response 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 ). Then, moving the summation over the species into the
integral and rearranging the right-hand side of equation (6) one obtains:
∑ 𝑃( 𝐻𝑠𝑝𝑒𝑐 𝑖
| 𝑆𝑖)
𝑛
𝑖=1
=
= ∫
∑ 𝑁𝑠𝑝𝑒𝑐𝑠𝑝𝑒𝑐 ∙ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐)
∑ 𝑃(𝐻𝑠𝑝𝑒𝑐 𝑖
) ∙ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 𝑖
∙ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 𝑖
)𝑑𝑆
𝑅 𝑆
(𝟕)
At this stage, it is convenient to define 𝐶𝑠𝑝𝑒𝑐 as 𝑃(𝐻𝑠𝑝𝑒𝑐 ) times the
multiplicity of flavour spec, i.e. in terms of improper priors reflecting the
absolute frequencies of particles of type spec expected a priori. In this new
framework, it is possible to resume the set of possible choices for the non-
normalised priors into two wide options. The first case is the one in which
some very accurate information about species multiplicities is available a priori
and, therefore, the choice is such that 𝐶𝑠𝑝𝑒𝑐 = 𝑁𝑠𝑝𝑒𝑐 . In this circumstance,
reading the latter formula the other way around, it is easy to show that the sum
of conditional posterior probabilities over the tracks reconstructs exactly each
species multiplicity. In fact, equation (7) becomes:
| 𝑆𝑖)
𝑛
𝑖=1
=
= ∫
∑ 𝑁𝑠𝑝𝑒𝑐𝑠𝑝𝑒𝑐 ∙ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 )
∑ 𝐶𝑠𝑝𝑒𝑐 𝑖
) ∙ 𝐶𝑠𝑝𝑒𝑐 𝑖
𝑑𝑆
𝑅 𝑆
𝐶𝑠𝑝𝑒𝑐=𝑁𝑠𝑝𝑒𝑐
→
𝐶𝑠𝑝𝑒𝑐=𝑁𝑠𝑝𝑒𝑐
→ 𝑁𝑠𝑝𝑒𝑐 ∫ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 𝑖
)𝑑𝑆 =
𝑅 𝑆
𝑁𝑠𝑝𝑒𝑐 (𝟖)
Alternatively, when no prior information is available, the right-hand
side of equation (7) can be rewritten as:
∫
∑ 𝑁𝑠𝑝𝑒𝑐𝑠𝑝𝑒𝑐 ∙ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 )
∑ 𝐶𝑠𝑝𝑒𝑐 𝑖
) ∙ 𝐶𝑠𝑝𝑒𝑐 𝑖
𝑑𝑆
𝑅 𝑆
=
= ∫
𝑁𝑠𝑝𝑒𝑐 𝑖
∙ 𝐶𝑠𝑝𝑒𝑐 𝑖
) + 𝐶𝑠𝑝𝑒𝑐 𝑖
∙ ∑ 𝑁𝑠𝑝𝑒𝑐 ∙ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 )𝑠𝑝𝑒𝑐≠𝑠𝑝𝑒𝑐 𝑖
𝐶𝑠𝑝𝑒𝑐 𝑖
) + ∑ 𝐶𝑠𝑝𝑒𝑐𝑠𝑝𝑒𝑐≠𝑠𝑝𝑒𝑐 𝑖
∙ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 )
)𝑑𝑆
𝑅 𝑆
Then, collecting out the factor 𝑁𝑠𝑝𝑒𝑐 𝑖
) in the
numerator and the factor 𝐶𝑠𝑝𝑒𝑐 𝑖
) in the denominator, the following
result is achieved:

16
) ∙ (1 + ∑
𝑁𝑠𝑝𝑒𝑐 ∙ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 )
)𝑠𝑝𝑒𝑐 ≠𝑠𝑝𝑒𝑐 𝑖
)
) ∙ (1 + ∑
𝐶𝑠𝑝𝑒𝑐 ∙ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 )
)𝑠𝑝𝑒𝑐≠𝑠𝑝𝑒𝑐 𝑖
)
=
=
(1 + ∑
𝑁𝑠𝑝𝑒𝑐
𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 )
)
1 + ∑
)𝑠𝑝𝑒𝑐 ≠𝑠𝑝𝑒𝑐 𝑖
+ ∑ (
𝐶𝑠𝑝𝑒𝑐
−
)
where the last step is obtained by adding and subtracting the quantity
∑
∙
𝑃(𝑆𝑖 | 𝐻 𝑠𝑝𝑒𝑐)
𝑃(𝑆𝑖| 𝐻 𝑠𝑝𝑒𝑐 𝑖
)
𝑠𝑝𝑒𝑐≠𝑠𝑝𝑒𝑐 𝑖
in the denominator. Successively, substituting the
latter expression into equation (7), taking 𝑁𝑠𝑝𝑒𝑐 𝑖
out of the integral and dividing
both numerator and denominator by (1 + ∑
𝑃(𝑆𝑖 | 𝐻 𝑠𝑝𝑒𝑐)
𝑃(𝑆𝑖 | 𝐻 𝑠𝑝𝑒𝑐 𝑖
)
), the
following formula is achieved:
∑ 𝑃( 𝐻𝑠𝑝𝑒𝑐 𝑖
| 𝑆𝑖)
𝑛
𝑖=1
=
= 𝑁𝑠𝑝𝑒𝑐 𝑖
∫
1
1 +
∑ (
−
) 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 )𝑠𝑝𝑒𝑐≠𝑠𝑝𝑒𝑐 𝑖
1 + ∑
𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 )𝑠𝑝𝑒𝑐≠𝑠𝑝𝑒𝑐 𝑖
)𝑑𝑆
𝑅 𝑆
Finally, operating a first order Taylor series expansion of the fraction7
inside the integral, the following approximation is obtained:
∫
1
1 +
∑ (
−
) 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 )𝑠𝑝𝑒𝑐≠𝑠𝑝𝑒𝑐 𝑖
1 + ∑
𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 )𝑠𝑝𝑒𝑐≠𝑠𝑝𝑒𝑐 𝑖
)𝑑𝑆
𝑅 𝑆
≈
≈ 𝑁𝑠𝑝𝑒𝑐 𝑖
∫
(
1 +
∑ (
−
)
𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐)
1 + ∑
𝑃
(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 )
)
)𝑑𝑆
𝑅 𝑆
7 Regarded as function of the improper priors in a neighbourhood of the true
multiplicities.

17
Again, since the term ∑ (
−
𝐶 𝑠𝑝𝑒𝑐 𝑖
)
𝑃(𝑆𝑖| 𝐻 𝑠𝑝𝑒𝑐)
𝑃(𝑆𝑖 | 𝐻 𝑠𝑝𝑒𝑐 𝑖
)
𝐶𝑠𝑝𝑒𝑐→𝑁𝑠𝑝𝑒𝑐
→ 0,
equation (7) can be rewritten as:
| 𝑆𝑖)
𝑛
𝑖=1
𝐶𝑠𝑝𝑒𝑐→𝑁𝑠𝑝𝑒𝑐
→ 𝑁𝑠𝑝𝑒𝑐 𝑖
∫ (1 + 0) 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 𝑖
)𝑑𝑆
𝑅 𝑆
=
∫ 𝑃(𝑆𝑖| 𝐻𝑠𝑝𝑒𝑐 𝑖
)𝑑𝑆
𝑅 𝑆
(𝟗)
Therefore, in both cases the method is able to reconstruct the true
multiplicities, exactly or approximately, as long as the choice of the priors is
done correctly. More specifically, in the second option the condition
𝐶𝑠𝑝𝑒𝑐 → 𝑁𝑠𝑝𝑒𝑐 is achieved by means of the iterative procedure described in this
chapter.
Therefore, the method is guaranteed to converge to the distribution of
interest, thus making a perfect particle identification possible. Furthermore, it
has the advantage that, adopting the Bayesian definition of probability, its
usage can be naturally generalised to combine the information of different
detectors inasmuch to achieve an effective exploitation of the full experiment
PID capabilities.
3.2 The analysis
The assessment of the performances of the PID method demonstrated
above is dealt with in the following, based on the output of the simulated data.
The comparison of estimated multiplicities and true ones across pT is carried
out in terms of percentage error of the reconstructed spectra with respect to
the true ones:
𝑒𝑟𝑟%
=
𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑒𝑑 − 𝑡𝑟𝑢𝑒
𝑡𝑟𝑢𝑒
(𝟏𝟎)
The task of reconstruction is conducted through the macro analyze.C
(see Appendix for more details), starting from a flat prior equal to one in each
pT bin for all the 3 species8. Its code includes two parameters that allow to
explore different scenarios, thus giving also the opportunity to estimate the
systematic errors. An interesting aspect to investigate, indeed, is how
systematic variations in the detector response may affect the performance of
the estimation procedure. In this respect, thirteen different scenarios are
explored introducing a bias in the signal reproduced by the detector, namely a
8 Inthis way the counts are normalisedto the multiplicityof pions ineach bin.

18
shift of the expected signal with respect to the simulated data and/or an error
on its width. More specifically:
i) a shift of 10% and 20% of the true 𝜎 value is introduced
uniformly for all species, both in positive and negative direction;
ii) an error of 10% and 20% of the true 𝜎 value is introduced
uniformly for all species, both in positive and negative direction;
iii) a combined effect of the variation of both the parameters is also
explored.
For sake of conciseness, only an upward and a downward alteration is
considered for each parameter and discussed in here, together with the
correspondent four scenarios deriving from the combination of those
variations. All the other results are just briefly mentioned (for further details,
check the Appendix).
All the results presented in this and the following chapters are obtained
using the software ROOT v. 6.04/02.
3.2.1 Realspectra
3.2.1.1 Perfect response scenario
The first scenario investigated is that in which the detector response
replicates perfectly the signal simulated. This setting represents the ideal
framework in which to assess the validity of the method and its speed of
convergence, since any virtual error can be attributed solely to the estimation
technique itself. The results are shown in Figure 9 below.
The reconstruction is almost perfect up to 4 GeV/c for all the species,
then it starts to show a bit of an upward trend for kaons, compensated by a
negative error on protons. The pions, instead, flutter fairly randomly around 0.
The errors for kaons can be quantified in nearly +10% at 7 GeV/c and a bit less
than +20% at the very boundary of the momentum’s range. Regarding protons,
the situation is the opposite, with a downward trend in the percentage error
causing an underestimation around 10-15% for pT > 6 GeV/c. The vertical bars
represent the uncertainty on the reconstructed multiplicities. Their values are
computed as the difference between two successive estimates multiplied by a
gain factor, defined as the proportion of that species over the total of the tracks
observed with that transverse momentum. Their values range from virtually
zero in the first half of the pT range to nearly 20-30% of the estimated value in
the last bins.

19
The previous plot shows an increment of the percentage error for
increasing values of particle’s transverse momenta. To explain the reason why
this behaviour is observed it is useful to look at the error as function of the
separation among the species. This relationship is illustrated in Figure 10 below
for steps 4, 5 and 6. From the graphics, it becomes clear that the percentage
error increases as long as the signals of different species get closer, as
predictable. In turn, since the separation becomes smaller when the transverse
momentum growths, this means that for high values of pT the error is expected
to be bigger. In particular, the reconstruction is almost perfect for pions as long
as their signal is more than 2.5𝜎 far from that of kaons. In any case their error
lies between +10 and -10%, varying fairly randomly around 0. As far as
protons, they behave quite good up to 2𝜎 and then start to have some
problems, with an error in the order of around -10% below 1.5𝜎. Finally, kaons
are affected by the behaviours of the other species, thus there is an excess in
their multiplicity so to balance the defection of protons.
In light of the considerations discussed above, kaons have the most
problematic situation, however expectedly since their signal’s distribution is in
the middle and, hence, is affected by the contamination due to both the other
species. Nevertheless, it is important to underline that the percentage error
may be misleading when comparing different species.
Figure 9: Percentage error in the “perfect response” scenario. Theplots show the distribution of
the percentage error over pT for pions (red), kaons (blue) and protons (green) in the first 6
iterations.

20
In fact, committing a mistake of one particle in the total counts of tracks
belonging to each species may matter very differently for the various types of
particles. In this respect, kaons and protons are certainly penalised inasmuch as
they are fewer than pions in large part of momentum’s domain. Figure 11
illustrates a direct comparison of true and reconstructed spectra after 6 steps.
All in all, the degree of agreement is good, though the absolute error is
quite high due to the large number of particles involved. However, the
discrepancy between true and estimated multiplicities is not yet attributable
Figure 10: Percentage error over n𝜎 separation. Inthe top line is presented a comparison between
protons and pions at equal separation by kaons signal, while in the middle and bottom lines is
shown a comparison between, respectively, π and K and p and K.
Figure 11: True VS estimated comparison. On the left hand side, the comparison of the
reconstructed spectra (dashed lines) and the true ones (solid lines) in the log scale. On the
right, instead, a zoom on the range 5 < pT < 10 is shown. Pions are represented in red, while
kaons in blue and, finally, protons in green.

21
solely to the PID strategy. In fact, in section 3.2.2 the analysis on the constant
ratios simulation will show that the disagreement is rather due to a not
sufficiently large number of observations in the last bins.
In terms of convergence the algorithm performs well yet in the first
steps. After 4 iterations the shape of the trend is already appreciable and in the
next two steps it is just reduced in scale. Successive iterations show not visible
improvements in terms of error, although a reduction in the estimates’
uncertainties is achieved. In conclusion, 6 steps seem a fairly reasonable choice
to balance the trade-off between accuracy and computing time9.
3.2.1.2 Shifted detector response (0.1 𝜎true)
The second scenario taken into account presents a systematic shift by
0.1𝜎true of the detector response in both directions, applied irrespectively of the
track’s identity. The results for the case of upward trend are presented in
Figure 12.
In this case the reconstruction is more troublesome even for low values
of transverse momentum. As matter of fact, after 2 GeV/c the count of pions
9 N.B. all the discussion inthe following is therefore referred to the results after 6 steps
Figure 12: Percentage error in the “upward shift” scenario. Theplots show the distribution of the
percentage error over pT for pions (red), kaons (blue) and protons (green) in the first 6
iterations.

22
starts slightly to be overestimated at the expense of kaons, with an error
corresponding nearly to 4-5% for both species. However, for pT > 3.5 GeV/c the
error relative to kaons begins an upward trend compensated, on the other
hand, by a defection of protons. The error is confined within the 10% for all
species up to nearly 6 GeV/c and then saturates around +15-20% for kaons at
high pT and, respectively, +5% and -15-20% for pions and protons in the same
range.
The situation is almost specular for π and p in the case of downward
trend (Figure 13). This time protons and kaons are slightly overestimated in the
region corresponding to pT greater than 2 GeV/c, with the pions compensating
with a more accentuated downward trend. The size of the errors for the three
species is nearly the same, but in opposite direction for protons and pions.
Kaons are instead fairly unchanged.
3.2.1.3 Altered detector response width (± 0.1 𝜎true)
In the next scenarios, instead, only the width of the signal reproduced
by the detector is altered. Firstly, a narrower signal width is considered and the
results are presented in Figure 14 below. The plot shows that kaons are
overestimated from 3 to roundly 7.5 GeV/c, with an error ranging gradually
Figure 13: Percentage error in the “downward shift” scenario. Theplots show the distribution of
iterations.

23
from 5 to 15%. This behaviour is compensated by both π and p, although no
much effect stands out from the illustration since their multiplicity is far
greater than kaons in that region. After that kinematic area, the counts of K
alternate excesses and defections which are more evidently balanced by
protons, whose numerosity becomes comparable to kaons’ yield for high pT.
Regarding the error, it is possible to say that it is confined below +10% for
kaons and around 0 for the other species up to 6 GeV/C, while starts to increase
up to +20% and -15% respectively for K and p.
On the other hand, the conditions are inverted for kaons and protons in
the case of detector response having a larger width with respect to the
simulated one (Figure 15 below). This time the PID is almost perfect up to 6
GeV/c, showing errors only in the order of 2-3% for K and p. On the contrary,
after that region a worse reconstruction is performed, with error of greater
magnitude in the order of 15-20% for kaons and 10-15% for protons. Pions are
fairly random around zero though.
Figure 14: Percentage error in the “narrower width” scenario. Theplots show the distribution of
iterations.

24
Further analyses have been run introducing more extreme systematics
in the detector response. Specifically, the shift has been increased to 0.2𝜎true
both in positive and negative direction and the response width has been set to
0.8𝜎true and 1.2𝜎true. The results were found to be consistent with the
aforementioned scenarios, with the trend in the percentage error maintaining
the same shape and being increased in magnitude quite linearly. For further
details, see the Appendix.
3.2.1.4 Combined effect
Finally, scenarios in which both detector response parameters have
been changed are taken into account. More specifically, the cases
aforementioned have been combined together, giving rise to four further
scenarios which are presented in the following.
Figure 15: Percentage error in the “larger width” scenario. Theplots show the distribution of the
percentage error over pT for pions (red), kaons (blue) and protons (green) in the first 6
iterations.

25
Figure 16 illustrates a
comparison of the four cases separately
for each species. As convention, pions
are pictured in red, kaons in blue and
protons in green, with shades of the
respective colours representing
different scenarios. They range from
the decrease of both width and
expected response by 0.1𝜎true (lighter
colour tones) to an increase of both
detector parameters by the same
amount (darker colour tones). Again,
iteration 6 is taken as reference. As one
may notice, curves of the percentage
error are coupled by the value of the
shift parameter. This behaviour is
crystal clear for pions and protons, in
which the trends of scenarios with
different 𝜎 are not so different when
blocking for the value of the shift. As far
as kaons, the combined effect of an
alteration of both width and expected
value is more appreciable.
Nevertheless, generally speaking the
effect of the two systematic variations
appears to act independently, without
showing a combined effect on the
error. In fact, the order of magnitude of
the mistaken counts is pretty much the
same as in the aforementioned cases.
A further insight in the
evaluation of the parallel among these
scenarios is given by Figure 17 below.
This time the four cases are dealt with
separately overimposing the curves of
the percentage error of the 3 species.
The plot can be read both by raw and
by column. The former option allows to
compare the effect of the signal width for fixed values of the shift, while the
latter enables the evaluation of the result of an alteration of the expected signal
blocking for the value of 𝜎.
Figure 16: Combined cases comparison. The
plots show the distribution of the percentage
error over pT for pions (on the left), kaons (in
the middle) and protons (on the right) at
iteration 6. Shades of different colours are
used to illustrate different scenarios.

26
The first characteristic that catches the eye is that kaons seem to have a
fairly constant behaviour irrespective of the reproduced response.
Nevertheless, looking more carefully, it is possible to notice two effects the shift
has on kaons. Firstly, it acts as an additive constant, dragging the curve up
when the variation is positive and down when it is negative. This impact is not
evident in the graphic though. Secondly, it reflects the curve of percentage error
with respect to the horizontal line passing through 0 in the range
2 < pT < 4 GeV/c. This latter tendency is more easily understandable if
considering together also pions and protons. In fact, for either species the shift
acts reflecting the curves of percentage error with respect to the horizontal line
passing through 0. Thus, π have a downward trend when negative shift is
added, while an upward trend when it is positive. The opposite behaviour is
observed for the protons. The reason why this happens is that decreasing, on
average, the signal of all the species by 0.1𝜎true means to move all the measured
responses towards pions signal, thus generating a defection in their counts. The
same happens in the opposite direction for p. On the contrary, kaons are not
much affected by this variation, due to the fact that their signal’s distribution
lies in between the other two. However, as already said, an impact in their error
Figure 17: Combined cases comparison. Theplots show the distribution of the percentage error
over pT for pions (red), kaons (blue) and protons (green) at iteration 6. The 4 different scenarios
are illustrated separately: negative shift is pictured in the first row, while positive in the second.
On the other hand, narrower response is presented in the first column, while a larger one in the
second.

27
distribution is appreciable for low pT values where the effect of the shift is
dominant with respect to that of the contamination due to other species,
inasmuch as the separation is high. This generates an overestimation due to a
defection of pions, when the shift is negative, and an underestimation in favour
of pions, when it is positive. However, as soon as the separation gets smaller
the contamination effect becomes dominant, thus causing the characteristic
upward trend of this species. As far as the impact of the width of the measured
signal, it can be notice that it does not much alter the shape of the trend given
by the shift in the expected signal, but simply amplify its magnitude.
In conclusion, it is possible to say that the two factors seem to act
independently on each other. Moreover, the shift seems to have a larger effect
than the width on the percentage error of the reconstruction, however
unexpectedly. This may be due to the fact that, despite a larger uncertainty on
the signal distribution surely contributes to make the reconstruction more
troublesome, its effect is not dominant neither for low transverse momentum,
where it is balanced by a good separation, nor for high pT values, where the
contamination due to other species would have risen either way since the
signal’s distributions get closer.
3.2.2 Flat-ratio simulation
As already mentioned, the method is guaranteed to converge for infinite
statistics, i.e. when the number of observations in each pT bin tends to infinity.
However, this is not the
case for the realistic
simulation, since the
counts in the right
boundary of the
momentum range drop
drastically for all the
species (see Figure 18). A
natural way of
overcoming this would be
to simply enlarge the number of events, so to keep the shape of the spectra
realistic, as well as the species relative proportions, but increasing the number
of particles in each bin by a huge factor. Nevertheless, it would incredibly
increase the computing time needed in the reconstruction phase, therefore this
solution is not considered in here. Alternatively, the issue is addressed by
running a different simulation in which the ratio between the species is flat
over the range of transverse momentum. In particular, it has been fixed to the
value observed in real data in correspondence to pT = 5 GeV/c. As showed
Figure 18: True counts comparison for pT > 5 GeV/c.

28
previously in Figure 8 above, this choice guarantees an appropriate number of
observations in each bin.
The whole set of scenarios was investigated also in this framework. The
results were found to be consistent with those achieved for real spectra, hence,
in order not to bother the reader, the following will deal only with the case of
perfect response. Though, all the other results are illustrated in Appendix.
3.2.2.1 Perfect response scenario
As already mentioned, this scenario represents the best setting in which
to assess the performance of the estimating technique so it is adopted as
example to show the behaviour of the method when sufficient observations are
available in all kinematic areas.
Figure 19 compares the trend over iterations of the percentage error for
the three species at steps 4, 6, 8, 10, 12 and 14. This time the improvement in
the reconstruction is appreciable also after iteration 6. Performing a step-to-
step comparison, it is possible to notice that in equal iterations the error is
lower for the flat-ratio simulated data.
Figure 19: Percentage error in the “perfect response” scenario. Theplots show the distribution of
the percentage error over pT for pions (red), kaons (blue) and protons (green). N.B. the Y-axis
scale is reduced from (-0.3,0.3) to (-0.05,0.05) in the bottom-line plots in order to illustrate more
clearly the changes in the curves.

29
At step 6, in fact, the mistaken counts do not exceed the +10%
threshold for high momenta as far as kaons are concerned, while an almost
perfect reconstruction is observed for the other species. Furthermore,
increasing the number of iterations the PID becomes even better, especially for
the kinematic areas in which the separation of the species gets lower. This may
be interpreted saying that the first 6 iterations are sufficient to reconstruct
practically perfectly the spectra in the first half of the momentum range, while
for a satisfactory reconstruction at high pT values 10 steps are needed, thus
achieving an error on kaons below the +2% and 12 for an error around 1%.
Definitely, the method performs well and was able to obtain a perfect
particle identification even for high momenta. The number of iterations needed
to achieve these results is between 8 and 12, depending on the degree of
precision requested. Similar conclusions can be drawn for the other scenarios,
in which the variation of the detector parameters does not affect the speed of
convergence of the algorithm. However, it introduces some systematic errors.
3.2.3 Differentset of priors
Finally, the last point to investigate further is the influence that the
starting priors have on the results of the iterative procedure and on its
convergence rate. In order to address this issue, another analysis has been run.
In the previous results, the initial guess on the priors was set equal for all the
species, as to normalise each yield to pions multiplicity in that bin. This means
that both kaons and protons begin from an overestimation of their true values
and, hence, it is observed a convergence from above. Alternatively, the
following will deal with results obtained starting from an underestimation of
kaons, which are set to the 20% of pions’ multiplicity. Thus, an instance of how
the technique behaves when the convergence is achieved from below is
illustrated below. Again, kaons have been taken as example because, being in
the middle, they suffer more for contamination due to both the other species.
In Figure 20 the progress of the estimating procedure is demonstrated
for both sets of priors, with the percentage error for the three species being
overimposed. The data on which the parallel is made are taken from the real
spectra simulation. Particle types are indicated with the usual convention
concerning colours. The solid lines with full markers represent the
approximation from above, while the dashed lines with open markers
reproduce the new set of priors. The first thing that leaps out is the evident
difference in the mistaken counts in the first steps for kaons and protons.
Although the new set of priors begins from an underestimation of K
multiplicities, after few steps their distribution already shows the typical shape
of the kaons trend, with an even more accentuated overestimation. Proceeding
ahead with the iterations, the gap between the curves vanishes up to eventually

30
converge to the same error’s distribution. On the contrary, pions seem not to
suffer the effect of the changes in the priors.
Therefore, the choice of a different set of priors does not affect the
results, although it influences the time needed to obtain likewise results
causing a lower rate of convergence of the algorithm. Based on the previous
plot, it is possible to say that 12 iterations are enough to achieve an error not
greater than +20% for kaons and -20% for protons, with an uncertainty
(measure as absolute difference between the curves of each species) in the
order of approximately 5%.
In conclusion, the method has shown great performances and was able
to obtain a perfect particle identification even for high momenta, provided that
enough observations were available in each bin. In order to do so, the number
of iterations needed is between 10 and 12, depending on the degree of
Figure 20: Priors convergence comparison. Theplots show the comparison of error’s distribution
for the two sets of priors at steps 5, 8, 11 and 14. Solid lines picture the approximation from
above, while dashed ones the convergence from below. Usual convention is used regarding
colours. N.B. the Y-axis scale is (-1,1) for the top left plot and (-0.5,0.5) for the others.

31
precision one is willing to achieve. Furthermore, the analysis of the results of
this technique can be adopted in order to give an estimate on the systematic
effects deriving from an imperfect knowledge of the detectors’ response.
3.3 Analysis of the resonances
Once the PID strategy has been validated for the reconstruction of the
spectra of observed particles, the next step is to move to the analysis of the
resonances, i.e. of those particles not directly observed because they have
decayed shortly after the moment of their production, before arriving to the
detecting instruments.
In order to illustrate the results of this new strand of the analysis, the
case of the particle ϕ is taken as reference. In fact, reconstructing the possible
decay channels from the combinatorial background of identified observable
particles is in principle very hard and, hence, a lot of preliminary work has to be
done to deal with this issue. However, the resonance ϕ presents some
simplifications that facilitate this task, thus making the ϕ the perfect case for
illustrative purpose.
3.3.1 The mesonϕ
The meson ϕ is a particle with no electric charge whose prevalent decay
channel is 𝜙 → 𝐾±
+ 𝐾∓
, i.e. two kaons with opposite charge10. It is formed of a
strange quark, s, and a strange antiquark, 𝑠̅, and, as such, it also constitutes its
own antiparticle. The invariant mass11 of this meson has been measured in
various experiments, resulting in the value of 1019.445 ± 0.020 MeV/c2. The
expected lifetime of such particle is estimated to be, instead, 1.55 ± 0.01 ∙ 10−22
seconds.
The reasons why this resonance appears to be a suitable case for a
benchmark analysis are to be found in the reconstruction of its decay channel.
First of all, in this framework, the only channel the ϕ is subjected to allows the
production of just two particles. This implies that the combinatorial
background is reduced to the lowest possible, since for three-body or higher
decays the background combinations are much more numerous. However, this
advantage would vanish if the particles produced were difficult to detect (e.g.
neutral or hardly interacting particles). On the contrary, the two daughters of
the meson ϕ are both electrically charged and so quite easy to detect. In
addition, they are exactly the same particle apart from the sign, thus, in
10 Other channels are also possible, however in here they are not taken into account.
11 The invariant mass, M, of a particle is defined as: 𝑀2
𝑐4
= 𝐸2
− 𝑝2
𝑐2
, where E is the
particle energy, the vector p is its momentum and c is the light speed. W is one of the
relativistic invariant quantities,thus its value is the same inany reference frame.

32
particular, they have also the same mass. This, in turn, implies that, in the
moment in which the decay takes place, the two daughters take approximately
half of the mother momentum. This means that, since the identification
depends on the pT of the particle, the PID performances on the kaons are
constrained once the ϕ’s momentum is fixed.
3.3.2 Results
In this case the PID strategy has to deal with the priors as a function of
the invariant mass instead of pT, thus applying weights during the fill of the
histograms over this new variable. The result of this procedure in the perfect
response scenario is illustrated in Figure 21 below.
The picture shows the reconstructed invariant mass of the mother
particle on the x-axis and the respective counts of combinations of candidate
daughters particles on the y-axis. In the top panel the true invariant mass
distribution and the estimated one are overimposed over the whole range of
transverse momentum, while in the bottom graphic the percentage error of the
reconstruction is plotted. Starting from the output of the iterative procedure, a
fit on the estimated posterior invariant mass distribution is made in order to
obtain the total yield of ϕ mesons. In particular, the background is modelled
Figure 21: Reconstructed invariant mass plot. The figure presents on the top panel the
invariant mass distribution of the resonance ϕ reconstructed from the channel K+ + K-. the
fit is overimposed in blue, while the background is represented by the red line. The results
are shown in the statistics box. In the bottom panel the usual percentage error of the
estimated yield with respect to the true one is illustrated.

33
through a linear function, while a relativistic Breit-Wigner is fitted on the peak.
The latter is a continuous probability distribution which is commonly used in
particle physics to model the signal in an invariant mass plot. Its functional
form was firstly derived by Lorentz for solving a problem of optic physics
during the classical study of atoms regarded as damped harmonic oscillator,
however it was introduced in high-energy physics by Gregory Breit and Eugen
Wigner, from whom it is named after. Its probability density function is defined
as follows:
𝐵𝑊( 𝑀; 𝑀0, 𝛤) =
( 𝛤
2⁄ )
2
( 𝑀 − 𝑀0)2 + ( 𝛤
2⁄ )
2 (𝟏𝟏)
where the two parameters M0 and Γ are the invariant mass of the ϕ meson and
width of its peak12. However, being a density function, it integrates to 1,
therefore it is not suitable to be adapted directly for modelling the peak of
observed absolute frequencies. Thus, a further parameter is added in the fit,
acting as a normalisation constant. Hence, globally there are 2 parameters
relative to the background, 2 describing the signal and a normalisation
constant, for a total of 5 unknown quantities to estimate. The result of the fit is
overimposed onto the peak of the signal in the top panel and the χ2 observed
value is reported in the statistics box along with its degrees of freedom.
The fit was performed in the interval of invariant mass ranging from 1.00 to
1.04 GeV/c2, thus considering 58 bins. The test was non-significant, showing an
observed value of the test statistic equal to 47.16 against a 95% threshold of a
χ253 = 70.99. Hence, the expression combining the signal and the background
successfully modelled the peak in the invariant mass distribution and can
therefore be used to estimate the invariant mass and the mean lifetime of the ϕ
meson. The estimates of the parameters of interest resulting from the fit are
presented in Table1 below.
Mass Width
Value 1.01998 GeV/c2 0.042603 GeV/c2
Table 1: Estimates of the fitting procedure.
The estimated mass was found to be 1.01998 GeV/c2, while for the width a
value of 0.042603 GeV/c2 was achieved. Therefore, the results were consistent
with the current estimates, thus, in conclusion, the PID strategy works well also
in the reconstruction of the ϕ decays. However, to assess properly the
12 The parameter Γ is linked to the mean lifetime, τ, by the relationship 𝛤 ∙ 𝜏 = 𝒽,
where 𝒽 = 𝒽
2𝜋⁄ and 𝒽 = 4.13567 × 10-16 eV/s represents the Plank constant.

34
performances of the method when applied to resonances, further tests are
needed.
After the estimates of the parameters of the signal and background
distributions have been found, the total yield over the whole momentum’s
domain has been found integrating the difference of the two functions in the fit
range, giving a total of 739948 particles, i.e. roundly the 67% of the true
number of particles. This result may seem far from a good representation,
however one has to consider that the reconstructed invariant mass plot
reproduced 1107671 ϕ mesons, i.e. the 99.99%, so the problem is rather the fit
than the reconstruction. In fact, the linear functional form for the background
may be too simplistic and a better combination can be explored. Anyhow, this
issue is not a great deal, since once one knows the efficiency of the
reconstruction, then it is just possible to correct the result for that value.
3.4 Limitations and outlook
The presented results were quite satisfactory both in terms of errors
and computing time (around 15 minutes per step for real spectra and around
40 minutes for flat-ratio simulation). Furthermore, the study of the different
scenarios also gave the opportunity to estimate the systematics introduced by
an imperfect knowledge of the experimental apparatus. However, the
procedure showed some limits as well.
First of all, the measure of the uncertainty on the estimates is
troublesome. For this reason, the given values of the error bars are to be taken
as raw indications rather than a proper estimate. On the other hand, a careful
reader may argue that no formal goodness of fit tests are performed and an
effort has to be done in order to address this issue. Conversely, two alternatives
have been explored so to deal with the lack of a formal assessment of the
conformity between true and reconstructed spectra. More specifically, two tests
were conducted in order to test the null hypothesis H0: Ftrue (si) = Fest.(si),
∀ si ∈ ℛ against the alternative H1: Ftrue (si) ≠ Fest.(si) for some si, namely the
conformity χ2 and the one-sample Kolmogorov-Smirnov tests. However, due to
the intrinsic nature of frequentist statistics, both the test statistics were
affected by the very large amount of observations in the sample (the tracks are
more than 65 million in the realistic simulation) so that their results would
misleadingly reject the null hypothesis, even though the agreement between
the two curves appears to be evident in the plots.
Secondly, the evaluation of the convergence of the algorithm is here
assessed graphically, based on the outputs. Therefore, the comments on the
speed of convergence are to be intended only as sensitive indications rather
than general recommendations. Indeed, the implementation of a stopping rule

35
is required in order to encourage the application of the method to real data
analysis. A possible answer to this need may be found in the comparison of
different initial guess on species prior distributions over momentum, as shown
in section 3.2.3.
Thirdly, starting from the outputs of the previous analysis, the
reconstruction of the invariant mass of the ϕ resonance has been dealt with,
returning consistent estimates for the invariant mass and mean lifetime
parameters and a yield of around 73-thousand particles produced in the whole
momentum range.
Finally, further development of this estimating technique is still ongoing
in order to allow to handle more problematic resonances as well. In this
respect, the optimal final target would be the reconstruction of the Λc signal in
Pb-Pb collisions. In fact, although current theories anticipate the production of
such particles in this kind of collisions, at the moment no PID strategy has been
able to reconstruct their signal. Therefore, a successful performance for this
case would give a lot of credits to the method, perhaps establishing it as a
benchmark in the sector.

37
4. A Bayesian approach
An alternative way to tackle the problem of particle identification
presented in this dissertation is through a Bayesian approach. In fact, although
the methodology demonstrated in chapter 3 makes use of Bayesian
probabilities in the update of current estimates, it is still embedded into the
frequentist statistics context since the objective of the estimation procedure is
the likelihood. In order to move to a Bayesian inference about the mixture
model presented in chapter 2, it is necessary to assume some prior
distributions on the unknown parameters. However, differently from what has
been done so far, this has to be made over the three species in each bin rather
than for each particle type over the range of momentum.
4.1 Estimation method
Mixture models have reached ever increasing prominence in modern
statistics, finding many applications related to important statistical topics like
clustering data, outlier detection, treatment of the unobserved heterogeneity,
regression and non-linear time series analysis. Since the moment they were
first introduced13, finite mixture models have been intensively and extensively
dealt with in all their aspects, creating a vast literature on the subject.
Nonetheless, a Bayesian approach has barely been developed until the early
nineties, majorly due to the prohibitive computing times required for
reasonable sample sizes, despite close form solutions are available in this
framework. However, in the past twenty years, also the research on this topic in
the latter branch has produced very interesting results, demonstrating how to
estimate these models in a Bayesian setup using Monte Carlo simulation
techniques based on Markov chains. Examples of key papers on Bayesian
analysis of mixtures models are, in temporal order, (Lavine, 1992), (Diebolt,
1994), (Bensmail, 1997) and (Richardson, 1997).
In frequentist context, the most applied maximum likelihood method to
make inference about mixture models is by far the EM algorithm. It constitutes
a very powerful tool since it allows to deal with groups of different size, shape
and orientation. Nonetheless, it comes with some limitations as the risk to
remain stuck in a local minimum and the impossibility to obtain directly the
estimates on the parameters uncertainty. The advantage of tackling the
problem in a Bayesian fashion is that a MCMC procedure is guaranteed to
converge eventually to a unique limiting distribution, which is also the one of
13 Karl Pearson was the first to show, in 1894, how to estimate the five parameters of a
mixture of two normal distributions using the method of moments.

38
interest – even if it will take some time. Furthermore, a natural and intuitive
definition of the uncertainty on the estimates arise from this approach in the
form of the whole posterior distribution. Finally, Bayesian inference also
provides the posterior probabilities for a single observation to belong to each
and every population of the mixture model, thus making it suitable for the case
under study in this thesis. As far as the estimating technique in a Bayesian
framework, an example of how it is possible to make inference on the
parameters of such models through a Gibbs Sampling algorithm using
conjugate priors can be found in (Franzén, 2006) and (Cornebise, 2005).
The analysis presented in the following will deal with the estimation of
the posterior distributions of the parameters of interest starting from the
model described by equation (2) in section 2.2. General recommendations
present in literature about the choice of the priors and the initialisation of the
hyperparameters are followed. Again, the primary objective of the estimating
procedure is the vector of mixing proportions Ω = {ωπ, ωK, ωp}, while the vector
θ = {μπ, μK, μp, σ} is considered not of interest. Therefore, the inference will
focus on the vector Ω and the remaining parameters can either be treated as
fixed or as if they were random quantities. In here the vector θ is taken to be
fixed a priori and known from previous experiments, as it actually is. However,
considering μ and σ as nuisance parameters would give the opportunity to test
the current knowledge about the detector response by comparing the new
estimates with the established values, hence a section dedicated to the problem
of the priors’ choice in this latter setup is also presented in this chapter.
4.1.1 Approach I: fix parameters
As already said, in the present study the vector θ is taken to be fixed a
priori since the mean values of the signals are given by the evaluation of a
proper separation function in correspondence of the pT bin the track belongs to,
while the variance is assumed to be common to all the populations and known
from previous estimates. This choice implies that the only unknown quantities
for which the specification of a prior distribution is needed are the mixing
proportions, Ω, of the mixture. Examples in literature have shown that the
Dirichlet distribution constitutes a favourable choice, resulting as the conjugate
prior for multinomial data, i.e. for categorical data in which each observation
can present just one of the categories. For this reason, it has been adopted as
prior for the population weights. Notice that, since the relative proportions of
particle species may vary over the range of transverse momentum, a different
model has to be considered in each pT bin, thus also a different prior has to be
specified. In the following the formalisation is presented with respect to the
single bin:

39
Ω ∼ Dir(𝑎 𝜋, 𝑎 𝐾, 𝑎 𝑝) =
1
𝐵( 𝒂)
∏ ω 𝑠𝑝𝑒𝑐
𝑎 𝑠𝑝𝑒𝑐−1
𝜋,𝐾,𝑝
𝑠𝑝𝑒𝑐
(𝟏𝟐)
where B(a) is the multivariate Beta function, i.e. 𝐵( 𝒂) =
∏ 𝛤
𝜋,𝐾,𝑝
𝑠𝑝𝑒𝑐 (𝑎 𝑠𝑝𝑒𝑐)
𝛤(∑ 𝑎 𝑠𝑝𝑒𝑐
𝜋,𝐾,𝑝
𝑠𝑝𝑒𝑐 )
.
The Dirichlet distribution, named after Peter Gustav Lejeune Dirichlet,
is a multivariate generalization of the Beta distribution and is very often used in
Bayesian statistics precisely for being a conjugate prior. It is a continuous
multivariate probability distribution parameterised by a vector α of positive
reals. The relative sizes of the hyperparameters αj describe the mean of the
prior distribution and their sum is a measure of the strength of the prior belief.
In other words, this distribution is mathematically equivalent to a likelihood
resulting from a sample of ∑ (𝑎 𝑗
𝐽
𝑗=1 − 1) individuals with αj -1 observations
belonging to class j. Thus, this choice allows to benefit from the advantages of
using conjugate priors and easily update the priors hyperparameters in order
to get the posterior distribution. In order to do so, the first step is to express the
likelihood in terms of the missing information about the group membership,
here indicated by the vector V :
𝑓( 𝑽| 𝜴) ∝ ∏ ω 𝑠𝑝𝑒𝑐
∑ 𝐼( 𝑣𝑖=𝑠𝑝𝑒𝑐)𝑛
𝑖=1
𝜋,𝐾,𝑝
𝑠𝑝𝑒𝑐
(𝟏𝟑)
where I(∙) is the indicator function defined as 𝐼( 𝑣𝑖 = 𝑠𝑝𝑒𝑐) = {
1, 𝑣𝑖 = 𝑠𝑝𝑒𝑐
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
.
Then, to derive the posterior distribution of the mixing proportions
conditional on the unknown vector V containing the individual classification, it
is sufficient simply to multiply together the prior and the likelihood, thus
resulting in the updated Dirichlet posterior distribution:
𝑓( 𝜴| 𝑽) ∝ 𝑓( 𝜴) ∙ 𝑓( 𝑽| 𝜴)
∝ ω 𝜋
𝑎 𝜋−1
∙ ω 𝐾
𝑎 𝐾−1
∙ ω 𝑝
𝑎 𝑝−1
∙ ω 𝜋
𝑛 𝜋 ∙ ω 𝐾
𝑛 𝐾 ∙ ω 𝑝
𝑛 𝑝
∝ ω 𝜋
𝑎 𝜋+𝑛 𝜋−1
∙ ω 𝐾
𝑎 𝐾+𝑛 𝐾−1
∙ ω 𝑝
𝑎 𝑝+𝑛 𝑝−1
𝜴| 𝑽 ∼ Dir(𝑎 𝜋
∗
, 𝑎 𝐾
∗
, 𝑎 𝑝
∗ ) (𝟏𝟒)
where 𝑎 𝜋
∗
= 𝑎 𝜋 + 𝑛 𝜋, 𝑎 𝐾
∗
= 𝑎 𝐾 + 𝑛 𝐾 and 𝑎 𝑝
∗
= 𝑎 𝑝 + 𝑛 𝑝.

40
Finally, the posterior probability for an observation si to belong to a
generic population spec is calculated according to Bayes’ theorem conditionally
on the whole set of observations s and the values of µ and 𝜎:
𝑝𝑖,𝑠𝑝𝑒𝑐 |𝜴, 𝝁, 𝜎 =
𝜔𝑠𝑝𝑒𝑐 ∙ 𝑓𝑠𝑝𝑒𝑐
𝑑𝑒𝑡.( 𝑠𝑖, 𝜽)
∑ 𝜔𝑠𝑝𝑒𝑐 ∙ 𝑓𝑠𝑝𝑒𝑐
𝑑𝑒𝑡.( 𝑠𝑖, 𝜽)𝜋,𝐾,𝑝
𝑠𝑝𝑒𝑐
, 𝑖 = 1, … , 𝑛 (𝟏𝟓)
Once the prior has been set and an updating rule has been found, the
estimation process may begin. Markov Chain Monte Carlo simulation is a
powerful tool to address this issue. The general scheme is to reconstruct a
Monte Carlo sample from the conditional posterior distribution of the
parameters given the data, let say p(η|y), simulating from a Markov chain
defined such that its limiting, stationary distribution is the target posterior.
Plenty of examples of such techniques are available in literature, see for
instance (Hastings, 1970), (Geman, 1984), (Gelfand, 1990) and (Gilks, 1999).
Among them, the Gibbs Sampler is one of the most frequently used MCMC
algorithms that is particularly effective when the full conditional distributions14
of the parameters of interest are known and relatively easy to sample from.
More specifically, each single step of this algorithm consists in sampling
iteratively one parameter per time from its conditional distribution given the
current estimates of all the others. This procedure is then repeated for every
unknown quantity, each time updating the current values of the parameters
with those already sampled at the current step. For parameters not sampled yet
the values at the previous step are considered. Finally, the process is iterated
either for a predetermined number of steps or up to a convergence criterion is
met. A thorough explanation of the Gibbs Sampler is given, for example, in
(Casella, 1992). A special case of this method is then the Data Augmentation
algorithm, which arises when missing data are present. For more details, see
(Tanner, 1987).
As every mixture model, also the one described by equation (2) in
section 2.2 can be expressed in terms of incomplete data since, when a given
signal is observed, no information on the true identity of the particle is
available. For this reason, a Gibbs Sampler in the form of Data Augmentation
has been chosen in order to achieve the estimates of the parameters of interest,
along with their classifications. The difference between the two is given by the
fact that in the former the generation of random variables is totally circular. In
fact, in the Gibbs Sampler each parameter is sampled from its conditional
distribution given the classification vector and the whole set of the other
parameters. On the contrary, in the Data Augmentation the simulation happens
14 A full conditional distribution of an unknown quantity is the probability density (or
mass) functionof that parameter conditional onthe values of all the other parameters.

41
considering the classifications and only the values of the parameters already
updated in the current step. The advantage of using the latter method is that,
leaving less space for randomness, it has better performances in terms of
convergence and speed. Hence, the Data Augmentation general scheme has
been adapted to the problem under study, resulting into three simple steps. The
algorithm applied in the analysis, in fact, consisted in i) sampling triplets of
mixing proportions from a Dirichlet distribution, whose hyperparameters are
the current estimates of α, and ii) using their values to compute the updated
estimates of the single tracks posterior probabilities. Once this has been done,
iii) the hyperparameters α are updated as well and the whole process is
iterated. The Data Augmentation algorithm is guaranteed to converge, even if it
may take some time. A formal proof of that is illustrated in (Diebolt, 1994) in
the context of one-dimensional normal mixture models using a duality principle
(thus, in framework similar to the setup of this analysis).
However, in order to be coherent with the method presented in chapter
3, the sampled values and posterior distribution of the mixing proportions has
been used solely to assess the convergence of the algorithm, while the
reconstruction of the histograms of interest has been conducted by filling the
individual posterior probabilities into the spectra of the three species
considered. As such, no actual classification has been done in the estimation of
the transverse momentum spectra, although a classification vector was
computed at each step in order to keep the algorithm going.
As far as the choice of the prior hyperparameters is concerned, they
have been set equal for the three species over the range of momentum, thus
resulting in a flat, uninformative prior distribution. In particular, the αspec
values have been chosen to be equal to 2 in order to guarantee a wide 95%
confidence interval for the mixing proportions of around (0.076, 0.66)15. The
purpose of this choice is to allow more flexibility as possible, such to enable a
correct modelling of the population weights which may be quite different from
bin to bin.
Finally, it is important to mention that, not having a pre-built routine
available in ROOT, the method was implemented from scratch and tested only
for the version of the software adopted (for more details check the macro
bayesian_analyze.C in the Appendix).
15 The confidence interval is computed exploiting the fact that the marginal distribution
of a Dirichlet’s parameter,αj, is a 𝐵𝑒𝑡𝑎(𝛼𝑗,(∑ 𝛼𝑗
𝐽
𝑗=1 )− 𝛼𝑗).

42
4.1.2 Approach II: nuisance parameters
Alternatively, one may be interested in treating the vector 𝜽 as random,
even though not of interest, and set some priors also on the parameters µ and 𝜎
(here the response width is allowed to vary for different species). Then,
statistical literature provides several examples of this, suggesting the following
choices:
𝜎𝑠𝑝𝑒𝑐
2
~𝒢 (
𝑓𝑠𝑝𝑒𝑐
2⁄ ,
𝑠𝑠𝑝𝑒𝑐
2⁄ ) (𝟏𝟔)
𝜇 𝑠𝑝𝑒𝑐|𝜎2
~𝒩(𝑚 𝑠𝑝𝑒𝑐 ,
1
𝜎2 ∙ 𝜏 𝑠𝑝𝑒𝑐
) (𝟏𝟕)
Updating rules for the hyperparameters’ posterior distributions of the
nuisance parameters are also available, resulting in:
𝜎𝑠𝑝𝑒𝑐
2 | 𝒔, 𝑽~𝒢 (
∗
2
⁄ ,
𝑠𝑠𝑝𝑒𝑐
∗
2
⁄ ) (𝟏𝟖)
𝜇 𝑠𝑝𝑒𝑐|𝜎, 𝒔, 𝑽~𝒩(𝑚 𝑠𝑝𝑒𝑐
∗
,
1
𝜎2 ∙ 𝜏∗
𝑠𝑝𝑒𝑐
) (𝟏𝟗)
where:
∗
= 𝑓𝑠𝑝𝑒𝑐 + 𝑛 𝑠𝑝𝑒𝑐 and 𝑠𝑠𝑝𝑒𝑐
∗
= 𝑠𝑠𝑝𝑒𝑐 + ∑ ( 𝑠𝑖 − 𝜇 𝑠𝑝𝑒𝑐)
2𝑛
𝑖=1
𝑚∗
=
𝜏 𝑠𝑝𝑒𝑐 𝑚 𝑠𝑝𝑒𝑐+𝑛 𝑠𝑝𝑒𝑐 𝑠̅ 𝑠𝑝𝑒𝑐
𝜏 𝑠𝑝𝑒𝑐 𝑛 𝑠𝑝𝑒𝑐
and 𝑠̅ 𝑠𝑝𝑒𝑐 =
1
𝑛 𝑠𝑝𝑒𝑐
∑ 𝑠𝑖𝑖:𝑣𝑖=𝑠𝑝𝑒𝑐 .
The estimation, then, proceeds using a Data Augmentation algorithm,
whose generic iteration k is illustrated below:
i) sample a triplet of 𝜎𝑠𝑝𝑒𝑐
2 (𝑘)
according to the Gamma distribution
in (18) conditional on data s and the current classification
vector V(k);
ii) sample a triplet of 𝜇 𝑠𝑝𝑒𝑐
(𝑘)
according to the Gaussian
distribution in (19) conditional on the simulated values 𝜎𝑠𝑝𝑒𝑐
2 (𝑘)
,
on data s and the current classification vector V(k);
iii) sample a triplet 𝜔𝑠𝑝𝑒𝑐
(𝑘)
according to the Dirichlet distribution
in (14) conditional on the current classification vector V(k);
iv) compute the individual posterior probabilities 𝑝𝑖,𝑠𝑝𝑒𝑐
(𝑘)
according
to equation (15) and use those values in order to update the
classification vector V(k) in V(k+1);

Tesi

Recommended

Recommended

More Related Content

Similar to Tesi

Similar to Tesi (20)

Tesi