Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Applied Physics Thesis
Delft University of Technology
Breath analysis
by quantum cascade laser
spectroscopy
Author:
Olav Grouwstra
Direct supervisor:
A. Reyes Reyes
Supervisor:
N. Bhattacharya
Professor:
H.P. Urbach
March 5, 2015

Contents
1 Introduction 3
1.1 Gas analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Quantum cascade laser spectroscopy . . . . . . . . . . . . . . . . . . . . 3
1.3 Previous work on the project . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Goals and thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Quantum Cascade Laser Spectroscopy 5
2.1 Absorbance spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 The quantum cascade laser . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Multipass cavity . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Distance meter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.4 Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Data Calibration 13
3.1 Wavenumber calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Four measurement signals . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Data acquisition and pre-processing . . . . . . . . . . . . . . . . . 15
3.1.3 Need for additional calibration . . . . . . . . . . . . . . . . . . . . 16
3.1.4 Wavenumber calibration . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 CO2 and H2O calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 Method for calibration to CO2 and H2O . . . . . . . . . . . . . . 26
3.2.2 CO2 and H2O calibration results . . . . . . . . . . . . . . . . . . 27
4 Molecules and Concentrations 30
4.1 List of molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.1 Standard molecules . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.2 Molecules selected to distinguish between data sets . . . . . . . . 31
4.1.3 Selection based on measurable concentration . . . . . . . . . . . . 34
4.2 Concentration determination . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.1 method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.2 Input and results . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Ethanol estimation by CO2 removal . . . . . . . . . . . . . . . . . . . . . 36
4.3.1 Motivation for CO2 removal . . . . . . . . . . . . . . . . . . . . . 36
4.3.2 Implementation of CO2 removal for ethanol estimation . . . . . . 37
1

5 Disease prediction by machine learning 40
5.1 Machine learning for classification . . . . . . . . . . . . . . . . . . . . . . 40
5.2 Classification by trial-and-error . . . . . . . . . . . . . . . . . . . . . . . 40
5.2.1 Basic explanation of some algorithms used . . . . . . . . . . . . . 41
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6 Conclusions and outlook 43
6.1 QCL setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Data calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.3 Determination of molecules . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.4 Determination of concentrations . . . . . . . . . . . . . . . . . . . . . . . 44
6.5 Categorizing health status by classification . . . . . . . . . . . . . . . . . 45
Appendices 48
A Results for a sample of a healthy patient 49
A.1 Standard molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A.2 Molecules found by p-region analysis . . . . . . . . . . . . . . . . . . . . 49
A.3 Concentrations found for a single healthy patient . . . . . . . . . . . . . 53
B List of molecules 58
B.1 Breath molecules with absorbance in the 1020 to 1100 cm−1
region. . . . 58
2

1. Introduction
1.1 Gas analysis
Gas analysis is done to find the different molecules present in a gas. Identifying the
composition of a substance is vital for various practical applications such as medical
diagnostics and environmental monitoring[1][2]. In this work the focus lies on gas analysis
applied to medical diagnosis.
The diagnosis of diseases is helped by many different techniques such as blood tests or
radiological studies like magnetic resonance imaging (MRI), as they give an idea of what is
inside the human body. Blood tests can provide this knowledge since changes in metabolic
processes significantly influence the compounds in human blood. For example, diabetes
increases the amount of acetone in the bloodstream. Hence knowing the concentration
of acetone could help diagnose diabetes. The compounds in blood diffuse through the
lungs into the breath we exhale, and diagnosis can therefore also be done by analysis of
human breath.
Breath analysis as opposed to blood tests has the benefit of being non intrusive,
thereby being much more easily implemented by medical professionals. Non intrusive
medical instruments also undergo less strict control by regulatory parties like the Food
and Drug Administration (FDA) and the European Medicines Agency (EMA). While
diagnosis through imaging techniques might give a better idea about the origin of a
disease, it does not give information on the influence of a disease on the metabolic
processes.
1.2 Quantum cascade laser spectroscopy
Quantum cascade laser (QCL) spectroscopy is only one among a variety of spectroscopy
methods, each of which has its pros and cons.
Currently one of the most common techniques used for gas analysis is gas chromatog-
raphy – mass spectrometry (GC-MS). Although GC-MS achieves high accuracy, it has
certain limitations. GC-MS is a specific test, meaning it requires a priori knowledge of
what to test for[3]. This means it is limited in use to only cases where the symptoms are
already present, and thus requires pre-diagnosis. GC-MS also requires sample prepara-
tion, limiting its use to trained professionals. A gas analysis method using a quantum
cascade laser for spectroscopy is treated in this work. This form of analysis does not need
sample preparation and is therefore more easily accessible. A priori knowledge of what
3

molecules to test for is also not necessary for the measurement itself, which makes it a
promising tool for laying statistical relations between concentrations of molecules and
the health status of individuals.
Photoacoustic spectroscopy (PAS) is another useful technique. It works, like QCL
spectroscopy, by shining light on a sample. However in PAS it is the absorbed light that
creates the signal by thermally expanding the sample, thereby creating a sound wave
which can be measured. As in the case of QCL spectroscopy, PAS requires no sample
preparation. But PAS requires the signal to go through more energy conversions, thereby
making it susceptible to vibrational noise[4].
Electronic noses are also used for the detection of gasses. These are however made
specifically to detect specific molecules, as opposed to broadband QCL spectroscopy
which can detect any molecule with an absorbance spectrum overlapping with the em-
mission spectrum of the QCL.
1.3 Previous work on the project
A spectroscopy setup using a quantum cascade laser (QCL) is described and the subse-
quent data processing is extensively treated. The setup is made and extensively treated
by A. Reyes Reyes as his PhD project funded by Fundamenteel Onderzoek Materie
(FOM) foundation[5]. Z. Hou developed the data acquisition under supervision of A.
Reyes Reyes[3]. Measurements processed in this research are from breath samples of
35 healthy volunteers, and 35 asthmatic volunteers. Two different breath samples are
used from each volunteer. These samples are from children from the Sophia Children’s
hospital in Rotterdam, all with their parents’ informed consent. From the absorbance
spectra obtained from these samples a uncertainty in the wavenumbers is found. In this
work calibration is done through post-processing in order to reduce this uncertainty.
1.4 Goals and thesis structure
The goal of this work is twofold. Firstly to decrease the uncertainties with which the
compounds and its concentrations are determined. The second goal is to find a reli-
able way to distinguish and identify breath of healthy and asthmatic children from the
compounds present and their concentrations.
This thesis works towards these goals by first explaining the general working principle
of the setup and theory behind QCL spectroscopy, where the parts having the most influ-
ence on the noise and resolution of the signal are highlighted. The absorbance measured
spectrum and its calibration are then treated. This is followed by an explanation of how
the present species and their concentrations of various molecules are found. Finally a
way of distinguishing between samples of healthy and asthmatic patients is treated, and
the overal results are discussed.
4

2. Quantum Cascade Laser
Spectroscopy
In this chapter the relation between the intensity of the laser light, the absorbance of
the gas, and the concentrations of the molecules in the gas is explained. The necessary
parameters to measure and the equipment that is used to measure these parameters are
explained as well. Measurements on the gas are performed with a spectroscopy setup
using a quantum cascade laser (QCL). The various components and conditions of the
measurement setup that inﬂuence the determination of the gas compounds and con-
centrations are explored. In order to understand how these components inﬂuence the
eventual concentration, the path of light through the components is described starting
from the quantum cascade laser and ending at the mercury cadmium telluride (HgCdTe)
detectors. Getting the absorbance spectrum from a breath sample takes around 60 min-
utes, 5 minutes of which is comprised by the actual scanning of the gas by the setup.
Transferring and loading the large datasets between computers and into processing soft-
ware takes up the most time. The setup as it is used for sample measurements is build
by A. Reyes Reyes[5].
2.1 Absorbance spectrum
In order to obtain information on the gas through spectroscopy, the interaction between
light and matter needs to be established. The relation of the concentrations of the
compounds in the gas to the light passing through the gas is given by the Beer-Lambert
law[6, 7]:
Io(ν, C) = Ii(ν)10−A(ν,C)
(2.1)
where Io is the beam intensity after interacting with the gas, Ii the beam intensity before
interacting with the gas, and ν and C denote the dependence on the wavenumber and
the concentrations of the compounds respectively. A is the absorbance, as given by
A(ν, C) =
n
i=1
i(ν)cilpath (2.2)
with
C = (c1, c2, ..., cn) (2.3)
where n is the amount of molecular species, and ci and i(ν) denote the concentra-
tion and molar absorptivity of each molecular species in the gas. The vector C is used
5

simply as an abbreviated way to indicate the dependence of the absorbance A on the
concentrations of all molecular species. The absorbance is also dependent on the inter-
action length lpath of the light with the gas. In order to determine the concentrations
ci of the molecules, all other parameters in Equation 2.1 and Equation 2.2 should be
known. The light is provided by the QCL, and the intensities Io and Ii are obtained by
processing the signals measured by the detectors in the setup. The molar absorptivity
i(ν) is a measure for how strongly a certain molecule species absorbs light at a given
wavelength. It is an intrinsic property of a molecule and can be considered unique to
each molecule. Hence the molar absorptivity can be looked up in the Northwest In-
frared database (NWIR)[8] and the high resolution transmission molecular absorption
database (HITRAN)[9], which are put together by the Pacific Northwest National Labo-
ratory (PNNL) and the Harvard-Smithsonian Center for Astrophysics respectively. The
interaction length lpath is a property of the setup, and more specifically of the cavity, and
can be measured from the cavity as described in subsection 2.2.2.
2.2 Setup
To find the concentration of compounds within a gas the spectroscopy setup depicted in
Figure 2.1 is used. The laser beam is produced at the QCLs. From the QCL the beam
falls onto a beam splitter, causing one of the beams to go straight to the monitor detector,
which is used to measure the intensity spectrum as emitted by the QCL. The other beam
from the beam splitter travels to the multipass cavity. The beam gets partially absorbed
by the gas present in the cavity, and then travels onwards to the main detector. The
distance meter is placed in such way that its beam coincides with that of the QCLs in
order to measure the optical path length in the cavity. The two detectors relay their
signals to the NI-PCI 5922 acquisition card, which in turn sends the signals to the
computer.
6

Figure 2.1: Diagram of the quantum cascade laser spectroscopy setup as used for mea-
suring. Courtesy of Z.Hou[3]
2.2.1 The quantum cascade laser
The laser system used is Block Engineering’s LaserScope which consists of two QCLs that
are arranged such that together they can scan over the wavenumber range of 832 cm−1
to 1263 cm−1
[10]. As the LaserScope is by itself designed to be a spectroscopic system,
it has an internal detector and a set of two QCLs which are portrayed in Figure 2.1 as
the monitor detector and the QCLs. The main specifications on the LaserScope can be
found in Table 2.1, where it should be noted that the spectral resolution stated is the
output when the signal is processed using Block Engineering’s own software.
Table 2.1: Specifications of the quantum cascade laser as given by Block Engineering
[10].
Parameter Specification
Spectral region 1267 to 833 cm−1
(8 to 12 µm)
Tuning Continuous or single wavelength
Emission type Pulsed
Pulse repetition frequency 200 kHz
Minimum average power 0.5 mW
Maximum average power 12 mW
Beam diameter 2.5 mm x 5 mm ellipse
Spectral resolution Down to 1 cm−1
By way of its design the QCLs emit different wavenumbers of light simultaneously[11].
The HgCdTe detectors detect the intensity of the light without differentiating between
the various wavenumbers. Since intensity per wavenumber is a necessity, a Littrow con-
figuration with a diffraction grating mounted on a piezo is used, as seen in Figure 2.2[3].
7

Figure 2.2: Diagram of a Littrow configuration external cavity diode laser with back
extraction. Courtesy of Z.Hou[3]
This Littrow configuration uses a diffraction grating to steer the first order diffraction
of the light with the desired wavenumber back through the laser diode, and into the setup.
The wavenumber selected using the grating depends on the spacing of the grating and
the incident angle of the light on the grating. The incident angle is controlled by applying
a voltage over the piezo.
The intensity of the light emitted by the QCL is furthermore influenced by mode
hopping, which can be caused by temperature fluctuations in the gain medium of the
laser or fluctuations in the injection current. In chapter 3 a closer look is taken at the
signal from the QCL and its noise. The intensity spectrum of the QCL system is given
in Figure 2.3. The general shape of this signal portrays the intensity spectrum the laser
outputs. The intensity drop around 1010 cm−1
indicates the switch between the two
QCLs that together scan over the entire wavenumber range.
8

850 900 950 1,000 1,050 1,100 1,150 1,200 1,250
0.2
0.4
0.6
0.8
1
·108
Wavenumber (cm-1
)
Intensity(counts)
Spectrum emitted by the QCL
Figure 2.3: Intensity spectrum of the laser system used: Block Engineering’s LaserScope.
2.2.2 Multipass cavity
A cavity is used to achieve a large interaction length lpath between the gas and the
beam. The cavity is depicted in Figure 2.4. The beam enters the cavity through a ZnSe
window. It then reflects from the two mirrors at either end of the cavity, until the beam
falls on the window again and exits the cavity. The mirrors (Aerodyne Research, Inc.)
are astigmatic, meaning each mirror has two different radii of curvature which allows
for more reflections to take place in order to have a larger interaction distance, lpath[12].
A large interaction distance is desirable since it increases the absorbance. A multipass
cavity is used in favor of a resonance cavity since the high reflection mirrors as used in a
resonance cavity are not available for the wide wavelength range used in this experiment.
9

Figure 2.4: Diagram of the multi-pass cavity. The ZnSe window is located on the left in
yellow, and the two mirrors in blue on either end of the cavity. Courtesy of Z.Hou[3]
The length of interaction of the light with the gas lpath is measured using a laser
distance meter to be 54.36 meter[3]. A multipass cavity with such a long path length is
used in order to have a large absorbance. The mirrors in the cavity and the ZnSe window
leading in to the cavity also absorb 94.85% of the light[3, p. 25]. Part of the light is ab-
sorbed in the cavity as according to the absorbance proﬁles of the molecules constituting
the gas. The remaining light exits the cavity and is measured at the main detector. In
order to clean the cavity between measurements a series of mass ﬂow controllers and a
diaphragm pump are used. This way contamination in the cavity is prevented[5].
2.2.3 Distance meter
The distance meter (Leica DISTO D2) emits light pulses along the path as depicted by
the red dashed line in Figure 2.1. The path the pulse travels outside the cavity is simply
measured with a ruler. By looking at Figure 2.1 it can be seen that by subtracting
the length of the path the pulse travels outside of the cavity from the total length as
calculated by the distance meter, the optical path length inside the cavity is obtained[5].
This optical path length inside the cavity is also checked by measuring the time a pulse
from the QCLs takes straight to a detector compared to the time a pulse from the QCLs
takes when it passes through the cavity and then onto a detector. The delay between
the pulses measured is given in Figure 2.5, from which the same result of 54.36 cm is
obtained.
10

Figure 2.5: Delay time between a pulse going straight to the a detector and a pulse
passing through the cavity for three different wavenumbers. (a) 950 cm−1
, (b) 1000
cm−1
, (c) 1150 cm−1
. Courtesy of A. Reyes Reyes[5]
2.2.4 Detection
A HgCdTe detector measures the light before it enters the gas, and another HgCdTe
measures the light when it exits the gas. These are called the monitor and main detector
respectively, and the signals obtained from these detectors will henceforth be referred to
as the monitor signal and the main signal. The specifications of the monitor detector are
given in Table 2.2.
Table 2.2: Specifications of the monitor detector as given by Block Engineering[13].
Detector element TE cooled HgCdTe
Active area 1x1 mm2
Window BaF2
Wavelength range 6–12 µm
Total gain (typical) 800 V/W
Cut-off frequency range 10 Hz – 20 MHz
Operating temperature (typical) 223 K
Saturation threshold power (at 8 µm, 20 MHz speed) 5 mW (peak)
The main detector is from VIGO System S.A. with part number PVI-4TE-10.6-
0.3x0.3-TO8-BaF2, the specifications of which are given in Table 2.3.
11

Table 2.3: Speciﬁcations of the main detector as given by VIGO System S.A.[14].
Detector element TE cooled HgCdTe
Active area 0.3 × 0.3 mm2
Window BaF2
Wavelength range 2–12 µm
Detectivity at λpeak ≥ 4.0 × 109 cm·
√
Hz
W
Detectivity at λopt ≥ 2.0 × 109 cm·
√
Hz
W
Optimal wavelength 10.6 µm
Operating temperature (typical) 195 K
The intensity of the light emitted by the QCL IQCL as seen in Figure 2.6 varies per
measurement due to temperature ﬂuctuations and noise on the voltage applied over the
piezo. Therefore the signals of multiple measurements need to be normalized in order to
be compared to each other, for which the two detectors are used.
1,173.35 1,173.4 1,173.45 1,173.5 1,173.55 1,173.6 1,173.65 1,173.7 1,173.75 1,173.8
7
7.5
8
8.5
9
·108
Wavenumber (cm-1
)
Intensity(counts)
Measurements of emitted intensity of the QCLs
Figure 2.6: Intensity spectrum of various measurements with the monitor detector.
12

3. Data Calibration
3.1 Wavenumber calibration
As mentioned in chapter 2 the intensity of the light emitted by the QCLs is subject to
noise. This noise is corrected for through normalisation by using the monitor and main
detector. In this chapter an attempt is made to reduce the standard deviation in the
monitor and the main signals as caused by an uncertainty of what wavenumber is emitted
from the QCLs. The wavenumber corresponding to the intensity measured at the detec-
tors is determined by the angle of the grating mounted on the piezo, as mentioned in
subsection 2.2.1. The wavenumber attributed to the measured beam intensities at a point
in time is determined by the voltage on the piezo, which should predictably determine
the angle of the grating. Since there is noise on the voltage over the piezo element the
behaviour of the piezo is not completely predictable, and thus the angle of the grating
can not be told precisely by the voltage at a point in time on the piezo. This gives an
uncertainty to the wavenumbers attributed to the measured beam intensities, as can be
seen in Figure 3.1. The wavenumber data in Figure 3.1 is obtained by recording files
made by Block Engineering’s firmware which measures the voltage over the piezo and
contains knowledge of what voltage corresponds to what wavenumber. Unfortunately
Block Engineering’s firmware outputs its spectrum at lower wavenumber resolution than
the acquisition described in subsection 3.1.2 using the NI-PCI 5922 acquisition card.
This wavenumber uncertainty in turn increases the standard deviation of the ab-
sorbance when multiple measurements are compared. QCL systems are a relatively new
development and the QCLs in the Laserscope should be among the best available cur-
rently, hence there is no certainty that significant improvements can be made with the
knowledge on QCL design and piezo elements currently available. With this, and the
complicated nature of the electronics in the LaserScope QCL system, it could take a
long time to replace the piezo, for which there is no time in the scope of the project.
Instead wavenumber calibration is done in order to reduce this uncertainty in absorbance
by calibration of multiple measurements.
3.1.1 Four measurement signals
As established in section 2.2 a main signal and a monitor signal are obtained during
each measurement. Multiple measurements are performed per sample. In order to get
information on the gas, measurements are made using the main detector when a gas is
13

Measuring time - wavenumber relation
200 400 600 800 1,000
850
900
950
1,000
Time stamp (a.u.)
Wavenumber(cm-1
)
(a)
468 468.5 469 469.5 470
920.8
921
Time stamp (a.u.)
Wavenumber(cm-1
)
(b)
Figure 3.1: Effect of the voltage noise on the wavenumber emitted of several measure-
ments over time. (a) Several scans superposed. (b) A zoom into a section of the figure
shown in (a).
inside the cavity. These measurements will be called gas measurements from now on.
In order to single out any effects due to the different paths the beams take to the main
and monitor detector, measurements are also done without gas in the cavity. These
measurements without gas will henceforth be referred to as reference measurements.
These reference measurements are also used to establish the beam intensity without
interaction with the gas, which is Ii in Equation 2.1. The effects of the components of
the setup such as the cavity, the lenses, and the mirrors on the beam to the main detector
are all packed together in a factor fpath1(ν). This factor is defined such that multiplying
the intensity of the beam as it exits the QCLs by this factor gives the intensity of the
beam as it arrives at the main detector, and the factor therefore takes values anywhere
between 0 and 1 for the wavenumber range. fpath2(ν) and fgas(ν, C) are similarly defined
for the effect of the elements between the QCLs and the monitor detector and for the
effect when a gas sample is present.
Due to effects in the QCLs, the signals vary in intensity between different measure-
ments. Therefore during both gas and reference measurements monitor signals are also
recorded. The monitor signals are necessary to be able to reduce the effects of the QCL,
as the QCL affects the signal of the monitor and the main detector similarly and can
therefore be identified. For a clear overview the various effects on the signal in the two
types of measurement of both detectors are laid out in Table 3.1. Wavenumber depen-
dence in Table 3.1 is indicated by ν and ν‘, to indicate that different measurements can
not be compared wavenumber-wise. This is a result from the uncertainty caused by ef-
14

fects in the QCL.
Table 3.1: The contributions to the intensity as measured by the two detectors.
Signal type Gas measurement Reference measurement
Main IQCL(ν‘)fpath1(ν‘)fgas(ν‘, C) IQCL(ν)fpath1(ν)
Monitor IQCL(ν‘)fpath2(ν‘) IQCL(ν)fpath2(ν)
3.1.2 Data acquisition and pre-processing
The signal from the detectors is processed before it is calibrated, as by the method
developed by Z. Hou[3]. The light emitted by the QCL is a series of pulses lasting 333
ns and spaced approximately 5600 ns apart. These light pulses go through the setup and
end up at the detectors, where the subsequent signals are then sampled at a rate of 10
MHz, or 1 sample per 100 ns, by the NI-PCI 5922 acquisition card. Since the saving of
the data from the memory of the acquisition card to the computer is one of the most
time consuming part of the data acquisition, the data is ﬁltered using LabVIEW such
that only the peaks and valleys of the pulse signal are saved to the computer. For this,
LabVIEW’s peak detection is used at a width of 3 points in order to get an amplitude
error for the peaks of 0.39%. As the valleys are essentially parts where the laser is emitting
no light but there is still a signal, the signal strength of the valleys surrounding a peak
are subtracted from the peak to normalize out any bias due to electronics of the detector.
A ﬁnal resolution of 0.05 cm−1
is achieved. For each sample ten gas measurements and
ten reference measurements are done and processed this way. The spectrum of one of
these processed monitor signals looks as shown in Figure 3.2. The entire spectrum spans
from 832 cm−1
to 1263 cm−1
.
15

800 850 900 950 1,000 1,050 1,100 1,150 1,200 1,250 1,300
0
0.2
0.4
0.6
0.8
1
1.2
1.4
·109
Wavenumber (cm-1
)
Intensity(counts)
Monitor signals of gas and reference measurement
Figure 3.2: A monitor signal after pre-processing as described in subsection 3.1.2.
3.1.3 Need for additional calibration
The absorbance is obtained by inverting Equation 2.1 such as:
A(ν, C) = − log10
Io(ν, C)
Ii(ν)
. (3.1)
Using Table 3.1 the measured signals are substituted for the beam intensity with gas
in the cavity Io and the beam intensity gas in the cavity Ii:
Io(ν‘, C) =
Imain,gas
Imon,gas
=
IQCL(ν‘)fpath1(ν‘)fgas(ν‘, C)
IQCL(ν‘)fpath2(ν‘)
=
fpath1(ν‘)fgas(ν‘, C)
fpath2(ν‘)
(3.2a)
Ii(ν) =
Imain,ref
Imon,ref
=
IQCL(ν)fpath1(ν)
IQCL(ν)fpath2(ν)
=
fpath1(ν)
fpath2(ν)
, (3.2b)
where the wavenumber dependence as indicated by ν and ν‘ shows that diﬀerent mea-
surements can not be compared wavenumber-wise as shown at the start of section 3.1.
16

The division by the monitor signals is done to remove the influence of the QCLs’ inten-
sity instability in IQCL(ν) and IQCL(ν‘). By substituting Equation 3.2 into Equation 3.1
Equation 3.3 is obtained:
A(ν, ν‘, C) = − log10


fpath1(ν‘)fgas(ν‘,C)
fpath2(ν‘)
fpath1(ν)
fpath2(ν)

 = − log10
fpath1(ν‘)fpath2(ν)fgas(ν‘, C)
fpath1(ν)fpath2(ν‘)
.
(3.3)
Note that the absorbance A(ν, ν‘, C) is still dependent on the influences of the op-
tical path because the two separate wavenumber dependencies cannot be resolved. The
mean absorbance from ten measurements of a single sample with the standard deviation
calculated per wavenumber are shown in Figure 3.3.
Mean absorbance and standard deviation of ten measurements
800 1,000 1,200
−0.2
0
0.2
0.4
Wavenumber (cm-1
)
Absorbance
(a)
1,172 1,173 1,174 1,175 1,176
0
0.1
Wavenumber (cm-1
)
Absorbance
(b)
Figure 3.3: Mean absorbance (red line) and standard deviation (dashed blue lines) of ten
measurements. (a) Entire scan range. (b) A zoom into a section of the scan shown in
(a).
The large standard deviation in Figure 3.3 is partly because the multiple measure-
ments over which the mean is taken cannot be compared properly, since the relation
between the wavenumbers of different measurements is not known. When looking closely
at Figure 3.4a and Figure 3.4b, this wavenumber discrepancy is seen as the measurements
do not overlap.
17

1,172 1,174 1,176
0.6
0.8
1
·109
Wavenumber (cm-1
)
Intensity(counts)
Uncalibrated monitor measurements
(a)
1,172 1,174 1,176
1
1.5
2
·108
Wavenumber (cm-1
)Intensity(counts)
Uncalibrated main measurements
(b)
Figure 3.4: Ten uncalibrated intensity measurements of a single sample. Five of these are
from measurements with a gas in the cavity, and five are from a reference measurement.
(a) Measured signals at the monitor detector. (b) Measured signals at the main detector.
This local horizontal shift from one measurement to another results in an increase in
standard deviation of the eventual absorbance and concentration. The absorbance of a
single sample is calculated from multiple measurements with gas inside the cavity and
the reference measurements without gas in the cavity. All these measurements need to
be aligned to the same wavenumber values.
3.1.4 Wavenumber calibration
In order to calibrate the various monitor and main signals, a basis to which they can
all be calibrated is chosen. The mean of all the main signals is not a useful basis to
calibrate towards since the main signals of the reference measurements differ from the
main signals of the gas measurements because in the latter a gas is present and absorbs
part of the light. The mean of all the monitor signals is used as basis for calibration
since the monitor signals are nearly identical for all measurements because they do not
enter the cavity and therefore are not affected by the gas in the cavity. Since the monitor
and main signal of a single measurement suffer from the same wavenumber discrepancies
from the QCL, the transformations that calibrate the monitor signals also calibrate their
respective main signals.
Calibration starts with the monitor signals, and is done towards the mean per wavenum-
ber of all the monitor signals of a sample, as given in Figure 3.5 as a dashed red line.
The positions of the peaks and valleys of this mean are then determined, as denoted by
black boxes in Figure 3.5. Each monitor signal is calibrated separately to the mean of
18

the monitor signals. Calibration of a monitor signal as denoted by the blue line in Fig-
ure 3.5 towards a single valley of the mean is shown in Figure 3.5b and Figure 3.5c. The
position of the valley of the mean used to calibrate is marked by a vertical dashed grey
line, and the two surrounding peaks of the mean are marked by vertical dashed black
lines in Figure 3.5b. The minimum value of the monitor signal between the two positions
of peaks of the mean is taken, as marked by the star in Figure 3.5c. This minimum value
is then taken as the signal’s valley corresponding to the valley of the mean as marked by
the grey line. This valley of the monitor signal is then shifted towards the corresponding
valley of the mean of all the monitors. This process is repeated for all valleys of the mean.
The peaks of the monitor signal are calibrated in a similar fashion. Every signal is now
locally shifted so its peaks and valleys match that of the mean signal. All data between
peaks and valleys is stretched and compressed by means of resampling to ﬁt the adjusted
peaks and valleys. Since the data between a peak and a valley is mostly straight, linear
interpolation is suﬃcient for resampling.
19

Calibration of a monitor signal
1,172 1,172.5 1,173 1,173.5 1,174 1,174.5 1,175 1,175.5 1,176
6
7
8
9
·108
Wavenumber (cm-1
)
Intensity(counts)
Mean of all monitors
A monitor signal
Peaks and valleys of mean
(a)
1,174 1,174.5 1,175
7
8
9
·108
Wavenumber (cm-1
)
Intensity(counts)
(b)
1,174 1,174.5 1,175
7
8
9
·108
Wavenumber (cm-1
)
Intensity(counts)
(c)
Figure 3.5: Calibration of a monitor signal to a valley of the mean.
The complete calibration of a monitor signal following the procedure shown in Fig-
ure 3.5 can be seen in Figure 3.6.
20

1,172 1,172.5 1,173 1,173.5 1,174 1,174.5 1,175 1,175.5 1,176
6
7
8
9
·108
Wavenumber (cm-1
)
Intensity(counts)
Calibrated monitor signal
Mean of all monitors
A monitor signal
Peaks and valleys of mean
Figure 3.6: A single monitor signal calibrated towards the mean of all monitor signals of
a sample.
The miss-calibrated blue peak in Figure 3.6 around 1175.2 cm−1
is a consequence of
the minimum of the monitor signal between the surrounding peaks of the mean being
located at a higher wavenumber rather than a lower wavenumber of this monitor peak.
Since the indices of the peaks and valleys of the monitor signal found are later sorted with
the assumption of peaks and valleys alternating, the small blue peak gets miss-assigned
as a valley and is consequently put at the place of the valley of the mean at 1175.2 cm−1
instead of peak of the mean at 1175.4 cm−1
. This can be solved by comparing the indices
of the monitor peaks with those of the monitor valleys, and replacing the index of the
monitor valleys of which the index is larger than that of the subsequent peak by the
index of the minimum between the monitor peak and the monitor peak before that.
Figure 3.7 demonstrates how the wavenumber calibration procedure calibrates ten
measurements of a sample towards each other. Five of the signals are from gas measure-
ments, and ﬁve are from reference measurements. Figure 3.7c and Figure 3.7d are the
wavenumber calibrated versions of Figure 3.7a and Figure 3.7b respectively.
21

1,172 1,174 1,176
0.6
0.8
1
·109
Wavenumber (cm-1
)
Intensity(counts)
Uncalibrated monitor measurements
(a)
1,172 1,174 1,176
1
1.5
2
·108
Wavenumber (cm-1
)Intensity(counts)
Uncalibrated main measurements
(b)
1,172 1,174 1,176
0.6
0.8
1
·109
Wavenumber (cm-1
)
Intensity(counts)
Calibrated monitor measurements
(c)
1,172 1,174 1,176
1
1.5
2
·108
Wavenumber (cm-1
)
Intensity(counts)
Calibrated main measurements
(d)
Figure 3.7: Ten monitor signals and their respective main signals shown uncalibrated
and calibrated. (a) Uncalibrated monitor signals. (b) Uncalibrated main signals. (c)
Calibrated monitor signals. (d) Calibrated main signals.
The calibration done this way does not align the signals perfectly, as can be seen
in Figure 3.7. The majority of the monitor peaks and valleys as seen in Figure 3.7c
22

are aligned perfectly, and slightly unaligned peaks can be attributed to the calibration
method only looking for a valley or peak of the signal between two peaks or valleys of
the mean. When the valley or peak of the signal is located exactly on a peak or valley of
the mean, the closest point to the peak is most likely the highest and therefore will be
aligned as a peak, giving a slight miss-alignment.
For each sample ten gas measurements and ten reference measurements are done,
giving a total of twenty monitor signals and twenty main signals. The drop in standard
deviation of these signals as a result of the calibration can be seen in Table 3.2. The
drop in standard deviation is calculated for signals from reference and gas measurements
separately, and for the collection of signals of both reference and gas measurements. The
numbers indicate the percentage drop from the uncalibrated signals to the calibrated
signals. From Table 3.2 it seems that the calibration works better for monitor signals
than it does for main signals. This can be attributed partly to the fact that the monitor
signals served as the basis of the calibration, and when looking at Figure 3.7 it does seem
that the calibration works better for the monitor signals. However, since each monitor and
its respective main signal experience the same wavenumber distortions from the QCLs,
and the exact transform used on each monitor signal is also used on its respective main
signal, the large difference in calibration result can not entirely be attributed to the basis
of calibration. Rather the standard deviation in the main signals is already relatively
higher than in the monitor signals, and the wavenumber uncertainty is a smaller part
in the total standard deviation of the main signals than in the monitor signals. Thus
calibrating for the wavenumber uncertainty decreases less the total standard absorbance
of the main signals.
Table 3.2: Decrease in standard deviation for the monitor and main signals as result of the
wavenumber calibration. This decrease in standard deviation is calculated separately for
just gas measurements, just reference measurements, and gas and reference measurements
together.
Signal type Gas measurements
(%)
Reference
measurements (%)
Gas & reference
measurements (%)
Main 7.04 6.50 6.46
Monitor 16.6 17.0 18.4
It is also interesting to note that the standard deviation of the main signals improves
less when both reference and gas measurements are taken into account as opposed to only
one type of measurement. This can be explained because the gas and reference measure-
ments of the main signal should be different due to the absorption of the main signal
by the gas in the gas measurement. Because the gas and reference measurements are so
different, the total standard deviation when taking both gas and reference measurements
into account is greater than that of the separate measurements, and hence the standard
deviation caused by the wavenumber uncertainty is a smaller part.
After aligning all these signals as explained in subsection 3.1.4, ten absorbances—each
from a combination of the four signals from Table 3.1—are calculated as according to
Equation 3.1, Equation 3.2, and Equation 3.3. However since the wavenumbers of the
23

various measurements are now calibrated,
ν = ν‘, (3.4)
and thus the absorbance as in Equation 3.3 becomes
A(ν, C) = − log10
fpath1(ν)fpath2(ν)fgas(ν, C)
fpath1(ν)fpath2(ν)
= − log10 [fgas(ν, C)] . (3.5)
The standard deviation of these ten absorbances decreases from 0.0313 before calibra-
tion to 0.0291 after calibration, a decrease of 7.06%. For a large part of the absorbance
spectrum this is still a relatively large standard deviation as seen in Figure 3.8, where
the mean absorbance found this way for one sample is given.
850 900 950 1,000 1,050 1,100 1,150 1,200 1,250
0
0.5
1
1.5
2
2.5
·10−1
Wavenumber (cm-1
)
Absorbance
Absorbance calibrated by wavenumber
Figure 3.8: The wavenumber calibrated absorbance as calculated from the measurements
of a single sample.
3.2 CO2 and H2O calibration
When looking at the absorbance spectrum of a breath sample as in Figure 3.8 there are
a few regions that stand out by their relatively high absorbance. From Equation 3.1 it
24

can be seen that high molar absorptivities (ν) in a region and high concentrations of
molecules with such molar absorptivities contribute to a high local absorbance. When
checking for molecules known to have relatively high concentrations in breath such as
CO2 with about 4% per volume, H2O with about 5% per volume, and N2 with about
78% per volume, a cause of the high absorbance regions is easily identiﬁed, as seen in
Figure 3.9. The molar absorptivity spectrum overlays of CO2 and H2O in Figure 3.9 are
scaled with factors 60 and 5 · 104
respectively to clearly show the matching regions. CO2
and H2O molar absorptivities are taken from HITRAN database[9].
850 900 950 1,000 1,050 1,100 1,150 1,200 1,250
−0.5
0
0.5
1
1.5
2
2.5
·10−1
Wavenumber (cm-1
)
Absorbance(base-10)
Absorbance with overlaid CO2 and H2O molar absorptivity
Experimental data
CO2
H2O
Figure 3.9: Absorbance from wavenumber calibrated data overlaid with spectra of CO2
and H2O from HITRAN database[9].
When looking more closely at these matching regions between the absorbance spec-
trum and CO2 and H2O, it is seen in Figure 3.10 that even after the wavenumber calibra-
tion the absorbance spectrum does not completely match CO2 and H2O from literature.
25

1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083
0
0.5
1
1.5
2
·10−1
Wavenumber (cm-1
)
Absorbance(base-10)
Closer look at absorbance with CO2 and H2O overlaid
Experimental data
CO2
H2O
Figure 3.10: Absorbance from wavenumber calibrated data overlaid with spectra of CO2
and H2O from HITRAN database[9].
A method similar to the wavenumber calibration is used in order to calibrate the
absorbance spectrum to the database spectra of CO2 and H2O. The diﬀerence is that
now only the peaks are matched to CO2 and H2O, and all data points in between,
including valleys, are linearly interpolated between the matched peaks.
3.2.1 Method for calibration to CO2 and H2O
Input and preprocessing
This method uses the absorbance spectrum as obtained from the wavenumber calibra-
tion, the accompanying wavenumber range, and molar absorptivity spectra from CO2
and H2O. Since the method used for the calibration to CO2 and to H2O are almost iden-
tical, only the CO2 case is described here. The CO2 spectrum is taken from HITRAN
database[9], and in the range of 832 cm−1
to 1263 cm−1
with a resolution of 0.05 cm−1
.
The CO2 spectrum is scaled by a factor 60 so the height of its peaks approximately
matches that of the experimental data.
26

CO2 peak and absorbance peak identification
To facilitate the calibration of the experimental data to the CO2 spectrum from the
HITRAN database, the peaks and valleys of the CO2 database spectrum serve as refer-
ence points. The peaks and valleys of the CO2 database spectrum are identified using
a peakfinder[15] algorithm. The peakfinder algorithm finds peaks above a certain ad-
justable threshold, based on its value compared to the surrounding data points. The
peakfinder algorithm then returns the indices and intensities of all data points that it
identifies. In case the first and last peak are before and respectively after the first and
last identified valley, the first and last peak in the spectrum are removed. This way the
range to adjust always starts and ends with a valley of the CO2 database spectrum.
Now that all peaks and valleys of the CO2 database spectrum are found, they are
used in order to find the absorbance peaks and valleys of the experimental data. The first
absorbance peak is identified by taking the maximum absorbance between the first two
valleys of the CO2 database spectrum. The neighbouring absorbance valley is identified
by taking the minimum between the first absorbance peak and the next peak of the
CO2 database spectrum. The next valleys/peaks of the experimental data are found by
looking between the last peak/valley found from the experimental data and the next
peak/valley from CO2 of the database, alternating between finding a peak and a valley.
This process is repeated to the last identified valley of the database CO2 spectrum. This
way all absorbance valleys are found between two peaks, and all absorbance peaks are
found between two valleys. Thus the same amount of absorbance peaks and valleys from
the experimental data are identified as peaks and valleys of the CO2 database spectrum,
making it easy to match them1
.
Peak calibration
All the peaks and valleys of the CO2 database spectrum are now put in a single list, and
ordered by the index. The same is done for the peaks and valleys of the experimental
data. The peaks of the experimental data list are now matched to coincide to the peaks
of CO2, and all the measurement data in between two peaks is interpolated to fit between
the surrounding CO2 peaks. This same procedure is applied to H2O. The result of these
calibrations is seen in Figure 3.11.
3.2.2 CO2 and H2O calibration results
Except for the sides of the CO2 and H2O spectra where the peak heights are around the
noise level, nearly all CO2 and H2O peaks seem to be correctly calibrated. This is also
indicated by Figure 3.11, though some peculiarities are worth mentioning.
Around 1081 cm−1
the measured absorbance has no peak, yet according to the CO2
spectrum it should have. One reason for the absence of an absorbance peak could be
that the calibration algorithm miss-calibrated, and the next absorbance peak at 1082.3
1
Since there is one more CO2 valley than CO2 peak, the amount of found absorbance peaks matches
CO2 peaks. However, two less absorbance valleys are found than there are CO2 valleys. Hence the first
and last CO2 valley are removed.
27

1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083
0
0.5
1
1.5
2
2.5
3
·10−1
Wavenumber (cm-1
)
Absorbance(base-10)
Closer look at absorbance calibrated to CO2 and H2O
Experimental data adjusted for CO2 and H2O
CO2
H2O
Figure 3.11: Absorbance calibrated to CO2 and H2O overlaid with spectra of CO2 and
H2O from HITRAN database[9].
cm−1
should actually be at 1081 cm−1
. This can be rejected on the basis that the spacing
of the other absorbance peaks seems to match well with the spacing of the CO2 peaks,
even before calibration as seen in Figure 3.10. The question remains—taking that no
miss-calibration took place around 1081 cm−1
—why there is no peak when a peak would
be expected from looking at the CO2 database spectrum. This might be explained by
noise, however when looking at the measured absorbance’s constituent monitor and main
signals the noise is not any higher than at other places.
A related peculiarity, namely the height of the measured absorbance peaks not match-
ing with the heights of the CO2 peaks in general, is also at first ascribed to noise. However,
when comparing absorbances of multiple different samples as in Figure 3.12, it can be
seen that the height of each peak is about equal between samples. This indicates that the
difference in height of the various measured absorbance peaks to the CO2 peaks is not a
matter of noise, but rather a consistent wavenumber dependent influence from a part of
the setup. A source for these mismatching absorbances has not yet been pinpointed.
One event of miss-calibration can still be argued to have taken place. In Figure 3.10
the closest thing to an absorbance peak near the CO2 database peak at 1081.1 cm−1
is
the peak located at 1081.8 cm−1
, which is not calibrated to the CO2 database peak as
28

seen in Figure 3.11. From Figure 3.10 it can be seen that this small peak was not detected
and calibrated since it is located outside the region where the search was performed, from
the valley in the absorbance spectrum at 1081.0 cm−1
to the valley in the CO2 spectrum
at 1081.6 cm−1
. This might be solved by increasing the search region, though this could
cause problems for other peaks. It can also be argued that since the small peak at 1081.8
cm−1
is actually located closer to the CO2 peak at 1082.3 cm−1
than to the peak at
1081.1 cm−1
, it is not necessarily better to match it to the peak at 1081.1 cm−1
. It can
also not be said whether the absorbance peak that would be expected near the small
peak would be located at the same position, and thus no conclusion can be made if this
peak would also not have been calibrated to its corresponding CO2 peak at 1081.1 cm−1
.
1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083
0
0.5
1
1.5
2
2.5
3
·10−1
Wavenumber (cm-1
)
Absorbance(base-10)
Comparison of height of CO2 peaks to measured samples
Experimental data
Experimental data
Experimental data
CO2 spectrum
Figure 3.12: Measured absorbances of three diﬀerent samples calibrated to CO2 and
H2O. The molar absorptivity overlay of CO2 is scaled by 60. CO2 molar absorptivity is
taken from the HITRAN database[9].
29

4. Molecules and Concentrations
In this chapter a list of molecules is assembled. This list contains the molecules that are
likely present in breath. The list is assembled in such a way to find molecules with con-
centration differences between the healthy volunteers and the asthmatic volunteers that
contributed their breath for this research. Seventy breath samples of healthy volunteers
and seventy breath samples of asthmatic volunteers are used. The absorbance spectra
of these samples were calibrated according to chapter 3. In this chapter the calibrated
spectra are checked for the concentrations of the molecules in the list. Lastly, a specific
method for acquiring a more accurate concentration for ethanol is described.
4.1 List of molecules
To check the absorbance spectra for concentrations, a list of molecules with known molar
absorptivities is assembled. The initial list consists of all molecules with known molar
absorptivities from the PNNL[8] database and the HITRAN[9] database. Currently these
databases provide information on a total of 456 different molecules.
From the 456 molecules available in the PNNL and HITRAN databases it is necessary
to select those likely present in breath. Not all of the molecules are expected to be present
in breath, as a lot of them are neither organic nor known to be in the atmosphere in an
amount detectable by the current setup. Seeing that the absorbance still has uncertainty
after the calibration, the concentrations determined will have a related uncertainty. As
a result the concentrations of non-present molecules will not necessarily be set to zero,
and will therefore influence the concentrations determined for molecules that are present
in the gas sample. Thus before determining the concentrations it is vital to get a list of
molecules which excludes these excessive molecules. In order to distinguish between the
healthy and asthmatic groups, the list should also contain molecules likely to have large
differences in concentration between the groups.
The list is assembled in two steps. Firstly by determining the molecules that have
been previously reported to be present in breath, the standard molecules. The second
step is adding those molecules likely to show different concentrations between the samples
of the healthy and the asthmatic volunteers. In the case when one is not interested in
finding differences between data sets the list of standard molecules should be extended
in another way. This can be done by selecting molecules based on their absorptivity and
a set minimal absorbance.
30

4.1.1 Standard molecules
This first group of molecules is a combination of atmospheric molecules[16] and molecules
known to be in breath[17] with high absorbance in the relevant wavenumber region of 832
cm−1
to 1263 cm−1
. Molecules such as CO2, H2O, ethanol, and acetone are the major
contributors to the absorbance spectrum of breath in the relevant wavenumber region.
These molecules must therefore be included in the list regardless of whether they are
likely to have differences in concentration between the healthy and the asthmatic group.
The list of molecules chosen this way can be found in Table 4.1.
Table 4.1: List of standard molecules used.
Molecule
Acetic acid
Acetone
Ammonia anhydrous
Benzene
Carbon dioxide
Carbon monoxide
Ethane
Ethyl alcohol
Ethylene
Formaldehyde monomer
Methane
Pentane
Propane
Water
4.1.2 Molecules selected to distinguish between data sets
To determine whether a gas was exhaled by a healthy person or by someone with asthma,
a recognizable difference in measurement results is desirable. P-values are used to quan-
tify this difference between the measurements from healthy people and asthmatic people.
Since p-values are often misunderstood, an explanation of how p-values are calculated
and the significance of p-values to this research is given.
P-values
The p-value is defined as the probability of obtaining a certain result or more under the
assumption of a certain hypothesis. In the context of these breath measurements the
null hypothesis is that being asthmatic has no effect on the concentrations of molecules,
and will give identical absorption profiles as being healthy. Thereby for each wavenum-
ber the combined absorbances of multiple samples of healthy and asthmatic people are
expected to be two groups of points spread around the same absorbance value. A normal
distribution is assigned to both the asthmatic and the healthy samples, and the mean
31

and standard deviations of both is calculated. Now, assuming the null hypothesis is true,
the mean and standard deviation of the asthmatic group should be the same as that
of the healthy group. The p-value per wavenumber is found by calculating the chance
that the absorbances of the asthmatic samples get the values they are, assuming the null
hypothesis which claims that the asthmatic samples follow the distribution of the healthy
samples.
Assume the null hypothesis is not necessarily true, but rather that the alternative—
having asthma does give a different absorbance spectrum—could also be true. The
preferred hypothesis is the one for which the measured absorbance is most likely to
occur. Although no p-values are calculated for the alternative hypothesis, a small p-
value for the null hypothesis is still used as an indicator that samples of asthmatic and
healthy patients are inherently different. The p-values between the 70 absorbances of
healthy patients on the one hand and the 70 absorbances from asthmatic patients on the
other hand are obtained this way for every wavenumber. This is done using an algorithm
from K. Brand[18] that uses the topTable function from the LIMMA package in R.
P-region analysis
In p-region analysis the compounds most likely to have different concentrations between
the two data sets are sought. This is done by checking the p-values corresponding to the
two data sets and finding regions of consecutive low p-values. The reasoning behind this
is as follows: organic molecules such as the ones expected in human breath are larger
and more complicated than for example H2O and CO2, and thereby have a spectrum
consisting of broad smooth profiles. If the concentration of such a molecule differs greatly
from one data set to the other, this would be clear by low p-values in a broad band. Thus
to find these molecules, the p-values corresponding to two data sets need to be below a
certain threshold, for a certain region of consecutive wavenumber values. For these so-
called p-regions, a list of compounds is checked for their absorptivity. Compounds with
an average absorptivity above a certain threshold within any of these regions are likely
candidates for the difference in the measured intensity between two sample groups, and
are listed to be checked for their concentration in the data set. This list includes not only
the molecules, but also in which wavenumber regions they are present. Two p-regions
are found using a minimal of 10 consecutive data points with p-values of 0.05 or lower,
one located between 1181.80 cm−1
and 1182.55 cm−1
and another from 1261.40 cm−1
to
1262.05 cm−1
. The complete set of compounds from the HITRAN and PNNL databases
is checked for average molar absorptivity above 10−4
ppm−1
m−1
in these regions. A table
with the regions and a list of molecules found this way can be seen in Appendix A.2.
The p-values between the absorbance spectra of the healthy and the asthmatic pa-
tients vary over narrow wavenumber regions, as can be seen in Figure 4.1. The red line
in Figure 4.1 indicates the upper limit of 0.05 for which p-values are selected to indicate
a significant difference in absorbance between healthy and asthmatic. The p-region from
1181.80 cm−1
to 1182.55 cm−1
can be identified using the red line. The p-values vary
over wavenumber regions that are narrow compared to the width of absorbance regions
as expected from molecules. This is explained by the mean-to-standard deviation ratio
32

of 0.77071
over the wavenumber range of ten absorbance spectra, as calculated from a
single gas sample of a healthy person. Even though the p-values are calculated between
two groups of 70 samples each, the large variations in p-values as observed in Figure 4.1
seem to indicate that either the samples within each group do not have much in common
with each other, or that the noise on the signal is too large to make out significant differ-
ences between the two groups. Hence it seems that the p-values might not be accurate
indicators of subtle differences in concentrations of molecules between the healthy and
asthmatic groups when using measurements with a large uncertainty. Regardless of this,
relevant candidate molecules have been found using this method. An example is Freon-
114 which is identified as being present in both p-regions and is used as a propellant in
metered-dose inhalers, which are the most commonly used delivery systems for treating
asthma[19].
1,172 1,174 1,176 1,178 1,180 1,182
0
0.2
0.4
0.6
0.8
1
Wavenumber (cm-1
)
P-value
P-values of absorbance spectra of 70 healthy vs. 70 asthmatic patients
P-values of healthy vs. asthmatic
P-region selection threshold
Figure 4.1: P-values as calculated between 70 absorbance spectra of healthy patients and
70 absorbance spectra of asthmatic patients.
1
The mean-to-standard deviation ratio over the entire spectrum of one sample is 0.7707. It is worth
mentioning that the mean-to-standard deviation ratio in the regions with high absorbance of CO2
and H2O is lower. The mean-to-standard deviation ratio can also vary considerably over the different
measurements.
33

4.1.3 Selection based on measurable concentration
The list of molecules can be shortened by checking their absorbances. Since molecules
with a very low absorbance in the wavenumber region of interest will not be accurately
estimated at the current noise level, these molecules can be eliminated. It is known from
Z. Hou[3, p. 71] that the sensitivity of the system is in the order of 1 ppmv, and from A.
Reyes Reyes[5] that the sensitivity can even reach in the order of hundreds of ppbv. This
sensitivity is then used to establish a lower limit on the concentration for all molecules.
In this case the chosen limit is even lower at 0.001 ppmv to make sure no molecules are
eliminated unnecessarily. Absorbances are calculated for all molecules from this limit in
concentration and their molar absorptivities.
From an initial concentration estimation using only the standard molecules, a lower
limit for the absorbance is set to 50 · 10−7
. For all molecules the absorbance calcu-
lated using the lower limit of concentration is then checked against this lower limit of
absorbance. Molecules with absorbance that does not reach above the lower limit of
absorbance at any point are dismissed, and the other molecules can be put on the list
and further checked for their concentrations. A list of 199 molecules that are reported to
be present in breath is checked this way[17]. When the entire spectrum is checked this
way, some 130 molecules pass. Adapting this method for the wavenumber range of 1020
to 1100 cm−1
, which is the major absorptivity region for ethanol, some 69 molecules pass
as found in Appendix B.1. This list is used for determining the concentration of ethanol,
as further explained in section 4.3.
Since the absorptivity of a molecule that is present might be lower than that of an
absent molecule, there is no right choice for the absorbance and concentration limit,
and the selection based on these limits ends up being a trade-off. The limits can be
chosen stricter with the risk of excluding molecules that are present in the gas and
thereby decreasing the accuracy of the determined concentration, but with the benefit of
excluding molecules that are not actually present in the gas, and thereby increasing the
accuracy of the concentration determined.
Since the method for finding differences in concentration between the healthy and
asthmatic group filters on minimum absorbance and more specific wavenumber ranges,
it is stricter and therefore preferred when possible.
4.2 Concentration determination
4.2.1 method
Henceforth only the list of molecules found in section 4.1 are used to determine the
concentrations. At this stage the strictness of the list can be adjusted by only allowing
molecules present in a certain amount or more of the previously determined regions of low
p-values to be processed further. In order to process the molecules further and compare
them to the measured data, the information of the molecules from the databases is first
transcribed to match the format of the measured data, i.e. in wavenumber range and
resolution. This comes down to removing all the information outside of the relevant
wavelength region, and matching the wavenumber spacing as used in the measured data
34

through interpolation. Furthermore the measured data and molecules need to be in
the same physical quantity, for which in this case absorbance is the most useful. The
aggregated absorbance of the gas is determined by the terms of Equation 2.2 and for a
single molecule by Equation 4.1
A(ν, cmol) = mol(ν)cmollpath, (4.1)
where the absorbance A(ν, cmol) is determined by the concentration cmol, the optical
path length of the laser in the gas lpath, and the molar absorptivity mol. Since the
path length is measured from the setup and the molar absorptivity of the molecules is
obtained from the databases, the absorbance is left as a function of the concentration.
All calculations are done over the set wavelength region of 832 cm−1
to 1263 cm−1
as
determined by the range of QCL. The concentrations are determined through what is
essentially a curve fitting problem as the absorbances of the molecules can be laid over
the measured absorbance and then best fitted to match by tweaking the concentrations of
the molecules. This curve-fitting problem can be expressed as a non-linear least squares
problem as in Equation 4.2:
r(ν, C) = Aspectrum(ν) −
n
i=1
i(ν)cilpath (4.2)
where the terms on the right hand side from left to right are the measured spectrum
Aspectrum(ν) and the sum is the spectrum to be fitted to it. Now take
S(C) =
1263cm−1
ν=832cm−1
r(ν, C)2
, (4.3)
where r is a residue quantity, and S is the quantity to be minimized. The spacing for the
sum is 0.05 cm−1
. The square of the residue quantity r is taken so that larger residues per
wavenumber influence S more heavily, and are therefore avoided by the minimization al-
gorithm. This non-linear least squares problem is solved using the Levenberg–Marquardt
algorithm[20] which tweaks the concentrations from a given set of initial concentrations.
4.2.2 Input and results
The method described in the previous subsection is applied on the molecules found with
the p-region analysis, where all molecules with presence in one or more regions are checked
for their concentration with the measured absorbance. The initial concentration of Ta-
ble 4.2 are used.
Table 4.2: Table of initial concentrations used to calculate the concentrations as given in
Appendix A.3.
Molecule Concentration (ppmv)
Carbon dioxide 1000
Water 1000
Standard molecules 1
Other molecules 0.01
35

The Levenberg-Marquardt algorithm is implemented through the lsqnonlin function in
MATLAB, with its options argument as: options = optimset(’Display’,’oﬀ’,’TolFun’,1e-
15).
The absorbances of six molecules with some of the highest concentrations are plotted
over the measured absorbance in Figure 4.2. These are the molecules benzene, carbon
dioxide, ethane, ethyl alcohol, ethylene, and water, as also reported to be present in
breath by B. de Lacy Costello[17]. Concentrations found for all molecules for this sample
are given at Appendix A.3.
800 850 900 950 1,000 1,050 1,100 1,150 1,200 1,250 1,300
−1
−0.5
0
0.5
1
1.5
·10−1
Wavenumber (cm-1
)
Absorbance(base-10)
Measured absorbance spectrum overlaid with molecule absorbances
Absorbance
Benzene
Carbon dioxide
Ethane
Ethyl alcohol
Ethylene
Water
Figure 4.2: Measured absorbance of breath of a healthy patient. The measured ab-
sorbance is overlaid with absorbances of molecules with some of the highest concentra-
tions.
4.3 Ethanol estimation by CO2 removal
4.3.1 Motivation for CO2 removal
As discussed in subsection 3.2.2, the peaks of the measured absorbance and the peaks
of the absorbance of CO2 from the calculated concentrations do not match in height as
36

seen in Figure 4.3.
1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083 1,084
0
0.5
1
1.5
2
2.5
·10−1
Wavenumber (cm-1
)
Absorbance(base-10)
Closer look at the measured absorbance and calculated CO2 and ethanol
Experimental data
Carbon dioxide
Ethyl alcohol
Water
Figure 4.3: Measured absorbance of the breath of a healthy patient. The absorbance is
overlaid with the spectra of water, carbon dioxide, and ethyl alcohol. This figure shows
how the measured carbon dioxide peaks are far from the calculated ones.
This difference in height of the CO2 peaks and the measured peaks also affects the
estimation of concentration for the other major molecule present in this region, namely
ethanol. Hence it could prove useful to remove CO2 and its uncertainties in order to
get a better estimate of the ethanol concentration, much in the same way as A. Reyes
Reyes[21] did for acetone.
4.3.2 Implementation of CO2 removal for ethanol estimation
The method described in section 4.2 is used to determine the concentrations of the
molecules found using subsection 4.1.1 and subsection 4.1.3. The calculated absorbance
contributions of all molecules except for that of CO2 are subtracted from the measured
absorbance, as seen in Figure 4.4b. This is done to single out the absorbance contribution
of CO2 as best as possible, so it can more easily be removed. All that is left is noise around
zero absorbance and the absorbance contribution of CO2. The peaks from CO2 are now
37

identified using a peak detection algorithm, and then fitted by normal distributions. This
way the actual measured CO2 peaks are fitted including any deviations they have from
the CO2 spectrum according to the database. The fits are then subtracted from the
signal, and all that is left is the difference between the measured absorbance and the
absorbance contributions of all molecules as calculated using section 4.2, which is mainly
noise around zero absorbance as seen in Figure 4.4c. The contributions of all molecules
other than CO2 are now summed back on this in order to get the full measurement back
minus any absorbance by CO2, as shown in Figure 4.4d. From this absorbance spectrum
the concentrations are once again determined using the method described in section 4.2.
The newly calculated ethanol concentration is 2% lower than that calculated before
the use of this method. Since this method of ethanol estimation has not been tested on
any samples for which the ethanol concentration is known before hand, not much can be
said on the accuracy of this method. The same method is however applied successfully
by removing water peaks to determine the acetone concentration in the region 1150 cm−1
to 1250 cm−1
[21, 22].
38

Step by step removal of CO2 peaks
1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100
0
1
2
·10−1
Absorbance(base-10)
Experimental data
(a)
1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100
0
1
2
·10−1
Absorbance(base-10)
Experimental data minus database molecules, with CO2
(b)
1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100
−0.4
−0.2
0
0.2
0.4
0.6
·10−1
Absorbance(base-10)
Experimental data minus database molecules, removed CO2
(c)
1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100
−0.4
−0.2
0
0.2
0.4
0.6
·10−1
Wavenumber
Absorbance(base-10)
Experimental data with database molecules, removed CO2
(d)
Figure 4.4: Step by step removal of the CO2 peaks.
39

5. Disease prediction by machine
learning
For the breath samples measured, an analytical study based on p-values was performed
to predict and to categorize the asthmatic and healthy samples[18]. This study did
not get any definitive results for prediction. Multiple articles in literature propose that
methods more complicated than just using the p-values could give a better results[23][24].
Finding the molecules which differentiate between a healthy and non-healthy subject,
and predicting diseases of a patient are both mentioned as possibilities. Since neither of
these sources applies machine learning to actually predict disease, and the application of
machine learning performed on breath data is not widespread[17], an attempt to do so
is made. The analysis is performed on 70 absorbance spectra from healthy patients, and
70 absorbance spectra from asthmatic patients.
5.1 Machine learning for classification
Algorithms performing classification by machine learning look for patterns in the data of
the different samples. Because for the current set of samples it is known whether they
originate from healthy or asthmatic patients, supervised learning methods are applied.
This means the machine learning algorithm is trained to recognize patterns of healthy
and asthmatic samples by feeding it a part of all samples. In this case 98 of 140 samples
are used to train the algorithms and the other 42 to test the accuracy of the algorithm.
5.2 Classification by trial-and-error
Since the scope of this project does not allow for a study of the details of machine learn-
ing and the construction of an algorithm specifically made for these datasets, existing
algorithms are used. Based on an extensive study of what kind of algorithms are most
useful in the field of breath analysis, the focus initially lies on support vector machines
(SVMs)[23]. Use is made of the website mlcomp.org, which was made specifically for up-
loading datasets, uploading machine learning algorithms, and running these algorithms
on the datasets[25]. This website expresses the rate of success in a single number, the
error, which is the amount of falsely classified samples divided by the total amount of
samples. Since most algorithms run in less than 5 minutes and they can run simul-
taneously, many different algorithms can be tested in a short amount of time. Aside
40

from the complete set of absorbances of 140 samples over the wavelength range of 832
to 1263 cm−1
, various subsets of this are tested as well. Among these are the set of 36
wavenumbers for which the p-values are below 0.005, the set of concentrations of 167
compounds as derived from the 140 absorbances, and the set of 30 compounds with the
highest concentrations as derived from the 140 absorbances.
5.2.1 Basic explanation of some algorithms used
The basic concept of some of the better working algorithms as seen in Table 5.1 is
explained here. Commonly, these algorithms reduce each spectrum to a set of features
which are then used to represent the various spectra as points in a feature space. The
entire space is then divided into volumes with borders based on the geometrical distances
between the training spectra: these are the classifiers. The test spectra are then portrayed
as points in this space in the same way as the training spectra and classified according
to which volume they inhabit. The way the spectra are portrayed as points in a space,
and the way the classifiers are built up from these points is different for each algorithm.
Method 1 in Table 5.1, the Normalized Gaussian radial basis function network, uses
at its core a method of k-means clustering. k-means clustering aims to partition the
absorbance spectra into k clusters in which each spectrum belongs to the cluster with
the nearest mean.
Possibly the simplest classification, the nearest neighbour algorithms of methods 2
and 3 in Table 5.1 classify the test spectra based on the closest training spectra in the
feature space.
Method 4, 5, and 6 in Table 5.1 are support vector machines. They represent the
spectra as points in space, mapped so that the spectra of the separate categories are
divided by a clear gap that is as wide as possible. New spectra are then mapped into
that same space and predicted to belong to a category based on which side of the gap
they fall on.
Table 5.1: Best error rates for eight algorithms out of 41 run on the entire dataset of
absorbances of 70 healthy and 70 asthmatic patients. Errors have possible values between
0 and 1. Names of algorithms are as stated on the website[25].
Method Description of algorithm Name of algorithm Error
1 Normalized Gaussian radial basis
function network
RBFNetwork weka nominal 0.167
2 K-nearest neighbours classifier IBk weka nominal 0.190
3 Nearest neighbours classifier IB1 weka nominal 0.190
4 Linear support vector machine for
multiclass classification
svmlight multiclass-linear 0.190
5 Support vector machine for multi-
class classification
svmlight multiclass 0.190
6 Support vector machine for binary
classification
svmlight-linear 0.214
41

5.3 Results
Results over the various datasets seem to generally be best for algorithms such as nearest
neighbor and support vector machines. A trend is also seen in that the larger data sets
get less miss-classifications. The smallest errors are found for the unmodified data set of
all absorbances over the entire wavelength range, which are given in Table 5.1.
The similarity of the errors from Table 5.1 can be explained by the small test set
of 42 samples. The various nearest neighbour algorithms function quite similarly, and
the same can be said for the support vector machines. As the results from Table 5.1
are the best from a total of 41 different algorithms, they corroborate the use of SVMs
as adviced by the study on machine learning algorithms for breath analysis[23]. The
best algorithm is the Normalized Gaussian radial basis function network with an error of
0.167, or 7 miss-classified samples out of the 42 test samples. The exact reason why this
algorithm performs better than the others has not been determined. The errors stated
are the combined errors of healthy samples falsely classified as asthmatic and the other
way around. By running the algorithms outside of mlcomp.org the separate errors of
both types of miss-classification can likely be obtained, which increases the knowledge of
the accuracy of a certain type of classification.
Without in-depth knowledge of the algorithms and further testing, there is no guar-
antee that they will also work on further datasets. The importance of these preliminary
results lies in the clear quantitative results expressed as the error. Since with little knowl-
edge a success rate of 83.3% is obtained, further research in machine learning applied for
breath classification could yield more successful algorithms that can be applied with the
current QCL spectroscopy setup in medical diagnosis.
42

6. Conclusions and outlook
Here the main problems from the previous chapters and their solutions are discussed.
Additionally, improvements for further research are made.
6.1 QCL setup
The intensity fluctuations in the light emitted by the QCL caused by mode hopping are
resolved by normalisation using the monitor and main detector as described in chapter 2.
One way to improve the system and forego the need of all the calibrations described in
this thesis is to use the monitor signal to measure the actual wavenumber. A Michelson-
Morley interferometer was considered to be put in when the setup was still being built.
It was omitted since there were plans to move the setup and a Michelson-Morley inter-
ferometer would be too sensitive for vibrations. Both a diffraction spectrometer and a
cell reference can be used as well, the latter being favorable as it is the cheaper option.
The sensitivity can also be improved by increasing the delay the multi-pass cavity
applies to the beam such that both the beams going to the monitor and the main detector
can instead be measured using one detector. This improves the signal since measuring
on two separate detectors introduces two independent sources of noise, whereas a single
detector does not[26].
6.2 Data calibration
The wavenumber uncertainty caused by noise in the voltage over the piezo is resolved by
calibration done in post-processing as described in chapter 3. Despite this calibration,
the standard deviation over ten absorbances spectra from a single sample is still relatively
large.
Even after the wavenumber calibration the resulting absorbance spectrum does not
match with CO2 and H2O peaks from the databases. This is caused by the high variation
of the twenty monitor measurements of which the mean is the basis of the wavenumber
calibration. Regardless, wavenumber calibration is still useful as an intermediate calibra-
tion step, as it calibrates more details than the calibration to CO2 and H2O performed
afterwards. One cause for this variation between measurements is expected to be the
noise on the voltage applied over the piezo. Other contributions are temperature fluc-
tuations that influence the laser performance, and the diffraction grating that reflects a
small range of wavelength rather than a single wavelength back into the laser diode.
43

The height of the measured absorbance peaks does not generally match with the
heights of the CO2 peaks from the database, as seen in Figure 3.12. It can be seen that
the height of each peak is about equal between samples. This indicates that the difference
in height of the various measured absorbance peaks to the CO2 peaks is not a matter of
noise, but rather a consistent wavenumber dependent influence from a part of the setup,
or possibly a molecule in breath. A source for these mismatching absorbances has not
yet been pinpointed.
6.3 Determination of molecules
A general disadvantage of choosing molecules based on their molar absorptivity is that
some molecules might have a large impact on the eventual absorbance due to a high con-
centration, despite having a low absorptivity. They would therefore be falsely eliminated
from the list of molecules that is checked.
There is also the drawback of excluding molecules which might be an important part of
the absorbance, but are not selected since they do not have adequate molar absorptivity in
the p-regions. This makes the subsequent determination of concentrations less accurate.
It is possible to allow for more p-regions, either by decreasing the minimum amount of
consecutive points needed or heightening the maximum p-values of which these regions
consist. This way more molecules will have some absorptivity in one of the regions and
therefore will not be scrapped.
Both problems have to do with selecting molecules based on intensity, and they can
be resolved in advance by putting molecules that play an important role in the eventual
absorbance in the standard list of molecules. This requires a priori knowledge of the
important molecules which is obtained from literature[17] and simply by performing
the least squares method to get the concentration on a large list of molecules. The
concentrations will not be as accurate for a large list, but all molecules that contribute
considerably to the spectrum can be picked out for later runs with strict molecule lists.
6.4 Determination of concentrations
From the shapes of CO2 and H2O in Figure 4.2 it can be seen that the concentrations
calculated using the lsqnonlin function are in the right order of magnitude. Since the
standard deviation over several measured absorbances of a single sample is still relatively
high even after wavenumber calibration, the calculated concentrations for the molecules
with lower concentrations are not very accurate. The average concentrations found for
several molecules over all samples are given in Figure 6.1. The difference in amount of
H2O present can be explained by a filter in the setup which catches the moisture in
breath before it enters the cavity. The large amount of ethanol measured for the samples
can be explained by the cleaning using ethanol of the mouthpieces through which the
breath samples are collected. The standard deviations portrayed are combinations of the
uncertainties from the measurements and the variation between breath of the patients.
44

CO2 H2O ACETONE NH3 CH4 ETHANOL
0.001
0.01
0.1
1
10
100
1,000
10,000
100,000
Concentration(ppmv) Average molecule concentrations over all samples
Healthy samples Asthmatic samples VSL healthy
Figure 6.1: Average concentrations of molecules for healthy and asthmatic children com-
pared to VSL data of healthy adults.
6.5 Categorizing health status by classification
An algorithm with a classification success rate of 83.3% is obtained. Further research in
machine learning applied for breath classification could yield more successful algorithms
that can be applied with the current QCL spectroscopy setup in medical diagnosis.
The accuracy of the classification can be improved by looking more closely into the
used algorithms, and improving them using a-priori knowledge of the samples such as
regions of relatively high signal-to-noise ratio. Wavenumber regions can be given weights
based on the signal-to-noise ratios so the algorithm depends more heavily on these accu-
rate regions.
The error rate could also tell a lot more if the amount of false positives and false
negatives is given. This would give separate error rates based on whether the algorithm
judges the sample as healthy or asthmatic. It is possible that one of these error rates is
much lower than the other, thereby increasing the certainty of the result above the total
error rate.
45

Bibliography
[1] I. Horvath and J.C. de Jongste. Exhaled Biomarkers. European Respiratory Society,
2010.
[2] Erik Swietlicki, Sanjiv Puri, Hans-Christen Hansson, and Hans Edner. Urban air
pollution source apportionment using a combination of aerosol and gas monitoring
techniques. Atmospheric Environment, 30(15):2795–2809, 1996.
[3] Zhe Hou. Breath analysis using infrared spectroscopy. August 2013.
[4] Ellen Holthoﬀ, John Bender, Paul Pellegrino, and Almon Fisher. Quantum cascade
laser-based photoacoustic spectroscopy for trace vapor detection and molecular dis-
crimination. Sensors, 10(3):1986–2002, 2010.
[5] A Reyes-Reyes, Z Hou, E Van Mastrigt, R C Horsten, J C De Jongste, M W Pi-
jnenburg, H P Urbach, and N Bhattacharya. Multicomponent gas analysis using
broadband quantum cascade laser spectroscopy. Optics express, 2014.
[6] DF Swinehart. The beer-lambert law. Journal of chemical education, 39(7):333,
1962.
[7] Silvia E Braslavsky. Glossary of terms used in photochemistry, (iupac recommenda-
tions 2006). Pure and Applied Chemistry, 79(3):293–465, 2007.
[8] Northwest infrared https://secure2.pnl.gov/nsd/nsd.nsf/welcome.
[9] Hitran database http://www.cfa.harvard.edu/hitran/.
[10] Block engineering laserscope for gas analysis.
[11] Jerome Faist, Federico Capasso, Deborah L Sivco, Carlo Sirtori, Albert L Hutchin-
son, and Alfred Y Cho. Quantum cascade laser. Science, 264(5158):553–556, 1994.
[12] J Barry McManus, Paul L Kebabian, and MS Zahniser. Astigmatic mirror multipass
absorption cells for long-path-length spectroscopy. Applied Optics, 34(18):3336–
3348, 1995.
[13] Mct ir detector module.
[14] Pvi-4te detector speciﬁcations.
46

[15] Matlab peakfinder code http://www.mathworks.com/matlabcentral/fileexchange/25500-
peakfinder.
[16] Atmosphere of earth - http://en.wikipedia.org/wiki/atmosphere of earth.
[17] B de Lacy Costello, A Amann, H Al-Kateb, C Flynn, W Filipiak, T Khalid, D Os-
borne, and N M Ratcliffe. A review of the volatiles from the healthy human body.
Journal of Breath Research, 8(1):014001, 2014.
[18] Esther Van Mastrigt, Adonis Reyes-Reyes, Karl Brand, Nandini Bhattacharya, H.P.
Urbach, Andrew Stubbs, Johan C. De Jongste, and Marielle W. Pijnenburg. Ex-
haled Breath Profiling In Children With Asthma And Cystic Fibrosis: A Laser
Spectroscopy-Based Method Focused On Hydrocarbons, pages A5639–A5639–. Amer-
ican Thoracic Society, May 2014.
[19] Chet L. Leach. Approaches and challenges to use freon propellant replacements.
Aerosol Science and Technology, 22(4):328–334, 1995.
[20] JorgeJ. Moré. The levenberg-marquardt algorithm: Implementation and theory. In
G.A. Watson, editor, Numerical Analysis, volume 630 of Lecture Notes in Mathe-
matics, pages 105–116. Springer Berlin Heidelberg, 1978.
[21] Adonis Reyes-Reyes, Roland C. Horsten, H. Paul Urbach, and Nandini Bhat-
tacharya. Study of the exhaled acetone in type 1 diabetes using quantum cascade
laser spectroscopy. Analytical Chemistry, 87(1):507–512, 2015. PMID: 25506743.
[22] Florian Adler, Piotr Maslowski, Aleksandra Foltynowicz, Kevin C. Cossel, Travis C.
Briles, Ingmar Hartl, and Jun Ye. Mid-infrared fourier transform spectroscopy with
a broadband frequency comb. Opt. Express, 18(21):21861–21872, Oct 2010.
[23] A Smolinska, A-Ch Hauschild, R R R Fijten, J W Dallinga, J Baumbach, and F J
van Schooten. Current breathomics—a review on data pre-processing techniques
and machine learning in metabolomics breath analysis. Journal of Breath Research,
April 2014.
[24] J. D. Malley, A. Dasgupta, and J. H. Moore. The limits of p-values for biological
data mining. BioData Min, 6(1):10, 2013.
[25] Mlcomp - http://mlcomp.org/.
[26] D.D. Nelson, J.H. Shorter, J.B. McManus, and M.S. Zahniser. Sub-part-per-billion
detection of nitric oxide in air using a thermoelectrically cooled mid-infrared quan-
tum cascade laser spectrometer. Applied Physics B, 75(2-3):343–350, 2002.
47

A. Results for a sample of a
healthy patient
A.1 Standard molecules
Table A.1: List of standard molecules used.
Molecule
Acetic acid
Acetone
Ammonia anhydrous
Benzene
Carbon dioxide
Carbon monoxide
Ethane
Ethyl alcohol
Ethylene
Formaldehyde monomer
Methane
Pentane
Propane
Water
A.2 Molecules found by p-region analysis
Table A.2: Table of molecules that are present in either or both of the p-regions. The
p-regions were found using a maximum p-value of 05 for at least 10 consecutive data
points or 0.5 cm−1
. These molecules have a mean absorptivity in this region of at least
10−4
.
1181.8-1182.55 1261.4-1262.05
1-Chloro-1 1 2 2-tetraﬂuoroethane (R124A) 1 1
1-Hexanoic acid 1 0
1 1 1-Chlorodiﬂuoroethane (Freon-142B) 1 0
49

10−4
.
1181.8-1182.55 1261.4-1262.05
1 1 1-Trifluoroacetone 1 0
1 1 1-Trifluoroethane (R143A) 1 1
1 1 1 2 3 3 3-Heptafluoropropane (HFC227EA) 1 1
1 2-Dibomotetrafluoroethane (Freon-114B2) 1 0
1 2-Dibromoethane (EDB) 1 0
1 2-Dichloro-1-fluoroethane (R141) 1 0
1 2-Dichloro-1 1 2-trifluoroethane (F132A) 1 0
1 2-Dimethoxyethane 1 0
1 4-Dioxane 0 1
2-Chloro-1 1 1-trifluoroethane (R133A) 1 1
2-Chloropropane 0 1
2-Chlroro-1 1 1 2-tetrafluoroethane (HCFC-124) 1 0
2-Ethoxyethyl acetate 1 1
2-Methoxyethanol 1 0
2H 3H-Perfluoropentane 1 1
2 2-Dichloro-1 1 1-trifluoroethane (Freon-123) 1 1
2 2-Difluorotetrachloroethane (F112A) 1 0
2 2 2-Trifluoroethanol 1 1
2 2 4-Trimethyl-1 3-pentanediol isobutyrate (Texanol) 1 1
3-Methyl-2-pentanone 1 0
Acetic acid 1 1
Acetic anhydride 1 0
Acetid acid dimer 0 1
Acetone cyanohydrin 1 0
Aniline 0 1
Benzyl chloride 0 1
Bromochlorodifluoromethane (Freon-12B1) 1 0
Butyl acetate 0 1
Butyric acid 1 0
Carbonyl fluoride 0 1
Chloromethyl ethyl ether 1 1
Chloromethyl methyl ether 1 0
Chloropentafluoroethane (R115) 1 1
Chlorosulfonyl isocyanate (CSI) 1 0
Diborane 1 0
Dichloromethane 0 1
Diethyl ether 1 0
50

10−4
.
1181.8-1182.55 1261.4-1262.05
Diethyl sulfide 0 1
Diisopropyl ether 1 0
Diisopropylamine 1 0
Diketene 0 1
Dimethoxymethane 1 0
Dimethyl carbonate 0 1
Dimethyl ether 1 0
Dimethylcarbamoyl chloride 0 1
Dipropylene glycol methyl ether 1 1
Ethyl acetate 0 1
Ethyl acrylate 1 1
Ethyl bromide (Halon-2001) 0 1
Ethyl butyrate 1 1
Ethyl chloroformate 1 0
Ethyl formate 1 0
Ethyl methyl ether 1 0
Ethyl tert-butyl ether 0 1
Ethyl trifluoroacetate 1 1
Ethylvinyl ether (EVE) 1 0
Formic acid dimer 1 0
Freon-113 (1 1 2-Trichlorortrifluoroethane) 1 0
Freon-114 (1 2-dichlorortetrafluoroethane) 1 1
Freon-134a (1 1 1 2-tetrafluoroethane) 1 0
Freon-218 (octafluoropropane) 0 1
Freon-C318 (octafluorocyclobutane) 0 1
Furan 1 0
Glycoaldehyde 0 1
Guaiacol 1 1
Hexachloro-1 3-butadiene 1 0
Hexafluoroacetone 1 1
Hexafluoroethane (Freon-116) 0 1
Hexafluoroisobutylene 1 1
Hexafluoropropene 1 0
Hydrogen peroxide 0 1
Isobutyl acetate 0 1
Isopentyl acetate 0 1
Isopropyl acetate 0 1
51

10−4
.
1181.8-1182.55 1261.4-1262.05
Methacryloyl chloride 0 1
Methanesulfonyl chloride 1 0
Methyl acetate 0 1
Methyl acrylate 1 1
Methyl benzoate 1 1
Methyl formate 1 0
Methyl iodide 0 1
Methyl isoamyl ketone 1 0
Methyl isobutyl ketone (MIBK) 1 0
Methyl isobutyrate 1 1
Methyl methacrylate 1 0
Methyl pivalate 1 0
Methyl propionate 1 1
Methyl propyl ketone 1 0
Methyl salicylate 1 1
Methylchloroformate (MCF) 1 0
Methyldichlorodisilanes (mixed isomers) 0 1
Methylethyl ketone 1 0
Methyltrichlorosilane 0 1
Methylvinyl ketone 1 1
N N’-Dimethyl formamide (DMF) 0 1
N N-Diethylaniline 0 1
N N-diethyl formamide 0 1
Nitrogen dioxide and dinitrogen tetroxide 0 1
Octanoic acid 1 0
Paraldehyde 1 0
Pentafluoroethane (f125) 1 0
Perfluorobutane 1 1
Perfluoroisobutylene (PFIB) 1 0
Phenol 1 1
Propargyl chloride 0 1
Propionic acid (and some dimer) 1 0
Propyl acetate 0 1
Propylene glycol 0 1
Sulfuryl chloride 1 0
Sulfuryl fluoride 0 1
Tetrafluoromethane 0 1
52

10−4
.
1181.8-1182.55 1261.4-1262.05
Trichlorofluroethylene 1 0
Trifluoroacetic acid 1 1
Trifluoroacetic anhydride 1 1
Trifluoroacetyl chloride 1 1
Trifluoromethane (Freon-23) 1 0
Trifluoromethylsulfur pentafluoride 0 1
Trifluoronitrosomethane 1 1
Trimethylamine 1 0
Vinayl acetate 0 1
bis(2-Chloroethyl) ether 0 1
cis-1 3-Dichloropropene 0 1
m-Cresol 1 0
n-Amyl acetate 0 1
o-Toluidine 0 1
tert-Amyl methyl ether 1 0
A.3 Concentrations found for a single healthy pa-
tient
In order to find the concentrations, the following options argument is used for the lsqnon-
lin function in MATLAB:
options = optimset(’Display’,’off’,’TolFun’,1e-15)
Table A.3: Table of initial concentrations used to calculate the concentrations as given
in Appendix A.3.
Molecule Concentration (ppmv)
Carbon dioxide 1000
Water 1000
Standard molecules 1
Other molecules 0.01
53

Table A.4: Concentrations found using the method described in section 4.2. These are
of a single sample from a healthy patient. The standard list of molecules as given in
section A.1 and the list of molecules as given by section A.2 are used.
Molecule concentration (ppmv)
Acetic acid 2.22045e-14
Acetone 2.22045e-14
Ammonia anhydrous 2.22045e-14
Benzene 1.79671
Carbon dioxide 0.912382
Carbon monoxide 1538.28
Ethane 2.66622
Ethyl alcohol 0.946815
Ethylene 0.109119
Formaldehyde monomer 2.22045e-14
Methane 2.22045e-14
Pentane 2.22045e-14
Propane 2.22045e-14
Water 0.0988024
1-Chloro-1 1 2 2-tetrafluoroethane (R124A) 2.22045e-14
1-Hexanoic acid 2.22045e-14
1 1 1-Chlorodifluoroethane (Freon-142B) 0.00478773
1 1 1-Trifluoroacetone 2.22045e-14
1 1 1-Trifluoroethane (R143A) 2.22045e-14
1 1 1 2 3 3 3-Heptafluoropropane (HFC227EA) 2.22045e-14
1 2-Dibomotetrafluoroethane (Freon-114B2) 2.22045e-14
1 2-Dibromoethane (EDB) 2.22045e-14
1 2-Dichloro-1-fluoroethane (R141) 2.22045e-14
1 2-Dichloro-1 1 2-trifluoroethane (F132A) 2.22045e-14
1 2-Dimethoxyethane 2.22045e-14
1 4-Dioxane 2.22045e-14
2-Chloro-1 1 1-trifluoroethane (R133A) 2.22045e-14
2-Chloropropane 2.22045e-14
2-Chlroro-1 1 1 2-tetrafluoroethane (HCFC-124) 2.22045e-14
2-Ethoxyethyl acetate 2.22045e-14
2-Methoxyethanol 2.30625e-14
2H 3H-Perfluoropentane 2.22045e-14
2 2-Dichloro-1 1 1-trifluoroethane (Freon-123) 2.22045e-14
2 2-Difluorotetrachloroethane (F112A) 0.0104979
2 2 2-Trifluoroethanol 2.22045e-14
2 2 4-Trimethyl-1 3-pentanediol isobutyrate (Texanol) 2.22045e-14
3-Methyl-2-pentanone 2.22045e-14
Acetic acid 2.22045e-14
54

Acetic anhydride 2.22045e-14
Acetid acid dimer 2.22045e-14
Acetone cyanohydrin 2.22045e-14
Aniline 0.692194
Benzyl chloride 2.22045e-14
Bromochlorodifluoromethane (Freon-12B1) 2.22045e-14
Butyl acetate 2.22045e-14
Butyric acid 2.22045e-14
Carbonyl fluoride 2.22045e-14
Chloromethyl ethyl ether 2.22045e-14
Chloromethyl methyl ether 2.22045e-14
Chloropentafluoroethane (R115) 0.0115797
Chlorosulfonyl isocyanate (CSI) 2.22045e-14
Diborane 0.114448
Dichloromethane 2.22045e-14
Diethyl ether 0.0152475
Diethyl sulfide 2.22045e-14
Diisopropyl ether 2.22045e-14
Diisopropylamine 2.22045e-14
Diketene 2.22045e-14
Dimethoxymethane 2.22045e-14
Dimethyl carbonate 2.22045e-14
Dimethyl ether 2.22045e-14
Dimethylcarbamoyl chloride 2.22045e-14
Dipropylene glycol methyl ether 2.22045e-14
Ethyl acetate 2.22045e-14
Ethyl acrylate 0.039846
Ethyl bromide (Halon-2001) 2.22046e-14
Ethyl butyrate 2.22045e-14
Ethyl chloroformate 2.22045e-14
Ethyl formate 2.22045e-14
Ethyl methyl ether 2.22045e-14
Ethyl tert-butyl ether 0.195303
Ethyl trifluoroacetate 2.22045e-14
Ethylvinyl ether (EVE) 2.22045e-14
Formic acid dimer 2.22045e-14
Freon-113 (1 1 2-Trichlorortrifluoroethane) 2.22045e-14
Freon-114 (1 2-dichlorortetrafluoroethane) 2.22045e-14
55

Freon-134a (1 1 1 2-tetrafluoroethane) 2.22045e-14
Freon-218 (octafluoropropane) 2.22045e-14
Freon-C318 (octafluorocyclobutane) 2.22045e-14
Furan 2.22045e-14
Glycoaldehyde 2.22045e-14
Guaiacol 2.22045e-14
Hexachloro-1 3-butadiene 2.22045e-14
Hexafluoroacetone 2.22045e-14
Hexafluoroethane (Freon-116) 2.22045e-14
Hexafluoroisobutylene 2.22045e-14
Hexafluoropropene 2.22045e-14
Hydrogen peroxide 2.22045e-14
Isobutyl acetate 2.22045e-14
Isopentyl acetate 2.22045e-14
Isopropyl acetate 2.22045e-14
Methacryloyl chloride 0.101246
Methanesulfonyl chloride 0.0190093
Methyl acetate 2.22045e-14
Methyl acrylate 2.22045e-14
Methyl benzoate 2.22045e-14
Methyl formate 2.22045e-14
Methyl iodide 2.22045e-14
Methyl isoamyl ketone 2.22045e-14
Methyl isobutyl ketone (MIBK) 2.22045e-14
Methyl isobutyrate 2.22045e-14
Methyl methacrylate 2.22045e-14
Methyl pivalate 0.0277697
Methyl propionate 2.22045e-14
Methyl propyl ketone 2.22045e-14
Methyl salicylate 2.22045e-14
Methylchloroformate (MCF) 2.22046e-14
Methyldichlorodisilanes (mixed isomers) 2.22045e-14
Methylethyl ketone 2.22045e-14
Methyltrichlorosilane 2.22046e-14
Methylvinyl ketone 2.22045e-14
N N’-Dimethyl formamide (DMF) 2.22045e-14
N N-Diethylaniline 2.22045e-14
N N-diethyl formamide 2.22045e-14
56

Nitrogen dioxide and dinitrogen tetroxide 2.22045e-14
Octanoic acid 2.22045e-14
Paraldehyde 2.22045e-14
Pentafluoroethane (f125) 2.22045e-14
Perfluorobutane 2.22045e-14
Perfluoroisobutylene (PFIB) 0.000842043
Phenol 2.22045e-14
Propargyl chloride 0.250046
Propionic acid (and some dimer) 2.22045e-14
Propyl acetate 2.22045e-14
Propylene glycol 2.22045e-14
Sulfuryl chloride 2.22045e-14
Sulfuryl fluoride 2.22046e-14
Tetrafluoromethane 2.22045e-14
Trichlorofluroethylene 2.22045e-14
Trifluoroacetic acid 2.22045e-14
Trifluoroacetic anhydride 0.00520583
Trifluoroacetyl chloride 2.22045e-14
Trifluoromethane (Freon-23) 2.22045e-14
Trifluoromethylsulfur pentafluoride 2.22045e-14
Trifluoronitrosomethane 2.22045e-14
Trimethylamine 2.22045e-14
Vinayl acetate 2.22045e-14
bis(2-Chloroethyl) ether 2.22045e-14
cis-1 3-Dichloropropene 2.22045e-14
m-Cresol 2.22045e-14
n-Amyl acetate 2.22045e-14
o-Toluidine 2.22045e-14
tert-Amyl methyl ether 2.22045e-14
57

B. List of molecules
B.1 Breath molecules with absorbance in the 1020
to 1100 cm−1
region.
Table B.1: Molecules as ﬁltered using subsection 4.1.3 in the 1020 to 1100 cm−1
region
on 199 molecules that should be present in breath[17] .
Molecules
1-Hexanoic acid
1-Pentanol (n-amyl alcohol)
1-Propanol
1 2-Diclorobenzene
1 3-Dichlorobenzene
1 4-Dichlorobenzene
1 4-Dioxane
2-Butoxyethanol
2-Ethoxyethyl acetate
2-Ethyl-1-hexanol
2-Methoxyethanol
2-methyl-1-propanol (IBA isobutanol)
2-Methyl-2-propanal (Isobutenal)
2-Methylfuran
2 3-Butanedione
3-methylfuran
Acetic acid
Acetic anhydride
Acetol
Allene (1 2-propadiene)
Allyl alcohol
Ammonia anhydrous
Benzene
Benzyl alcohol
Butyl acetate
Butyric acid
58

region
Molecules
Cadaverine
Chlorobenzene
Cyclopentene
Cyclopropane
Diethyl ether
Dimethoxymethane
Dimethyl ether
Dimethyl sulfoxide
Dipropylene glycol methyl ether
Ethyl acetate
Ethyl acrylate
Ethyl alcohol
Ethyl butyrate
Ethyl tert-butyl ether
Ethylene
Ethylvinyl ether (EVE)
Furan
Furfural
Furfuryl alcohol
Glycoaldehyde
Guaiacol
Isobutyl acetate
Isopentyl acetate
Isopropyl acetate
Isopropyl alcohol
Methyl acetate
Methyl alcohol
Methyl methacrylate
Methyl propionate
Methylamine
n-Butyl alcohol
N N’-Dimethyl formamide (DMF)
N N-diethyl formamide
Octanoic acid
Propionic acid (and some dimer)
Propylene glycol
Propylene sulﬁde
Pyridine
sec-Butyl alcohol
59

region
Molecules
tert-Butyl methyl ether
Thiophene
trans-Crotonaldehyde
Trimethylamine
60

Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Similar to Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra (20)

Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra