SlideShare a Scribd company logo
1 of 61
Download to read offline
Applied Physics Thesis
Delft University of Technology
Breath analysis
by quantum cascade laser
spectroscopy
Author:
Olav Grouwstra
Direct supervisor:
A. Reyes Reyes
Supervisor:
N. Bhattacharya
Professor:
H.P. Urbach
March 5, 2015
Contents
1 Introduction 3
1.1 Gas analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Quantum cascade laser spectroscopy . . . . . . . . . . . . . . . . . . . . 3
1.3 Previous work on the project . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Goals and thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Quantum Cascade Laser Spectroscopy 5
2.1 Absorbance spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 The quantum cascade laser . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Multipass cavity . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Distance meter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.4 Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Data Calibration 13
3.1 Wavenumber calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Four measurement signals . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Data acquisition and pre-processing . . . . . . . . . . . . . . . . . 15
3.1.3 Need for additional calibration . . . . . . . . . . . . . . . . . . . . 16
3.1.4 Wavenumber calibration . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 CO2 and H2O calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 Method for calibration to CO2 and H2O . . . . . . . . . . . . . . 26
3.2.2 CO2 and H2O calibration results . . . . . . . . . . . . . . . . . . 27
4 Molecules and Concentrations 30
4.1 List of molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.1 Standard molecules . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.2 Molecules selected to distinguish between data sets . . . . . . . . 31
4.1.3 Selection based on measurable concentration . . . . . . . . . . . . 34
4.2 Concentration determination . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.1 method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.2 Input and results . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Ethanol estimation by CO2 removal . . . . . . . . . . . . . . . . . . . . . 36
4.3.1 Motivation for CO2 removal . . . . . . . . . . . . . . . . . . . . . 36
4.3.2 Implementation of CO2 removal for ethanol estimation . . . . . . 37
1
5 Disease prediction by machine learning 40
5.1 Machine learning for classification . . . . . . . . . . . . . . . . . . . . . . 40
5.2 Classification by trial-and-error . . . . . . . . . . . . . . . . . . . . . . . 40
5.2.1 Basic explanation of some algorithms used . . . . . . . . . . . . . 41
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6 Conclusions and outlook 43
6.1 QCL setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Data calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.3 Determination of molecules . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.4 Determination of concentrations . . . . . . . . . . . . . . . . . . . . . . . 44
6.5 Categorizing health status by classification . . . . . . . . . . . . . . . . . 45
Appendices 48
A Results for a sample of a healthy patient 49
A.1 Standard molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A.2 Molecules found by p-region analysis . . . . . . . . . . . . . . . . . . . . 49
A.3 Concentrations found for a single healthy patient . . . . . . . . . . . . . 53
B List of molecules 58
B.1 Breath molecules with absorbance in the 1020 to 1100 cm−1
region. . . . 58
2
1. Introduction
1.1 Gas analysis
Gas analysis is done to find the different molecules present in a gas. Identifying the
composition of a substance is vital for various practical applications such as medical
diagnostics and environmental monitoring[1][2]. In this work the focus lies on gas analysis
applied to medical diagnosis.
The diagnosis of diseases is helped by many different techniques such as blood tests or
radiological studies like magnetic resonance imaging (MRI), as they give an idea of what is
inside the human body. Blood tests can provide this knowledge since changes in metabolic
processes significantly influence the compounds in human blood. For example, diabetes
increases the amount of acetone in the bloodstream. Hence knowing the concentration
of acetone could help diagnose diabetes. The compounds in blood diffuse through the
lungs into the breath we exhale, and diagnosis can therefore also be done by analysis of
human breath.
Breath analysis as opposed to blood tests has the benefit of being non intrusive,
thereby being much more easily implemented by medical professionals. Non intrusive
medical instruments also undergo less strict control by regulatory parties like the Food
and Drug Administration (FDA) and the European Medicines Agency (EMA). While
diagnosis through imaging techniques might give a better idea about the origin of a
disease, it does not give information on the influence of a disease on the metabolic
processes.
1.2 Quantum cascade laser spectroscopy
Quantum cascade laser (QCL) spectroscopy is only one among a variety of spectroscopy
methods, each of which has its pros and cons.
Currently one of the most common techniques used for gas analysis is gas chromatog-
raphy – mass spectrometry (GC-MS). Although GC-MS achieves high accuracy, it has
certain limitations. GC-MS is a specific test, meaning it requires a priori knowledge of
what to test for[3]. This means it is limited in use to only cases where the symptoms are
already present, and thus requires pre-diagnosis. GC-MS also requires sample prepara-
tion, limiting its use to trained professionals. A gas analysis method using a quantum
cascade laser for spectroscopy is treated in this work. This form of analysis does not need
sample preparation and is therefore more easily accessible. A priori knowledge of what
3
molecules to test for is also not necessary for the measurement itself, which makes it a
promising tool for laying statistical relations between concentrations of molecules and
the health status of individuals.
Photoacoustic spectroscopy (PAS) is another useful technique. It works, like QCL
spectroscopy, by shining light on a sample. However in PAS it is the absorbed light that
creates the signal by thermally expanding the sample, thereby creating a sound wave
which can be measured. As in the case of QCL spectroscopy, PAS requires no sample
preparation. But PAS requires the signal to go through more energy conversions, thereby
making it susceptible to vibrational noise[4].
Electronic noses are also used for the detection of gasses. These are however made
specifically to detect specific molecules, as opposed to broadband QCL spectroscopy
which can detect any molecule with an absorbance spectrum overlapping with the em-
mission spectrum of the QCL.
1.3 Previous work on the project
A spectroscopy setup using a quantum cascade laser (QCL) is described and the subse-
quent data processing is extensively treated. The setup is made and extensively treated
by A. Reyes Reyes as his PhD project funded by Fundamenteel Onderzoek Materie
(FOM) foundation[5]. Z. Hou developed the data acquisition under supervision of A.
Reyes Reyes[3]. Measurements processed in this research are from breath samples of
35 healthy volunteers, and 35 asthmatic volunteers. Two different breath samples are
used from each volunteer. These samples are from children from the Sophia Children’s
hospital in Rotterdam, all with their parents’ informed consent. From the absorbance
spectra obtained from these samples a uncertainty in the wavenumbers is found. In this
work calibration is done through post-processing in order to reduce this uncertainty.
1.4 Goals and thesis structure
The goal of this work is twofold. Firstly to decrease the uncertainties with which the
compounds and its concentrations are determined. The second goal is to find a reli-
able way to distinguish and identify breath of healthy and asthmatic children from the
compounds present and their concentrations.
This thesis works towards these goals by first explaining the general working principle
of the setup and theory behind QCL spectroscopy, where the parts having the most influ-
ence on the noise and resolution of the signal are highlighted. The absorbance measured
spectrum and its calibration are then treated. This is followed by an explanation of how
the present species and their concentrations of various molecules are found. Finally a
way of distinguishing between samples of healthy and asthmatic patients is treated, and
the overal results are discussed.
4
2. Quantum Cascade Laser
Spectroscopy
In this chapter the relation between the intensity of the laser light, the absorbance of
the gas, and the concentrations of the molecules in the gas is explained. The necessary
parameters to measure and the equipment that is used to measure these parameters are
explained as well. Measurements on the gas are performed with a spectroscopy setup
using a quantum cascade laser (QCL). The various components and conditions of the
measurement setup that influence the determination of the gas compounds and con-
centrations are explored. In order to understand how these components influence the
eventual concentration, the path of light through the components is described starting
from the quantum cascade laser and ending at the mercury cadmium telluride (HgCdTe)
detectors. Getting the absorbance spectrum from a breath sample takes around 60 min-
utes, 5 minutes of which is comprised by the actual scanning of the gas by the setup.
Transferring and loading the large datasets between computers and into processing soft-
ware takes up the most time. The setup as it is used for sample measurements is build
by A. Reyes Reyes[5].
2.1 Absorbance spectrum
In order to obtain information on the gas through spectroscopy, the interaction between
light and matter needs to be established. The relation of the concentrations of the
compounds in the gas to the light passing through the gas is given by the Beer-Lambert
law[6, 7]:
Io(ν, C) = Ii(ν)10−A(ν,C)
(2.1)
where Io is the beam intensity after interacting with the gas, Ii the beam intensity before
interacting with the gas, and ν and C denote the dependence on the wavenumber and
the concentrations of the compounds respectively. A is the absorbance, as given by
A(ν, C) =
n
i=1
i(ν)cilpath (2.2)
with
C = (c1, c2, ..., cn) (2.3)
where n is the amount of molecular species, and ci and i(ν) denote the concentra-
tion and molar absorptivity of each molecular species in the gas. The vector C is used
5
simply as an abbreviated way to indicate the dependence of the absorbance A on the
concentrations of all molecular species. The absorbance is also dependent on the inter-
action length lpath of the light with the gas. In order to determine the concentrations
ci of the molecules, all other parameters in Equation 2.1 and Equation 2.2 should be
known. The light is provided by the QCL, and the intensities Io and Ii are obtained by
processing the signals measured by the detectors in the setup. The molar absorptivity
i(ν) is a measure for how strongly a certain molecule species absorbs light at a given
wavelength. It is an intrinsic property of a molecule and can be considered unique to
each molecule. Hence the molar absorptivity can be looked up in the Northwest In-
frared database (NWIR)[8] and the high resolution transmission molecular absorption
database (HITRAN)[9], which are put together by the Pacific Northwest National Labo-
ratory (PNNL) and the Harvard-Smithsonian Center for Astrophysics respectively. The
interaction length lpath is a property of the setup, and more specifically of the cavity, and
can be measured from the cavity as described in subsection 2.2.2.
2.2 Setup
To find the concentration of compounds within a gas the spectroscopy setup depicted in
Figure 2.1 is used. The laser beam is produced at the QCLs. From the QCL the beam
falls onto a beam splitter, causing one of the beams to go straight to the monitor detector,
which is used to measure the intensity spectrum as emitted by the QCL. The other beam
from the beam splitter travels to the multipass cavity. The beam gets partially absorbed
by the gas present in the cavity, and then travels onwards to the main detector. The
distance meter is placed in such way that its beam coincides with that of the QCLs in
order to measure the optical path length in the cavity. The two detectors relay their
signals to the NI-PCI 5922 acquisition card, which in turn sends the signals to the
computer.
6
Figure 2.1: Diagram of the quantum cascade laser spectroscopy setup as used for mea-
suring. Courtesy of Z.Hou[3]
2.2.1 The quantum cascade laser
The laser system used is Block Engineering’s LaserScope which consists of two QCLs that
are arranged such that together they can scan over the wavenumber range of 832 cm−1
to 1263 cm−1
[10]. As the LaserScope is by itself designed to be a spectroscopic system,
it has an internal detector and a set of two QCLs which are portrayed in Figure 2.1 as
the monitor detector and the QCLs. The main specifications on the LaserScope can be
found in Table 2.1, where it should be noted that the spectral resolution stated is the
output when the signal is processed using Block Engineering’s own software.
Table 2.1: Specifications of the quantum cascade laser as given by Block Engineering
[10].
Parameter Specification
Spectral region 1267 to 833 cm−1
(8 to 12 µm)
Tuning Continuous or single wavelength
Emission type Pulsed
Pulse repetition frequency 200 kHz
Minimum average power 0.5 mW
Maximum average power 12 mW
Beam diameter 2.5 mm x 5 mm ellipse
Spectral resolution Down to 1 cm−1
By way of its design the QCLs emit different wavenumbers of light simultaneously[11].
The HgCdTe detectors detect the intensity of the light without differentiating between
the various wavenumbers. Since intensity per wavenumber is a necessity, a Littrow con-
figuration with a diffraction grating mounted on a piezo is used, as seen in Figure 2.2[3].
7
Figure 2.2: Diagram of a Littrow configuration external cavity diode laser with back
extraction. Courtesy of Z.Hou[3]
This Littrow configuration uses a diffraction grating to steer the first order diffraction
of the light with the desired wavenumber back through the laser diode, and into the setup.
The wavenumber selected using the grating depends on the spacing of the grating and
the incident angle of the light on the grating. The incident angle is controlled by applying
a voltage over the piezo.
The intensity of the light emitted by the QCL is furthermore influenced by mode
hopping, which can be caused by temperature fluctuations in the gain medium of the
laser or fluctuations in the injection current. In chapter 3 a closer look is taken at the
signal from the QCL and its noise. The intensity spectrum of the QCL system is given
in Figure 2.3. The general shape of this signal portrays the intensity spectrum the laser
outputs. The intensity drop around 1010 cm−1
indicates the switch between the two
QCLs that together scan over the entire wavenumber range.
8
850 900 950 1,000 1,050 1,100 1,150 1,200 1,250
0.2
0.4
0.6
0.8
1
·108
Wavenumber (cm-1
)
Intensity(counts)
Spectrum emitted by the QCL
Figure 2.3: Intensity spectrum of the laser system used: Block Engineering’s LaserScope.
2.2.2 Multipass cavity
A cavity is used to achieve a large interaction length lpath between the gas and the
beam. The cavity is depicted in Figure 2.4. The beam enters the cavity through a ZnSe
window. It then reflects from the two mirrors at either end of the cavity, until the beam
falls on the window again and exits the cavity. The mirrors (Aerodyne Research, Inc.)
are astigmatic, meaning each mirror has two different radii of curvature which allows
for more reflections to take place in order to have a larger interaction distance, lpath[12].
A large interaction distance is desirable since it increases the absorbance. A multipass
cavity is used in favor of a resonance cavity since the high reflection mirrors as used in a
resonance cavity are not available for the wide wavelength range used in this experiment.
9
Figure 2.4: Diagram of the multi-pass cavity. The ZnSe window is located on the left in
yellow, and the two mirrors in blue on either end of the cavity. Courtesy of Z.Hou[3]
The length of interaction of the light with the gas lpath is measured using a laser
distance meter to be 54.36 meter[3]. A multipass cavity with such a long path length is
used in order to have a large absorbance. The mirrors in the cavity and the ZnSe window
leading in to the cavity also absorb 94.85% of the light[3, p. 25]. Part of the light is ab-
sorbed in the cavity as according to the absorbance profiles of the molecules constituting
the gas. The remaining light exits the cavity and is measured at the main detector. In
order to clean the cavity between measurements a series of mass flow controllers and a
diaphragm pump are used. This way contamination in the cavity is prevented[5].
2.2.3 Distance meter
The distance meter (Leica DISTO D2) emits light pulses along the path as depicted by
the red dashed line in Figure 2.1. The path the pulse travels outside the cavity is simply
measured with a ruler. By looking at Figure 2.1 it can be seen that by subtracting
the length of the path the pulse travels outside of the cavity from the total length as
calculated by the distance meter, the optical path length inside the cavity is obtained[5].
This optical path length inside the cavity is also checked by measuring the time a pulse
from the QCLs takes straight to a detector compared to the time a pulse from the QCLs
takes when it passes through the cavity and then onto a detector. The delay between
the pulses measured is given in Figure 2.5, from which the same result of 54.36 cm is
obtained.
10
Figure 2.5: Delay time between a pulse going straight to the a detector and a pulse
passing through the cavity for three different wavenumbers. (a) 950 cm−1
, (b) 1000
cm−1
, (c) 1150 cm−1
. Courtesy of A. Reyes Reyes[5]
2.2.4 Detection
A HgCdTe detector measures the light before it enters the gas, and another HgCdTe
measures the light when it exits the gas. These are called the monitor and main detector
respectively, and the signals obtained from these detectors will henceforth be referred to
as the monitor signal and the main signal. The specifications of the monitor detector are
given in Table 2.2.
Table 2.2: Specifications of the monitor detector as given by Block Engineering[13].
Parameter Specification
Detector element TE cooled HgCdTe
Active area 1x1 mm2
Window BaF2
Wavelength range 6–12 µm
Total gain (typical) 800 V/W
Cut-off frequency range 10 Hz – 20 MHz
Operating temperature (typical) 223 K
Saturation threshold power (at 8 µm, 20 MHz speed) 5 mW (peak)
The main detector is from VIGO System S.A. with part number PVI-4TE-10.6-
0.3x0.3-TO8-BaF2, the specifications of which are given in Table 2.3.
11
Table 2.3: Specifications of the main detector as given by VIGO System S.A.[14].
Parameter Specification
Detector element TE cooled HgCdTe
Active area 0.3 × 0.3 mm2
Window BaF2
Wavelength range 2–12 µm
Detectivity at λpeak ≥ 4.0 × 109 cm·
√
Hz
W
Detectivity at λopt ≥ 2.0 × 109 cm·
√
Hz
W
Optimal wavelength 10.6 µm
Operating temperature (typical) 195 K
The intensity of the light emitted by the QCL IQCL as seen in Figure 2.6 varies per
measurement due to temperature fluctuations and noise on the voltage applied over the
piezo. Therefore the signals of multiple measurements need to be normalized in order to
be compared to each other, for which the two detectors are used.
1,173.35 1,173.4 1,173.45 1,173.5 1,173.55 1,173.6 1,173.65 1,173.7 1,173.75 1,173.8
7
7.5
8
8.5
9
·108
Wavenumber (cm-1
)
Intensity(counts)
Measurements of emitted intensity of the QCLs
Figure 2.6: Intensity spectrum of various measurements with the monitor detector.
12
3. Data Calibration
3.1 Wavenumber calibration
As mentioned in chapter 2 the intensity of the light emitted by the QCLs is subject to
noise. This noise is corrected for through normalisation by using the monitor and main
detector. In this chapter an attempt is made to reduce the standard deviation in the
monitor and the main signals as caused by an uncertainty of what wavenumber is emitted
from the QCLs. The wavenumber corresponding to the intensity measured at the detec-
tors is determined by the angle of the grating mounted on the piezo, as mentioned in
subsection 2.2.1. The wavenumber attributed to the measured beam intensities at a point
in time is determined by the voltage on the piezo, which should predictably determine
the angle of the grating. Since there is noise on the voltage over the piezo element the
behaviour of the piezo is not completely predictable, and thus the angle of the grating
can not be told precisely by the voltage at a point in time on the piezo. This gives an
uncertainty to the wavenumbers attributed to the measured beam intensities, as can be
seen in Figure 3.1. The wavenumber data in Figure 3.1 is obtained by recording files
made by Block Engineering’s firmware which measures the voltage over the piezo and
contains knowledge of what voltage corresponds to what wavenumber. Unfortunately
Block Engineering’s firmware outputs its spectrum at lower wavenumber resolution than
the acquisition described in subsection 3.1.2 using the NI-PCI 5922 acquisition card.
This wavenumber uncertainty in turn increases the standard deviation of the ab-
sorbance when multiple measurements are compared. QCL systems are a relatively new
development and the QCLs in the Laserscope should be among the best available cur-
rently, hence there is no certainty that significant improvements can be made with the
knowledge on QCL design and piezo elements currently available. With this, and the
complicated nature of the electronics in the LaserScope QCL system, it could take a
long time to replace the piezo, for which there is no time in the scope of the project.
Instead wavenumber calibration is done in order to reduce this uncertainty in absorbance
by calibration of multiple measurements.
3.1.1 Four measurement signals
As established in section 2.2 a main signal and a monitor signal are obtained during
each measurement. Multiple measurements are performed per sample. In order to get
information on the gas, measurements are made using the main detector when a gas is
13
Measuring time - wavenumber relation
200 400 600 800 1,000
850
900
950
1,000
Time stamp (a.u.)
Wavenumber(cm-1
)
(a)
468 468.5 469 469.5 470
920.8
921
Time stamp (a.u.)
Wavenumber(cm-1
)
(b)
Figure 3.1: Effect of the voltage noise on the wavenumber emitted of several measure-
ments over time. (a) Several scans superposed. (b) A zoom into a section of the figure
shown in (a).
inside the cavity. These measurements will be called gas measurements from now on.
In order to single out any effects due to the different paths the beams take to the main
and monitor detector, measurements are also done without gas in the cavity. These
measurements without gas will henceforth be referred to as reference measurements.
These reference measurements are also used to establish the beam intensity without
interaction with the gas, which is Ii in Equation 2.1. The effects of the components of
the setup such as the cavity, the lenses, and the mirrors on the beam to the main detector
are all packed together in a factor fpath1(ν). This factor is defined such that multiplying
the intensity of the beam as it exits the QCLs by this factor gives the intensity of the
beam as it arrives at the main detector, and the factor therefore takes values anywhere
between 0 and 1 for the wavenumber range. fpath2(ν) and fgas(ν, C) are similarly defined
for the effect of the elements between the QCLs and the monitor detector and for the
effect when a gas sample is present.
Due to effects in the QCLs, the signals vary in intensity between different measure-
ments. Therefore during both gas and reference measurements monitor signals are also
recorded. The monitor signals are necessary to be able to reduce the effects of the QCL,
as the QCL affects the signal of the monitor and the main detector similarly and can
therefore be identified. For a clear overview the various effects on the signal in the two
types of measurement of both detectors are laid out in Table 3.1. Wavenumber depen-
dence in Table 3.1 is indicated by ν and ν‘, to indicate that different measurements can
not be compared wavenumber-wise. This is a result from the uncertainty caused by ef-
14
fects in the QCL.
Table 3.1: The contributions to the intensity as measured by the two detectors.
Signal type Gas measurement Reference measurement
Main IQCL(ν‘)fpath1(ν‘)fgas(ν‘, C) IQCL(ν)fpath1(ν)
Monitor IQCL(ν‘)fpath2(ν‘) IQCL(ν)fpath2(ν)
3.1.2 Data acquisition and pre-processing
The signal from the detectors is processed before it is calibrated, as by the method
developed by Z. Hou[3]. The light emitted by the QCL is a series of pulses lasting 333
ns and spaced approximately 5600 ns apart. These light pulses go through the setup and
end up at the detectors, where the subsequent signals are then sampled at a rate of 10
MHz, or 1 sample per 100 ns, by the NI-PCI 5922 acquisition card. Since the saving of
the data from the memory of the acquisition card to the computer is one of the most
time consuming part of the data acquisition, the data is filtered using LabVIEW such
that only the peaks and valleys of the pulse signal are saved to the computer. For this,
LabVIEW’s peak detection is used at a width of 3 points in order to get an amplitude
error for the peaks of 0.39%. As the valleys are essentially parts where the laser is emitting
no light but there is still a signal, the signal strength of the valleys surrounding a peak
are subtracted from the peak to normalize out any bias due to electronics of the detector.
A final resolution of 0.05 cm−1
is achieved. For each sample ten gas measurements and
ten reference measurements are done and processed this way. The spectrum of one of
these processed monitor signals looks as shown in Figure 3.2. The entire spectrum spans
from 832 cm−1
to 1263 cm−1
.
15
800 850 900 950 1,000 1,050 1,100 1,150 1,200 1,250 1,300
0
0.2
0.4
0.6
0.8
1
1.2
1.4
·109
Wavenumber (cm-1
)
Intensity(counts)
Monitor signals of gas and reference measurement
Figure 3.2: A monitor signal after pre-processing as described in subsection 3.1.2.
3.1.3 Need for additional calibration
The absorbance is obtained by inverting Equation 2.1 such as:
A(ν, C) = − log10
Io(ν, C)
Ii(ν)
. (3.1)
Using Table 3.1 the measured signals are substituted for the beam intensity with gas
in the cavity Io and the beam intensity gas in the cavity Ii:
Io(ν‘, C) =
Imain,gas
Imon,gas
=
IQCL(ν‘)fpath1(ν‘)fgas(ν‘, C)
IQCL(ν‘)fpath2(ν‘)
=
fpath1(ν‘)fgas(ν‘, C)
fpath2(ν‘)
(3.2a)
Ii(ν) =
Imain,ref
Imon,ref
=
IQCL(ν)fpath1(ν)
IQCL(ν)fpath2(ν)
=
fpath1(ν)
fpath2(ν)
, (3.2b)
where the wavenumber dependence as indicated by ν and ν‘ shows that different mea-
surements can not be compared wavenumber-wise as shown at the start of section 3.1.
16
The division by the monitor signals is done to remove the influence of the QCLs’ inten-
sity instability in IQCL(ν) and IQCL(ν‘). By substituting Equation 3.2 into Equation 3.1
Equation 3.3 is obtained:
A(ν, ν‘, C) = − log10


fpath1(ν‘)fgas(ν‘,C)
fpath2(ν‘)
fpath1(ν)
fpath2(ν)

 = − log10
fpath1(ν‘)fpath2(ν)fgas(ν‘, C)
fpath1(ν)fpath2(ν‘)
.
(3.3)
Note that the absorbance A(ν, ν‘, C) is still dependent on the influences of the op-
tical path because the two separate wavenumber dependencies cannot be resolved. The
mean absorbance from ten measurements of a single sample with the standard deviation
calculated per wavenumber are shown in Figure 3.3.
Mean absorbance and standard deviation of ten measurements
800 1,000 1,200
−0.2
0
0.2
0.4
Wavenumber (cm-1
)
Absorbance
(a)
1,172 1,173 1,174 1,175 1,176
0
0.1
Wavenumber (cm-1
)
Absorbance
(b)
Figure 3.3: Mean absorbance (red line) and standard deviation (dashed blue lines) of ten
measurements. (a) Entire scan range. (b) A zoom into a section of the scan shown in
(a).
The large standard deviation in Figure 3.3 is partly because the multiple measure-
ments over which the mean is taken cannot be compared properly, since the relation
between the wavenumbers of different measurements is not known. When looking closely
at Figure 3.4a and Figure 3.4b, this wavenumber discrepancy is seen as the measurements
do not overlap.
17
1,172 1,174 1,176
0.6
0.8
1
·109
Wavenumber (cm-1
)
Intensity(counts)
Uncalibrated monitor measurements
(a)
1,172 1,174 1,176
1
1.5
2
·108
Wavenumber (cm-1
)Intensity(counts)
Uncalibrated main measurements
(b)
Figure 3.4: Ten uncalibrated intensity measurements of a single sample. Five of these are
from measurements with a gas in the cavity, and five are from a reference measurement.
(a) Measured signals at the monitor detector. (b) Measured signals at the main detector.
This local horizontal shift from one measurement to another results in an increase in
standard deviation of the eventual absorbance and concentration. The absorbance of a
single sample is calculated from multiple measurements with gas inside the cavity and
the reference measurements without gas in the cavity. All these measurements need to
be aligned to the same wavenumber values.
3.1.4 Wavenumber calibration
In order to calibrate the various monitor and main signals, a basis to which they can
all be calibrated is chosen. The mean of all the main signals is not a useful basis to
calibrate towards since the main signals of the reference measurements differ from the
main signals of the gas measurements because in the latter a gas is present and absorbs
part of the light. The mean of all the monitor signals is used as basis for calibration
since the monitor signals are nearly identical for all measurements because they do not
enter the cavity and therefore are not affected by the gas in the cavity. Since the monitor
and main signal of a single measurement suffer from the same wavenumber discrepancies
from the QCL, the transformations that calibrate the monitor signals also calibrate their
respective main signals.
Calibration starts with the monitor signals, and is done towards the mean per wavenum-
ber of all the monitor signals of a sample, as given in Figure 3.5 as a dashed red line.
The positions of the peaks and valleys of this mean are then determined, as denoted by
black boxes in Figure 3.5. Each monitor signal is calibrated separately to the mean of
18
the monitor signals. Calibration of a monitor signal as denoted by the blue line in Fig-
ure 3.5 towards a single valley of the mean is shown in Figure 3.5b and Figure 3.5c. The
position of the valley of the mean used to calibrate is marked by a vertical dashed grey
line, and the two surrounding peaks of the mean are marked by vertical dashed black
lines in Figure 3.5b. The minimum value of the monitor signal between the two positions
of peaks of the mean is taken, as marked by the star in Figure 3.5c. This minimum value
is then taken as the signal’s valley corresponding to the valley of the mean as marked by
the grey line. This valley of the monitor signal is then shifted towards the corresponding
valley of the mean of all the monitors. This process is repeated for all valleys of the mean.
The peaks of the monitor signal are calibrated in a similar fashion. Every signal is now
locally shifted so its peaks and valleys match that of the mean signal. All data between
peaks and valleys is stretched and compressed by means of resampling to fit the adjusted
peaks and valleys. Since the data between a peak and a valley is mostly straight, linear
interpolation is sufficient for resampling.
19
Calibration of a monitor signal
1,172 1,172.5 1,173 1,173.5 1,174 1,174.5 1,175 1,175.5 1,176
6
7
8
9
·108
Wavenumber (cm-1
)
Intensity(counts)
Mean of all monitors
A monitor signal
Peaks and valleys of mean
(a)
1,174 1,174.5 1,175
7
8
9
·108
Wavenumber (cm-1
)
Intensity(counts)
(b)
1,174 1,174.5 1,175
7
8
9
·108
Wavenumber (cm-1
)
Intensity(counts)
(c)
Figure 3.5: Calibration of a monitor signal to a valley of the mean.
The complete calibration of a monitor signal following the procedure shown in Fig-
ure 3.5 can be seen in Figure 3.6.
20
1,172 1,172.5 1,173 1,173.5 1,174 1,174.5 1,175 1,175.5 1,176
6
7
8
9
·108
Wavenumber (cm-1
)
Intensity(counts)
Calibrated monitor signal
Mean of all monitors
A monitor signal
Peaks and valleys of mean
Figure 3.6: A single monitor signal calibrated towards the mean of all monitor signals of
a sample.
The miss-calibrated blue peak in Figure 3.6 around 1175.2 cm−1
is a consequence of
the minimum of the monitor signal between the surrounding peaks of the mean being
located at a higher wavenumber rather than a lower wavenumber of this monitor peak.
Since the indices of the peaks and valleys of the monitor signal found are later sorted with
the assumption of peaks and valleys alternating, the small blue peak gets miss-assigned
as a valley and is consequently put at the place of the valley of the mean at 1175.2 cm−1
instead of peak of the mean at 1175.4 cm−1
. This can be solved by comparing the indices
of the monitor peaks with those of the monitor valleys, and replacing the index of the
monitor valleys of which the index is larger than that of the subsequent peak by the
index of the minimum between the monitor peak and the monitor peak before that.
Figure 3.7 demonstrates how the wavenumber calibration procedure calibrates ten
measurements of a sample towards each other. Five of the signals are from gas measure-
ments, and five are from reference measurements. Figure 3.7c and Figure 3.7d are the
wavenumber calibrated versions of Figure 3.7a and Figure 3.7b respectively.
21
1,172 1,174 1,176
0.6
0.8
1
·109
Wavenumber (cm-1
)
Intensity(counts)
Uncalibrated monitor measurements
(a)
1,172 1,174 1,176
1
1.5
2
·108
Wavenumber (cm-1
)Intensity(counts)
Uncalibrated main measurements
(b)
1,172 1,174 1,176
0.6
0.8
1
·109
Wavenumber (cm-1
)
Intensity(counts)
Calibrated monitor measurements
(c)
1,172 1,174 1,176
1
1.5
2
·108
Wavenumber (cm-1
)
Intensity(counts)
Calibrated main measurements
(d)
Figure 3.7: Ten monitor signals and their respective main signals shown uncalibrated
and calibrated. (a) Uncalibrated monitor signals. (b) Uncalibrated main signals. (c)
Calibrated monitor signals. (d) Calibrated main signals.
The calibration done this way does not align the signals perfectly, as can be seen
in Figure 3.7. The majority of the monitor peaks and valleys as seen in Figure 3.7c
22
are aligned perfectly, and slightly unaligned peaks can be attributed to the calibration
method only looking for a valley or peak of the signal between two peaks or valleys of
the mean. When the valley or peak of the signal is located exactly on a peak or valley of
the mean, the closest point to the peak is most likely the highest and therefore will be
aligned as a peak, giving a slight miss-alignment.
For each sample ten gas measurements and ten reference measurements are done,
giving a total of twenty monitor signals and twenty main signals. The drop in standard
deviation of these signals as a result of the calibration can be seen in Table 3.2. The
drop in standard deviation is calculated for signals from reference and gas measurements
separately, and for the collection of signals of both reference and gas measurements. The
numbers indicate the percentage drop from the uncalibrated signals to the calibrated
signals. From Table 3.2 it seems that the calibration works better for monitor signals
than it does for main signals. This can be attributed partly to the fact that the monitor
signals served as the basis of the calibration, and when looking at Figure 3.7 it does seem
that the calibration works better for the monitor signals. However, since each monitor and
its respective main signal experience the same wavenumber distortions from the QCLs,
and the exact transform used on each monitor signal is also used on its respective main
signal, the large difference in calibration result can not entirely be attributed to the basis
of calibration. Rather the standard deviation in the main signals is already relatively
higher than in the monitor signals, and the wavenumber uncertainty is a smaller part
in the total standard deviation of the main signals than in the monitor signals. Thus
calibrating for the wavenumber uncertainty decreases less the total standard absorbance
of the main signals.
Table 3.2: Decrease in standard deviation for the monitor and main signals as result of the
wavenumber calibration. This decrease in standard deviation is calculated separately for
just gas measurements, just reference measurements, and gas and reference measurements
together.
Signal type Gas measurements
(%)
Reference
measurements (%)
Gas & reference
measurements (%)
Main 7.04 6.50 6.46
Monitor 16.6 17.0 18.4
It is also interesting to note that the standard deviation of the main signals improves
less when both reference and gas measurements are taken into account as opposed to only
one type of measurement. This can be explained because the gas and reference measure-
ments of the main signal should be different due to the absorption of the main signal
by the gas in the gas measurement. Because the gas and reference measurements are so
different, the total standard deviation when taking both gas and reference measurements
into account is greater than that of the separate measurements, and hence the standard
deviation caused by the wavenumber uncertainty is a smaller part.
After aligning all these signals as explained in subsection 3.1.4, ten absorbances—each
from a combination of the four signals from Table 3.1—are calculated as according to
Equation 3.1, Equation 3.2, and Equation 3.3. However since the wavenumbers of the
23
various measurements are now calibrated,
ν = ν‘, (3.4)
and thus the absorbance as in Equation 3.3 becomes
A(ν, C) = − log10
fpath1(ν)fpath2(ν)fgas(ν, C)
fpath1(ν)fpath2(ν)
= − log10 [fgas(ν, C)] . (3.5)
The standard deviation of these ten absorbances decreases from 0.0313 before calibra-
tion to 0.0291 after calibration, a decrease of 7.06%. For a large part of the absorbance
spectrum this is still a relatively large standard deviation as seen in Figure 3.8, where
the mean absorbance found this way for one sample is given.
850 900 950 1,000 1,050 1,100 1,150 1,200 1,250
0
0.5
1
1.5
2
2.5
·10−1
Wavenumber (cm-1
)
Absorbance
Absorbance calibrated by wavenumber
Figure 3.8: The wavenumber calibrated absorbance as calculated from the measurements
of a single sample.
3.2 CO2 and H2O calibration
When looking at the absorbance spectrum of a breath sample as in Figure 3.8 there are
a few regions that stand out by their relatively high absorbance. From Equation 3.1 it
24
can be seen that high molar absorptivities (ν) in a region and high concentrations of
molecules with such molar absorptivities contribute to a high local absorbance. When
checking for molecules known to have relatively high concentrations in breath such as
CO2 with about 4% per volume, H2O with about 5% per volume, and N2 with about
78% per volume, a cause of the high absorbance regions is easily identified, as seen in
Figure 3.9. The molar absorptivity spectrum overlays of CO2 and H2O in Figure 3.9 are
scaled with factors 60 and 5 · 104
respectively to clearly show the matching regions. CO2
and H2O molar absorptivities are taken from HITRAN database[9].
850 900 950 1,000 1,050 1,100 1,150 1,200 1,250
−0.5
0
0.5
1
1.5
2
2.5
·10−1
Wavenumber (cm-1
)
Absorbance(base-10)
Absorbance with overlaid CO2 and H2O molar absorptivity
Experimental data
CO2
H2O
Figure 3.9: Absorbance from wavenumber calibrated data overlaid with spectra of CO2
and H2O from HITRAN database[9].
When looking more closely at these matching regions between the absorbance spec-
trum and CO2 and H2O, it is seen in Figure 3.10 that even after the wavenumber calibra-
tion the absorbance spectrum does not completely match CO2 and H2O from literature.
25
1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083
0
0.5
1
1.5
2
·10−1
Wavenumber (cm-1
)
Absorbance(base-10)
Closer look at absorbance with CO2 and H2O overlaid
Experimental data
CO2
H2O
Figure 3.10: Absorbance from wavenumber calibrated data overlaid with spectra of CO2
and H2O from HITRAN database[9].
A method similar to the wavenumber calibration is used in order to calibrate the
absorbance spectrum to the database spectra of CO2 and H2O. The difference is that
now only the peaks are matched to CO2 and H2O, and all data points in between,
including valleys, are linearly interpolated between the matched peaks.
3.2.1 Method for calibration to CO2 and H2O
Input and preprocessing
This method uses the absorbance spectrum as obtained from the wavenumber calibra-
tion, the accompanying wavenumber range, and molar absorptivity spectra from CO2
and H2O. Since the method used for the calibration to CO2 and to H2O are almost iden-
tical, only the CO2 case is described here. The CO2 spectrum is taken from HITRAN
database[9], and in the range of 832 cm−1
to 1263 cm−1
with a resolution of 0.05 cm−1
.
The CO2 spectrum is scaled by a factor 60 so the height of its peaks approximately
matches that of the experimental data.
26
CO2 peak and absorbance peak identification
To facilitate the calibration of the experimental data to the CO2 spectrum from the
HITRAN database, the peaks and valleys of the CO2 database spectrum serve as refer-
ence points. The peaks and valleys of the CO2 database spectrum are identified using
a peakfinder[15] algorithm. The peakfinder algorithm finds peaks above a certain ad-
justable threshold, based on its value compared to the surrounding data points. The
peakfinder algorithm then returns the indices and intensities of all data points that it
identifies. In case the first and last peak are before and respectively after the first and
last identified valley, the first and last peak in the spectrum are removed. This way the
range to adjust always starts and ends with a valley of the CO2 database spectrum.
Now that all peaks and valleys of the CO2 database spectrum are found, they are
used in order to find the absorbance peaks and valleys of the experimental data. The first
absorbance peak is identified by taking the maximum absorbance between the first two
valleys of the CO2 database spectrum. The neighbouring absorbance valley is identified
by taking the minimum between the first absorbance peak and the next peak of the
CO2 database spectrum. The next valleys/peaks of the experimental data are found by
looking between the last peak/valley found from the experimental data and the next
peak/valley from CO2 of the database, alternating between finding a peak and a valley.
This process is repeated to the last identified valley of the database CO2 spectrum. This
way all absorbance valleys are found between two peaks, and all absorbance peaks are
found between two valleys. Thus the same amount of absorbance peaks and valleys from
the experimental data are identified as peaks and valleys of the CO2 database spectrum,
making it easy to match them1
.
Peak calibration
All the peaks and valleys of the CO2 database spectrum are now put in a single list, and
ordered by the index. The same is done for the peaks and valleys of the experimental
data. The peaks of the experimental data list are now matched to coincide to the peaks
of CO2, and all the measurement data in between two peaks is interpolated to fit between
the surrounding CO2 peaks. This same procedure is applied to H2O. The result of these
calibrations is seen in Figure 3.11.
3.2.2 CO2 and H2O calibration results
Except for the sides of the CO2 and H2O spectra where the peak heights are around the
noise level, nearly all CO2 and H2O peaks seem to be correctly calibrated. This is also
indicated by Figure 3.11, though some peculiarities are worth mentioning.
Around 1081 cm−1
the measured absorbance has no peak, yet according to the CO2
spectrum it should have. One reason for the absence of an absorbance peak could be
that the calibration algorithm miss-calibrated, and the next absorbance peak at 1082.3
1
Since there is one more CO2 valley than CO2 peak, the amount of found absorbance peaks matches
CO2 peaks. However, two less absorbance valleys are found than there are CO2 valleys. Hence the first
and last CO2 valley are removed.
27
1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083
0
0.5
1
1.5
2
2.5
3
·10−1
Wavenumber (cm-1
)
Absorbance(base-10)
Closer look at absorbance calibrated to CO2 and H2O
Experimental data adjusted for CO2 and H2O
CO2
H2O
Figure 3.11: Absorbance calibrated to CO2 and H2O overlaid with spectra of CO2 and
H2O from HITRAN database[9].
cm−1
should actually be at 1081 cm−1
. This can be rejected on the basis that the spacing
of the other absorbance peaks seems to match well with the spacing of the CO2 peaks,
even before calibration as seen in Figure 3.10. The question remains—taking that no
miss-calibration took place around 1081 cm−1
—why there is no peak when a peak would
be expected from looking at the CO2 database spectrum. This might be explained by
noise, however when looking at the measured absorbance’s constituent monitor and main
signals the noise is not any higher than at other places.
A related peculiarity, namely the height of the measured absorbance peaks not match-
ing with the heights of the CO2 peaks in general, is also at first ascribed to noise. However,
when comparing absorbances of multiple different samples as in Figure 3.12, it can be
seen that the height of each peak is about equal between samples. This indicates that the
difference in height of the various measured absorbance peaks to the CO2 peaks is not a
matter of noise, but rather a consistent wavenumber dependent influence from a part of
the setup. A source for these mismatching absorbances has not yet been pinpointed.
One event of miss-calibration can still be argued to have taken place. In Figure 3.10
the closest thing to an absorbance peak near the CO2 database peak at 1081.1 cm−1
is
the peak located at 1081.8 cm−1
, which is not calibrated to the CO2 database peak as
28
seen in Figure 3.11. From Figure 3.10 it can be seen that this small peak was not detected
and calibrated since it is located outside the region where the search was performed, from
the valley in the absorbance spectrum at 1081.0 cm−1
to the valley in the CO2 spectrum
at 1081.6 cm−1
. This might be solved by increasing the search region, though this could
cause problems for other peaks. It can also be argued that since the small peak at 1081.8
cm−1
is actually located closer to the CO2 peak at 1082.3 cm−1
than to the peak at
1081.1 cm−1
, it is not necessarily better to match it to the peak at 1081.1 cm−1
. It can
also not be said whether the absorbance peak that would be expected near the small
peak would be located at the same position, and thus no conclusion can be made if this
peak would also not have been calibrated to its corresponding CO2 peak at 1081.1 cm−1
.
1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083
0
0.5
1
1.5
2
2.5
3
·10−1
Wavenumber (cm-1
)
Absorbance(base-10)
Comparison of height of CO2 peaks to measured samples
Experimental data
Experimental data
Experimental data
CO2 spectrum
Figure 3.12: Measured absorbances of three different samples calibrated to CO2 and
H2O. The molar absorptivity overlay of CO2 is scaled by 60. CO2 molar absorptivity is
taken from the HITRAN database[9].
29
4. Molecules and Concentrations
In this chapter a list of molecules is assembled. This list contains the molecules that are
likely present in breath. The list is assembled in such a way to find molecules with con-
centration differences between the healthy volunteers and the asthmatic volunteers that
contributed their breath for this research. Seventy breath samples of healthy volunteers
and seventy breath samples of asthmatic volunteers are used. The absorbance spectra
of these samples were calibrated according to chapter 3. In this chapter the calibrated
spectra are checked for the concentrations of the molecules in the list. Lastly, a specific
method for acquiring a more accurate concentration for ethanol is described.
4.1 List of molecules
To check the absorbance spectra for concentrations, a list of molecules with known molar
absorptivities is assembled. The initial list consists of all molecules with known molar
absorptivities from the PNNL[8] database and the HITRAN[9] database. Currently these
databases provide information on a total of 456 different molecules.
From the 456 molecules available in the PNNL and HITRAN databases it is necessary
to select those likely present in breath. Not all of the molecules are expected to be present
in breath, as a lot of them are neither organic nor known to be in the atmosphere in an
amount detectable by the current setup. Seeing that the absorbance still has uncertainty
after the calibration, the concentrations determined will have a related uncertainty. As
a result the concentrations of non-present molecules will not necessarily be set to zero,
and will therefore influence the concentrations determined for molecules that are present
in the gas sample. Thus before determining the concentrations it is vital to get a list of
molecules which excludes these excessive molecules. In order to distinguish between the
healthy and asthmatic groups, the list should also contain molecules likely to have large
differences in concentration between the groups.
The list is assembled in two steps. Firstly by determining the molecules that have
been previously reported to be present in breath, the standard molecules. The second
step is adding those molecules likely to show different concentrations between the samples
of the healthy and the asthmatic volunteers. In the case when one is not interested in
finding differences between data sets the list of standard molecules should be extended
in another way. This can be done by selecting molecules based on their absorptivity and
a set minimal absorbance.
30
4.1.1 Standard molecules
This first group of molecules is a combination of atmospheric molecules[16] and molecules
known to be in breath[17] with high absorbance in the relevant wavenumber region of 832
cm−1
to 1263 cm−1
. Molecules such as CO2, H2O, ethanol, and acetone are the major
contributors to the absorbance spectrum of breath in the relevant wavenumber region.
These molecules must therefore be included in the list regardless of whether they are
likely to have differences in concentration between the healthy and the asthmatic group.
The list of molecules chosen this way can be found in Table 4.1.
Table 4.1: List of standard molecules used.
Molecule
Acetic acid
Acetone
Ammonia anhydrous
Benzene
Carbon dioxide
Carbon monoxide
Ethane
Ethyl alcohol
Ethylene
Formaldehyde monomer
Methane
Pentane
Propane
Water
4.1.2 Molecules selected to distinguish between data sets
To determine whether a gas was exhaled by a healthy person or by someone with asthma,
a recognizable difference in measurement results is desirable. P-values are used to quan-
tify this difference between the measurements from healthy people and asthmatic people.
Since p-values are often misunderstood, an explanation of how p-values are calculated
and the significance of p-values to this research is given.
P-values
The p-value is defined as the probability of obtaining a certain result or more under the
assumption of a certain hypothesis. In the context of these breath measurements the
null hypothesis is that being asthmatic has no effect on the concentrations of molecules,
and will give identical absorption profiles as being healthy. Thereby for each wavenum-
ber the combined absorbances of multiple samples of healthy and asthmatic people are
expected to be two groups of points spread around the same absorbance value. A normal
distribution is assigned to both the asthmatic and the healthy samples, and the mean
31
and standard deviations of both is calculated. Now, assuming the null hypothesis is true,
the mean and standard deviation of the asthmatic group should be the same as that
of the healthy group. The p-value per wavenumber is found by calculating the chance
that the absorbances of the asthmatic samples get the values they are, assuming the null
hypothesis which claims that the asthmatic samples follow the distribution of the healthy
samples.
Assume the null hypothesis is not necessarily true, but rather that the alternative—
having asthma does give a different absorbance spectrum—could also be true. The
preferred hypothesis is the one for which the measured absorbance is most likely to
occur. Although no p-values are calculated for the alternative hypothesis, a small p-
value for the null hypothesis is still used as an indicator that samples of asthmatic and
healthy patients are inherently different. The p-values between the 70 absorbances of
healthy patients on the one hand and the 70 absorbances from asthmatic patients on the
other hand are obtained this way for every wavenumber. This is done using an algorithm
from K. Brand[18] that uses the topTable function from the LIMMA package in R.
P-region analysis
In p-region analysis the compounds most likely to have different concentrations between
the two data sets are sought. This is done by checking the p-values corresponding to the
two data sets and finding regions of consecutive low p-values. The reasoning behind this
is as follows: organic molecules such as the ones expected in human breath are larger
and more complicated than for example H2O and CO2, and thereby have a spectrum
consisting of broad smooth profiles. If the concentration of such a molecule differs greatly
from one data set to the other, this would be clear by low p-values in a broad band. Thus
to find these molecules, the p-values corresponding to two data sets need to be below a
certain threshold, for a certain region of consecutive wavenumber values. For these so-
called p-regions, a list of compounds is checked for their absorptivity. Compounds with
an average absorptivity above a certain threshold within any of these regions are likely
candidates for the difference in the measured intensity between two sample groups, and
are listed to be checked for their concentration in the data set. This list includes not only
the molecules, but also in which wavenumber regions they are present. Two p-regions
are found using a minimal of 10 consecutive data points with p-values of 0.05 or lower,
one located between 1181.80 cm−1
and 1182.55 cm−1
and another from 1261.40 cm−1
to
1262.05 cm−1
. The complete set of compounds from the HITRAN and PNNL databases
is checked for average molar absorptivity above 10−4
ppm−1
m−1
in these regions. A table
with the regions and a list of molecules found this way can be seen in Appendix A.2.
The p-values between the absorbance spectra of the healthy and the asthmatic pa-
tients vary over narrow wavenumber regions, as can be seen in Figure 4.1. The red line
in Figure 4.1 indicates the upper limit of 0.05 for which p-values are selected to indicate
a significant difference in absorbance between healthy and asthmatic. The p-region from
1181.80 cm−1
to 1182.55 cm−1
can be identified using the red line. The p-values vary
over wavenumber regions that are narrow compared to the width of absorbance regions
as expected from molecules. This is explained by the mean-to-standard deviation ratio
32
of 0.77071
over the wavenumber range of ten absorbance spectra, as calculated from a
single gas sample of a healthy person. Even though the p-values are calculated between
two groups of 70 samples each, the large variations in p-values as observed in Figure 4.1
seem to indicate that either the samples within each group do not have much in common
with each other, or that the noise on the signal is too large to make out significant differ-
ences between the two groups. Hence it seems that the p-values might not be accurate
indicators of subtle differences in concentrations of molecules between the healthy and
asthmatic groups when using measurements with a large uncertainty. Regardless of this,
relevant candidate molecules have been found using this method. An example is Freon-
114 which is identified as being present in both p-regions and is used as a propellant in
metered-dose inhalers, which are the most commonly used delivery systems for treating
asthma[19].
1,172 1,174 1,176 1,178 1,180 1,182
0
0.2
0.4
0.6
0.8
1
Wavenumber (cm-1
)
P-value
P-values of absorbance spectra of 70 healthy vs. 70 asthmatic patients
P-values of healthy vs. asthmatic
P-region selection threshold
Figure 4.1: P-values as calculated between 70 absorbance spectra of healthy patients and
70 absorbance spectra of asthmatic patients.
1
The mean-to-standard deviation ratio over the entire spectrum of one sample is 0.7707. It is worth
mentioning that the mean-to-standard deviation ratio in the regions with high absorbance of CO2
and H2O is lower. The mean-to-standard deviation ratio can also vary considerably over the different
measurements.
33
4.1.3 Selection based on measurable concentration
The list of molecules can be shortened by checking their absorbances. Since molecules
with a very low absorbance in the wavenumber region of interest will not be accurately
estimated at the current noise level, these molecules can be eliminated. It is known from
Z. Hou[3, p. 71] that the sensitivity of the system is in the order of 1 ppmv, and from A.
Reyes Reyes[5] that the sensitivity can even reach in the order of hundreds of ppbv. This
sensitivity is then used to establish a lower limit on the concentration for all molecules.
In this case the chosen limit is even lower at 0.001 ppmv to make sure no molecules are
eliminated unnecessarily. Absorbances are calculated for all molecules from this limit in
concentration and their molar absorptivities.
From an initial concentration estimation using only the standard molecules, a lower
limit for the absorbance is set to 50 · 10−7
. For all molecules the absorbance calcu-
lated using the lower limit of concentration is then checked against this lower limit of
absorbance. Molecules with absorbance that does not reach above the lower limit of
absorbance at any point are dismissed, and the other molecules can be put on the list
and further checked for their concentrations. A list of 199 molecules that are reported to
be present in breath is checked this way[17]. When the entire spectrum is checked this
way, some 130 molecules pass. Adapting this method for the wavenumber range of 1020
to 1100 cm−1
, which is the major absorptivity region for ethanol, some 69 molecules pass
as found in Appendix B.1. This list is used for determining the concentration of ethanol,
as further explained in section 4.3.
Since the absorptivity of a molecule that is present might be lower than that of an
absent molecule, there is no right choice for the absorbance and concentration limit,
and the selection based on these limits ends up being a trade-off. The limits can be
chosen stricter with the risk of excluding molecules that are present in the gas and
thereby decreasing the accuracy of the determined concentration, but with the benefit of
excluding molecules that are not actually present in the gas, and thereby increasing the
accuracy of the concentration determined.
Since the method for finding differences in concentration between the healthy and
asthmatic group filters on minimum absorbance and more specific wavenumber ranges,
it is stricter and therefore preferred when possible.
4.2 Concentration determination
4.2.1 method
Henceforth only the list of molecules found in section 4.1 are used to determine the
concentrations. At this stage the strictness of the list can be adjusted by only allowing
molecules present in a certain amount or more of the previously determined regions of low
p-values to be processed further. In order to process the molecules further and compare
them to the measured data, the information of the molecules from the databases is first
transcribed to match the format of the measured data, i.e. in wavenumber range and
resolution. This comes down to removing all the information outside of the relevant
wavelength region, and matching the wavenumber spacing as used in the measured data
34
through interpolation. Furthermore the measured data and molecules need to be in
the same physical quantity, for which in this case absorbance is the most useful. The
aggregated absorbance of the gas is determined by the terms of Equation 2.2 and for a
single molecule by Equation 4.1
A(ν, cmol) = mol(ν)cmollpath, (4.1)
where the absorbance A(ν, cmol) is determined by the concentration cmol, the optical
path length of the laser in the gas lpath, and the molar absorptivity mol. Since the
path length is measured from the setup and the molar absorptivity of the molecules is
obtained from the databases, the absorbance is left as a function of the concentration.
All calculations are done over the set wavelength region of 832 cm−1
to 1263 cm−1
as
determined by the range of QCL. The concentrations are determined through what is
essentially a curve fitting problem as the absorbances of the molecules can be laid over
the measured absorbance and then best fitted to match by tweaking the concentrations of
the molecules. This curve-fitting problem can be expressed as a non-linear least squares
problem as in Equation 4.2:
r(ν, C) = Aspectrum(ν) −
n
i=1
i(ν)cilpath (4.2)
where the terms on the right hand side from left to right are the measured spectrum
Aspectrum(ν) and the sum is the spectrum to be fitted to it. Now take
S(C) =
1263cm−1
ν=832cm−1
r(ν, C)2
, (4.3)
where r is a residue quantity, and S is the quantity to be minimized. The spacing for the
sum is 0.05 cm−1
. The square of the residue quantity r is taken so that larger residues per
wavenumber influence S more heavily, and are therefore avoided by the minimization al-
gorithm. This non-linear least squares problem is solved using the Levenberg–Marquardt
algorithm[20] which tweaks the concentrations from a given set of initial concentrations.
4.2.2 Input and results
The method described in the previous subsection is applied on the molecules found with
the p-region analysis, where all molecules with presence in one or more regions are checked
for their concentration with the measured absorbance. The initial concentration of Ta-
ble 4.2 are used.
Table 4.2: Table of initial concentrations used to calculate the concentrations as given in
Appendix A.3.
Molecule Concentration (ppmv)
Carbon dioxide 1000
Water 1000
Standard molecules 1
Other molecules 0.01
35
The Levenberg-Marquardt algorithm is implemented through the lsqnonlin function in
MATLAB, with its options argument as: options = optimset(’Display’,’off’,’TolFun’,1e-
15).
The absorbances of six molecules with some of the highest concentrations are plotted
over the measured absorbance in Figure 4.2. These are the molecules benzene, carbon
dioxide, ethane, ethyl alcohol, ethylene, and water, as also reported to be present in
breath by B. de Lacy Costello[17]. Concentrations found for all molecules for this sample
are given at Appendix A.3.
800 850 900 950 1,000 1,050 1,100 1,150 1,200 1,250 1,300
−1
−0.5
0
0.5
1
1.5
·10−1
Wavenumber (cm-1
)
Absorbance(base-10)
Measured absorbance spectrum overlaid with molecule absorbances
Absorbance
Benzene
Carbon dioxide
Ethane
Ethyl alcohol
Ethylene
Water
Figure 4.2: Measured absorbance of breath of a healthy patient. The measured ab-
sorbance is overlaid with absorbances of molecules with some of the highest concentra-
tions.
4.3 Ethanol estimation by CO2 removal
4.3.1 Motivation for CO2 removal
As discussed in subsection 3.2.2, the peaks of the measured absorbance and the peaks
of the absorbance of CO2 from the calculated concentrations do not match in height as
36
seen in Figure 4.3.
1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083 1,084
0
0.5
1
1.5
2
2.5
·10−1
Wavenumber (cm-1
)
Absorbance(base-10)
Closer look at the measured absorbance and calculated CO2 and ethanol
Experimental data
Carbon dioxide
Ethyl alcohol
Water
Figure 4.3: Measured absorbance of the breath of a healthy patient. The absorbance is
overlaid with the spectra of water, carbon dioxide, and ethyl alcohol. This figure shows
how the measured carbon dioxide peaks are far from the calculated ones.
This difference in height of the CO2 peaks and the measured peaks also affects the
estimation of concentration for the other major molecule present in this region, namely
ethanol. Hence it could prove useful to remove CO2 and its uncertainties in order to
get a better estimate of the ethanol concentration, much in the same way as A. Reyes
Reyes[21] did for acetone.
4.3.2 Implementation of CO2 removal for ethanol estimation
The method described in section 4.2 is used to determine the concentrations of the
molecules found using subsection 4.1.1 and subsection 4.1.3. The calculated absorbance
contributions of all molecules except for that of CO2 are subtracted from the measured
absorbance, as seen in Figure 4.4b. This is done to single out the absorbance contribution
of CO2 as best as possible, so it can more easily be removed. All that is left is noise around
zero absorbance and the absorbance contribution of CO2. The peaks from CO2 are now
37
identified using a peak detection algorithm, and then fitted by normal distributions. This
way the actual measured CO2 peaks are fitted including any deviations they have from
the CO2 spectrum according to the database. The fits are then subtracted from the
signal, and all that is left is the difference between the measured absorbance and the
absorbance contributions of all molecules as calculated using section 4.2, which is mainly
noise around zero absorbance as seen in Figure 4.4c. The contributions of all molecules
other than CO2 are now summed back on this in order to get the full measurement back
minus any absorbance by CO2, as shown in Figure 4.4d. From this absorbance spectrum
the concentrations are once again determined using the method described in section 4.2.
The newly calculated ethanol concentration is 2% lower than that calculated before
the use of this method. Since this method of ethanol estimation has not been tested on
any samples for which the ethanol concentration is known before hand, not much can be
said on the accuracy of this method. The same method is however applied successfully
by removing water peaks to determine the acetone concentration in the region 1150 cm−1
to 1250 cm−1
[21, 22].
38
Step by step removal of CO2 peaks
1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100
0
1
2
·10−1
Absorbance(base-10)
Experimental data
(a)
1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100
0
1
2
·10−1
Absorbance(base-10)
Experimental data minus database molecules, with CO2
(b)
1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100
−0.4
−0.2
0
0.2
0.4
0.6
·10−1
Absorbance(base-10)
Experimental data minus database molecules, removed CO2
(c)
1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100
−0.4
−0.2
0
0.2
0.4
0.6
·10−1
Wavenumber
Absorbance(base-10)
Experimental data with database molecules, removed CO2
(d)
Figure 4.4: Step by step removal of the CO2 peaks.
39
5. Disease prediction by machine
learning
For the breath samples measured, an analytical study based on p-values was performed
to predict and to categorize the asthmatic and healthy samples[18]. This study did
not get any definitive results for prediction. Multiple articles in literature propose that
methods more complicated than just using the p-values could give a better results[23][24].
Finding the molecules which differentiate between a healthy and non-healthy subject,
and predicting diseases of a patient are both mentioned as possibilities. Since neither of
these sources applies machine learning to actually predict disease, and the application of
machine learning performed on breath data is not widespread[17], an attempt to do so
is made. The analysis is performed on 70 absorbance spectra from healthy patients, and
70 absorbance spectra from asthmatic patients.
5.1 Machine learning for classification
Algorithms performing classification by machine learning look for patterns in the data of
the different samples. Because for the current set of samples it is known whether they
originate from healthy or asthmatic patients, supervised learning methods are applied.
This means the machine learning algorithm is trained to recognize patterns of healthy
and asthmatic samples by feeding it a part of all samples. In this case 98 of 140 samples
are used to train the algorithms and the other 42 to test the accuracy of the algorithm.
5.2 Classification by trial-and-error
Since the scope of this project does not allow for a study of the details of machine learn-
ing and the construction of an algorithm specifically made for these datasets, existing
algorithms are used. Based on an extensive study of what kind of algorithms are most
useful in the field of breath analysis, the focus initially lies on support vector machines
(SVMs)[23]. Use is made of the website mlcomp.org, which was made specifically for up-
loading datasets, uploading machine learning algorithms, and running these algorithms
on the datasets[25]. This website expresses the rate of success in a single number, the
error, which is the amount of falsely classified samples divided by the total amount of
samples. Since most algorithms run in less than 5 minutes and they can run simul-
taneously, many different algorithms can be tested in a short amount of time. Aside
40
from the complete set of absorbances of 140 samples over the wavelength range of 832
to 1263 cm−1
, various subsets of this are tested as well. Among these are the set of 36
wavenumbers for which the p-values are below 0.005, the set of concentrations of 167
compounds as derived from the 140 absorbances, and the set of 30 compounds with the
highest concentrations as derived from the 140 absorbances.
5.2.1 Basic explanation of some algorithms used
The basic concept of some of the better working algorithms as seen in Table 5.1 is
explained here. Commonly, these algorithms reduce each spectrum to a set of features
which are then used to represent the various spectra as points in a feature space. The
entire space is then divided into volumes with borders based on the geometrical distances
between the training spectra: these are the classifiers. The test spectra are then portrayed
as points in this space in the same way as the training spectra and classified according
to which volume they inhabit. The way the spectra are portrayed as points in a space,
and the way the classifiers are built up from these points is different for each algorithm.
Method 1 in Table 5.1, the Normalized Gaussian radial basis function network, uses
at its core a method of k-means clustering. k-means clustering aims to partition the
absorbance spectra into k clusters in which each spectrum belongs to the cluster with
the nearest mean.
Possibly the simplest classification, the nearest neighbour algorithms of methods 2
and 3 in Table 5.1 classify the test spectra based on the closest training spectra in the
feature space.
Method 4, 5, and 6 in Table 5.1 are support vector machines. They represent the
spectra as points in space, mapped so that the spectra of the separate categories are
divided by a clear gap that is as wide as possible. New spectra are then mapped into
that same space and predicted to belong to a category based on which side of the gap
they fall on.
Table 5.1: Best error rates for eight algorithms out of 41 run on the entire dataset of
absorbances of 70 healthy and 70 asthmatic patients. Errors have possible values between
0 and 1. Names of algorithms are as stated on the website[25].
Method Description of algorithm Name of algorithm Error
1 Normalized Gaussian radial basis
function network
RBFNetwork weka nominal 0.167
2 K-nearest neighbours classifier IBk weka nominal 0.190
3 Nearest neighbours classifier IB1 weka nominal 0.190
4 Linear support vector machine for
multiclass classification
svmlight multiclass-linear 0.190
5 Support vector machine for multi-
class classification
svmlight multiclass 0.190
6 Support vector machine for binary
classification
svmlight-linear 0.214
41
5.3 Results
Results over the various datasets seem to generally be best for algorithms such as nearest
neighbor and support vector machines. A trend is also seen in that the larger data sets
get less miss-classifications. The smallest errors are found for the unmodified data set of
all absorbances over the entire wavelength range, which are given in Table 5.1.
The similarity of the errors from Table 5.1 can be explained by the small test set
of 42 samples. The various nearest neighbour algorithms function quite similarly, and
the same can be said for the support vector machines. As the results from Table 5.1
are the best from a total of 41 different algorithms, they corroborate the use of SVMs
as adviced by the study on machine learning algorithms for breath analysis[23]. The
best algorithm is the Normalized Gaussian radial basis function network with an error of
0.167, or 7 miss-classified samples out of the 42 test samples. The exact reason why this
algorithm performs better than the others has not been determined. The errors stated
are the combined errors of healthy samples falsely classified as asthmatic and the other
way around. By running the algorithms outside of mlcomp.org the separate errors of
both types of miss-classification can likely be obtained, which increases the knowledge of
the accuracy of a certain type of classification.
Without in-depth knowledge of the algorithms and further testing, there is no guar-
antee that they will also work on further datasets. The importance of these preliminary
results lies in the clear quantitative results expressed as the error. Since with little knowl-
edge a success rate of 83.3% is obtained, further research in machine learning applied for
breath classification could yield more successful algorithms that can be applied with the
current QCL spectroscopy setup in medical diagnosis.
42
6. Conclusions and outlook
Here the main problems from the previous chapters and their solutions are discussed.
Additionally, improvements for further research are made.
6.1 QCL setup
The intensity fluctuations in the light emitted by the QCL caused by mode hopping are
resolved by normalisation using the monitor and main detector as described in chapter 2.
One way to improve the system and forego the need of all the calibrations described in
this thesis is to use the monitor signal to measure the actual wavenumber. A Michelson-
Morley interferometer was considered to be put in when the setup was still being built.
It was omitted since there were plans to move the setup and a Michelson-Morley inter-
ferometer would be too sensitive for vibrations. Both a diffraction spectrometer and a
cell reference can be used as well, the latter being favorable as it is the cheaper option.
The sensitivity can also be improved by increasing the delay the multi-pass cavity
applies to the beam such that both the beams going to the monitor and the main detector
can instead be measured using one detector. This improves the signal since measuring
on two separate detectors introduces two independent sources of noise, whereas a single
detector does not[26].
6.2 Data calibration
The wavenumber uncertainty caused by noise in the voltage over the piezo is resolved by
calibration done in post-processing as described in chapter 3. Despite this calibration,
the standard deviation over ten absorbances spectra from a single sample is still relatively
large.
Even after the wavenumber calibration the resulting absorbance spectrum does not
match with CO2 and H2O peaks from the databases. This is caused by the high variation
of the twenty monitor measurements of which the mean is the basis of the wavenumber
calibration. Regardless, wavenumber calibration is still useful as an intermediate calibra-
tion step, as it calibrates more details than the calibration to CO2 and H2O performed
afterwards. One cause for this variation between measurements is expected to be the
noise on the voltage applied over the piezo. Other contributions are temperature fluc-
tuations that influence the laser performance, and the diffraction grating that reflects a
small range of wavelength rather than a single wavelength back into the laser diode.
43
The height of the measured absorbance peaks does not generally match with the
heights of the CO2 peaks from the database, as seen in Figure 3.12. It can be seen that
the height of each peak is about equal between samples. This indicates that the difference
in height of the various measured absorbance peaks to the CO2 peaks is not a matter of
noise, but rather a consistent wavenumber dependent influence from a part of the setup,
or possibly a molecule in breath. A source for these mismatching absorbances has not
yet been pinpointed.
6.3 Determination of molecules
A general disadvantage of choosing molecules based on their molar absorptivity is that
some molecules might have a large impact on the eventual absorbance due to a high con-
centration, despite having a low absorptivity. They would therefore be falsely eliminated
from the list of molecules that is checked.
There is also the drawback of excluding molecules which might be an important part of
the absorbance, but are not selected since they do not have adequate molar absorptivity in
the p-regions. This makes the subsequent determination of concentrations less accurate.
It is possible to allow for more p-regions, either by decreasing the minimum amount of
consecutive points needed or heightening the maximum p-values of which these regions
consist. This way more molecules will have some absorptivity in one of the regions and
therefore will not be scrapped.
Both problems have to do with selecting molecules based on intensity, and they can
be resolved in advance by putting molecules that play an important role in the eventual
absorbance in the standard list of molecules. This requires a priori knowledge of the
important molecules which is obtained from literature[17] and simply by performing
the least squares method to get the concentration on a large list of molecules. The
concentrations will not be as accurate for a large list, but all molecules that contribute
considerably to the spectrum can be picked out for later runs with strict molecule lists.
6.4 Determination of concentrations
From the shapes of CO2 and H2O in Figure 4.2 it can be seen that the concentrations
calculated using the lsqnonlin function are in the right order of magnitude. Since the
standard deviation over several measured absorbances of a single sample is still relatively
high even after wavenumber calibration, the calculated concentrations for the molecules
with lower concentrations are not very accurate. The average concentrations found for
several molecules over all samples are given in Figure 6.1. The difference in amount of
H2O present can be explained by a filter in the setup which catches the moisture in
breath before it enters the cavity. The large amount of ethanol measured for the samples
can be explained by the cleaning using ethanol of the mouthpieces through which the
breath samples are collected. The standard deviations portrayed are combinations of the
uncertainties from the measurements and the variation between breath of the patients.
44
CO2 H2O ACETONE NH3 CH4 ETHANOL
0.001
0.01
0.1
1
10
100
1,000
10,000
100,000
Concentration(ppmv) Average molecule concentrations over all samples
Healthy samples Asthmatic samples VSL healthy
Figure 6.1: Average concentrations of molecules for healthy and asthmatic children com-
pared to VSL data of healthy adults.
6.5 Categorizing health status by classification
An algorithm with a classification success rate of 83.3% is obtained. Further research in
machine learning applied for breath classification could yield more successful algorithms
that can be applied with the current QCL spectroscopy setup in medical diagnosis.
The accuracy of the classification can be improved by looking more closely into the
used algorithms, and improving them using a-priori knowledge of the samples such as
regions of relatively high signal-to-noise ratio. Wavenumber regions can be given weights
based on the signal-to-noise ratios so the algorithm depends more heavily on these accu-
rate regions.
The error rate could also tell a lot more if the amount of false positives and false
negatives is given. This would give separate error rates based on whether the algorithm
judges the sample as healthy or asthmatic. It is possible that one of these error rates is
much lower than the other, thereby increasing the certainty of the result above the total
error rate.
45
Bibliography
[1] I. Horvath and J.C. de Jongste. Exhaled Biomarkers. European Respiratory Society,
2010.
[2] Erik Swietlicki, Sanjiv Puri, Hans-Christen Hansson, and Hans Edner. Urban air
pollution source apportionment using a combination of aerosol and gas monitoring
techniques. Atmospheric Environment, 30(15):2795–2809, 1996.
[3] Zhe Hou. Breath analysis using infrared spectroscopy. August 2013.
[4] Ellen Holthoff, John Bender, Paul Pellegrino, and Almon Fisher. Quantum cascade
laser-based photoacoustic spectroscopy for trace vapor detection and molecular dis-
crimination. Sensors, 10(3):1986–2002, 2010.
[5] A Reyes-Reyes, Z Hou, E Van Mastrigt, R C Horsten, J C De Jongste, M W Pi-
jnenburg, H P Urbach, and N Bhattacharya. Multicomponent gas analysis using
broadband quantum cascade laser spectroscopy. Optics express, 2014.
[6] DF Swinehart. The beer-lambert law. Journal of chemical education, 39(7):333,
1962.
[7] Silvia E Braslavsky. Glossary of terms used in photochemistry, (iupac recommenda-
tions 2006). Pure and Applied Chemistry, 79(3):293–465, 2007.
[8] Northwest infrared https://secure2.pnl.gov/nsd/nsd.nsf/welcome.
[9] Hitran database http://www.cfa.harvard.edu/hitran/.
[10] Block engineering laserscope for gas analysis.
[11] Jerome Faist, Federico Capasso, Deborah L Sivco, Carlo Sirtori, Albert L Hutchin-
son, and Alfred Y Cho. Quantum cascade laser. Science, 264(5158):553–556, 1994.
[12] J Barry McManus, Paul L Kebabian, and MS Zahniser. Astigmatic mirror multipass
absorption cells for long-path-length spectroscopy. Applied Optics, 34(18):3336–
3348, 1995.
[13] Mct ir detector module.
[14] Pvi-4te detector specifications.
46
[15] Matlab peakfinder code http://www.mathworks.com/matlabcentral/fileexchange/25500-
peakfinder.
[16] Atmosphere of earth - http://en.wikipedia.org/wiki/atmosphere of earth.
[17] B de Lacy Costello, A Amann, H Al-Kateb, C Flynn, W Filipiak, T Khalid, D Os-
borne, and N M Ratcliffe. A review of the volatiles from the healthy human body.
Journal of Breath Research, 8(1):014001, 2014.
[18] Esther Van Mastrigt, Adonis Reyes-Reyes, Karl Brand, Nandini Bhattacharya, H.P.
Urbach, Andrew Stubbs, Johan C. De Jongste, and Marielle W. Pijnenburg. Ex-
haled Breath Profiling In Children With Asthma And Cystic Fibrosis: A Laser
Spectroscopy-Based Method Focused On Hydrocarbons, pages A5639–A5639–. Amer-
ican Thoracic Society, May 2014.
[19] Chet L. Leach. Approaches and challenges to use freon propellant replacements.
Aerosol Science and Technology, 22(4):328–334, 1995.
[20] JorgeJ. Mor´e. The levenberg-marquardt algorithm: Implementation and theory. In
G.A. Watson, editor, Numerical Analysis, volume 630 of Lecture Notes in Mathe-
matics, pages 105–116. Springer Berlin Heidelberg, 1978.
[21] Adonis Reyes-Reyes, Roland C. Horsten, H. Paul Urbach, and Nandini Bhat-
tacharya. Study of the exhaled acetone in type 1 diabetes using quantum cascade
laser spectroscopy. Analytical Chemistry, 87(1):507–512, 2015. PMID: 25506743.
[22] Florian Adler, Piotr Maslowski, Aleksandra Foltynowicz, Kevin C. Cossel, Travis C.
Briles, Ingmar Hartl, and Jun Ye. Mid-infrared fourier transform spectroscopy with
a broadband frequency comb. Opt. Express, 18(21):21861–21872, Oct 2010.
[23] A Smolinska, A-Ch Hauschild, R R R Fijten, J W Dallinga, J Baumbach, and F J
van Schooten. Current breathomics—a review on data pre-processing techniques
and machine learning in metabolomics breath analysis. Journal of Breath Research,
April 2014.
[24] J. D. Malley, A. Dasgupta, and J. H. Moore. The limits of p-values for biological
data mining. BioData Min, 6(1):10, 2013.
[25] Mlcomp - http://mlcomp.org/.
[26] D.D. Nelson, J.H. Shorter, J.B. McManus, and M.S. Zahniser. Sub-part-per-billion
detection of nitric oxide in air using a thermoelectrically cooled mid-infrared quan-
tum cascade laser spectrometer. Applied Physics B, 75(2-3):343–350, 2002.
47
Appendices
48
A. Results for a sample of a
healthy patient
A.1 Standard molecules
Table A.1: List of standard molecules used.
Molecule
Acetic acid
Acetone
Ammonia anhydrous
Benzene
Carbon dioxide
Carbon monoxide
Ethane
Ethyl alcohol
Ethylene
Formaldehyde monomer
Methane
Pentane
Propane
Water
A.2 Molecules found by p-region analysis
Table A.2: Table of molecules that are present in either or both of the p-regions. The
p-regions were found using a maximum p-value of 05 for at least 10 consecutive data
points or 0.5 cm−1
. These molecules have a mean absorptivity in this region of at least
10−4
.
1181.8-1182.55 1261.4-1262.05
1-Chloro-1 1 2 2-tetrafluoroethane (R124A) 1 1
1-Hexanoic acid 1 0
1 1 1-Chlorodifluoroethane (Freon-142B) 1 0
49
Table A.2: Table of molecules that are present in either or both of the p-regions. The
p-regions were found using a maximum p-value of 05 for at least 10 consecutive data
points or 0.5 cm−1
. These molecules have a mean absorptivity in this region of at least
10−4
.
1181.8-1182.55 1261.4-1262.05
1 1 1-Trifluoroacetone 1 0
1 1 1-Trifluoroethane (R143A) 1 1
1 1 1 2 3 3 3-Heptafluoropropane (HFC227EA) 1 1
1 2-Dibomotetrafluoroethane (Freon-114B2) 1 0
1 2-Dibromoethane (EDB) 1 0
1 2-Dichloro-1-fluoroethane (R141) 1 0
1 2-Dichloro-1 1 2-trifluoroethane (F132A) 1 0
1 2-Dimethoxyethane 1 0
1 4-Dioxane 0 1
2-Chloro-1 1 1-trifluoroethane (R133A) 1 1
2-Chloropropane 0 1
2-Chlroro-1 1 1 2-tetrafluoroethane (HCFC-124) 1 0
2-Ethoxyethyl acetate 1 1
2-Methoxyethanol 1 0
2H 3H-Perfluoropentane 1 1
2 2-Dichloro-1 1 1-trifluoroethane (Freon-123) 1 1
2 2-Difluorotetrachloroethane (F112A) 1 0
2 2 2-Trifluoroethanol 1 1
2 2 4-Trimethyl-1 3-pentanediol isobutyrate (Texanol) 1 1
3-Methyl-2-pentanone 1 0
Acetic acid 1 1
Acetic anhydride 1 0
Acetid acid dimer 0 1
Acetone cyanohydrin 1 0
Aniline 0 1
Benzyl chloride 0 1
Bromochlorodifluoromethane (Freon-12B1) 1 0
Butyl acetate 0 1
Butyric acid 1 0
Carbonyl fluoride 0 1
Chloromethyl ethyl ether 1 1
Chloromethyl methyl ether 1 0
Chloropentafluoroethane (R115) 1 1
Chlorosulfonyl isocyanate (CSI) 1 0
Diborane 1 0
Dichloromethane 0 1
Diethyl ether 1 0
50
Table A.2: Table of molecules that are present in either or both of the p-regions. The
p-regions were found using a maximum p-value of 05 for at least 10 consecutive data
points or 0.5 cm−1
. These molecules have a mean absorptivity in this region of at least
10−4
.
1181.8-1182.55 1261.4-1262.05
Diethyl sulfide 0 1
Diisopropyl ether 1 0
Diisopropylamine 1 0
Diketene 0 1
Dimethoxymethane 1 0
Dimethyl carbonate 0 1
Dimethyl ether 1 0
Dimethylcarbamoyl chloride 0 1
Dipropylene glycol methyl ether 1 1
Ethyl acetate 0 1
Ethyl acrylate 1 1
Ethyl bromide (Halon-2001) 0 1
Ethyl butyrate 1 1
Ethyl chloroformate 1 0
Ethyl formate 1 0
Ethyl methyl ether 1 0
Ethyl tert-butyl ether 0 1
Ethyl trifluoroacetate 1 1
Ethylvinyl ether (EVE) 1 0
Formic acid dimer 1 0
Freon-113 (1 1 2-Trichlorortrifluoroethane) 1 0
Freon-114 (1 2-dichlorortetrafluoroethane) 1 1
Freon-134a (1 1 1 2-tetrafluoroethane) 1 0
Freon-218 (octafluoropropane) 0 1
Freon-C318 (octafluorocyclobutane) 0 1
Furan 1 0
Glycoaldehyde 0 1
Guaiacol 1 1
Hexachloro-1 3-butadiene 1 0
Hexafluoroacetone 1 1
Hexafluoroethane (Freon-116) 0 1
Hexafluoroisobutylene 1 1
Hexafluoropropene 1 0
Hydrogen peroxide 0 1
Isobutyl acetate 0 1
Isopentyl acetate 0 1
Isopropyl acetate 0 1
51
Table A.2: Table of molecules that are present in either or both of the p-regions. The
p-regions were found using a maximum p-value of 05 for at least 10 consecutive data
points or 0.5 cm−1
. These molecules have a mean absorptivity in this region of at least
10−4
.
1181.8-1182.55 1261.4-1262.05
Methacryloyl chloride 0 1
Methanesulfonyl chloride 1 0
Methyl acetate 0 1
Methyl acrylate 1 1
Methyl benzoate 1 1
Methyl formate 1 0
Methyl iodide 0 1
Methyl isoamyl ketone 1 0
Methyl isobutyl ketone (MIBK) 1 0
Methyl isobutyrate 1 1
Methyl methacrylate 1 0
Methyl pivalate 1 0
Methyl propionate 1 1
Methyl propyl ketone 1 0
Methyl salicylate 1 1
Methylchloroformate (MCF) 1 0
Methyldichlorodisilanes (mixed isomers) 0 1
Methylethyl ketone 1 0
Methyltrichlorosilane 0 1
Methylvinyl ketone 1 1
N N’-Dimethyl formamide (DMF) 0 1
N N-Diethylaniline 0 1
N N-diethyl formamide 0 1
Nitrogen dioxide and dinitrogen tetroxide 0 1
Octanoic acid 1 0
Paraldehyde 1 0
Pentafluoroethane (f125) 1 0
Perfluorobutane 1 1
Perfluoroisobutylene (PFIB) 1 0
Phenol 1 1
Propargyl chloride 0 1
Propionic acid (and some dimer) 1 0
Propyl acetate 0 1
Propylene glycol 0 1
Sulfuryl chloride 1 0
Sulfuryl fluoride 0 1
Tetrafluoromethane 0 1
52
Table A.2: Table of molecules that are present in either or both of the p-regions. The
p-regions were found using a maximum p-value of 05 for at least 10 consecutive data
points or 0.5 cm−1
. These molecules have a mean absorptivity in this region of at least
10−4
.
1181.8-1182.55 1261.4-1262.05
Trichlorofluroethylene 1 0
Trifluoroacetic acid 1 1
Trifluoroacetic anhydride 1 1
Trifluoroacetyl chloride 1 1
Trifluoromethane (Freon-23) 1 0
Trifluoromethylsulfur pentafluoride 0 1
Trifluoronitrosomethane 1 1
Trimethylamine 1 0
Vinayl acetate 0 1
bis(2-Chloroethyl) ether 0 1
cis-1 3-Dichloropropene 0 1
m-Cresol 1 0
n-Amyl acetate 0 1
o-Toluidine 0 1
tert-Amyl methyl ether 1 0
A.3 Concentrations found for a single healthy pa-
tient
In order to find the concentrations, the following options argument is used for the lsqnon-
lin function in MATLAB:
options = optimset(’Display’,’off’,’TolFun’,1e-15)
Table A.3: Table of initial concentrations used to calculate the concentrations as given
in Appendix A.3.
Molecule Concentration (ppmv)
Carbon dioxide 1000
Water 1000
Standard molecules 1
Other molecules 0.01
53
Table A.4: Concentrations found using the method described in section 4.2. These are
of a single sample from a healthy patient. The standard list of molecules as given in
section A.1 and the list of molecules as given by section A.2 are used.
Molecule concentration (ppmv)
Acetic acid 2.22045e-14
Acetone 2.22045e-14
Ammonia anhydrous 2.22045e-14
Benzene 1.79671
Carbon dioxide 0.912382
Carbon monoxide 1538.28
Ethane 2.66622
Ethyl alcohol 0.946815
Ethylene 0.109119
Formaldehyde monomer 2.22045e-14
Methane 2.22045e-14
Pentane 2.22045e-14
Propane 2.22045e-14
Water 0.0988024
1-Chloro-1 1 2 2-tetrafluoroethane (R124A) 2.22045e-14
1-Hexanoic acid 2.22045e-14
1 1 1-Chlorodifluoroethane (Freon-142B) 0.00478773
1 1 1-Trifluoroacetone 2.22045e-14
1 1 1-Trifluoroethane (R143A) 2.22045e-14
1 1 1 2 3 3 3-Heptafluoropropane (HFC227EA) 2.22045e-14
1 2-Dibomotetrafluoroethane (Freon-114B2) 2.22045e-14
1 2-Dibromoethane (EDB) 2.22045e-14
1 2-Dichloro-1-fluoroethane (R141) 2.22045e-14
1 2-Dichloro-1 1 2-trifluoroethane (F132A) 2.22045e-14
1 2-Dimethoxyethane 2.22045e-14
1 4-Dioxane 2.22045e-14
2-Chloro-1 1 1-trifluoroethane (R133A) 2.22045e-14
2-Chloropropane 2.22045e-14
2-Chlroro-1 1 1 2-tetrafluoroethane (HCFC-124) 2.22045e-14
2-Ethoxyethyl acetate 2.22045e-14
2-Methoxyethanol 2.30625e-14
2H 3H-Perfluoropentane 2.22045e-14
2 2-Dichloro-1 1 1-trifluoroethane (Freon-123) 2.22045e-14
2 2-Difluorotetrachloroethane (F112A) 0.0104979
2 2 2-Trifluoroethanol 2.22045e-14
2 2 4-Trimethyl-1 3-pentanediol isobutyrate (Texanol) 2.22045e-14
3-Methyl-2-pentanone 2.22045e-14
Acetic acid 2.22045e-14
54
Table A.4: Concentrations found using the method described in section 4.2. These are
of a single sample from a healthy patient. The standard list of molecules as given in
section A.1 and the list of molecules as given by section A.2 are used.
Molecule concentration (ppmv)
Acetic anhydride 2.22045e-14
Acetid acid dimer 2.22045e-14
Acetone cyanohydrin 2.22045e-14
Aniline 0.692194
Benzyl chloride 2.22045e-14
Bromochlorodifluoromethane (Freon-12B1) 2.22045e-14
Butyl acetate 2.22045e-14
Butyric acid 2.22045e-14
Carbonyl fluoride 2.22045e-14
Chloromethyl ethyl ether 2.22045e-14
Chloromethyl methyl ether 2.22045e-14
Chloropentafluoroethane (R115) 0.0115797
Chlorosulfonyl isocyanate (CSI) 2.22045e-14
Diborane 0.114448
Dichloromethane 2.22045e-14
Diethyl ether 0.0152475
Diethyl sulfide 2.22045e-14
Diisopropyl ether 2.22045e-14
Diisopropylamine 2.22045e-14
Diketene 2.22045e-14
Dimethoxymethane 2.22045e-14
Dimethyl carbonate 2.22045e-14
Dimethyl ether 2.22045e-14
Dimethylcarbamoyl chloride 2.22045e-14
Dipropylene glycol methyl ether 2.22045e-14
Ethyl acetate 2.22045e-14
Ethyl acrylate 0.039846
Ethyl bromide (Halon-2001) 2.22046e-14
Ethyl butyrate 2.22045e-14
Ethyl chloroformate 2.22045e-14
Ethyl formate 2.22045e-14
Ethyl methyl ether 2.22045e-14
Ethyl tert-butyl ether 0.195303
Ethyl trifluoroacetate 2.22045e-14
Ethylvinyl ether (EVE) 2.22045e-14
Formic acid dimer 2.22045e-14
Freon-113 (1 1 2-Trichlorortrifluoroethane) 2.22045e-14
Freon-114 (1 2-dichlorortetrafluoroethane) 2.22045e-14
55
Table A.4: Concentrations found using the method described in section 4.2. These are
of a single sample from a healthy patient. The standard list of molecules as given in
section A.1 and the list of molecules as given by section A.2 are used.
Molecule concentration (ppmv)
Freon-134a (1 1 1 2-tetrafluoroethane) 2.22045e-14
Freon-218 (octafluoropropane) 2.22045e-14
Freon-C318 (octafluorocyclobutane) 2.22045e-14
Furan 2.22045e-14
Glycoaldehyde 2.22045e-14
Guaiacol 2.22045e-14
Hexachloro-1 3-butadiene 2.22045e-14
Hexafluoroacetone 2.22045e-14
Hexafluoroethane (Freon-116) 2.22045e-14
Hexafluoroisobutylene 2.22045e-14
Hexafluoropropene 2.22045e-14
Hydrogen peroxide 2.22045e-14
Isobutyl acetate 2.22045e-14
Isopentyl acetate 2.22045e-14
Isopropyl acetate 2.22045e-14
Methacryloyl chloride 0.101246
Methanesulfonyl chloride 0.0190093
Methyl acetate 2.22045e-14
Methyl acrylate 2.22045e-14
Methyl benzoate 2.22045e-14
Methyl formate 2.22045e-14
Methyl iodide 2.22045e-14
Methyl isoamyl ketone 2.22045e-14
Methyl isobutyl ketone (MIBK) 2.22045e-14
Methyl isobutyrate 2.22045e-14
Methyl methacrylate 2.22045e-14
Methyl pivalate 0.0277697
Methyl propionate 2.22045e-14
Methyl propyl ketone 2.22045e-14
Methyl salicylate 2.22045e-14
Methylchloroformate (MCF) 2.22046e-14
Methyldichlorodisilanes (mixed isomers) 2.22045e-14
Methylethyl ketone 2.22045e-14
Methyltrichlorosilane 2.22046e-14
Methylvinyl ketone 2.22045e-14
N N’-Dimethyl formamide (DMF) 2.22045e-14
N N-Diethylaniline 2.22045e-14
N N-diethyl formamide 2.22045e-14
56
Table A.4: Concentrations found using the method described in section 4.2. These are
of a single sample from a healthy patient. The standard list of molecules as given in
section A.1 and the list of molecules as given by section A.2 are used.
Molecule concentration (ppmv)
Nitrogen dioxide and dinitrogen tetroxide 2.22045e-14
Octanoic acid 2.22045e-14
Paraldehyde 2.22045e-14
Pentafluoroethane (f125) 2.22045e-14
Perfluorobutane 2.22045e-14
Perfluoroisobutylene (PFIB) 0.000842043
Phenol 2.22045e-14
Propargyl chloride 0.250046
Propionic acid (and some dimer) 2.22045e-14
Propyl acetate 2.22045e-14
Propylene glycol 2.22045e-14
Sulfuryl chloride 2.22045e-14
Sulfuryl fluoride 2.22046e-14
Tetrafluoromethane 2.22045e-14
Trichlorofluroethylene 2.22045e-14
Trifluoroacetic acid 2.22045e-14
Trifluoroacetic anhydride 0.00520583
Trifluoroacetyl chloride 2.22045e-14
Trifluoromethane (Freon-23) 2.22045e-14
Trifluoromethylsulfur pentafluoride 2.22045e-14
Trifluoronitrosomethane 2.22045e-14
Trimethylamine 2.22045e-14
Vinayl acetate 2.22045e-14
bis(2-Chloroethyl) ether 2.22045e-14
cis-1 3-Dichloropropene 2.22045e-14
m-Cresol 2.22045e-14
n-Amyl acetate 2.22045e-14
o-Toluidine 2.22045e-14
tert-Amyl methyl ether 2.22045e-14
57
B. List of molecules
B.1 Breath molecules with absorbance in the 1020
to 1100 cm−1
region.
Table B.1: Molecules as filtered using subsection 4.1.3 in the 1020 to 1100 cm−1
region
on 199 molecules that should be present in breath[17] .
Molecules
1-Hexanoic acid
1-Pentanol (n-amyl alcohol)
1-Propanol
1 2-Diclorobenzene
1 3-Dichlorobenzene
1 4-Dichlorobenzene
1 4-Dioxane
2-Butoxyethanol
2-Ethoxyethyl acetate
2-Ethyl-1-hexanol
2-Methoxyethanol
2-methyl-1-propanol (IBA isobutanol)
2-Methyl-2-propanal (Isobutenal)
2-Methylfuran
2 3-Butanedione
3-methylfuran
Acetic acid
Acetic anhydride
Acetol
Allene (1 2-propadiene)
Allyl alcohol
Ammonia anhydrous
Benzene
Benzyl alcohol
Butyl acetate
Butyric acid
58
Table B.1: Molecules as filtered using subsection 4.1.3 in the 1020 to 1100 cm−1
region
on 199 molecules that should be present in breath[17] .
Molecules
Cadaverine
Chlorobenzene
Cyclopentene
Cyclopropane
Diethyl ether
Dimethoxymethane
Dimethyl ether
Dimethyl sulfoxide
Dipropylene glycol methyl ether
Ethyl acetate
Ethyl acrylate
Ethyl alcohol
Ethyl butyrate
Ethyl tert-butyl ether
Ethylene
Ethylvinyl ether (EVE)
Furan
Furfural
Furfuryl alcohol
Glycoaldehyde
Guaiacol
Isobutyl acetate
Isopentyl acetate
Isopropyl acetate
Isopropyl alcohol
Methyl acetate
Methyl alcohol
Methyl methacrylate
Methyl propionate
Methylamine
n-Butyl alcohol
N N’-Dimethyl formamide (DMF)
N N-diethyl formamide
Octanoic acid
Propionic acid (and some dimer)
Propylene glycol
Propylene sulfide
Pyridine
sec-Butyl alcohol
59
Table B.1: Molecules as filtered using subsection 4.1.3 in the 1020 to 1100 cm−1
region
on 199 molecules that should be present in breath[17] .
Molecules
tert-Butyl methyl ether
Thiophene
trans-Crotonaldehyde
Trimethylamine
60

More Related Content

Viewers also liked

Установка электроцентробежного насоса
Установка электроцентробежного насосаУстановка электроцентробежного насоса
Установка электроцентробежного насосаНикита Марущак
 
James Heverin Resume 3.3
James Heverin Resume 3.3James Heverin Resume 3.3
James Heverin Resume 3.3Jim Heverin
 
Решение Casein 2016: Энергоэффективность нефтегазова промышленность Татнефть
Решение Casein 2016: Энергоэффективность нефтегазова промышленность ТатнефтьРешение Casein 2016: Энергоэффективность нефтегазова промышленность Татнефть
Решение Casein 2016: Энергоэффективность нефтегазова промышленность ТатнефтьНикита Марущак
 
James Heverin Resume 3.3
James Heverin Resume 3.3James Heverin Resume 3.3
James Heverin Resume 3.3Jim Heverin
 
Resume Chitralekha_Mal.doc
Resume Chitralekha_Mal.docResume Chitralekha_Mal.doc
Resume Chitralekha_Mal.docChitralekha Mal
 
Social Psychology Fnbe course outline
Social Psychology Fnbe  course outline Social Psychology Fnbe  course outline
Social Psychology Fnbe course outline SolomonTangerine
 
Integated final report bak kut teh
Integated final report   bak kut tehIntegated final report   bak kut teh
Integated final report bak kut tehSolomonTangerine
 
Social Psychology Video Fnbe research project
Social Psychology Video Fnbe research projectSocial Psychology Video Fnbe research project
Social Psychology Video Fnbe research projectSolomonTangerine
 
Project 1 musical performance january 2015
Project 1 musical performance january 2015Project 1 musical performance january 2015
Project 1 musical performance january 2015SolomonTangerine
 
Business Tutorial Questions Compilation
Business Tutorial Questions CompilationBusiness Tutorial Questions Compilation
Business Tutorial Questions CompilationSolomonTangerine
 
English Assignment 2 (research report)
English Assignment 2 (research report) English Assignment 2 (research report)
English Assignment 2 (research report) SolomonTangerine
 
Enbe the journal note brief
Enbe the journal note brief Enbe the journal note brief
Enbe the journal note brief SolomonTangerine
 
English Essay Comparison Action Comedy movies
English Essay Comparison Action Comedy moviesEnglish Essay Comparison Action Comedy movies
English Essay Comparison Action Comedy moviesSolomonTangerine
 

Viewers also liked (20)

Asokha Varthanan
Asokha VarthananAsokha Varthanan
Asokha Varthanan
 
Установка электроцентробежного насоса
Установка электроцентробежного насосаУстановка электроцентробежного насоса
Установка электроцентробежного насоса
 
James Heverin Resume 3.3
James Heverin Resume 3.3James Heverin Resume 3.3
James Heverin Resume 3.3
 
Решение Casein 2016: Энергоэффективность нефтегазова промышленность Татнефть
Решение Casein 2016: Энергоэффективность нефтегазова промышленность ТатнефтьРешение Casein 2016: Энергоэффективность нефтегазова промышленность Татнефть
Решение Casein 2016: Энергоэффективность нефтегазова промышленность Татнефть
 
James Heverin Resume 3.3
James Heverin Resume 3.3James Heverin Resume 3.3
James Heverin Resume 3.3
 
Resume Chitralekha_Mal.doc
Resume Chitralekha_Mal.docResume Chitralekha_Mal.doc
Resume Chitralekha_Mal.doc
 
Annabel Kingston CV
Annabel Kingston CVAnnabel Kingston CV
Annabel Kingston CV
 
Paint the world
Paint the worldPaint the world
Paint the world
 
Social Psychology Fnbe course outline
Social Psychology Fnbe  course outline Social Psychology Fnbe  course outline
Social Psychology Fnbe course outline
 
Integated final report bak kut teh
Integated final report   bak kut tehIntegated final report   bak kut teh
Integated final report bak kut teh
 
Social Psychology Video Fnbe research project
Social Psychology Video Fnbe research projectSocial Psychology Video Fnbe research project
Social Psychology Video Fnbe research project
 
Project 1 musical performance january 2015
Project 1 musical performance january 2015Project 1 musical performance january 2015
Project 1 musical performance january 2015
 
Instrument
InstrumentInstrument
Instrument
 
ITD module Outline 2015
ITD module Outline 2015ITD module Outline 2015
ITD module Outline 2015
 
Enbe.project one jan 2015
Enbe.project one jan 2015Enbe.project one jan 2015
Enbe.project one jan 2015
 
Business Tutorial Questions Compilation
Business Tutorial Questions CompilationBusiness Tutorial Questions Compilation
Business Tutorial Questions Compilation
 
English Assignment 2 (research report)
English Assignment 2 (research report) English Assignment 2 (research report)
English Assignment 2 (research report)
 
Enbe the journal note brief
Enbe the journal note brief Enbe the journal note brief
Enbe the journal note brief
 
English Essay Comparison Action Comedy movies
English Essay Comparison Action Comedy moviesEnglish Essay Comparison Action Comedy movies
English Essay Comparison Action Comedy movies
 
National mosque
National mosqueNational mosque
National mosque
 

Similar to Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Medical physics 2004 - rivard - update of aapm task group no 43 report a ...
Medical physics   2004 - rivard - update of aapm task group no  43 report  a ...Medical physics   2004 - rivard - update of aapm task group no  43 report  a ...
Medical physics 2004 - rivard - update of aapm task group no 43 report a ...Adly3
 
CERN-THESIS-2011-016
CERN-THESIS-2011-016CERN-THESIS-2011-016
CERN-THESIS-2011-016Manuel Kayl
 
M1 - Photoconductive Emitters
M1 - Photoconductive EmittersM1 - Photoconductive Emitters
M1 - Photoconductive EmittersThanh-Quy Nguyen
 
BScThesisOskarTriebe_final
BScThesisOskarTriebe_finalBScThesisOskarTriebe_final
BScThesisOskarTriebe_finalOskar Triebe
 
Inventory of Radiological Methodologies (Inventario de Metodologías Radiológi...
Inventory of Radiological Methodologies (Inventario de Metodologías Radiológi...Inventory of Radiological Methodologies (Inventario de Metodologías Radiológi...
Inventory of Radiological Methodologies (Inventario de Metodologías Radiológi...Omar Alonso Suarez Oquendo
 
Thesis_Sebastian_Ånerud_2015-06-16
Thesis_Sebastian_Ånerud_2015-06-16Thesis_Sebastian_Ånerud_2015-06-16
Thesis_Sebastian_Ånerud_2015-06-16Sebastian
 
Thesis Fabian Brull
Thesis Fabian BrullThesis Fabian Brull
Thesis Fabian BrullFabian Brull
 
VAC Study material VAC
VAC Study material VAC VAC Study material VAC
VAC Study material VAC Vamsi kumar
 
Bio chemistrytheory
Bio chemistrytheoryBio chemistrytheory
Bio chemistrytheoryXINYOUWANZ
 

Similar to Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra (20)

Diplomarbeit
DiplomarbeitDiplomarbeit
Diplomarbeit
 
thesis_choward
thesis_chowardthesis_choward
thesis_choward
 
AAPM REPORT 84
AAPM REPORT 84AAPM REPORT 84
AAPM REPORT 84
 
Medical physics 2004 - rivard - update of aapm task group no 43 report a ...
Medical physics   2004 - rivard - update of aapm task group no  43 report  a ...Medical physics   2004 - rivard - update of aapm task group no  43 report  a ...
Medical physics 2004 - rivard - update of aapm task group no 43 report a ...
 
CERN-THESIS-2012-056
CERN-THESIS-2012-056CERN-THESIS-2012-056
CERN-THESIS-2012-056
 
CERN-THESIS-2011-016
CERN-THESIS-2011-016CERN-THESIS-2011-016
CERN-THESIS-2011-016
 
edc_adaptivity
edc_adaptivityedc_adaptivity
edc_adaptivity
 
MyThesis
MyThesisMyThesis
MyThesis
 
M1 - Photoconductive Emitters
M1 - Photoconductive EmittersM1 - Photoconductive Emitters
M1 - Photoconductive Emitters
 
Sheikh-Bagheri_etal
Sheikh-Bagheri_etalSheikh-Bagheri_etal
Sheikh-Bagheri_etal
 
BScThesisOskarTriebe_final
BScThesisOskarTriebe_finalBScThesisOskarTriebe_final
BScThesisOskarTriebe_final
 
Inventory of Radiological Methodologies (Inventario de Metodologías Radiológi...
Inventory of Radiological Methodologies (Inventario de Metodologías Radiológi...Inventory of Radiological Methodologies (Inventario de Metodologías Radiológi...
Inventory of Radiological Methodologies (Inventario de Metodologías Radiológi...
 
Isl1408681688437
Isl1408681688437Isl1408681688437
Isl1408681688437
 
Thesis_Sebastian_Ånerud_2015-06-16
Thesis_Sebastian_Ånerud_2015-06-16Thesis_Sebastian_Ånerud_2015-06-16
Thesis_Sebastian_Ånerud_2015-06-16
 
Thesis Fabian Brull
Thesis Fabian BrullThesis Fabian Brull
Thesis Fabian Brull
 
Bachelor Arbeit
Bachelor ArbeitBachelor Arbeit
Bachelor Arbeit
 
論文
論文論文
論文
 
Fabio Forcolin Thesis
Fabio Forcolin ThesisFabio Forcolin Thesis
Fabio Forcolin Thesis
 
VAC Study material VAC
VAC Study material VAC VAC Study material VAC
VAC Study material VAC
 
Bio chemistrytheory
Bio chemistrytheoryBio chemistrytheory
Bio chemistrytheory
 

Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

  • 1. Applied Physics Thesis Delft University of Technology Breath analysis by quantum cascade laser spectroscopy Author: Olav Grouwstra Direct supervisor: A. Reyes Reyes Supervisor: N. Bhattacharya Professor: H.P. Urbach March 5, 2015
  • 2. Contents 1 Introduction 3 1.1 Gas analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Quantum cascade laser spectroscopy . . . . . . . . . . . . . . . . . . . . 3 1.3 Previous work on the project . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Goals and thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Quantum Cascade Laser Spectroscopy 5 2.1 Absorbance spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 The quantum cascade laser . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Multipass cavity . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.3 Distance meter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.4 Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Data Calibration 13 3.1 Wavenumber calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1 Four measurement signals . . . . . . . . . . . . . . . . . . . . . . 13 3.1.2 Data acquisition and pre-processing . . . . . . . . . . . . . . . . . 15 3.1.3 Need for additional calibration . . . . . . . . . . . . . . . . . . . . 16 3.1.4 Wavenumber calibration . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 CO2 and H2O calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.1 Method for calibration to CO2 and H2O . . . . . . . . . . . . . . 26 3.2.2 CO2 and H2O calibration results . . . . . . . . . . . . . . . . . . 27 4 Molecules and Concentrations 30 4.1 List of molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1.1 Standard molecules . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1.2 Molecules selected to distinguish between data sets . . . . . . . . 31 4.1.3 Selection based on measurable concentration . . . . . . . . . . . . 34 4.2 Concentration determination . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.1 method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.2 Input and results . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3 Ethanol estimation by CO2 removal . . . . . . . . . . . . . . . . . . . . . 36 4.3.1 Motivation for CO2 removal . . . . . . . . . . . . . . . . . . . . . 36 4.3.2 Implementation of CO2 removal for ethanol estimation . . . . . . 37 1
  • 3. 5 Disease prediction by machine learning 40 5.1 Machine learning for classification . . . . . . . . . . . . . . . . . . . . . . 40 5.2 Classification by trial-and-error . . . . . . . . . . . . . . . . . . . . . . . 40 5.2.1 Basic explanation of some algorithms used . . . . . . . . . . . . . 41 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6 Conclusions and outlook 43 6.1 QCL setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.2 Data calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3 Determination of molecules . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.4 Determination of concentrations . . . . . . . . . . . . . . . . . . . . . . . 44 6.5 Categorizing health status by classification . . . . . . . . . . . . . . . . . 45 Appendices 48 A Results for a sample of a healthy patient 49 A.1 Standard molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 A.2 Molecules found by p-region analysis . . . . . . . . . . . . . . . . . . . . 49 A.3 Concentrations found for a single healthy patient . . . . . . . . . . . . . 53 B List of molecules 58 B.1 Breath molecules with absorbance in the 1020 to 1100 cm−1 region. . . . 58 2
  • 4. 1. Introduction 1.1 Gas analysis Gas analysis is done to find the different molecules present in a gas. Identifying the composition of a substance is vital for various practical applications such as medical diagnostics and environmental monitoring[1][2]. In this work the focus lies on gas analysis applied to medical diagnosis. The diagnosis of diseases is helped by many different techniques such as blood tests or radiological studies like magnetic resonance imaging (MRI), as they give an idea of what is inside the human body. Blood tests can provide this knowledge since changes in metabolic processes significantly influence the compounds in human blood. For example, diabetes increases the amount of acetone in the bloodstream. Hence knowing the concentration of acetone could help diagnose diabetes. The compounds in blood diffuse through the lungs into the breath we exhale, and diagnosis can therefore also be done by analysis of human breath. Breath analysis as opposed to blood tests has the benefit of being non intrusive, thereby being much more easily implemented by medical professionals. Non intrusive medical instruments also undergo less strict control by regulatory parties like the Food and Drug Administration (FDA) and the European Medicines Agency (EMA). While diagnosis through imaging techniques might give a better idea about the origin of a disease, it does not give information on the influence of a disease on the metabolic processes. 1.2 Quantum cascade laser spectroscopy Quantum cascade laser (QCL) spectroscopy is only one among a variety of spectroscopy methods, each of which has its pros and cons. Currently one of the most common techniques used for gas analysis is gas chromatog- raphy – mass spectrometry (GC-MS). Although GC-MS achieves high accuracy, it has certain limitations. GC-MS is a specific test, meaning it requires a priori knowledge of what to test for[3]. This means it is limited in use to only cases where the symptoms are already present, and thus requires pre-diagnosis. GC-MS also requires sample prepara- tion, limiting its use to trained professionals. A gas analysis method using a quantum cascade laser for spectroscopy is treated in this work. This form of analysis does not need sample preparation and is therefore more easily accessible. A priori knowledge of what 3
  • 5. molecules to test for is also not necessary for the measurement itself, which makes it a promising tool for laying statistical relations between concentrations of molecules and the health status of individuals. Photoacoustic spectroscopy (PAS) is another useful technique. It works, like QCL spectroscopy, by shining light on a sample. However in PAS it is the absorbed light that creates the signal by thermally expanding the sample, thereby creating a sound wave which can be measured. As in the case of QCL spectroscopy, PAS requires no sample preparation. But PAS requires the signal to go through more energy conversions, thereby making it susceptible to vibrational noise[4]. Electronic noses are also used for the detection of gasses. These are however made specifically to detect specific molecules, as opposed to broadband QCL spectroscopy which can detect any molecule with an absorbance spectrum overlapping with the em- mission spectrum of the QCL. 1.3 Previous work on the project A spectroscopy setup using a quantum cascade laser (QCL) is described and the subse- quent data processing is extensively treated. The setup is made and extensively treated by A. Reyes Reyes as his PhD project funded by Fundamenteel Onderzoek Materie (FOM) foundation[5]. Z. Hou developed the data acquisition under supervision of A. Reyes Reyes[3]. Measurements processed in this research are from breath samples of 35 healthy volunteers, and 35 asthmatic volunteers. Two different breath samples are used from each volunteer. These samples are from children from the Sophia Children’s hospital in Rotterdam, all with their parents’ informed consent. From the absorbance spectra obtained from these samples a uncertainty in the wavenumbers is found. In this work calibration is done through post-processing in order to reduce this uncertainty. 1.4 Goals and thesis structure The goal of this work is twofold. Firstly to decrease the uncertainties with which the compounds and its concentrations are determined. The second goal is to find a reli- able way to distinguish and identify breath of healthy and asthmatic children from the compounds present and their concentrations. This thesis works towards these goals by first explaining the general working principle of the setup and theory behind QCL spectroscopy, where the parts having the most influ- ence on the noise and resolution of the signal are highlighted. The absorbance measured spectrum and its calibration are then treated. This is followed by an explanation of how the present species and their concentrations of various molecules are found. Finally a way of distinguishing between samples of healthy and asthmatic patients is treated, and the overal results are discussed. 4
  • 6. 2. Quantum Cascade Laser Spectroscopy In this chapter the relation between the intensity of the laser light, the absorbance of the gas, and the concentrations of the molecules in the gas is explained. The necessary parameters to measure and the equipment that is used to measure these parameters are explained as well. Measurements on the gas are performed with a spectroscopy setup using a quantum cascade laser (QCL). The various components and conditions of the measurement setup that influence the determination of the gas compounds and con- centrations are explored. In order to understand how these components influence the eventual concentration, the path of light through the components is described starting from the quantum cascade laser and ending at the mercury cadmium telluride (HgCdTe) detectors. Getting the absorbance spectrum from a breath sample takes around 60 min- utes, 5 minutes of which is comprised by the actual scanning of the gas by the setup. Transferring and loading the large datasets between computers and into processing soft- ware takes up the most time. The setup as it is used for sample measurements is build by A. Reyes Reyes[5]. 2.1 Absorbance spectrum In order to obtain information on the gas through spectroscopy, the interaction between light and matter needs to be established. The relation of the concentrations of the compounds in the gas to the light passing through the gas is given by the Beer-Lambert law[6, 7]: Io(ν, C) = Ii(ν)10−A(ν,C) (2.1) where Io is the beam intensity after interacting with the gas, Ii the beam intensity before interacting with the gas, and ν and C denote the dependence on the wavenumber and the concentrations of the compounds respectively. A is the absorbance, as given by A(ν, C) = n i=1 i(ν)cilpath (2.2) with C = (c1, c2, ..., cn) (2.3) where n is the amount of molecular species, and ci and i(ν) denote the concentra- tion and molar absorptivity of each molecular species in the gas. The vector C is used 5
  • 7. simply as an abbreviated way to indicate the dependence of the absorbance A on the concentrations of all molecular species. The absorbance is also dependent on the inter- action length lpath of the light with the gas. In order to determine the concentrations ci of the molecules, all other parameters in Equation 2.1 and Equation 2.2 should be known. The light is provided by the QCL, and the intensities Io and Ii are obtained by processing the signals measured by the detectors in the setup. The molar absorptivity i(ν) is a measure for how strongly a certain molecule species absorbs light at a given wavelength. It is an intrinsic property of a molecule and can be considered unique to each molecule. Hence the molar absorptivity can be looked up in the Northwest In- frared database (NWIR)[8] and the high resolution transmission molecular absorption database (HITRAN)[9], which are put together by the Pacific Northwest National Labo- ratory (PNNL) and the Harvard-Smithsonian Center for Astrophysics respectively. The interaction length lpath is a property of the setup, and more specifically of the cavity, and can be measured from the cavity as described in subsection 2.2.2. 2.2 Setup To find the concentration of compounds within a gas the spectroscopy setup depicted in Figure 2.1 is used. The laser beam is produced at the QCLs. From the QCL the beam falls onto a beam splitter, causing one of the beams to go straight to the monitor detector, which is used to measure the intensity spectrum as emitted by the QCL. The other beam from the beam splitter travels to the multipass cavity. The beam gets partially absorbed by the gas present in the cavity, and then travels onwards to the main detector. The distance meter is placed in such way that its beam coincides with that of the QCLs in order to measure the optical path length in the cavity. The two detectors relay their signals to the NI-PCI 5922 acquisition card, which in turn sends the signals to the computer. 6
  • 8. Figure 2.1: Diagram of the quantum cascade laser spectroscopy setup as used for mea- suring. Courtesy of Z.Hou[3] 2.2.1 The quantum cascade laser The laser system used is Block Engineering’s LaserScope which consists of two QCLs that are arranged such that together they can scan over the wavenumber range of 832 cm−1 to 1263 cm−1 [10]. As the LaserScope is by itself designed to be a spectroscopic system, it has an internal detector and a set of two QCLs which are portrayed in Figure 2.1 as the monitor detector and the QCLs. The main specifications on the LaserScope can be found in Table 2.1, where it should be noted that the spectral resolution stated is the output when the signal is processed using Block Engineering’s own software. Table 2.1: Specifications of the quantum cascade laser as given by Block Engineering [10]. Parameter Specification Spectral region 1267 to 833 cm−1 (8 to 12 µm) Tuning Continuous or single wavelength Emission type Pulsed Pulse repetition frequency 200 kHz Minimum average power 0.5 mW Maximum average power 12 mW Beam diameter 2.5 mm x 5 mm ellipse Spectral resolution Down to 1 cm−1 By way of its design the QCLs emit different wavenumbers of light simultaneously[11]. The HgCdTe detectors detect the intensity of the light without differentiating between the various wavenumbers. Since intensity per wavenumber is a necessity, a Littrow con- figuration with a diffraction grating mounted on a piezo is used, as seen in Figure 2.2[3]. 7
  • 9. Figure 2.2: Diagram of a Littrow configuration external cavity diode laser with back extraction. Courtesy of Z.Hou[3] This Littrow configuration uses a diffraction grating to steer the first order diffraction of the light with the desired wavenumber back through the laser diode, and into the setup. The wavenumber selected using the grating depends on the spacing of the grating and the incident angle of the light on the grating. The incident angle is controlled by applying a voltage over the piezo. The intensity of the light emitted by the QCL is furthermore influenced by mode hopping, which can be caused by temperature fluctuations in the gain medium of the laser or fluctuations in the injection current. In chapter 3 a closer look is taken at the signal from the QCL and its noise. The intensity spectrum of the QCL system is given in Figure 2.3. The general shape of this signal portrays the intensity spectrum the laser outputs. The intensity drop around 1010 cm−1 indicates the switch between the two QCLs that together scan over the entire wavenumber range. 8
  • 10. 850 900 950 1,000 1,050 1,100 1,150 1,200 1,250 0.2 0.4 0.6 0.8 1 ·108 Wavenumber (cm-1 ) Intensity(counts) Spectrum emitted by the QCL Figure 2.3: Intensity spectrum of the laser system used: Block Engineering’s LaserScope. 2.2.2 Multipass cavity A cavity is used to achieve a large interaction length lpath between the gas and the beam. The cavity is depicted in Figure 2.4. The beam enters the cavity through a ZnSe window. It then reflects from the two mirrors at either end of the cavity, until the beam falls on the window again and exits the cavity. The mirrors (Aerodyne Research, Inc.) are astigmatic, meaning each mirror has two different radii of curvature which allows for more reflections to take place in order to have a larger interaction distance, lpath[12]. A large interaction distance is desirable since it increases the absorbance. A multipass cavity is used in favor of a resonance cavity since the high reflection mirrors as used in a resonance cavity are not available for the wide wavelength range used in this experiment. 9
  • 11. Figure 2.4: Diagram of the multi-pass cavity. The ZnSe window is located on the left in yellow, and the two mirrors in blue on either end of the cavity. Courtesy of Z.Hou[3] The length of interaction of the light with the gas lpath is measured using a laser distance meter to be 54.36 meter[3]. A multipass cavity with such a long path length is used in order to have a large absorbance. The mirrors in the cavity and the ZnSe window leading in to the cavity also absorb 94.85% of the light[3, p. 25]. Part of the light is ab- sorbed in the cavity as according to the absorbance profiles of the molecules constituting the gas. The remaining light exits the cavity and is measured at the main detector. In order to clean the cavity between measurements a series of mass flow controllers and a diaphragm pump are used. This way contamination in the cavity is prevented[5]. 2.2.3 Distance meter The distance meter (Leica DISTO D2) emits light pulses along the path as depicted by the red dashed line in Figure 2.1. The path the pulse travels outside the cavity is simply measured with a ruler. By looking at Figure 2.1 it can be seen that by subtracting the length of the path the pulse travels outside of the cavity from the total length as calculated by the distance meter, the optical path length inside the cavity is obtained[5]. This optical path length inside the cavity is also checked by measuring the time a pulse from the QCLs takes straight to a detector compared to the time a pulse from the QCLs takes when it passes through the cavity and then onto a detector. The delay between the pulses measured is given in Figure 2.5, from which the same result of 54.36 cm is obtained. 10
  • 12. Figure 2.5: Delay time between a pulse going straight to the a detector and a pulse passing through the cavity for three different wavenumbers. (a) 950 cm−1 , (b) 1000 cm−1 , (c) 1150 cm−1 . Courtesy of A. Reyes Reyes[5] 2.2.4 Detection A HgCdTe detector measures the light before it enters the gas, and another HgCdTe measures the light when it exits the gas. These are called the monitor and main detector respectively, and the signals obtained from these detectors will henceforth be referred to as the monitor signal and the main signal. The specifications of the monitor detector are given in Table 2.2. Table 2.2: Specifications of the monitor detector as given by Block Engineering[13]. Parameter Specification Detector element TE cooled HgCdTe Active area 1x1 mm2 Window BaF2 Wavelength range 6–12 µm Total gain (typical) 800 V/W Cut-off frequency range 10 Hz – 20 MHz Operating temperature (typical) 223 K Saturation threshold power (at 8 µm, 20 MHz speed) 5 mW (peak) The main detector is from VIGO System S.A. with part number PVI-4TE-10.6- 0.3x0.3-TO8-BaF2, the specifications of which are given in Table 2.3. 11
  • 13. Table 2.3: Specifications of the main detector as given by VIGO System S.A.[14]. Parameter Specification Detector element TE cooled HgCdTe Active area 0.3 × 0.3 mm2 Window BaF2 Wavelength range 2–12 µm Detectivity at λpeak ≥ 4.0 × 109 cm· √ Hz W Detectivity at λopt ≥ 2.0 × 109 cm· √ Hz W Optimal wavelength 10.6 µm Operating temperature (typical) 195 K The intensity of the light emitted by the QCL IQCL as seen in Figure 2.6 varies per measurement due to temperature fluctuations and noise on the voltage applied over the piezo. Therefore the signals of multiple measurements need to be normalized in order to be compared to each other, for which the two detectors are used. 1,173.35 1,173.4 1,173.45 1,173.5 1,173.55 1,173.6 1,173.65 1,173.7 1,173.75 1,173.8 7 7.5 8 8.5 9 ·108 Wavenumber (cm-1 ) Intensity(counts) Measurements of emitted intensity of the QCLs Figure 2.6: Intensity spectrum of various measurements with the monitor detector. 12
  • 14. 3. Data Calibration 3.1 Wavenumber calibration As mentioned in chapter 2 the intensity of the light emitted by the QCLs is subject to noise. This noise is corrected for through normalisation by using the monitor and main detector. In this chapter an attempt is made to reduce the standard deviation in the monitor and the main signals as caused by an uncertainty of what wavenumber is emitted from the QCLs. The wavenumber corresponding to the intensity measured at the detec- tors is determined by the angle of the grating mounted on the piezo, as mentioned in subsection 2.2.1. The wavenumber attributed to the measured beam intensities at a point in time is determined by the voltage on the piezo, which should predictably determine the angle of the grating. Since there is noise on the voltage over the piezo element the behaviour of the piezo is not completely predictable, and thus the angle of the grating can not be told precisely by the voltage at a point in time on the piezo. This gives an uncertainty to the wavenumbers attributed to the measured beam intensities, as can be seen in Figure 3.1. The wavenumber data in Figure 3.1 is obtained by recording files made by Block Engineering’s firmware which measures the voltage over the piezo and contains knowledge of what voltage corresponds to what wavenumber. Unfortunately Block Engineering’s firmware outputs its spectrum at lower wavenumber resolution than the acquisition described in subsection 3.1.2 using the NI-PCI 5922 acquisition card. This wavenumber uncertainty in turn increases the standard deviation of the ab- sorbance when multiple measurements are compared. QCL systems are a relatively new development and the QCLs in the Laserscope should be among the best available cur- rently, hence there is no certainty that significant improvements can be made with the knowledge on QCL design and piezo elements currently available. With this, and the complicated nature of the electronics in the LaserScope QCL system, it could take a long time to replace the piezo, for which there is no time in the scope of the project. Instead wavenumber calibration is done in order to reduce this uncertainty in absorbance by calibration of multiple measurements. 3.1.1 Four measurement signals As established in section 2.2 a main signal and a monitor signal are obtained during each measurement. Multiple measurements are performed per sample. In order to get information on the gas, measurements are made using the main detector when a gas is 13
  • 15. Measuring time - wavenumber relation 200 400 600 800 1,000 850 900 950 1,000 Time stamp (a.u.) Wavenumber(cm-1 ) (a) 468 468.5 469 469.5 470 920.8 921 Time stamp (a.u.) Wavenumber(cm-1 ) (b) Figure 3.1: Effect of the voltage noise on the wavenumber emitted of several measure- ments over time. (a) Several scans superposed. (b) A zoom into a section of the figure shown in (a). inside the cavity. These measurements will be called gas measurements from now on. In order to single out any effects due to the different paths the beams take to the main and monitor detector, measurements are also done without gas in the cavity. These measurements without gas will henceforth be referred to as reference measurements. These reference measurements are also used to establish the beam intensity without interaction with the gas, which is Ii in Equation 2.1. The effects of the components of the setup such as the cavity, the lenses, and the mirrors on the beam to the main detector are all packed together in a factor fpath1(ν). This factor is defined such that multiplying the intensity of the beam as it exits the QCLs by this factor gives the intensity of the beam as it arrives at the main detector, and the factor therefore takes values anywhere between 0 and 1 for the wavenumber range. fpath2(ν) and fgas(ν, C) are similarly defined for the effect of the elements between the QCLs and the monitor detector and for the effect when a gas sample is present. Due to effects in the QCLs, the signals vary in intensity between different measure- ments. Therefore during both gas and reference measurements monitor signals are also recorded. The monitor signals are necessary to be able to reduce the effects of the QCL, as the QCL affects the signal of the monitor and the main detector similarly and can therefore be identified. For a clear overview the various effects on the signal in the two types of measurement of both detectors are laid out in Table 3.1. Wavenumber depen- dence in Table 3.1 is indicated by ν and ν‘, to indicate that different measurements can not be compared wavenumber-wise. This is a result from the uncertainty caused by ef- 14
  • 16. fects in the QCL. Table 3.1: The contributions to the intensity as measured by the two detectors. Signal type Gas measurement Reference measurement Main IQCL(ν‘)fpath1(ν‘)fgas(ν‘, C) IQCL(ν)fpath1(ν) Monitor IQCL(ν‘)fpath2(ν‘) IQCL(ν)fpath2(ν) 3.1.2 Data acquisition and pre-processing The signal from the detectors is processed before it is calibrated, as by the method developed by Z. Hou[3]. The light emitted by the QCL is a series of pulses lasting 333 ns and spaced approximately 5600 ns apart. These light pulses go through the setup and end up at the detectors, where the subsequent signals are then sampled at a rate of 10 MHz, or 1 sample per 100 ns, by the NI-PCI 5922 acquisition card. Since the saving of the data from the memory of the acquisition card to the computer is one of the most time consuming part of the data acquisition, the data is filtered using LabVIEW such that only the peaks and valleys of the pulse signal are saved to the computer. For this, LabVIEW’s peak detection is used at a width of 3 points in order to get an amplitude error for the peaks of 0.39%. As the valleys are essentially parts where the laser is emitting no light but there is still a signal, the signal strength of the valleys surrounding a peak are subtracted from the peak to normalize out any bias due to electronics of the detector. A final resolution of 0.05 cm−1 is achieved. For each sample ten gas measurements and ten reference measurements are done and processed this way. The spectrum of one of these processed monitor signals looks as shown in Figure 3.2. The entire spectrum spans from 832 cm−1 to 1263 cm−1 . 15
  • 17. 800 850 900 950 1,000 1,050 1,100 1,150 1,200 1,250 1,300 0 0.2 0.4 0.6 0.8 1 1.2 1.4 ·109 Wavenumber (cm-1 ) Intensity(counts) Monitor signals of gas and reference measurement Figure 3.2: A monitor signal after pre-processing as described in subsection 3.1.2. 3.1.3 Need for additional calibration The absorbance is obtained by inverting Equation 2.1 such as: A(ν, C) = − log10 Io(ν, C) Ii(ν) . (3.1) Using Table 3.1 the measured signals are substituted for the beam intensity with gas in the cavity Io and the beam intensity gas in the cavity Ii: Io(ν‘, C) = Imain,gas Imon,gas = IQCL(ν‘)fpath1(ν‘)fgas(ν‘, C) IQCL(ν‘)fpath2(ν‘) = fpath1(ν‘)fgas(ν‘, C) fpath2(ν‘) (3.2a) Ii(ν) = Imain,ref Imon,ref = IQCL(ν)fpath1(ν) IQCL(ν)fpath2(ν) = fpath1(ν) fpath2(ν) , (3.2b) where the wavenumber dependence as indicated by ν and ν‘ shows that different mea- surements can not be compared wavenumber-wise as shown at the start of section 3.1. 16
  • 18. The division by the monitor signals is done to remove the influence of the QCLs’ inten- sity instability in IQCL(ν) and IQCL(ν‘). By substituting Equation 3.2 into Equation 3.1 Equation 3.3 is obtained: A(ν, ν‘, C) = − log10   fpath1(ν‘)fgas(ν‘,C) fpath2(ν‘) fpath1(ν) fpath2(ν)   = − log10 fpath1(ν‘)fpath2(ν)fgas(ν‘, C) fpath1(ν)fpath2(ν‘) . (3.3) Note that the absorbance A(ν, ν‘, C) is still dependent on the influences of the op- tical path because the two separate wavenumber dependencies cannot be resolved. The mean absorbance from ten measurements of a single sample with the standard deviation calculated per wavenumber are shown in Figure 3.3. Mean absorbance and standard deviation of ten measurements 800 1,000 1,200 −0.2 0 0.2 0.4 Wavenumber (cm-1 ) Absorbance (a) 1,172 1,173 1,174 1,175 1,176 0 0.1 Wavenumber (cm-1 ) Absorbance (b) Figure 3.3: Mean absorbance (red line) and standard deviation (dashed blue lines) of ten measurements. (a) Entire scan range. (b) A zoom into a section of the scan shown in (a). The large standard deviation in Figure 3.3 is partly because the multiple measure- ments over which the mean is taken cannot be compared properly, since the relation between the wavenumbers of different measurements is not known. When looking closely at Figure 3.4a and Figure 3.4b, this wavenumber discrepancy is seen as the measurements do not overlap. 17
  • 19. 1,172 1,174 1,176 0.6 0.8 1 ·109 Wavenumber (cm-1 ) Intensity(counts) Uncalibrated monitor measurements (a) 1,172 1,174 1,176 1 1.5 2 ·108 Wavenumber (cm-1 )Intensity(counts) Uncalibrated main measurements (b) Figure 3.4: Ten uncalibrated intensity measurements of a single sample. Five of these are from measurements with a gas in the cavity, and five are from a reference measurement. (a) Measured signals at the monitor detector. (b) Measured signals at the main detector. This local horizontal shift from one measurement to another results in an increase in standard deviation of the eventual absorbance and concentration. The absorbance of a single sample is calculated from multiple measurements with gas inside the cavity and the reference measurements without gas in the cavity. All these measurements need to be aligned to the same wavenumber values. 3.1.4 Wavenumber calibration In order to calibrate the various monitor and main signals, a basis to which they can all be calibrated is chosen. The mean of all the main signals is not a useful basis to calibrate towards since the main signals of the reference measurements differ from the main signals of the gas measurements because in the latter a gas is present and absorbs part of the light. The mean of all the monitor signals is used as basis for calibration since the monitor signals are nearly identical for all measurements because they do not enter the cavity and therefore are not affected by the gas in the cavity. Since the monitor and main signal of a single measurement suffer from the same wavenumber discrepancies from the QCL, the transformations that calibrate the monitor signals also calibrate their respective main signals. Calibration starts with the monitor signals, and is done towards the mean per wavenum- ber of all the monitor signals of a sample, as given in Figure 3.5 as a dashed red line. The positions of the peaks and valleys of this mean are then determined, as denoted by black boxes in Figure 3.5. Each monitor signal is calibrated separately to the mean of 18
  • 20. the monitor signals. Calibration of a monitor signal as denoted by the blue line in Fig- ure 3.5 towards a single valley of the mean is shown in Figure 3.5b and Figure 3.5c. The position of the valley of the mean used to calibrate is marked by a vertical dashed grey line, and the two surrounding peaks of the mean are marked by vertical dashed black lines in Figure 3.5b. The minimum value of the monitor signal between the two positions of peaks of the mean is taken, as marked by the star in Figure 3.5c. This minimum value is then taken as the signal’s valley corresponding to the valley of the mean as marked by the grey line. This valley of the monitor signal is then shifted towards the corresponding valley of the mean of all the monitors. This process is repeated for all valleys of the mean. The peaks of the monitor signal are calibrated in a similar fashion. Every signal is now locally shifted so its peaks and valleys match that of the mean signal. All data between peaks and valleys is stretched and compressed by means of resampling to fit the adjusted peaks and valleys. Since the data between a peak and a valley is mostly straight, linear interpolation is sufficient for resampling. 19
  • 21. Calibration of a monitor signal 1,172 1,172.5 1,173 1,173.5 1,174 1,174.5 1,175 1,175.5 1,176 6 7 8 9 ·108 Wavenumber (cm-1 ) Intensity(counts) Mean of all monitors A monitor signal Peaks and valleys of mean (a) 1,174 1,174.5 1,175 7 8 9 ·108 Wavenumber (cm-1 ) Intensity(counts) (b) 1,174 1,174.5 1,175 7 8 9 ·108 Wavenumber (cm-1 ) Intensity(counts) (c) Figure 3.5: Calibration of a monitor signal to a valley of the mean. The complete calibration of a monitor signal following the procedure shown in Fig- ure 3.5 can be seen in Figure 3.6. 20
  • 22. 1,172 1,172.5 1,173 1,173.5 1,174 1,174.5 1,175 1,175.5 1,176 6 7 8 9 ·108 Wavenumber (cm-1 ) Intensity(counts) Calibrated monitor signal Mean of all monitors A monitor signal Peaks and valleys of mean Figure 3.6: A single monitor signal calibrated towards the mean of all monitor signals of a sample. The miss-calibrated blue peak in Figure 3.6 around 1175.2 cm−1 is a consequence of the minimum of the monitor signal between the surrounding peaks of the mean being located at a higher wavenumber rather than a lower wavenumber of this monitor peak. Since the indices of the peaks and valleys of the monitor signal found are later sorted with the assumption of peaks and valleys alternating, the small blue peak gets miss-assigned as a valley and is consequently put at the place of the valley of the mean at 1175.2 cm−1 instead of peak of the mean at 1175.4 cm−1 . This can be solved by comparing the indices of the monitor peaks with those of the monitor valleys, and replacing the index of the monitor valleys of which the index is larger than that of the subsequent peak by the index of the minimum between the monitor peak and the monitor peak before that. Figure 3.7 demonstrates how the wavenumber calibration procedure calibrates ten measurements of a sample towards each other. Five of the signals are from gas measure- ments, and five are from reference measurements. Figure 3.7c and Figure 3.7d are the wavenumber calibrated versions of Figure 3.7a and Figure 3.7b respectively. 21
  • 23. 1,172 1,174 1,176 0.6 0.8 1 ·109 Wavenumber (cm-1 ) Intensity(counts) Uncalibrated monitor measurements (a) 1,172 1,174 1,176 1 1.5 2 ·108 Wavenumber (cm-1 )Intensity(counts) Uncalibrated main measurements (b) 1,172 1,174 1,176 0.6 0.8 1 ·109 Wavenumber (cm-1 ) Intensity(counts) Calibrated monitor measurements (c) 1,172 1,174 1,176 1 1.5 2 ·108 Wavenumber (cm-1 ) Intensity(counts) Calibrated main measurements (d) Figure 3.7: Ten monitor signals and their respective main signals shown uncalibrated and calibrated. (a) Uncalibrated monitor signals. (b) Uncalibrated main signals. (c) Calibrated monitor signals. (d) Calibrated main signals. The calibration done this way does not align the signals perfectly, as can be seen in Figure 3.7. The majority of the monitor peaks and valleys as seen in Figure 3.7c 22
  • 24. are aligned perfectly, and slightly unaligned peaks can be attributed to the calibration method only looking for a valley or peak of the signal between two peaks or valleys of the mean. When the valley or peak of the signal is located exactly on a peak or valley of the mean, the closest point to the peak is most likely the highest and therefore will be aligned as a peak, giving a slight miss-alignment. For each sample ten gas measurements and ten reference measurements are done, giving a total of twenty monitor signals and twenty main signals. The drop in standard deviation of these signals as a result of the calibration can be seen in Table 3.2. The drop in standard deviation is calculated for signals from reference and gas measurements separately, and for the collection of signals of both reference and gas measurements. The numbers indicate the percentage drop from the uncalibrated signals to the calibrated signals. From Table 3.2 it seems that the calibration works better for monitor signals than it does for main signals. This can be attributed partly to the fact that the monitor signals served as the basis of the calibration, and when looking at Figure 3.7 it does seem that the calibration works better for the monitor signals. However, since each monitor and its respective main signal experience the same wavenumber distortions from the QCLs, and the exact transform used on each monitor signal is also used on its respective main signal, the large difference in calibration result can not entirely be attributed to the basis of calibration. Rather the standard deviation in the main signals is already relatively higher than in the monitor signals, and the wavenumber uncertainty is a smaller part in the total standard deviation of the main signals than in the monitor signals. Thus calibrating for the wavenumber uncertainty decreases less the total standard absorbance of the main signals. Table 3.2: Decrease in standard deviation for the monitor and main signals as result of the wavenumber calibration. This decrease in standard deviation is calculated separately for just gas measurements, just reference measurements, and gas and reference measurements together. Signal type Gas measurements (%) Reference measurements (%) Gas & reference measurements (%) Main 7.04 6.50 6.46 Monitor 16.6 17.0 18.4 It is also interesting to note that the standard deviation of the main signals improves less when both reference and gas measurements are taken into account as opposed to only one type of measurement. This can be explained because the gas and reference measure- ments of the main signal should be different due to the absorption of the main signal by the gas in the gas measurement. Because the gas and reference measurements are so different, the total standard deviation when taking both gas and reference measurements into account is greater than that of the separate measurements, and hence the standard deviation caused by the wavenumber uncertainty is a smaller part. After aligning all these signals as explained in subsection 3.1.4, ten absorbances—each from a combination of the four signals from Table 3.1—are calculated as according to Equation 3.1, Equation 3.2, and Equation 3.3. However since the wavenumbers of the 23
  • 25. various measurements are now calibrated, ν = ν‘, (3.4) and thus the absorbance as in Equation 3.3 becomes A(ν, C) = − log10 fpath1(ν)fpath2(ν)fgas(ν, C) fpath1(ν)fpath2(ν) = − log10 [fgas(ν, C)] . (3.5) The standard deviation of these ten absorbances decreases from 0.0313 before calibra- tion to 0.0291 after calibration, a decrease of 7.06%. For a large part of the absorbance spectrum this is still a relatively large standard deviation as seen in Figure 3.8, where the mean absorbance found this way for one sample is given. 850 900 950 1,000 1,050 1,100 1,150 1,200 1,250 0 0.5 1 1.5 2 2.5 ·10−1 Wavenumber (cm-1 ) Absorbance Absorbance calibrated by wavenumber Figure 3.8: The wavenumber calibrated absorbance as calculated from the measurements of a single sample. 3.2 CO2 and H2O calibration When looking at the absorbance spectrum of a breath sample as in Figure 3.8 there are a few regions that stand out by their relatively high absorbance. From Equation 3.1 it 24
  • 26. can be seen that high molar absorptivities (ν) in a region and high concentrations of molecules with such molar absorptivities contribute to a high local absorbance. When checking for molecules known to have relatively high concentrations in breath such as CO2 with about 4% per volume, H2O with about 5% per volume, and N2 with about 78% per volume, a cause of the high absorbance regions is easily identified, as seen in Figure 3.9. The molar absorptivity spectrum overlays of CO2 and H2O in Figure 3.9 are scaled with factors 60 and 5 · 104 respectively to clearly show the matching regions. CO2 and H2O molar absorptivities are taken from HITRAN database[9]. 850 900 950 1,000 1,050 1,100 1,150 1,200 1,250 −0.5 0 0.5 1 1.5 2 2.5 ·10−1 Wavenumber (cm-1 ) Absorbance(base-10) Absorbance with overlaid CO2 and H2O molar absorptivity Experimental data CO2 H2O Figure 3.9: Absorbance from wavenumber calibrated data overlaid with spectra of CO2 and H2O from HITRAN database[9]. When looking more closely at these matching regions between the absorbance spec- trum and CO2 and H2O, it is seen in Figure 3.10 that even after the wavenumber calibra- tion the absorbance spectrum does not completely match CO2 and H2O from literature. 25
  • 27. 1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083 0 0.5 1 1.5 2 ·10−1 Wavenumber (cm-1 ) Absorbance(base-10) Closer look at absorbance with CO2 and H2O overlaid Experimental data CO2 H2O Figure 3.10: Absorbance from wavenumber calibrated data overlaid with spectra of CO2 and H2O from HITRAN database[9]. A method similar to the wavenumber calibration is used in order to calibrate the absorbance spectrum to the database spectra of CO2 and H2O. The difference is that now only the peaks are matched to CO2 and H2O, and all data points in between, including valleys, are linearly interpolated between the matched peaks. 3.2.1 Method for calibration to CO2 and H2O Input and preprocessing This method uses the absorbance spectrum as obtained from the wavenumber calibra- tion, the accompanying wavenumber range, and molar absorptivity spectra from CO2 and H2O. Since the method used for the calibration to CO2 and to H2O are almost iden- tical, only the CO2 case is described here. The CO2 spectrum is taken from HITRAN database[9], and in the range of 832 cm−1 to 1263 cm−1 with a resolution of 0.05 cm−1 . The CO2 spectrum is scaled by a factor 60 so the height of its peaks approximately matches that of the experimental data. 26
  • 28. CO2 peak and absorbance peak identification To facilitate the calibration of the experimental data to the CO2 spectrum from the HITRAN database, the peaks and valleys of the CO2 database spectrum serve as refer- ence points. The peaks and valleys of the CO2 database spectrum are identified using a peakfinder[15] algorithm. The peakfinder algorithm finds peaks above a certain ad- justable threshold, based on its value compared to the surrounding data points. The peakfinder algorithm then returns the indices and intensities of all data points that it identifies. In case the first and last peak are before and respectively after the first and last identified valley, the first and last peak in the spectrum are removed. This way the range to adjust always starts and ends with a valley of the CO2 database spectrum. Now that all peaks and valleys of the CO2 database spectrum are found, they are used in order to find the absorbance peaks and valleys of the experimental data. The first absorbance peak is identified by taking the maximum absorbance between the first two valleys of the CO2 database spectrum. The neighbouring absorbance valley is identified by taking the minimum between the first absorbance peak and the next peak of the CO2 database spectrum. The next valleys/peaks of the experimental data are found by looking between the last peak/valley found from the experimental data and the next peak/valley from CO2 of the database, alternating between finding a peak and a valley. This process is repeated to the last identified valley of the database CO2 spectrum. This way all absorbance valleys are found between two peaks, and all absorbance peaks are found between two valleys. Thus the same amount of absorbance peaks and valleys from the experimental data are identified as peaks and valleys of the CO2 database spectrum, making it easy to match them1 . Peak calibration All the peaks and valleys of the CO2 database spectrum are now put in a single list, and ordered by the index. The same is done for the peaks and valleys of the experimental data. The peaks of the experimental data list are now matched to coincide to the peaks of CO2, and all the measurement data in between two peaks is interpolated to fit between the surrounding CO2 peaks. This same procedure is applied to H2O. The result of these calibrations is seen in Figure 3.11. 3.2.2 CO2 and H2O calibration results Except for the sides of the CO2 and H2O spectra where the peak heights are around the noise level, nearly all CO2 and H2O peaks seem to be correctly calibrated. This is also indicated by Figure 3.11, though some peculiarities are worth mentioning. Around 1081 cm−1 the measured absorbance has no peak, yet according to the CO2 spectrum it should have. One reason for the absence of an absorbance peak could be that the calibration algorithm miss-calibrated, and the next absorbance peak at 1082.3 1 Since there is one more CO2 valley than CO2 peak, the amount of found absorbance peaks matches CO2 peaks. However, two less absorbance valleys are found than there are CO2 valleys. Hence the first and last CO2 valley are removed. 27
  • 29. 1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083 0 0.5 1 1.5 2 2.5 3 ·10−1 Wavenumber (cm-1 ) Absorbance(base-10) Closer look at absorbance calibrated to CO2 and H2O Experimental data adjusted for CO2 and H2O CO2 H2O Figure 3.11: Absorbance calibrated to CO2 and H2O overlaid with spectra of CO2 and H2O from HITRAN database[9]. cm−1 should actually be at 1081 cm−1 . This can be rejected on the basis that the spacing of the other absorbance peaks seems to match well with the spacing of the CO2 peaks, even before calibration as seen in Figure 3.10. The question remains—taking that no miss-calibration took place around 1081 cm−1 —why there is no peak when a peak would be expected from looking at the CO2 database spectrum. This might be explained by noise, however when looking at the measured absorbance’s constituent monitor and main signals the noise is not any higher than at other places. A related peculiarity, namely the height of the measured absorbance peaks not match- ing with the heights of the CO2 peaks in general, is also at first ascribed to noise. However, when comparing absorbances of multiple different samples as in Figure 3.12, it can be seen that the height of each peak is about equal between samples. This indicates that the difference in height of the various measured absorbance peaks to the CO2 peaks is not a matter of noise, but rather a consistent wavenumber dependent influence from a part of the setup. A source for these mismatching absorbances has not yet been pinpointed. One event of miss-calibration can still be argued to have taken place. In Figure 3.10 the closest thing to an absorbance peak near the CO2 database peak at 1081.1 cm−1 is the peak located at 1081.8 cm−1 , which is not calibrated to the CO2 database peak as 28
  • 30. seen in Figure 3.11. From Figure 3.10 it can be seen that this small peak was not detected and calibrated since it is located outside the region where the search was performed, from the valley in the absorbance spectrum at 1081.0 cm−1 to the valley in the CO2 spectrum at 1081.6 cm−1 . This might be solved by increasing the search region, though this could cause problems for other peaks. It can also be argued that since the small peak at 1081.8 cm−1 is actually located closer to the CO2 peak at 1082.3 cm−1 than to the peak at 1081.1 cm−1 , it is not necessarily better to match it to the peak at 1081.1 cm−1 . It can also not be said whether the absorbance peak that would be expected near the small peak would be located at the same position, and thus no conclusion can be made if this peak would also not have been calibrated to its corresponding CO2 peak at 1081.1 cm−1 . 1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083 0 0.5 1 1.5 2 2.5 3 ·10−1 Wavenumber (cm-1 ) Absorbance(base-10) Comparison of height of CO2 peaks to measured samples Experimental data Experimental data Experimental data CO2 spectrum Figure 3.12: Measured absorbances of three different samples calibrated to CO2 and H2O. The molar absorptivity overlay of CO2 is scaled by 60. CO2 molar absorptivity is taken from the HITRAN database[9]. 29
  • 31. 4. Molecules and Concentrations In this chapter a list of molecules is assembled. This list contains the molecules that are likely present in breath. The list is assembled in such a way to find molecules with con- centration differences between the healthy volunteers and the asthmatic volunteers that contributed their breath for this research. Seventy breath samples of healthy volunteers and seventy breath samples of asthmatic volunteers are used. The absorbance spectra of these samples were calibrated according to chapter 3. In this chapter the calibrated spectra are checked for the concentrations of the molecules in the list. Lastly, a specific method for acquiring a more accurate concentration for ethanol is described. 4.1 List of molecules To check the absorbance spectra for concentrations, a list of molecules with known molar absorptivities is assembled. The initial list consists of all molecules with known molar absorptivities from the PNNL[8] database and the HITRAN[9] database. Currently these databases provide information on a total of 456 different molecules. From the 456 molecules available in the PNNL and HITRAN databases it is necessary to select those likely present in breath. Not all of the molecules are expected to be present in breath, as a lot of them are neither organic nor known to be in the atmosphere in an amount detectable by the current setup. Seeing that the absorbance still has uncertainty after the calibration, the concentrations determined will have a related uncertainty. As a result the concentrations of non-present molecules will not necessarily be set to zero, and will therefore influence the concentrations determined for molecules that are present in the gas sample. Thus before determining the concentrations it is vital to get a list of molecules which excludes these excessive molecules. In order to distinguish between the healthy and asthmatic groups, the list should also contain molecules likely to have large differences in concentration between the groups. The list is assembled in two steps. Firstly by determining the molecules that have been previously reported to be present in breath, the standard molecules. The second step is adding those molecules likely to show different concentrations between the samples of the healthy and the asthmatic volunteers. In the case when one is not interested in finding differences between data sets the list of standard molecules should be extended in another way. This can be done by selecting molecules based on their absorptivity and a set minimal absorbance. 30
  • 32. 4.1.1 Standard molecules This first group of molecules is a combination of atmospheric molecules[16] and molecules known to be in breath[17] with high absorbance in the relevant wavenumber region of 832 cm−1 to 1263 cm−1 . Molecules such as CO2, H2O, ethanol, and acetone are the major contributors to the absorbance spectrum of breath in the relevant wavenumber region. These molecules must therefore be included in the list regardless of whether they are likely to have differences in concentration between the healthy and the asthmatic group. The list of molecules chosen this way can be found in Table 4.1. Table 4.1: List of standard molecules used. Molecule Acetic acid Acetone Ammonia anhydrous Benzene Carbon dioxide Carbon monoxide Ethane Ethyl alcohol Ethylene Formaldehyde monomer Methane Pentane Propane Water 4.1.2 Molecules selected to distinguish between data sets To determine whether a gas was exhaled by a healthy person or by someone with asthma, a recognizable difference in measurement results is desirable. P-values are used to quan- tify this difference between the measurements from healthy people and asthmatic people. Since p-values are often misunderstood, an explanation of how p-values are calculated and the significance of p-values to this research is given. P-values The p-value is defined as the probability of obtaining a certain result or more under the assumption of a certain hypothesis. In the context of these breath measurements the null hypothesis is that being asthmatic has no effect on the concentrations of molecules, and will give identical absorption profiles as being healthy. Thereby for each wavenum- ber the combined absorbances of multiple samples of healthy and asthmatic people are expected to be two groups of points spread around the same absorbance value. A normal distribution is assigned to both the asthmatic and the healthy samples, and the mean 31
  • 33. and standard deviations of both is calculated. Now, assuming the null hypothesis is true, the mean and standard deviation of the asthmatic group should be the same as that of the healthy group. The p-value per wavenumber is found by calculating the chance that the absorbances of the asthmatic samples get the values they are, assuming the null hypothesis which claims that the asthmatic samples follow the distribution of the healthy samples. Assume the null hypothesis is not necessarily true, but rather that the alternative— having asthma does give a different absorbance spectrum—could also be true. The preferred hypothesis is the one for which the measured absorbance is most likely to occur. Although no p-values are calculated for the alternative hypothesis, a small p- value for the null hypothesis is still used as an indicator that samples of asthmatic and healthy patients are inherently different. The p-values between the 70 absorbances of healthy patients on the one hand and the 70 absorbances from asthmatic patients on the other hand are obtained this way for every wavenumber. This is done using an algorithm from K. Brand[18] that uses the topTable function from the LIMMA package in R. P-region analysis In p-region analysis the compounds most likely to have different concentrations between the two data sets are sought. This is done by checking the p-values corresponding to the two data sets and finding regions of consecutive low p-values. The reasoning behind this is as follows: organic molecules such as the ones expected in human breath are larger and more complicated than for example H2O and CO2, and thereby have a spectrum consisting of broad smooth profiles. If the concentration of such a molecule differs greatly from one data set to the other, this would be clear by low p-values in a broad band. Thus to find these molecules, the p-values corresponding to two data sets need to be below a certain threshold, for a certain region of consecutive wavenumber values. For these so- called p-regions, a list of compounds is checked for their absorptivity. Compounds with an average absorptivity above a certain threshold within any of these regions are likely candidates for the difference in the measured intensity between two sample groups, and are listed to be checked for their concentration in the data set. This list includes not only the molecules, but also in which wavenumber regions they are present. Two p-regions are found using a minimal of 10 consecutive data points with p-values of 0.05 or lower, one located between 1181.80 cm−1 and 1182.55 cm−1 and another from 1261.40 cm−1 to 1262.05 cm−1 . The complete set of compounds from the HITRAN and PNNL databases is checked for average molar absorptivity above 10−4 ppm−1 m−1 in these regions. A table with the regions and a list of molecules found this way can be seen in Appendix A.2. The p-values between the absorbance spectra of the healthy and the asthmatic pa- tients vary over narrow wavenumber regions, as can be seen in Figure 4.1. The red line in Figure 4.1 indicates the upper limit of 0.05 for which p-values are selected to indicate a significant difference in absorbance between healthy and asthmatic. The p-region from 1181.80 cm−1 to 1182.55 cm−1 can be identified using the red line. The p-values vary over wavenumber regions that are narrow compared to the width of absorbance regions as expected from molecules. This is explained by the mean-to-standard deviation ratio 32
  • 34. of 0.77071 over the wavenumber range of ten absorbance spectra, as calculated from a single gas sample of a healthy person. Even though the p-values are calculated between two groups of 70 samples each, the large variations in p-values as observed in Figure 4.1 seem to indicate that either the samples within each group do not have much in common with each other, or that the noise on the signal is too large to make out significant differ- ences between the two groups. Hence it seems that the p-values might not be accurate indicators of subtle differences in concentrations of molecules between the healthy and asthmatic groups when using measurements with a large uncertainty. Regardless of this, relevant candidate molecules have been found using this method. An example is Freon- 114 which is identified as being present in both p-regions and is used as a propellant in metered-dose inhalers, which are the most commonly used delivery systems for treating asthma[19]. 1,172 1,174 1,176 1,178 1,180 1,182 0 0.2 0.4 0.6 0.8 1 Wavenumber (cm-1 ) P-value P-values of absorbance spectra of 70 healthy vs. 70 asthmatic patients P-values of healthy vs. asthmatic P-region selection threshold Figure 4.1: P-values as calculated between 70 absorbance spectra of healthy patients and 70 absorbance spectra of asthmatic patients. 1 The mean-to-standard deviation ratio over the entire spectrum of one sample is 0.7707. It is worth mentioning that the mean-to-standard deviation ratio in the regions with high absorbance of CO2 and H2O is lower. The mean-to-standard deviation ratio can also vary considerably over the different measurements. 33
  • 35. 4.1.3 Selection based on measurable concentration The list of molecules can be shortened by checking their absorbances. Since molecules with a very low absorbance in the wavenumber region of interest will not be accurately estimated at the current noise level, these molecules can be eliminated. It is known from Z. Hou[3, p. 71] that the sensitivity of the system is in the order of 1 ppmv, and from A. Reyes Reyes[5] that the sensitivity can even reach in the order of hundreds of ppbv. This sensitivity is then used to establish a lower limit on the concentration for all molecules. In this case the chosen limit is even lower at 0.001 ppmv to make sure no molecules are eliminated unnecessarily. Absorbances are calculated for all molecules from this limit in concentration and their molar absorptivities. From an initial concentration estimation using only the standard molecules, a lower limit for the absorbance is set to 50 · 10−7 . For all molecules the absorbance calcu- lated using the lower limit of concentration is then checked against this lower limit of absorbance. Molecules with absorbance that does not reach above the lower limit of absorbance at any point are dismissed, and the other molecules can be put on the list and further checked for their concentrations. A list of 199 molecules that are reported to be present in breath is checked this way[17]. When the entire spectrum is checked this way, some 130 molecules pass. Adapting this method for the wavenumber range of 1020 to 1100 cm−1 , which is the major absorptivity region for ethanol, some 69 molecules pass as found in Appendix B.1. This list is used for determining the concentration of ethanol, as further explained in section 4.3. Since the absorptivity of a molecule that is present might be lower than that of an absent molecule, there is no right choice for the absorbance and concentration limit, and the selection based on these limits ends up being a trade-off. The limits can be chosen stricter with the risk of excluding molecules that are present in the gas and thereby decreasing the accuracy of the determined concentration, but with the benefit of excluding molecules that are not actually present in the gas, and thereby increasing the accuracy of the concentration determined. Since the method for finding differences in concentration between the healthy and asthmatic group filters on minimum absorbance and more specific wavenumber ranges, it is stricter and therefore preferred when possible. 4.2 Concentration determination 4.2.1 method Henceforth only the list of molecules found in section 4.1 are used to determine the concentrations. At this stage the strictness of the list can be adjusted by only allowing molecules present in a certain amount or more of the previously determined regions of low p-values to be processed further. In order to process the molecules further and compare them to the measured data, the information of the molecules from the databases is first transcribed to match the format of the measured data, i.e. in wavenumber range and resolution. This comes down to removing all the information outside of the relevant wavelength region, and matching the wavenumber spacing as used in the measured data 34
  • 36. through interpolation. Furthermore the measured data and molecules need to be in the same physical quantity, for which in this case absorbance is the most useful. The aggregated absorbance of the gas is determined by the terms of Equation 2.2 and for a single molecule by Equation 4.1 A(ν, cmol) = mol(ν)cmollpath, (4.1) where the absorbance A(ν, cmol) is determined by the concentration cmol, the optical path length of the laser in the gas lpath, and the molar absorptivity mol. Since the path length is measured from the setup and the molar absorptivity of the molecules is obtained from the databases, the absorbance is left as a function of the concentration. All calculations are done over the set wavelength region of 832 cm−1 to 1263 cm−1 as determined by the range of QCL. The concentrations are determined through what is essentially a curve fitting problem as the absorbances of the molecules can be laid over the measured absorbance and then best fitted to match by tweaking the concentrations of the molecules. This curve-fitting problem can be expressed as a non-linear least squares problem as in Equation 4.2: r(ν, C) = Aspectrum(ν) − n i=1 i(ν)cilpath (4.2) where the terms on the right hand side from left to right are the measured spectrum Aspectrum(ν) and the sum is the spectrum to be fitted to it. Now take S(C) = 1263cm−1 ν=832cm−1 r(ν, C)2 , (4.3) where r is a residue quantity, and S is the quantity to be minimized. The spacing for the sum is 0.05 cm−1 . The square of the residue quantity r is taken so that larger residues per wavenumber influence S more heavily, and are therefore avoided by the minimization al- gorithm. This non-linear least squares problem is solved using the Levenberg–Marquardt algorithm[20] which tweaks the concentrations from a given set of initial concentrations. 4.2.2 Input and results The method described in the previous subsection is applied on the molecules found with the p-region analysis, where all molecules with presence in one or more regions are checked for their concentration with the measured absorbance. The initial concentration of Ta- ble 4.2 are used. Table 4.2: Table of initial concentrations used to calculate the concentrations as given in Appendix A.3. Molecule Concentration (ppmv) Carbon dioxide 1000 Water 1000 Standard molecules 1 Other molecules 0.01 35
  • 37. The Levenberg-Marquardt algorithm is implemented through the lsqnonlin function in MATLAB, with its options argument as: options = optimset(’Display’,’off’,’TolFun’,1e- 15). The absorbances of six molecules with some of the highest concentrations are plotted over the measured absorbance in Figure 4.2. These are the molecules benzene, carbon dioxide, ethane, ethyl alcohol, ethylene, and water, as also reported to be present in breath by B. de Lacy Costello[17]. Concentrations found for all molecules for this sample are given at Appendix A.3. 800 850 900 950 1,000 1,050 1,100 1,150 1,200 1,250 1,300 −1 −0.5 0 0.5 1 1.5 ·10−1 Wavenumber (cm-1 ) Absorbance(base-10) Measured absorbance spectrum overlaid with molecule absorbances Absorbance Benzene Carbon dioxide Ethane Ethyl alcohol Ethylene Water Figure 4.2: Measured absorbance of breath of a healthy patient. The measured ab- sorbance is overlaid with absorbances of molecules with some of the highest concentra- tions. 4.3 Ethanol estimation by CO2 removal 4.3.1 Motivation for CO2 removal As discussed in subsection 3.2.2, the peaks of the measured absorbance and the peaks of the absorbance of CO2 from the calculated concentrations do not match in height as 36
  • 38. seen in Figure 4.3. 1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083 1,084 0 0.5 1 1.5 2 2.5 ·10−1 Wavenumber (cm-1 ) Absorbance(base-10) Closer look at the measured absorbance and calculated CO2 and ethanol Experimental data Carbon dioxide Ethyl alcohol Water Figure 4.3: Measured absorbance of the breath of a healthy patient. The absorbance is overlaid with the spectra of water, carbon dioxide, and ethyl alcohol. This figure shows how the measured carbon dioxide peaks are far from the calculated ones. This difference in height of the CO2 peaks and the measured peaks also affects the estimation of concentration for the other major molecule present in this region, namely ethanol. Hence it could prove useful to remove CO2 and its uncertainties in order to get a better estimate of the ethanol concentration, much in the same way as A. Reyes Reyes[21] did for acetone. 4.3.2 Implementation of CO2 removal for ethanol estimation The method described in section 4.2 is used to determine the concentrations of the molecules found using subsection 4.1.1 and subsection 4.1.3. The calculated absorbance contributions of all molecules except for that of CO2 are subtracted from the measured absorbance, as seen in Figure 4.4b. This is done to single out the absorbance contribution of CO2 as best as possible, so it can more easily be removed. All that is left is noise around zero absorbance and the absorbance contribution of CO2. The peaks from CO2 are now 37
  • 39. identified using a peak detection algorithm, and then fitted by normal distributions. This way the actual measured CO2 peaks are fitted including any deviations they have from the CO2 spectrum according to the database. The fits are then subtracted from the signal, and all that is left is the difference between the measured absorbance and the absorbance contributions of all molecules as calculated using section 4.2, which is mainly noise around zero absorbance as seen in Figure 4.4c. The contributions of all molecules other than CO2 are now summed back on this in order to get the full measurement back minus any absorbance by CO2, as shown in Figure 4.4d. From this absorbance spectrum the concentrations are once again determined using the method described in section 4.2. The newly calculated ethanol concentration is 2% lower than that calculated before the use of this method. Since this method of ethanol estimation has not been tested on any samples for which the ethanol concentration is known before hand, not much can be said on the accuracy of this method. The same method is however applied successfully by removing water peaks to determine the acetone concentration in the region 1150 cm−1 to 1250 cm−1 [21, 22]. 38
  • 40. Step by step removal of CO2 peaks 1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100 0 1 2 ·10−1 Absorbance(base-10) Experimental data (a) 1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100 0 1 2 ·10−1 Absorbance(base-10) Experimental data minus database molecules, with CO2 (b) 1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100 −0.4 −0.2 0 0.2 0.4 0.6 ·10−1 Absorbance(base-10) Experimental data minus database molecules, removed CO2 (c) 1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100 −0.4 −0.2 0 0.2 0.4 0.6 ·10−1 Wavenumber Absorbance(base-10) Experimental data with database molecules, removed CO2 (d) Figure 4.4: Step by step removal of the CO2 peaks. 39
  • 41. 5. Disease prediction by machine learning For the breath samples measured, an analytical study based on p-values was performed to predict and to categorize the asthmatic and healthy samples[18]. This study did not get any definitive results for prediction. Multiple articles in literature propose that methods more complicated than just using the p-values could give a better results[23][24]. Finding the molecules which differentiate between a healthy and non-healthy subject, and predicting diseases of a patient are both mentioned as possibilities. Since neither of these sources applies machine learning to actually predict disease, and the application of machine learning performed on breath data is not widespread[17], an attempt to do so is made. The analysis is performed on 70 absorbance spectra from healthy patients, and 70 absorbance spectra from asthmatic patients. 5.1 Machine learning for classification Algorithms performing classification by machine learning look for patterns in the data of the different samples. Because for the current set of samples it is known whether they originate from healthy or asthmatic patients, supervised learning methods are applied. This means the machine learning algorithm is trained to recognize patterns of healthy and asthmatic samples by feeding it a part of all samples. In this case 98 of 140 samples are used to train the algorithms and the other 42 to test the accuracy of the algorithm. 5.2 Classification by trial-and-error Since the scope of this project does not allow for a study of the details of machine learn- ing and the construction of an algorithm specifically made for these datasets, existing algorithms are used. Based on an extensive study of what kind of algorithms are most useful in the field of breath analysis, the focus initially lies on support vector machines (SVMs)[23]. Use is made of the website mlcomp.org, which was made specifically for up- loading datasets, uploading machine learning algorithms, and running these algorithms on the datasets[25]. This website expresses the rate of success in a single number, the error, which is the amount of falsely classified samples divided by the total amount of samples. Since most algorithms run in less than 5 minutes and they can run simul- taneously, many different algorithms can be tested in a short amount of time. Aside 40
  • 42. from the complete set of absorbances of 140 samples over the wavelength range of 832 to 1263 cm−1 , various subsets of this are tested as well. Among these are the set of 36 wavenumbers for which the p-values are below 0.005, the set of concentrations of 167 compounds as derived from the 140 absorbances, and the set of 30 compounds with the highest concentrations as derived from the 140 absorbances. 5.2.1 Basic explanation of some algorithms used The basic concept of some of the better working algorithms as seen in Table 5.1 is explained here. Commonly, these algorithms reduce each spectrum to a set of features which are then used to represent the various spectra as points in a feature space. The entire space is then divided into volumes with borders based on the geometrical distances between the training spectra: these are the classifiers. The test spectra are then portrayed as points in this space in the same way as the training spectra and classified according to which volume they inhabit. The way the spectra are portrayed as points in a space, and the way the classifiers are built up from these points is different for each algorithm. Method 1 in Table 5.1, the Normalized Gaussian radial basis function network, uses at its core a method of k-means clustering. k-means clustering aims to partition the absorbance spectra into k clusters in which each spectrum belongs to the cluster with the nearest mean. Possibly the simplest classification, the nearest neighbour algorithms of methods 2 and 3 in Table 5.1 classify the test spectra based on the closest training spectra in the feature space. Method 4, 5, and 6 in Table 5.1 are support vector machines. They represent the spectra as points in space, mapped so that the spectra of the separate categories are divided by a clear gap that is as wide as possible. New spectra are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on. Table 5.1: Best error rates for eight algorithms out of 41 run on the entire dataset of absorbances of 70 healthy and 70 asthmatic patients. Errors have possible values between 0 and 1. Names of algorithms are as stated on the website[25]. Method Description of algorithm Name of algorithm Error 1 Normalized Gaussian radial basis function network RBFNetwork weka nominal 0.167 2 K-nearest neighbours classifier IBk weka nominal 0.190 3 Nearest neighbours classifier IB1 weka nominal 0.190 4 Linear support vector machine for multiclass classification svmlight multiclass-linear 0.190 5 Support vector machine for multi- class classification svmlight multiclass 0.190 6 Support vector machine for binary classification svmlight-linear 0.214 41
  • 43. 5.3 Results Results over the various datasets seem to generally be best for algorithms such as nearest neighbor and support vector machines. A trend is also seen in that the larger data sets get less miss-classifications. The smallest errors are found for the unmodified data set of all absorbances over the entire wavelength range, which are given in Table 5.1. The similarity of the errors from Table 5.1 can be explained by the small test set of 42 samples. The various nearest neighbour algorithms function quite similarly, and the same can be said for the support vector machines. As the results from Table 5.1 are the best from a total of 41 different algorithms, they corroborate the use of SVMs as adviced by the study on machine learning algorithms for breath analysis[23]. The best algorithm is the Normalized Gaussian radial basis function network with an error of 0.167, or 7 miss-classified samples out of the 42 test samples. The exact reason why this algorithm performs better than the others has not been determined. The errors stated are the combined errors of healthy samples falsely classified as asthmatic and the other way around. By running the algorithms outside of mlcomp.org the separate errors of both types of miss-classification can likely be obtained, which increases the knowledge of the accuracy of a certain type of classification. Without in-depth knowledge of the algorithms and further testing, there is no guar- antee that they will also work on further datasets. The importance of these preliminary results lies in the clear quantitative results expressed as the error. Since with little knowl- edge a success rate of 83.3% is obtained, further research in machine learning applied for breath classification could yield more successful algorithms that can be applied with the current QCL spectroscopy setup in medical diagnosis. 42
  • 44. 6. Conclusions and outlook Here the main problems from the previous chapters and their solutions are discussed. Additionally, improvements for further research are made. 6.1 QCL setup The intensity fluctuations in the light emitted by the QCL caused by mode hopping are resolved by normalisation using the monitor and main detector as described in chapter 2. One way to improve the system and forego the need of all the calibrations described in this thesis is to use the monitor signal to measure the actual wavenumber. A Michelson- Morley interferometer was considered to be put in when the setup was still being built. It was omitted since there were plans to move the setup and a Michelson-Morley inter- ferometer would be too sensitive for vibrations. Both a diffraction spectrometer and a cell reference can be used as well, the latter being favorable as it is the cheaper option. The sensitivity can also be improved by increasing the delay the multi-pass cavity applies to the beam such that both the beams going to the monitor and the main detector can instead be measured using one detector. This improves the signal since measuring on two separate detectors introduces two independent sources of noise, whereas a single detector does not[26]. 6.2 Data calibration The wavenumber uncertainty caused by noise in the voltage over the piezo is resolved by calibration done in post-processing as described in chapter 3. Despite this calibration, the standard deviation over ten absorbances spectra from a single sample is still relatively large. Even after the wavenumber calibration the resulting absorbance spectrum does not match with CO2 and H2O peaks from the databases. This is caused by the high variation of the twenty monitor measurements of which the mean is the basis of the wavenumber calibration. Regardless, wavenumber calibration is still useful as an intermediate calibra- tion step, as it calibrates more details than the calibration to CO2 and H2O performed afterwards. One cause for this variation between measurements is expected to be the noise on the voltage applied over the piezo. Other contributions are temperature fluc- tuations that influence the laser performance, and the diffraction grating that reflects a small range of wavelength rather than a single wavelength back into the laser diode. 43
  • 45. The height of the measured absorbance peaks does not generally match with the heights of the CO2 peaks from the database, as seen in Figure 3.12. It can be seen that the height of each peak is about equal between samples. This indicates that the difference in height of the various measured absorbance peaks to the CO2 peaks is not a matter of noise, but rather a consistent wavenumber dependent influence from a part of the setup, or possibly a molecule in breath. A source for these mismatching absorbances has not yet been pinpointed. 6.3 Determination of molecules A general disadvantage of choosing molecules based on their molar absorptivity is that some molecules might have a large impact on the eventual absorbance due to a high con- centration, despite having a low absorptivity. They would therefore be falsely eliminated from the list of molecules that is checked. There is also the drawback of excluding molecules which might be an important part of the absorbance, but are not selected since they do not have adequate molar absorptivity in the p-regions. This makes the subsequent determination of concentrations less accurate. It is possible to allow for more p-regions, either by decreasing the minimum amount of consecutive points needed or heightening the maximum p-values of which these regions consist. This way more molecules will have some absorptivity in one of the regions and therefore will not be scrapped. Both problems have to do with selecting molecules based on intensity, and they can be resolved in advance by putting molecules that play an important role in the eventual absorbance in the standard list of molecules. This requires a priori knowledge of the important molecules which is obtained from literature[17] and simply by performing the least squares method to get the concentration on a large list of molecules. The concentrations will not be as accurate for a large list, but all molecules that contribute considerably to the spectrum can be picked out for later runs with strict molecule lists. 6.4 Determination of concentrations From the shapes of CO2 and H2O in Figure 4.2 it can be seen that the concentrations calculated using the lsqnonlin function are in the right order of magnitude. Since the standard deviation over several measured absorbances of a single sample is still relatively high even after wavenumber calibration, the calculated concentrations for the molecules with lower concentrations are not very accurate. The average concentrations found for several molecules over all samples are given in Figure 6.1. The difference in amount of H2O present can be explained by a filter in the setup which catches the moisture in breath before it enters the cavity. The large amount of ethanol measured for the samples can be explained by the cleaning using ethanol of the mouthpieces through which the breath samples are collected. The standard deviations portrayed are combinations of the uncertainties from the measurements and the variation between breath of the patients. 44
  • 46. CO2 H2O ACETONE NH3 CH4 ETHANOL 0.001 0.01 0.1 1 10 100 1,000 10,000 100,000 Concentration(ppmv) Average molecule concentrations over all samples Healthy samples Asthmatic samples VSL healthy Figure 6.1: Average concentrations of molecules for healthy and asthmatic children com- pared to VSL data of healthy adults. 6.5 Categorizing health status by classification An algorithm with a classification success rate of 83.3% is obtained. Further research in machine learning applied for breath classification could yield more successful algorithms that can be applied with the current QCL spectroscopy setup in medical diagnosis. The accuracy of the classification can be improved by looking more closely into the used algorithms, and improving them using a-priori knowledge of the samples such as regions of relatively high signal-to-noise ratio. Wavenumber regions can be given weights based on the signal-to-noise ratios so the algorithm depends more heavily on these accu- rate regions. The error rate could also tell a lot more if the amount of false positives and false negatives is given. This would give separate error rates based on whether the algorithm judges the sample as healthy or asthmatic. It is possible that one of these error rates is much lower than the other, thereby increasing the certainty of the result above the total error rate. 45
  • 47. Bibliography [1] I. Horvath and J.C. de Jongste. Exhaled Biomarkers. European Respiratory Society, 2010. [2] Erik Swietlicki, Sanjiv Puri, Hans-Christen Hansson, and Hans Edner. Urban air pollution source apportionment using a combination of aerosol and gas monitoring techniques. Atmospheric Environment, 30(15):2795–2809, 1996. [3] Zhe Hou. Breath analysis using infrared spectroscopy. August 2013. [4] Ellen Holthoff, John Bender, Paul Pellegrino, and Almon Fisher. Quantum cascade laser-based photoacoustic spectroscopy for trace vapor detection and molecular dis- crimination. Sensors, 10(3):1986–2002, 2010. [5] A Reyes-Reyes, Z Hou, E Van Mastrigt, R C Horsten, J C De Jongste, M W Pi- jnenburg, H P Urbach, and N Bhattacharya. Multicomponent gas analysis using broadband quantum cascade laser spectroscopy. Optics express, 2014. [6] DF Swinehart. The beer-lambert law. Journal of chemical education, 39(7):333, 1962. [7] Silvia E Braslavsky. Glossary of terms used in photochemistry, (iupac recommenda- tions 2006). Pure and Applied Chemistry, 79(3):293–465, 2007. [8] Northwest infrared https://secure2.pnl.gov/nsd/nsd.nsf/welcome. [9] Hitran database http://www.cfa.harvard.edu/hitran/. [10] Block engineering laserscope for gas analysis. [11] Jerome Faist, Federico Capasso, Deborah L Sivco, Carlo Sirtori, Albert L Hutchin- son, and Alfred Y Cho. Quantum cascade laser. Science, 264(5158):553–556, 1994. [12] J Barry McManus, Paul L Kebabian, and MS Zahniser. Astigmatic mirror multipass absorption cells for long-path-length spectroscopy. Applied Optics, 34(18):3336– 3348, 1995. [13] Mct ir detector module. [14] Pvi-4te detector specifications. 46
  • 48. [15] Matlab peakfinder code http://www.mathworks.com/matlabcentral/fileexchange/25500- peakfinder. [16] Atmosphere of earth - http://en.wikipedia.org/wiki/atmosphere of earth. [17] B de Lacy Costello, A Amann, H Al-Kateb, C Flynn, W Filipiak, T Khalid, D Os- borne, and N M Ratcliffe. A review of the volatiles from the healthy human body. Journal of Breath Research, 8(1):014001, 2014. [18] Esther Van Mastrigt, Adonis Reyes-Reyes, Karl Brand, Nandini Bhattacharya, H.P. Urbach, Andrew Stubbs, Johan C. De Jongste, and Marielle W. Pijnenburg. Ex- haled Breath Profiling In Children With Asthma And Cystic Fibrosis: A Laser Spectroscopy-Based Method Focused On Hydrocarbons, pages A5639–A5639–. Amer- ican Thoracic Society, May 2014. [19] Chet L. Leach. Approaches and challenges to use freon propellant replacements. Aerosol Science and Technology, 22(4):328–334, 1995. [20] JorgeJ. Mor´e. The levenberg-marquardt algorithm: Implementation and theory. In G.A. Watson, editor, Numerical Analysis, volume 630 of Lecture Notes in Mathe- matics, pages 105–116. Springer Berlin Heidelberg, 1978. [21] Adonis Reyes-Reyes, Roland C. Horsten, H. Paul Urbach, and Nandini Bhat- tacharya. Study of the exhaled acetone in type 1 diabetes using quantum cascade laser spectroscopy. Analytical Chemistry, 87(1):507–512, 2015. PMID: 25506743. [22] Florian Adler, Piotr Maslowski, Aleksandra Foltynowicz, Kevin C. Cossel, Travis C. Briles, Ingmar Hartl, and Jun Ye. Mid-infrared fourier transform spectroscopy with a broadband frequency comb. Opt. Express, 18(21):21861–21872, Oct 2010. [23] A Smolinska, A-Ch Hauschild, R R R Fijten, J W Dallinga, J Baumbach, and F J van Schooten. Current breathomics—a review on data pre-processing techniques and machine learning in metabolomics breath analysis. Journal of Breath Research, April 2014. [24] J. D. Malley, A. Dasgupta, and J. H. Moore. The limits of p-values for biological data mining. BioData Min, 6(1):10, 2013. [25] Mlcomp - http://mlcomp.org/. [26] D.D. Nelson, J.H. Shorter, J.B. McManus, and M.S. Zahniser. Sub-part-per-billion detection of nitric oxide in air using a thermoelectrically cooled mid-infrared quan- tum cascade laser spectrometer. Applied Physics B, 75(2-3):343–350, 2002. 47
  • 50. A. Results for a sample of a healthy patient A.1 Standard molecules Table A.1: List of standard molecules used. Molecule Acetic acid Acetone Ammonia anhydrous Benzene Carbon dioxide Carbon monoxide Ethane Ethyl alcohol Ethylene Formaldehyde monomer Methane Pentane Propane Water A.2 Molecules found by p-region analysis Table A.2: Table of molecules that are present in either or both of the p-regions. The p-regions were found using a maximum p-value of 05 for at least 10 consecutive data points or 0.5 cm−1 . These molecules have a mean absorptivity in this region of at least 10−4 . 1181.8-1182.55 1261.4-1262.05 1-Chloro-1 1 2 2-tetrafluoroethane (R124A) 1 1 1-Hexanoic acid 1 0 1 1 1-Chlorodifluoroethane (Freon-142B) 1 0 49
  • 51. Table A.2: Table of molecules that are present in either or both of the p-regions. The p-regions were found using a maximum p-value of 05 for at least 10 consecutive data points or 0.5 cm−1 . These molecules have a mean absorptivity in this region of at least 10−4 . 1181.8-1182.55 1261.4-1262.05 1 1 1-Trifluoroacetone 1 0 1 1 1-Trifluoroethane (R143A) 1 1 1 1 1 2 3 3 3-Heptafluoropropane (HFC227EA) 1 1 1 2-Dibomotetrafluoroethane (Freon-114B2) 1 0 1 2-Dibromoethane (EDB) 1 0 1 2-Dichloro-1-fluoroethane (R141) 1 0 1 2-Dichloro-1 1 2-trifluoroethane (F132A) 1 0 1 2-Dimethoxyethane 1 0 1 4-Dioxane 0 1 2-Chloro-1 1 1-trifluoroethane (R133A) 1 1 2-Chloropropane 0 1 2-Chlroro-1 1 1 2-tetrafluoroethane (HCFC-124) 1 0 2-Ethoxyethyl acetate 1 1 2-Methoxyethanol 1 0 2H 3H-Perfluoropentane 1 1 2 2-Dichloro-1 1 1-trifluoroethane (Freon-123) 1 1 2 2-Difluorotetrachloroethane (F112A) 1 0 2 2 2-Trifluoroethanol 1 1 2 2 4-Trimethyl-1 3-pentanediol isobutyrate (Texanol) 1 1 3-Methyl-2-pentanone 1 0 Acetic acid 1 1 Acetic anhydride 1 0 Acetid acid dimer 0 1 Acetone cyanohydrin 1 0 Aniline 0 1 Benzyl chloride 0 1 Bromochlorodifluoromethane (Freon-12B1) 1 0 Butyl acetate 0 1 Butyric acid 1 0 Carbonyl fluoride 0 1 Chloromethyl ethyl ether 1 1 Chloromethyl methyl ether 1 0 Chloropentafluoroethane (R115) 1 1 Chlorosulfonyl isocyanate (CSI) 1 0 Diborane 1 0 Dichloromethane 0 1 Diethyl ether 1 0 50
  • 52. Table A.2: Table of molecules that are present in either or both of the p-regions. The p-regions were found using a maximum p-value of 05 for at least 10 consecutive data points or 0.5 cm−1 . These molecules have a mean absorptivity in this region of at least 10−4 . 1181.8-1182.55 1261.4-1262.05 Diethyl sulfide 0 1 Diisopropyl ether 1 0 Diisopropylamine 1 0 Diketene 0 1 Dimethoxymethane 1 0 Dimethyl carbonate 0 1 Dimethyl ether 1 0 Dimethylcarbamoyl chloride 0 1 Dipropylene glycol methyl ether 1 1 Ethyl acetate 0 1 Ethyl acrylate 1 1 Ethyl bromide (Halon-2001) 0 1 Ethyl butyrate 1 1 Ethyl chloroformate 1 0 Ethyl formate 1 0 Ethyl methyl ether 1 0 Ethyl tert-butyl ether 0 1 Ethyl trifluoroacetate 1 1 Ethylvinyl ether (EVE) 1 0 Formic acid dimer 1 0 Freon-113 (1 1 2-Trichlorortrifluoroethane) 1 0 Freon-114 (1 2-dichlorortetrafluoroethane) 1 1 Freon-134a (1 1 1 2-tetrafluoroethane) 1 0 Freon-218 (octafluoropropane) 0 1 Freon-C318 (octafluorocyclobutane) 0 1 Furan 1 0 Glycoaldehyde 0 1 Guaiacol 1 1 Hexachloro-1 3-butadiene 1 0 Hexafluoroacetone 1 1 Hexafluoroethane (Freon-116) 0 1 Hexafluoroisobutylene 1 1 Hexafluoropropene 1 0 Hydrogen peroxide 0 1 Isobutyl acetate 0 1 Isopentyl acetate 0 1 Isopropyl acetate 0 1 51
  • 53. Table A.2: Table of molecules that are present in either or both of the p-regions. The p-regions were found using a maximum p-value of 05 for at least 10 consecutive data points or 0.5 cm−1 . These molecules have a mean absorptivity in this region of at least 10−4 . 1181.8-1182.55 1261.4-1262.05 Methacryloyl chloride 0 1 Methanesulfonyl chloride 1 0 Methyl acetate 0 1 Methyl acrylate 1 1 Methyl benzoate 1 1 Methyl formate 1 0 Methyl iodide 0 1 Methyl isoamyl ketone 1 0 Methyl isobutyl ketone (MIBK) 1 0 Methyl isobutyrate 1 1 Methyl methacrylate 1 0 Methyl pivalate 1 0 Methyl propionate 1 1 Methyl propyl ketone 1 0 Methyl salicylate 1 1 Methylchloroformate (MCF) 1 0 Methyldichlorodisilanes (mixed isomers) 0 1 Methylethyl ketone 1 0 Methyltrichlorosilane 0 1 Methylvinyl ketone 1 1 N N’-Dimethyl formamide (DMF) 0 1 N N-Diethylaniline 0 1 N N-diethyl formamide 0 1 Nitrogen dioxide and dinitrogen tetroxide 0 1 Octanoic acid 1 0 Paraldehyde 1 0 Pentafluoroethane (f125) 1 0 Perfluorobutane 1 1 Perfluoroisobutylene (PFIB) 1 0 Phenol 1 1 Propargyl chloride 0 1 Propionic acid (and some dimer) 1 0 Propyl acetate 0 1 Propylene glycol 0 1 Sulfuryl chloride 1 0 Sulfuryl fluoride 0 1 Tetrafluoromethane 0 1 52
  • 54. Table A.2: Table of molecules that are present in either or both of the p-regions. The p-regions were found using a maximum p-value of 05 for at least 10 consecutive data points or 0.5 cm−1 . These molecules have a mean absorptivity in this region of at least 10−4 . 1181.8-1182.55 1261.4-1262.05 Trichlorofluroethylene 1 0 Trifluoroacetic acid 1 1 Trifluoroacetic anhydride 1 1 Trifluoroacetyl chloride 1 1 Trifluoromethane (Freon-23) 1 0 Trifluoromethylsulfur pentafluoride 0 1 Trifluoronitrosomethane 1 1 Trimethylamine 1 0 Vinayl acetate 0 1 bis(2-Chloroethyl) ether 0 1 cis-1 3-Dichloropropene 0 1 m-Cresol 1 0 n-Amyl acetate 0 1 o-Toluidine 0 1 tert-Amyl methyl ether 1 0 A.3 Concentrations found for a single healthy pa- tient In order to find the concentrations, the following options argument is used for the lsqnon- lin function in MATLAB: options = optimset(’Display’,’off’,’TolFun’,1e-15) Table A.3: Table of initial concentrations used to calculate the concentrations as given in Appendix A.3. Molecule Concentration (ppmv) Carbon dioxide 1000 Water 1000 Standard molecules 1 Other molecules 0.01 53
  • 55. Table A.4: Concentrations found using the method described in section 4.2. These are of a single sample from a healthy patient. The standard list of molecules as given in section A.1 and the list of molecules as given by section A.2 are used. Molecule concentration (ppmv) Acetic acid 2.22045e-14 Acetone 2.22045e-14 Ammonia anhydrous 2.22045e-14 Benzene 1.79671 Carbon dioxide 0.912382 Carbon monoxide 1538.28 Ethane 2.66622 Ethyl alcohol 0.946815 Ethylene 0.109119 Formaldehyde monomer 2.22045e-14 Methane 2.22045e-14 Pentane 2.22045e-14 Propane 2.22045e-14 Water 0.0988024 1-Chloro-1 1 2 2-tetrafluoroethane (R124A) 2.22045e-14 1-Hexanoic acid 2.22045e-14 1 1 1-Chlorodifluoroethane (Freon-142B) 0.00478773 1 1 1-Trifluoroacetone 2.22045e-14 1 1 1-Trifluoroethane (R143A) 2.22045e-14 1 1 1 2 3 3 3-Heptafluoropropane (HFC227EA) 2.22045e-14 1 2-Dibomotetrafluoroethane (Freon-114B2) 2.22045e-14 1 2-Dibromoethane (EDB) 2.22045e-14 1 2-Dichloro-1-fluoroethane (R141) 2.22045e-14 1 2-Dichloro-1 1 2-trifluoroethane (F132A) 2.22045e-14 1 2-Dimethoxyethane 2.22045e-14 1 4-Dioxane 2.22045e-14 2-Chloro-1 1 1-trifluoroethane (R133A) 2.22045e-14 2-Chloropropane 2.22045e-14 2-Chlroro-1 1 1 2-tetrafluoroethane (HCFC-124) 2.22045e-14 2-Ethoxyethyl acetate 2.22045e-14 2-Methoxyethanol 2.30625e-14 2H 3H-Perfluoropentane 2.22045e-14 2 2-Dichloro-1 1 1-trifluoroethane (Freon-123) 2.22045e-14 2 2-Difluorotetrachloroethane (F112A) 0.0104979 2 2 2-Trifluoroethanol 2.22045e-14 2 2 4-Trimethyl-1 3-pentanediol isobutyrate (Texanol) 2.22045e-14 3-Methyl-2-pentanone 2.22045e-14 Acetic acid 2.22045e-14 54
  • 56. Table A.4: Concentrations found using the method described in section 4.2. These are of a single sample from a healthy patient. The standard list of molecules as given in section A.1 and the list of molecules as given by section A.2 are used. Molecule concentration (ppmv) Acetic anhydride 2.22045e-14 Acetid acid dimer 2.22045e-14 Acetone cyanohydrin 2.22045e-14 Aniline 0.692194 Benzyl chloride 2.22045e-14 Bromochlorodifluoromethane (Freon-12B1) 2.22045e-14 Butyl acetate 2.22045e-14 Butyric acid 2.22045e-14 Carbonyl fluoride 2.22045e-14 Chloromethyl ethyl ether 2.22045e-14 Chloromethyl methyl ether 2.22045e-14 Chloropentafluoroethane (R115) 0.0115797 Chlorosulfonyl isocyanate (CSI) 2.22045e-14 Diborane 0.114448 Dichloromethane 2.22045e-14 Diethyl ether 0.0152475 Diethyl sulfide 2.22045e-14 Diisopropyl ether 2.22045e-14 Diisopropylamine 2.22045e-14 Diketene 2.22045e-14 Dimethoxymethane 2.22045e-14 Dimethyl carbonate 2.22045e-14 Dimethyl ether 2.22045e-14 Dimethylcarbamoyl chloride 2.22045e-14 Dipropylene glycol methyl ether 2.22045e-14 Ethyl acetate 2.22045e-14 Ethyl acrylate 0.039846 Ethyl bromide (Halon-2001) 2.22046e-14 Ethyl butyrate 2.22045e-14 Ethyl chloroformate 2.22045e-14 Ethyl formate 2.22045e-14 Ethyl methyl ether 2.22045e-14 Ethyl tert-butyl ether 0.195303 Ethyl trifluoroacetate 2.22045e-14 Ethylvinyl ether (EVE) 2.22045e-14 Formic acid dimer 2.22045e-14 Freon-113 (1 1 2-Trichlorortrifluoroethane) 2.22045e-14 Freon-114 (1 2-dichlorortetrafluoroethane) 2.22045e-14 55
  • 57. Table A.4: Concentrations found using the method described in section 4.2. These are of a single sample from a healthy patient. The standard list of molecules as given in section A.1 and the list of molecules as given by section A.2 are used. Molecule concentration (ppmv) Freon-134a (1 1 1 2-tetrafluoroethane) 2.22045e-14 Freon-218 (octafluoropropane) 2.22045e-14 Freon-C318 (octafluorocyclobutane) 2.22045e-14 Furan 2.22045e-14 Glycoaldehyde 2.22045e-14 Guaiacol 2.22045e-14 Hexachloro-1 3-butadiene 2.22045e-14 Hexafluoroacetone 2.22045e-14 Hexafluoroethane (Freon-116) 2.22045e-14 Hexafluoroisobutylene 2.22045e-14 Hexafluoropropene 2.22045e-14 Hydrogen peroxide 2.22045e-14 Isobutyl acetate 2.22045e-14 Isopentyl acetate 2.22045e-14 Isopropyl acetate 2.22045e-14 Methacryloyl chloride 0.101246 Methanesulfonyl chloride 0.0190093 Methyl acetate 2.22045e-14 Methyl acrylate 2.22045e-14 Methyl benzoate 2.22045e-14 Methyl formate 2.22045e-14 Methyl iodide 2.22045e-14 Methyl isoamyl ketone 2.22045e-14 Methyl isobutyl ketone (MIBK) 2.22045e-14 Methyl isobutyrate 2.22045e-14 Methyl methacrylate 2.22045e-14 Methyl pivalate 0.0277697 Methyl propionate 2.22045e-14 Methyl propyl ketone 2.22045e-14 Methyl salicylate 2.22045e-14 Methylchloroformate (MCF) 2.22046e-14 Methyldichlorodisilanes (mixed isomers) 2.22045e-14 Methylethyl ketone 2.22045e-14 Methyltrichlorosilane 2.22046e-14 Methylvinyl ketone 2.22045e-14 N N’-Dimethyl formamide (DMF) 2.22045e-14 N N-Diethylaniline 2.22045e-14 N N-diethyl formamide 2.22045e-14 56
  • 58. Table A.4: Concentrations found using the method described in section 4.2. These are of a single sample from a healthy patient. The standard list of molecules as given in section A.1 and the list of molecules as given by section A.2 are used. Molecule concentration (ppmv) Nitrogen dioxide and dinitrogen tetroxide 2.22045e-14 Octanoic acid 2.22045e-14 Paraldehyde 2.22045e-14 Pentafluoroethane (f125) 2.22045e-14 Perfluorobutane 2.22045e-14 Perfluoroisobutylene (PFIB) 0.000842043 Phenol 2.22045e-14 Propargyl chloride 0.250046 Propionic acid (and some dimer) 2.22045e-14 Propyl acetate 2.22045e-14 Propylene glycol 2.22045e-14 Sulfuryl chloride 2.22045e-14 Sulfuryl fluoride 2.22046e-14 Tetrafluoromethane 2.22045e-14 Trichlorofluroethylene 2.22045e-14 Trifluoroacetic acid 2.22045e-14 Trifluoroacetic anhydride 0.00520583 Trifluoroacetyl chloride 2.22045e-14 Trifluoromethane (Freon-23) 2.22045e-14 Trifluoromethylsulfur pentafluoride 2.22045e-14 Trifluoronitrosomethane 2.22045e-14 Trimethylamine 2.22045e-14 Vinayl acetate 2.22045e-14 bis(2-Chloroethyl) ether 2.22045e-14 cis-1 3-Dichloropropene 2.22045e-14 m-Cresol 2.22045e-14 n-Amyl acetate 2.22045e-14 o-Toluidine 2.22045e-14 tert-Amyl methyl ether 2.22045e-14 57
  • 59. B. List of molecules B.1 Breath molecules with absorbance in the 1020 to 1100 cm−1 region. Table B.1: Molecules as filtered using subsection 4.1.3 in the 1020 to 1100 cm−1 region on 199 molecules that should be present in breath[17] . Molecules 1-Hexanoic acid 1-Pentanol (n-amyl alcohol) 1-Propanol 1 2-Diclorobenzene 1 3-Dichlorobenzene 1 4-Dichlorobenzene 1 4-Dioxane 2-Butoxyethanol 2-Ethoxyethyl acetate 2-Ethyl-1-hexanol 2-Methoxyethanol 2-methyl-1-propanol (IBA isobutanol) 2-Methyl-2-propanal (Isobutenal) 2-Methylfuran 2 3-Butanedione 3-methylfuran Acetic acid Acetic anhydride Acetol Allene (1 2-propadiene) Allyl alcohol Ammonia anhydrous Benzene Benzyl alcohol Butyl acetate Butyric acid 58
  • 60. Table B.1: Molecules as filtered using subsection 4.1.3 in the 1020 to 1100 cm−1 region on 199 molecules that should be present in breath[17] . Molecules Cadaverine Chlorobenzene Cyclopentene Cyclopropane Diethyl ether Dimethoxymethane Dimethyl ether Dimethyl sulfoxide Dipropylene glycol methyl ether Ethyl acetate Ethyl acrylate Ethyl alcohol Ethyl butyrate Ethyl tert-butyl ether Ethylene Ethylvinyl ether (EVE) Furan Furfural Furfuryl alcohol Glycoaldehyde Guaiacol Isobutyl acetate Isopentyl acetate Isopropyl acetate Isopropyl alcohol Methyl acetate Methyl alcohol Methyl methacrylate Methyl propionate Methylamine n-Butyl alcohol N N’-Dimethyl formamide (DMF) N N-diethyl formamide Octanoic acid Propionic acid (and some dimer) Propylene glycol Propylene sulfide Pyridine sec-Butyl alcohol 59
  • 61. Table B.1: Molecules as filtered using subsection 4.1.3 in the 1020 to 1100 cm−1 region on 199 molecules that should be present in breath[17] . Molecules tert-Butyl methyl ether Thiophene trans-Crotonaldehyde Trimethylamine 60