SlideShare a Scribd company logo
1 of 47
Download to read offline
Page 1 of 47
The Presence of Prevotella intermedia 17
within the human lung and its relationship
with lung cancer & COPD: a metagenomic
analysis of the human lung microbiome
Student Name: Holly Davies
Student ID: 130023847
Submitted in part candidature for the degree of B.Sc. Biology (Genetics)
Institute of Biological, Environmental and Rural Sciences
Aberystwyth University
Submitted April 2016
Page 2 of 47
Contents Page
0. Preface
0.1 Declaration 4
0.2 Acknowledgements 5
0.3 Abstract 6
1. Introduction
1.1 Outline and Objectives 7
1.2 Lung cancer & COPD 8
1.2.1 Lung cancer 8
1.2.2 COPD 10
1.3 Prevotella intermedia 17 12
1.3.1 The Prevotella genus 12
1.3.2 Prevotella intermedia 12
1.3.3 Prevotella intermedia 17 13
1.4 Lung microbiome research 13
1.5 Previous work 14
2. Materials and Methodology
2.1 Aims and objectives 15
2.2 Initial analysis 15
2.2.1 Largest contig assembly 15
2.2.2 NCBI Blast search 16
2.2.3 Alignment 16
2.3 Individual samples 16
Page 3 of 47
2.3.1 Import and sampling 17
2.3.2 De novo assembly 17
2.3.2 Read Mapping 17
3. Results
3.1 NCBI Blast results 18
3.2 NODE alignment 19
3.3 Mapping of individual samples 20
4. Discussion
4.1 Discussion of Results 23
4.1.1 NCBI MegaBlast and NODE alignment 23
4.1.2 Individual sample data 23
4.2 Limitations & implications 24
4.3 Further study 24
4.4 Conclusions 25
5. References 26
6. Word Count 31
7. List of Figures/Tables/Images 32
8. Appendix 33
Page 4 of 47
0.1 Declaration
Module BR32330
I certify that all material in this paper is the result of my own investigation, except
where indicated, and references used in preparation of the text have been cited. This
paper has not been previously submitted as part of any other assessed module (with the
exception of the project proposal submitted for this paper), or submitted for any other
degree or diploma.
NAME: HOLLY DAVIES
DATE: 13/04/2016
Page 5 of 47
0.2. Acknowledgements
I would like to take this opportunity to thank Dr Justin Pachebat for the opportunity to be a
part in this research, and for the constant & helpful advice and support throughout this entire
project.
I would also like to thank everyone involved in the MEDLUNG project, specifically Joe
Healey, Simon Cameron and Tom Hitch for providing the background and basis necessary
for me to be able to conduct this research.
Finally, I would like to thank Michael Best and Louise Denny for providing motivation and
support throughout this project, it has been invaluable to me.
Page 6 of 47
0.3 Abstract
The aim of this project was to analyse the bacterial DNA present in the sputum of lung
cancer and COPD (Chronic Obstructive Pulmonary Disease) patients to further research into
developing a biomarker for these diseases in association with the MEDLUNG Project
(Metabolic Biomarkers for the Detection of Lung cancer) – a multicentre study on behalf of
the National Health Service (NHS). The initial analysis was conducted on an Illumina
metagenome contig assembly of data collected from 30 patients (10 healthy, 10 lung cancer,
10 COPD) using NCBI (National Centre for Biotechnology Information) BLAST (Basic Local
Alignment Search Tool) searches. From this analysis Prevotella intermedia 17 was identified
within the contig assembly.
Prevotella intermedia had previously been found orally in periodontal diseases (Maeda
et al, 1998) periapical periodontitis (Jacinto et al, 2003), and noma (an acute gangrenous
disease) (Bolivar et al, 2012), and also had been found to be associated with cystic fibrosis
(Ulrich et al, 2010) and causing an increased risk of pneumonia in mice (Nagaoka et al, 2014).
Specifically, Prevotella intermedia 17 is a clinical strain of the species that had only been
isolated from the periodontal pocket (Ruan et al, 2015).
This analysis was conducted using CLC Genomics Workbench 8 (CLC bio, 2016) and
included performing a de novo assembly with the initial patient data from the MEDLUNG
collection, and mapping this to the P. intermedia 17 reference genome. From this it was further
found that P. intermedia 17 is indeed found in the lungs, but also that lung cancer and COPD
have a seriously negative effect upon it, reducing it by 85-99% when compared with the healthy
control group.
This study has discovered the presence of Prevotella intermedia 17 in the lungs for the
first time, and also that P. intermedia 17 does have a relationship with both lung cancer and
COPD in humans. This could lead to the development of a new diagnostic test for lung cancer
or COPD, or possibly further the knowledge surrounding these diseases and how they manifest
in the human lung. Developing a new diagnostic test and providing early screening for patients
is vitally important for lung cancer and COPD, as it would have the capacity to save countless
lives by giving more people access to curative treatment at an earlier stage where it can be
effective.
Page 7 of 47
1. Introduction
1.1 OUTLINE AND OBJECTIVES
The aim of this project was to analyse the bacterial DNA present in the sputum of lung
cancer and COPD (Chronic Obstructive Pulmonary Disease) patients to further research into
developing a biomarker (biological molecule which is specific to said diseases) for these
diseases in association with the MEDLUNG Project (Metabolic Biomarkers for the Detection
of Lung cancer) – a multicentre study on behalf of the National Health Service (NHS). The
initial analysis was conducted on an Illumina metagenome contig assembly of data collected
from 30 patients (10 healthy, 10 lung cancer, 10 COPD) using NCBI (National Centre for
Biotechnology Information) BLAST (Basic Local Alignment Search Tool) searches. From this
analysis Prevotella intermedia 17 was identified within the contig assembly.
Prevotella intermedia had previously been found orally in periodontal diseases (Maeda
et al, 1998) periapical periodontitis (Jacinto et al, 2003), and noma (an acute gangrenous
disease) (Bolivar et al, 2012). Outside of oral diseases, Prevotella intermedia had been found
to be associated with cystic fibrosis (Ulrich et al, 2010) and causing an increased risk of
pneumonia in mice (Nagaoka et al, 2014). Specifically, Prevotella intermedia 17 is a clinical
strain of the species that had only been isolated from the periodontal pocket (Ruan et al, 2015),
with no links to lung cancer/COPD, or even the lungs in general.
From this, the Prevotella intermedia 17 reference genome was aligned with the raw
individual patients’ data to confirm its presence within the lungs, and to determine whether it
is linked to lung cancer and COPD. Hopefully a link between Prevotella intermedia and these
diseases would be established, leading to a new diagnostic test being developed in further
study, ensuring early diagnosis and higher survival rates of lung cancer and COPD sufferers.
Developing a new diagnostic test and providing early screening for patients is vitally
important for lung cancer and COPD, as two-thirds of lung cancer cases are diagnosed at
advanced stages whereby curative treatment becomes unavailable (CancerResearchUK, 2015a)
and COPD is regularly under- and mis-diagnosed (W.H.O., 2015a). If an early diagnostic test
could be developed, it would have the capacity to save countless lives by giving more people
access to early treatment.
Page 8 of 47
1.2 LUNG CANCER AND COPD
Lung cancer and COPD are among 2 of the most prevalent respiratory tract disorders
(CancerResearchUK, 2015a), both having extremely high morbidity and mortality (Eddy,
1989, Mallia et al., 2007). The most common cause of cancer death in the UK is lung cancer
(CancerResearchUK, 2015b), with COPD causing 6% of deaths globally (W.H.O., 2015a).
These diseases are not mutually exclusive, as a high risk of lung cancer usually equals a high
risk of COPD (Raviv et al, 2011). Hopefully by developing a biomarker for one, it would give
pointers for a biomarker for the other.
1.2.1 LUNG CANCER
Lung cancer is the most common cause of cancer death in the UK, accounting for 22%
of all deaths from cancer, and is the second most common cancer in the UK
(CancerResearchUK, 2015c). Globally, 58% of lung cancer cases occurred in less developed
countries in 2012 (Ferlay et al, 2014), and accounted for 1.59 million deaths (W.H.O., 2014).
In many cases the cause of the disease is clear, with tobacco smoking accounting for
more than 8 out of 10 cases, however other risk factors include exposure to carcinogens and
radiation, air pollution, family history and poor immunity (CancerResearchUK, 2015c).
Ageing is another factor that is involved in the development of lung cancer, which can be down
to an accumulation of the effect of risk factors (overall risk accumulation), however the overall
risk accumulation is then combined with the less effective cellular repair mechanisms as a
person grows older (W.H.O. 2015b). However, the World Health Organisation states that
“more than 30% of cancer deaths could be prevented by modifying or avoiding key risk factors”
(W.H.O. 2015b).
There are many preventative measures currently operating to attempt to reduce the
incidence of lung cancer. The main focus of these are to decrease smoking levels in
populations, but there are also some measures to address the rarer risk factors. Smoking
cessation is the main method of preventing lung cancer, as after 10 years of smoking cessation,
there is a 30-50% reduction in lung cancer mortality risk when compared to persistent smokers
(Fiore et al, 1996) and is helped by government campaigning as seen in Image 1. To help a
person achieve smoking cessation, the Agency for Healthcare Research and Quality (formerly
the Agency for Health Care Policy and Research [AHCPR]) developed a set of clinical
smoking-cessation guidelines for the benefit of both the patient and the health care provider
Page 9 of 47
(Fiore et al, 1996), including documenting the patient’s tobacco use and the offer of one or
more effective smoking cessation treatments (nicotine-replacement, social support, skills
training/problem solving etc.). Another method of prevention includes the moderating of
occupational exposure to lung carcinogens, such as chromium, arsenic, nickel and asbestos, as
when all considered together, attribute to 9-15% of all lung cancer (Alberg et al, 2007).
Image 1: Government campaign supporting smoking cessation (Parry, 2010)
There are two main classifications of lung cancer; non-small cell and small cell. Non-
small cell lung cancer accounts for approximately 85% of lung cancers and occurs in three
types; adenocarcinoma, squamous cell carcinoma and large cell carcinoma (CancerCare®,
2016). Small cell lung cancer accounts for the remaining 15% of lung cancer incidences, and
tend to grow more quickly than non-small cell tumours (CancerCare®). The most common
symptoms associated with lung cancer are coughing, shortness of breath, fatigue and blood
present in the sputum (CancerResearchUK, 2015c). Other symptoms can include weight loss,
recurrent infections such as bronchitis and pneumonia, and chest pain (American Cancer
Society, 2016). Lung cancer can also produce hormone-like substances which enter the
bloodstream, causing paraneoplastic syndromes in various tissues and organs such as
hypercalcemia (high blood calcium levels), blood clots, gynecomastia (excess breast growth in
men) and various nervous system problems (American Cancer Society, 2016).
Despite all this there is no national screening programming for lung cancer in the UK,
leading to most cases being discovered via x-ray, by which point the cancer is usually too
advanced for curative treatment (CancerResearchUK, 2015c). There are some attempts to
introduce a screening programme into the UK, such as the UK Lung Cancer Screening Trial
(UKLS), which aims to screen people most at risk (e.g. between the age of 50-75) using various
Page 10 of 47
clinical tests, the most promising being CT scanning, to help diagnose lung cancer earlier
(UKLS, 2012). There are some screening programmes in the US, however they are very
selective in who they screen and also use CT scanning to determine diagnosis (CDC, 2016).
The problem with the current focus on lung cancer screening is that it requires the use of CT
scanning, which exposes the patient to radiation, possibly increasing the risk of cancer
(Brenner, 2003). Cancer Research UK state that the essential criteria for a possible screening
programme is to be simple, quick, relatively inexpensive and not harmful (CancerResearchUK,
2015c), which the current possible screening programmes do not meet, causing harm through
radiation exposure or a possible allergic reaction to the dye used in the CT scan (NHS, 2016).
The discovery of a biomarker for lung cancer could save lives through the development of a
new diagnostic test, detecting lung cancer before it can be seen on a CT scan, whilst also
complying with the essential criteria for a screening programme.
1.2.2 COPD
Chronic Obstructive Pulmonary Disease (COPD) is a lung disease which interferes with
normal breathing via a persistent blockage of airflow. It causes 25000 deaths per year in the
UK and more than 3 million globally in 2012, approximately 6% of all deaths recorded
(W.H.O., 2015a), becoming the third most common cause of death in the world (Lozano et al,
2013). However, these numbers are not an accurate representation of how prevalent COPD is,
with an estimated 24 million people in the US suffering from the disease without even knowing
it (American Lung Association, 2016), pushing for a better diagnosis/screening programme
and more public awareness.
As with lung cancer, the leading cause of COPD is cigarette smoking (NIH, 2013a). As
many as 8 out of 10 COPD-related deaths are caused by smoking (US. Department of Health
and Human Services, 2014), accounting for approximately 5.4 million deaths in 2005 (W.H.O.
2016). There are also other risk factors for COPD, mainly prevalent in low-income countries
(W.H.O. 2016). Exposure to indoor air pollution, mainly caused by the use of biomass fuels
for cooking and heating, is the biggest risk factor in these countries due to inefficient resources
available, with approximately 3 billion people using these methods of heating (W.H.O. 2016).
Other risk factors include exposure to certain types of dust and chemicals at work (e.g. coal
and cadmium) and possibly urban air pollution (not conclusive) (NHS, 2014). The preventative
measures in place for COPD are the same as those for lung cancer, as they both have the same
risk factors.
Page 11 of 47
The poor airflow associated with COPD is the result of the contributions of two
conditions; emphysema (the breaking down of lung tissue) and obstructive bronchiolitis (small
airways disease) (Vestbo et al, 2013a), which create structural changes within the lungs, as
seen in Image 2. The main symptoms associated with COPD include breathlessness, abnormal
sputum and a chronic cough (W.H.O. 2015a). However, at first, COPD can present no
symptoms, or only mild ones, making early diagnosis difficult (NIH. 2013b).
Image 2: Structural changes in human lungs with COPD (Houghton, 2013)
There is a diagnostic test for COPD called spirometry, which is only considered for
someone over the age of 35-40 who presents with various symptoms and has had a history of
exposure to the risk factors (Vestbo et al, 2013b). Spirometry involves the use of a
bronchodilator (drug to open airways) and works by measuring the amount of airflow
obstruction present (Qaseem et al, 2011). To make a diagnosis, two measurements are made:
the forced expiratory volume in one second (FEV1) (greatest volume of air expelled in one
second), and the forced vital capacity (FVC) (greatest volume of air expelled in one full breath)
(Young & Vincent, 2010). Using these two measurements a FEV1/FVC ratio can be calculated
and compared against medical guidelines (usually a ratio lower than 70% in someone with
COPD-like symptoms) to determine whether or not they have the disease, however this can
lead to an over-diagnosis of COPD in elderly patients (Qaseem et al, 2011). The issue with
spirometry as a diagnostic tool is that using it on people who do not present symptoms of COPD
Page 12 of 47
has “evidence of uncertain effect, and therefore is currently not recommended” (Vestbo et al,
2013a). Due to this these is no early diagnostic method for people with COPD, therefore by the
time the disease is diagnosed, it is too advanced for curative treatment to be successful
(W.H.O., 2015a). Developing a diagnostic tool based on a biomarker would be highly
beneficial to COPD sufferers as it has the possibility to detect the disease before symptoms
have manifested, making treatment more successful. As with lung cancer, this could be
introduced as a national screening programme to reduce the deaths caused by COPD, as a large
amount of people with COPD are not diagnosed correctly (American Lung Association, 2016).
1.3 PREVOTELLA INTERMEDIA 17
1.3.1. THE PREVOTELLA GENUS
The Prevotella genus is a group of anaerobic gram-negative rod-shaped bacteria most
commonly found in association with periodontal diseases (Maeda et al, 1998). It is classified
among the group of ‘black pigmented bacteria’ due to the formation of smooth and shiny
colonies with black/grey colour when grown on a blood agar plate (Shah & Collins, 1990). The
original classification for these bacteria was Bacteroides melaninogenicus, until it was
reclassified and split into Prevotella melaninogenicus and Prevotella intermedia (Brook,
2015). The Prevotella genus is very versatile, having been found in various areas such as the
oral cavity, upper respiratory tract, urogenital tract (Eiring et al, 1998), rumen and human
faeces (Hayashi et al, 2007). Many species of Prevotella are potential/opportunistic pathogens
(Yunfeng et al, 2015) under a wide range of environments and are known to invade host tissues
(Nadkarni et al, 2012).
1.3.2 PREVOTELLA INTERMEDIA
Due to its isolation from lesions of patients, Prevotella intermedia has been found as a
putative periodontal pathogen, specifically in early periodontitis, advanced periodontitis, and
acute necrotizing ulcerative gingivitis (Haffajee & Socransky, 1994). It has also been found to
invade the human coronary artery endothelial and smooth muscle cells in vitro (Dorn et al,
1999) and has been found in atheromatous plaques (Haraszthy et al, 1998). A significant find
in relation to this study is that “P. intermedia plays a critical role in the complex
pathophysiology of lung disease in patients with cystic fibrosis” when in anaerobic sputum
Page 13 of 47
plugs (Ulrich et al, 2010). The results of this study could show that this situation is not only
limited to cystic fibrosis patients, but also people suffering from lung cancer and COPD.
1.3.3. PREVOTELLA INTERMEDIA 17
P. intermedia 17 is a strain of P. intermedia clinically isolated from a human
periodontal pocket (Fukushima et al, 1992). It is differentiated from the other strains of P.
intermedia (for example 27 and ATCC 25611) by examining the diameter of fimbriae (curlin
protein appendages carrying adhesins) present on its cell surface (Leung et al, 1989). P.
intermedia presents type C (8nm diameter) fimbriae, unlike other strains of this species (Dorn
et al, 1998). Dorn et al (1998) found that, in terms of the human oral epithelial cell line, P.
intermedia 17 has the ability to invade host cells whereas strain 27 and ATCC 25611 cannot,
and also possesses strong agglutinating activity for human erythrocytes and can bind to human
buccal epithelial cells more avidly than other strains. He further speculates that “the type C
fimbriae could promote invasion by providing a means for the bacteria to attach to the cell
surface” (Dorn et al, 1998). Fan et al (2006) further state that P. intermedia 17 possesses a cell
surface protein with a broad-spectrum extra-cellular-matrix binding ability, which probably
mediates its binding through adhesins. With P. intermedia 17’s ability to do this, it could be
possible that this strain can also invade the cells of human lungs through the epithelial layer
and extracellular matrix present on the alveoli. If this is shown to be true, it would be the first
time this strain has been found in the lungs, and further could present a pathological
relationship with lung cancer/COPD.
1.4 LUNG MICROBIOME RESEARCH
Lung microbiome research is a relatively new method of research in which the bacterial
contents of the human lung are analysed, mostly for the purpose of disease investigation. Many
factors can influence the environment in the lungs, such as oxygen, pH, hydrophobicity,
temperature, salinity, predators, nutrient scarcity and many more (Dickson et al, 2015), factors
which disease can alter very easily. The microbiome of the lungs is determined by three
ecological factors; “microbial immigration into the airways, elimination of microbes from the
airways and the relative reproduction rates of its community members, as determined by
regional growth conditions” (Dickson et al, 2015). During disease these three ecological factors
change, therefore changing the bacteria species present in the lungs. By examining the changes
Page 14 of 47
in bacteria species, it gives insight into the effects the disease is having on the lungs, and
possibly opens the door to new diagnostic tests and treatments being developed based on it.
Examples of successful lung microbiome projects include; the identification of a core set of
common bacteria found in the lungs of COPD patients (Erb-Downward et al, 2011), the
discovery that certain members of Staphylococcus and Streptococcus are linked to the
progression of idiopathic pulmonary fibrosis (Han et al, 2014), and the discovery that P.
intermedia plays a critical role in the pathophysiology of lung diseases in patients with cystic
fibrosis (Ulrich et al, 2010). Using the techniques set out from these papers and many more,
this study will analyse the microbiome of the lung to identify a bacteria species that possessed
a link to lung cancer or COPD.
1.5 PREVIOUS WORK
Thirty sputum samples were obtained for the MEDLUNG project (10 from healthy
patients, 10 from patients suffering with lung cancer, 10 from patients suffering from COPD)
along with the patients’ medical histories (all data was collected and treated in compliance with
ethical guidelines and confidentially). The genomic DNA was extracted from these samples
and used to create barcoded Illumina sequencing libraries for each individual samples. These
were subsequently paired-end sequences on an Illumina HiSeq2000 platform by Simon
Cameron as part of his PhD (Cameron, 2015). From this a de novo contig assembly was
performed by Tom Hitch (IBERS PhD student), which forms the starting point for this study.
Page 15 of 47
2. Materials and Methods
2.1 AIMS AND OBJECTIVES
The aims and objectives for this project were to analyse the de novo contig assembly,
provided by Tom Hitch (IBERS PhD student), of the DNA samples obtained by the
MEDLUNG project. The aim of this was to possibly find a bacteria species which possessed a
relationship with lung cancer or COPD, whether that be with its presence or its absence, to
possibly develop a biomarker in future research. The discovery of a successful biomarker for
these diseases could lead to the development of a new diagnostic test for lung cancer or COPD.
An example of how this could happen would be to develop a primer for a biomarker tagged
with fluorescent markers, therefore if this biomarker is present it would be visible under ultra
violet light, meaning that the patient has one of these diseases (depending on the nature of the
biomarker). This would help the global initiative for reducing the suffering from these diseases
by enabling early diagnosis before the symptoms manifest, making curative treatment more
available.
2.2 INITIAL ANALYSIS
For this part of the analysis the metagenome contig assembly produced by Tom Hitch
(IBERS PhD student) was used to discover a bacteria species present within the sputum
samples of the MEDLUNG collection. To conduct this research, the CLC Genomics
Workbench 8 software (CLC bio, 2016) was used.
2.2.1 LARGEST CONTIG ASSEMBLY
The first stage of the initial analysis involved arranging the metagenome contig
assembly by size, from largest number of base pairs (bp), to the smallest. Due to time
constrictions on the project, the 10 largest contigs were chosen to search through as these
represented the largest portion of the contig assembly whilst being within time constraints. The
10 largest contigs were saved as a separate sequence list, then saved as separate sequences to
allow for analysis.
Page 16 of 47
2.2.2 NCBI BLAST SEARCH
These individual sequences were then subject to a BLAST (Basic Local Alignment
Search Tool) (Altschul et al, 1990) function to identify their individual components by aligning
against reference genomes. For this study, the NCBI (National Centre for Biotechnology
Information) BLAST database was used due to its extensive collection of reference genomes
and genes, and it’s easy to use interface (NCBI, 2016). The individual sequences were run
through the NCBI nucleotide BLAST function, using the MegaBlast algorithm (default
parameters) (Morgulis et al, 2008) which is used for comparing a query sequence to a reference
sequence and is used for sequence identification and intra-species comparison (NCBI, 2015).
It was noticed that Prevotella intermedia 17 had hit 5 of the 10 largest contigs at relatively high
query cover levels, which was unusual as this strain of Prevotella intermedia had not yet been
found in the lungs. Therefore, it was decided to continue with this line of research for the
remainder of this project. Also the MegaBlast search results were saved for these 5 contigs to
reference later.
2.2.3 ALIGNMENT
To display the relation between the 5 largest contigs which hit P. intermedia 17 and the
reference genome itself, CLC genomics workbench 8 was used to create a visual alignment of
these contigs against the reference genome obtained from the NCBI genome database. To do
this the P. intermedia 17 reference genome was imported into CLC using standard import, and
then, using Toolbox > Molecular Biology Tools > Sequencing Data Analysis > Assemble
Sequences to Reference, was used as the reference genome for assembling each of the 5 contigs
to it to display the query cover data obtained from the search results of the MegaBlast. Using
the graphics function in CLC, these alignments were then exported for use in the results.
2.3 INDIVIDUAL SAMPLES
From the discovery of P. intermedia 17 in the metagenome contig assembly, the next
step was to use this reference genome to search through the individual patient samples, obtained
by MEDLUNG, to provide further evidence of the presence of this bacteria and to determine
whether this strain was linked to lung cancer/COPD. This was also performed in CLC genomics
workbench 8. The data consisted of 30 samples (labelled B, C or D depending on which
Page 17 of 47
collection of patients the samples was from, and numbered 2-11). Each sample consisted of 4
files, 2 lanes each having 2 reads (forward and backward).
2.3.1 IMPORT AND SAMPLING
To start this section of the research, the individual patient data had to be imported into
CLC. It was in [fastq] format, therefore it was imported using the Illumina import function,
ensuring to select the paired read function to merge the 2 read files into one. After this there
was 60 files, 30 samples containing 2 lanes of data each. Then, due to time and computer
restraints, it was decided to sample 500000 reads from each file (according to sample size
calculation, on average needed to exceed ~2000 reads to be significant). To do this in CLC:
Toolbox > NGS Core Tools > Sample Reads, then specified 500000 reads to be sampled.
2.3.2 DE NOVO ASSEMBLY
To merge the 2 ‘lane’ files for each sample and to make further analysis easier, it was
decided to perform de novo assemblies for each sample. In CLC, with the 2 files of the sample
selected, Toolbox > De Novo Sequencing > De Novo Assembly. Default parameters were used
with the exception of mapping the contigs back to the contigs, due to time constraints.
2.3.3 READ MAPPING
Once the de novo assembly had completed, the next stage was to map the assemblies
to the P. intermedia 17 reference genome to identify any reads from the bacteria genome. To
do this the Map Reads to Reference function (default parameters) was used, located in CLC >
Toolbox > NGS Core Tools > Map Reads to Reference, and the P. intermedia 17 genome
selected as the reference. Once all the mapping was completed for a full set of samples i.e. B,
a track list was created of all the mapping graphs, and the maximum graph coverage set at 3
across all sample sets for the purpose of comparison later on. These track lists were then
exported using the graphics function in CLC for comparison later. Furthermore, the number of
reads for each sample set, separated into each individual sample, were plotted as graphs for
easy interpretation as results. It was decided that the track lists of mapping graphs were to be
included in the appendix as they were summarised by the graphs formulated.
Page 18 of 47
3. Results
3.1 NCBI BLAST RESULTS
The first set of results show the NCBI MegaBlast search results from the initial analysis
of research, indicating the presence of P. intermedia 17 in 5 out of 10 of the largest contigs
(nodes). The full NCBI MegaBlast results are available in appendix (1).
NODE ID Description Max
Score
Total
Score
Query
cover
E
value
Identity Accession
NODE_54069 P. intermedia 17
chromosome II
7491 24979 90% 0.0 83% CP003503.1
NODE_28947 P. intermedia 17
chromosome I
3517 10209 40% 0.0 81% CP003502.1
NODE_13609 P. intermedia 17
chromosome II
6259 18001 77% 0.0 83% CP003503.1
NODE_12098 P. intermedia 17
chromosome II
5068 18113 63% 0.0 81% CP003503.1
NODE_18381 P. intermedia 17
chromosome II
2372 5668 33% 0.0 81% CP003503.1
Table 1: NCBI MegaBlast results from inputting the 10 largest contigs (nodes) from the initial
analysis. Only the 5 contigs that hit P. intermedia 17 are displayed, including the chromosome
they hit, the max score, total score, query cover, E value, identity and accession ID.
The 5 nodes input into MegaBlast all hit at above 80% identity, with over 33% query
cover, with an E value of 0. Therefore P. intermedia 17 was significantly found within the 5
largest contigs of initial analysis, validating the search for this bacteria within the individual
samples.
To confirm the presence of this bacteria in the contigs, the P. intermedia reference
genome of the corresponding chromosome found in the MegaBlast results was aligned with
the contigs that returned P. intermedia 17 hits.
Page 19 of 47
3.2 NODE ALIGNMENT
Figure 1: The alignment of the 5 contigs with the P. intermedia 17 reference genome
corresponding to the chromosomes which hit each individual node. The pink areas of the
coverage graph display the areas which align with the node.
As seen in Figure 1, there is indeed the presence of P. intermedia 17 within the
metagenome contig assembly and therefore present in the lungs of the individual patients.
Some nodes contain more P. intermedia 17 genome than others, with the most conserved being
NODE_54069, containing a 90% query cover and 83% identity to the bacteria genome, and the
Page 20 of 47
least conserved being NODE_18381, containing a 33% query cover and 81% identity,
displayed in the visual alignment in Figure 1. These results do indeed prove the presence of P.
intermedia 17 within the lungs.
3.3 MAPPING OF INDIVIDUAL SAMPLES
The next stage of the analysis revolved around mapping the P. intermedia 17 genome
to the individual patient data to further support the hypothesis that P. intermedia 17 is present
within the lung and to distinguish any relationship between P. intermedia 17 and lung
cancer/COPD. The full mapping graphs from this part of the analysis are available in the
appendix (2), with individual patient mapping data available in appendix (3).
First looked at was the total number of mapped reads across the three patient groups;
Control, Lung Cancer, and COPD.
Figure 2: The total number of mapped P. intermedia 17 reads across the three patient groups;
Control, Lung Cancer, and COPD. The raw data values are displayed above the data bars.
1622
226
6
0
200
400
600
800
1000
1200
1400
1600
1800
Control Lung Cancer COPD
Numberofmappedreads
Patient Group
Total Number of Mapped P. intermedia 17 Reads
Page 21 of 47
From Figure 2 it can be said that the highest number of mapped P. intermedia 17 reads
were present in the control group (healthy patients), with the number decreasing significantly
in lung cancer patients, and even further in COPD patients.
To ensure that the trend displayed in Figure 2 was not due to a varying number of
reads/bases in the sample data, the average percentage of mapped reads across the entire
individual patient group was looked at to see if the trend appeared here also.
Figure 3: The average percentage of reads from the individual patient groups that mapped to
P. intermedia 17. The actual percentage is displayed to the left of each marker.
Figure 3 appears to nearly mirror the trend shown in Figure 2 that the amount of P.
intermedia 17 within human sputum is at its highest in healthy people (control group),
decreasing significantly in lung cancer patients, and even further in COPD patients.
It was decided that it would also be useful to look at the distribution of reads among the
two chromosomes in the P. intermedia 17 genome to determine which one is more prevalent
among the mapped reads.
0.756
0.289
0.022
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Control Lung Cancer COPD
Average%ofmappedreads
Patient Group
Average % of Reads Mapped to P. intermedia 17
Page 22 of 47
Figure 4: The distribution of mapped reads in the patient groups. Chromosome 1 is displayed
by the solid black data bars and chromosome 2 is represented by the patterned data bars. The
actual value of number of reads is displayed above the data bars. For COPD (due to the low
numbers) the data bars cannot be seen – 2 represents chromosome 1 and 5 represents
chromosome 2.
From Figure 4 it can be seen that chromosome 2 of P. intermedia 17 appears
significantly more among the individual patient data than chromosome 1, especially in the
control group, where chromosome 2 appears approximately over 4 times more than
chromosome 1. However, this could be due to chromosome 2 being approximately 4 times
longer than chromosome 1. On the other hand, there is also the fact that chromosome 2
appeared in 4 out of the 5 largest contigs, as opposed to chromosome 1 hitting 1 contig, which
could not be affected by sequence length.
299
39
2
1323
187
5
0
200
400
600
800
1000
1200
1400
Control Lung Cancer COPD
Numberofmappedreads
Patient Group
Distribution of Mapped Reads in the Patient Groups
Chromosome 1 Chromosome 2
Page 23 of 47
4. Discussion
4.1 DISCUSSION OF RESULTS
4.1.1 NCBI MEGABLAST AND NODE ALIGNMENT
From the NCBI MegaBlast and the following node alignment it is shown that P.
intermedia 17 is definitely present within the human lung. This is the first time that this strain
of P. intermedia has been located in the human lung and opens the door for various further
studies, such as P. intermedia 17’s molecular relationship with lung cancer/COPD and whether
the decrease in the bacterium is directly caused by the presence of the disease. P. intermedia
has already been found in the human lung in relation to cystic fibrosis (Ulrich et al, 2010) so
it’s not entirely unheard of, however strain 17 has only now been found to be present there
also.
4.1.2 INDIVIDUAL SAMPLE DATA
The individual sample data provides many conclusions towards this study.
Firstly, it further confirms the presence of P. intermedia 17 in the human lung along with the
NCBI MegaBlast and node alignment results. In addition to this these results also show a very
interesting trend between the control group, lung cancer group and COPD group. Many species
of Prevotella are potential/opportunistic pathogens (Yunfeng et al, 2015) under a wide range
of environments and are known to invade host tissues (Nadkarni et al, 2012). From this it would
be expected that if a link was found between P. intermedia 17 and lung cancer/COPD, the trend
displayed would show that the level of bacteria would increase in lung cancer/COPD patients,
as P. intermedia 17 would be a pathogen linked to the diseases (more bacteria presence = higher
chance of disease developing). However, looking at the results of the individual sample data
analysis, the trend is actually opposite in this case. The highest level of P. intermedia 17 was
present in the control group (healthy patients) and decreased by approximately 85% in the lung
cancer patient group, and decreased a further ~15% in the COPD patient group. To ensure this
was not due to variation in sample size between patient groups the average % of mapped reads
out of the whole sample group data was calculated, and the trend from this nearly mirrored the
trend shown in the total mapped reads graph. These trends show that the presence of P.
intermedia decreases when lung cancer or COPD is present in the patient. This could be due to
many factors, direct or indirect. The manifestation of these diseases could directly cause the
death of the P. intermedia 17 cells, for example by phagocytosis or toxin/hormone release. On
Page 24 of 47
the other hand, they could destroy the P. intermedia cells indirectly, through possibly
increasing the growth/presence of other bacteria species which compete with the P. intermedia,
or through changing the environment in the lungs making it inhabitable for the bacteria. This
would be a question for further study, and could lead to a wider knowledge about the mechanics
of lung cancer and/or COPD in the human lung.
The distribution of mapped reads across the P. intermedia 17 genome was also looked
at to see if lung cancer/COPD affected this. Chromosome 1 of P. intermedia 17 contains
579647 base pairs, and chromosome 2 contains 2119790 base pairs, approximately four times
more. This is reflected in the distribution of mapped reads in the control patient group, with
chromosome 1 having 299 mapped reads and chromosome 2 having 1323 mapped reads. This
is relatively maintained in the other two patient groups with some allowance for standard errors,
therefore the diseases do not affect the viability of chromosome 1 or 2 within the human lung.
4.2 LIMITATIONS & IMPLICATIONS
Time and computer memory/processor deficiency was a large limitation that was
encountered during the research, leading to the sampling of 500000 reads from the individual
data. For example, conducting a de novo assembly on the original data was taking up to 18
hours per file, with some attempts aborting due to disk space and computer memory deficiency,
therefore taking a lot longer than was expected due to the size of the files. To correct this when
using this data in the future, a computer with a very large amount of memory and an excellent
processor would be required to complete genomic analysis of the full individual sample data.
4.3 FURTHER STUDY
There are many various routes that could be followed when conducting further research
from this analysis. An example would be to calculate a minimum threshold of P. intermedia
17 presence within the lung for the two diseases i.e. if a patient falls below this threshold then
further investigation would be required or a diagnosis achieved. For this to work a diagnostic
test would have to be developed. This could be achieved, for example, by developing a
biomarker for P. intermedia 17 and tagging it with a fluorescent marker. When, for example,
mixed with patient’s sputum, the biomarker with fluorescent tag would bind to any P.
Page 25 of 47
intermedia 17 present, and be visible under an ultra-violet light. The less fluorescence visible,
the higher the patient’s chance of having lung cancer or COPD.
Another route of further study could be researching how the lung cancer/COPD cells
interact with the P. intermedia 17 and cause its reduced prevalence in affected lungs. The
manifestation of these diseases could directly cause the death of the P. intermedia 17 cells, for
example by phagocytosis or toxin/hormone release. On the other hand, they could destroy the
P. intermedia cells indirectly, through possibly increasing the growth/presence of other bacteria
species which compete with the P. intermedia, or through changing the environment in the
lungs making it inhabitable for the bacteria.
Other P. intermedia strains were found in the NCBI MegaBlast results, so researching
whether these appear in the human lung could be another route to follow. Also, investigating
whether P. intermedia 17 is related to any other diseases that predominantly reside in the lungs,
or maybe even whether it has relationships with other types of cancer. Additionally, possibly
investigating whether it has any adverse effects upon the disease itself could be a promising
option.
4.4 CONCLUSIONS
This study has not only discovered the presence of Prevotella intermedia 17 in the lungs
for the first time, it has also discovered that it indeed P. intermedia 17 does have a relationship
with both lung cancer and chronic obstructive pulmonary disorder in humans. This could lead
to the development of a new diagnostic test for lung cancer or COPD, or possibly further the
knowledge surrounding these diseases and how they manifest in the human lung. Developing
a new diagnostic test and providing early screening for patients is vitally important for lung
cancer and COPD, as it would have the capacity to save countless lives by giving more people
access to curative treatment at an early stage where it can be effective.
Page 26 of 47
5. References
ALBERG AJ, FORD JG, SAMET JM. Epidemiology of lung cancer: ACCP evidence-based clinical
practice guideline (2nd
edition). Chest. 2007;132(29S-55S).
ALTSCHUL DF, GISH W, MILLER W, MYERS EW, LIPMAN DJ. Basic local alignment search tool.
J Mol Biol. 1990;215(3):403-10.
AMERICAN CANCER SOCIETY. 2016. Signs and Symptoms of Lung Cancer [Online]. American
Cancer Society. Available: http://www.cancer.org/cancer/lungcancer-non-
smallcell/moreinformation/lungcancerpreventionandearlydetection/lung-cancer-prevention-
and-early-detection-signs-and-symptoms [Accessed 4th April 2016].
AMERICAN LUNG ASSOCIATION. 2016. How Serious is COPD [Online]. American Lung
Association. Available: http://www.lung.org/lung-health-and-diseases/lung-disease-
lookup/copd/learn-about-copd/how-serious-is-copd.html?referrer=https://www.google.co.uk/
[Accessed 4th
April 2016].
BOLIVAR I, WHITESON K, STADELMANN B, BARATTI-MAYERD, GIZARD Y, MOMBELLI
A. Bacterial diversity in oral samples of children in Niger with acute noma, acute necrotizing
gingivitis, and healthy controls. PLoS Negl Trop Dis. 2012;6(3):e1556.
BRENNER DJ. Radiation Risks Potentially Associated with Low-Dose CT Screening of Adult Smokers
for Lung Cancer. RSNA Radiology. 2004;231(2):030-880.
BROOK I. 2015. Bacteroides Infection: Background [Online]. Medscape. Available:
http://emedicine.medscape.com/article/233339-overview [Accessed 6th April 2016].
CAMERON S. Charting Human Microbiome and Metabolome Changes in Disease and Stress.
Aberystwyth University. 2015. PhD thesis.
CANCERCARE®. 2016. Types and Staging of Lung Cancer [Online]. Lungcancer.org (A program of
CancerCare®). Available: http://www.lungcancer.org/find_information/publications/163-
lung_cancer_101/268-types_and_staging [Accessed 4th April 2016].
CANCERRESEARCHUK. 2015a. Lung Cancer Survival Statistics [Online]. CancerResearchUK.
Available: http://www.cancerresearchuk.org/cancer-info/cancerstats/types/lung/survival/lung-
cancer-survival-statistics [Accessed 23rd March 2015]
CANCERRESEARCHUK. 2015b. Lung Cancer Mortality Statistics [Online]. CancerResearchUK.
Available: http://www.cancerresearchuk.org/cancer-info/cancerstats/types/lung/mortality/uk-
lung-cancer-mortality-statistics [Accessed 23rd March 2015]
CANCERRESEARCHUK. 2015c. General Factsheet for Lung Cancer [Online]. CancerResearchUK.
Available:
http://www.cancerresearchuk.org/prod_consump/groups/cr_common/@cah/@gen/documents/
generalcontent/cr_120625.pdf [Accessed 23rd March 2015]
CENTRES FOR DISEASE CONTROL AND PREVENTION (CDC). 2016. Lung Cancer – Basic
Information – What Screening tests are there? [Online]. Centres for Disease Control and
Page 27 of 47
Prevention. Available: http://www.cdc.gov/cancer/lung/basic_info/screening.htm [Accessed
4th April 2016].
CLC BIO. 2016. CLC Genomics Workbench 8 [Software]. Qiagen.
DICKSON RP, HUFFNAGLE GB. The Lung Microbiome: New Principles for Respiratory
Bacteriology in Health and Disease. PLoS Pathog. 2015;11(7):e1004923.
DORN BR, DUNN WA JR, PROGULSKE-FOX A. Invasion of human coronary cells by periodontal
pathogens. Infect Immun. 1999:67(11);5792-8.
DORN BR, LEUNG KP, PROGULSKE-FOX A. Invasion of Human Oral Epithelial Cells by Prevotella
intermedia. Infect Immun. 1998;66(12):6054-6057.
EDDY, D. Screening for lung cancer. Annals of internal medicine. 1989;111:232-237.
EIRING P, WALLER K, WIDMANN A, WERNER H. Fibronectin and laminin binding of urogenital
and oral Prevotella species. Zentralbl Bakteriol. 1998;288(3):361-72.
FAN Y, DIVYA I, CECILIA A, JANINA P, LEWIS DR. Identification and characterisation of a cell
surface protein of Prevotella intermedia 17 with broad-spectrum binding activity for
extracellular matrix proteins. Proteomics. 2006;6(22):6023-32.
FERLAY J, SOERJOMATARAM I, ERVIK M, DIKSHIT R, ESER S, MATHERS C, REBELO M,
PARKIN DM, FORMAN D, BRAY F. 2014. Cancer Incidence and Mortality Worldwide:
IARC CancerBase No. 11. Globocan 2012 v1.1. 2014
FIORE MC, BAILEY WC, COHEN SJ. Smoking Cessation: Clinical Practice Guideline No 18. US
Department of Health and Human Services, Public Health Service, Agency for Health Care
Policy and Research. AHCPR Publ. 1996;96:0692.
FUKUSHIMA H, MOROI H, INOUE J, ONOE T, EZAKI T, YABUUCHI E, LEUNG KP, WALKER
CB, CLARK WB, SAGAWA H. Phenotypic characteristics and DNA relatedness in Prevotella
intermedia and similar organisms. Oral Microbiol Immunol. 1992;7(1):60-4.
HAFFAJEE AD, SOCRANSKY SS. Review: Microbial etiological agents of destructive periodontal
diseases. Periodontol 2000. 1994;5:78-111.
HAN MK, ZHOU Y, MURRAY S, TAYOB N, NOTH I, LAMA VN, MOORE BB, WHITE ES,
FLAHERTY KR, HUFFNAGLE GB, MARTINEZ FJ. Lung microbiome and disease
progression in idiopathic pulmonary fibrosis: an analysis of the COMET study. The Lancet
Respiratory Medicine. 2014;2(7):548-556.
HARASZTHY VI, ZAMBOM JJ, TREVISAN M, SHAH R, ZEID M, GENCO RJ. Identification of
pathogens in atheromatous plaques. J Dent Res. 1998;77:666.
HAYASHI H, SHIBATA K, SAKAMOTO M, TOMITA S, BENNO Y. Prevotella copri sp. nov. and
Prevotella stercorea sp. nov., isolated from human faeces. Int J Syst Evol Microbiol. 2007;57(Pt
5):941-6.
HOUGHTON AM. Mechanistic links between COPD and lung cancer. Nature Reviews Cancer.
2013;13:233-245.
Page 28 of 47
JACINTO RC, GOMES BP, FERRAZ CC, ZAIA AA, FILHO FJ. Microbiological analysis of infected
root canals from symptomatic and asymptomatic teeth with periapical periodontitis and the
antimicrobial susceptibility of some isolated anaerobic bacteria. Oral Microbiol Immunol.
2003;18(5):285-92.
ERB-DOWNWARD JR, THOMPSON DL, HAN MK, FREEMAN CM, MCCLOSKY L, SCHMIDT
LA, YOUNG VB, TOEWS GB, CURTIS JL, SUNDARAM B, MARTINEZ FJ, HUFFNAGLE
GB. Analysis of the Lung Microbiome in the ‘Healthy’ Smoker and in COPD. PLoS ONE.
2011;6(2):e16384.
LEUNG KP, FUKUSHIMA H, SAGAWA H, WALKER CB, CLARK WB. Surface appendages,
hemagglutination, and adherence to human epithelial cells of Bacteroides intermedius. Oral
Microbiol Immunol. 1989;4(4):204-10.
LOZANO R, NAGHAVI M, FOREMAN K. Global and regional mortality from 235 causes of death
age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study
2010. Lancet. 2013;380:2095-128.
MAEDA N, OKAMOTO M, KONDO K, ISHIKAWA H, OSADA R, TSURUMOTO A. Incidence of
Prevotella intermedia and Prevotella nigrescens in periodontal health and disease. Microbiol
Immunol. 1998;42(9):583-9.
MALLIA P, CONTOLI M, CARAMORI G, PANDIT A, JOHNSTON S, PAPI A. Exacerbations of
asthma and chronic obstructive pulmonary disease (COPD): focus on virus induced
exacerbations. Current pharmaceutical design. 2003;13:73-97.
MORGULIS A, COLOURIS G, RAYTSELIS Y, MADDEN TL, AGARWALA R, SHAFFER AA.
Database indexing for production MegaBLAST searches. Bioinformatics. 2008;24(16):1757-
64.
NADKANI MA, BROWNE GV, CHHOUR K, BYUN R, NGUYEN K, CHAPPLE CC. Pattern of
distribution of Prevotella species/phylotypes associated with healthy gingiva and periodontal
disease. Eur J Clin Microbiol Infect Dis. 2012;31(11):2989-99.
NAGAOKA K, YANAGIHARA K, MORINAGA Y, NAKAMURA S, HARADA T. Prevotella
intermedia Induces Severe Bacteremic Pneumococcal Pneumonia in Mice with Upregulated
Platelet-Activating Factor Receptor Expression. Infection and Immunity. 2014;82(2):587-593.
NATIONAL CENTRE FOR BIOTECHNOLOGY INFORMATION (NCBI). 2016. BLAST®
[Online]. National Centre for Biotechnology Information, U.S. National Library of Medicine.
Available: http://blast.ncbi.nlm.nih.gov/Blast.cgi [Accessed 9th April 2016].
NATIONAL CENTRE FOR BIOTECHNOLOGY INFORMATION (NCBI). 2015. BLAST
Homepage and Selected Search Pages: Introducing the BLAST homepage and form
elements/functions of selected search pages [Online]. National Centre For Biotechnology
Information. Available: ftp://ftp.ncbi.nlm.nih.gov/pub/factsheets/HowTo_BLASTGuide.pdf
[Accessed 9th April 2016].
NATIONAL HEALTH SERVICE (NHS). 2014. Chronic obstructive pulmonary disease – Causes of
COPD [Online]. NHS Choices. Available: http://www.nhs.uk/Conditions/Chronic-obstructive-
pulmonary-disease/Pages/Causes.aspx [Accessed 5th April 2016].
Page 29 of 47
NATIONAL HEALTH SERVICE (NHS). 2016. CT Scan – Introduction [Online]. NHS Choices.
Available: http://www.nhs.uk/conditions/ct-scan/Pages/Introduction.aspx [Accessed 4th April
2016].
NATIONAL INSTITUTES OF HEALTH (NIH). 2013a. What is COPD? [Online]. National Heart,
Lung, and Blood Institute. Available: http://www.nhlbi.nih.gov/health/health-
topics/topics/copd/ [Accessed 5th
April 2016]
NATIONAL INSTITUTES OF HEALTH (NIH). 2013b. What are the signs and symptoms of COPD?
[Online]. National Heart, Lung, and Blood Institute. Available:
https://www.nhlbi.nih.gov/health/health-topics/topics/copd/signs [Accessed 5th April 2016].
QASEEM A, WILT TJ, WEINBERGER SE, HANANIA NA, CRINER G, VAN BER MOLEN T,
MARCINIUK DD. Diagnosis and Management of Stable Chronic Obstructive Pulmonary
Disease: A Clinical Practice Guideline Update from the American College of Physicians,
American College of Chest Physicians, American Thoracic Society and European Respiratory
Society. Annals of Internal Medicine. 2011;155(3):179-91
PARRY. 2010. Use and abuse of drugs – the link between smoking and lung cancer [Image][Online]
Available:
http://www.corescience.co.uk/index.php?option=com_content&view=article&id=58%3Ause-
and-abuse-of-drugs&catid=43%3Adrugs&Itemid=41&limitstart=3 [Accessed 13th
April 2016]
RAVIV S, HAWKINS K, DECAMP M, KALHAN R. Lung cancer in chronic obstructive pulmonary
disease: enhancing surgical options and outcomes. American journal of respiratory and critical
care medicine. 2011;176:532-555.
RUAN Y, SHEN L, ZOU Y, QI Z, YIN J, JIANG J, GUO L, HE L, CHEN Z, TANG Z, QIN S.
Comparative genome analysis of Prevotella intermedia strain isolated from infected root canal
reveals features related to pathogenicity and adaptation. BMC Genomics. 2015;16(1):1.
SHAH HN, COLLINS DM. NOTES: Prevotella, a new genus to include bacteroides melaninogenicus
and related species formerly classified in the genus bacteroides. Int J Systematic.
1990;40(2):205-8.
UK LUNG CANCER SCREENING TRIAL (UKLS). 2012. Background to UKLS [Online]. UKLS.
Available: https://www.ukls.org/index.html [Accessed 4th April 2016].
ULRICH M, BEER I, BRAITMAIER P, DIERKES M, KUMMER F, KRISMER B. Relative
contribution of Prevotella intermedia and Pseudomonas aeruginosa to lung pathology in
airways of patients with cystic fibrosis. Thorax. 2010;65(11):978-84.
US. DEPARTMENT OF HEALTH AND HUMAN SERVICES. 2014. The Health Consequences of
Smoking – 50 years of progress: A report of the surgeon general [Online]. Centres for Disease
Control and Prevention. Available: http://www.cdc.gov/tobacco/data_statistics/sgr/50th-
anniversary/index.htm [Accessed 5th April 2016].
VESTBO, JORGEN. Definition and Overview: Global Strategy for the Diagnosis, Management, and
Prevention of Chronic Obstructive Pulmonary Disease. Global Initiative for Chronic
Obstructive Lung Disease. 2013:pp(1-7).
Page 30 of 47
VESTBO, JORGEN. Diagnosis and Assessment: Global Strategy for the Diagnosis, Management, and
Prevention of Chronic Obstructive Pulmonary Disease. Global Initiative for Chronic
Obstructive Lung Disease. 2013:pp(9-17).
WORLD HEALTH ORGANISATION (W.H.O.). 2016. Chronic respiratory diseases – Causes of
COPD [Online]. World Health Organisation. Available:
http://www.who.int/respiratory/copd/causes/en/ [Accessed 5th
April 2016].
WORLD HEALTH ORGANISATION (W.H.O.). 2015a. Chronic Obstructive Pulmonary Disease
(COPD) Factsheet [Online]. World Health Organisation. Available:
http://www.who.int/mediacentre/factsheets/fs315/en/ [Accessed 23rd March 2015].
WORLD HEALTH ORGANISATION (W.H.O.). 2015b. Cancer Factsheet [Online]. World Health
Organisation. Available: http://www.who.int/mediacentre/factsheets/fs297/en/ [Accessed 4th
April 2016].
WORLD HEALTH ORGANISATION (W.H.O.). 2014. World Cancer Report 2014. [Online]. World
Health Organisation. Available:
http://apps.who.int/bookorders/anglais/detart1.jsp?codlan=1&codcol=76&codcch=31#
[Accessed 4th
April 2016].
YOUNG, VINCENT B. (2010). Blueprints Medicine (5th
Ed.). Philadelphia: Wolters Kluwer
Health/Lippincott William & Wilkins. p. 69. ISBN: 978-0-7817-8870-0.
YUNFENG R, LU S, YAN Z, ZHENGNAN Q, JUN Y, JIE J, LIANG G, LIN H, ZIJIANG C,
ZISHENG T, SHENGYING Q. Comparative genome analysis of Prevotella intermedia strain
isolated from infected root canal reveals features related to pathogenicity and adaptation. BMC
Genomics. 2015:16;122.
Page 31 of 47
6. Word Count
The final word count for this study, excluding the final list of references,
acknowledgements, tables, table of contents, and figure/image legends is:
6692
Page 32 of 47
7. List of Figures/Tables/Images
Table 1: NCBI MegaBlast Search Results for the 5 largest contigs in relation to P.
intermedia 17
Figure 1: Node alignment of the 5 contigs with the P. intermedia 17 reference
genome
Figure 2: Bar chart displaying the total number of mapped reads found in each of
the patient groups
Figure 3: Line chart displaying the average percentage of mapped reads from the
total genomic data in the patient groups
Figure 4: Bar chart displaying the distribution of mapped reads across the two
chromosomes of the P. intermedia 17 genome for each of the patient
groups
Image 1: Government campaign supporting smoking cessation
Image 2: Structural changes in human lungs with COPD
Page 33 of 47
8. Appendix
APPENDIX 1 – NCBI MEGABLAST RESULTS (FULL)
NODE_54069
Description Max
Score
Total
Score
Query
Cover
E
value
Identity Accession
Prevotella intermedia DNA.
Complete genome. Strain:
OMA14. Chromosome 1
7413 24943 91% 0.0 83% AP014597.1
Prevotella intermedia DNA.
Chromosome 2. Complete
genome. Strain: 17-2
7491 24979 90% 0.0 83% AP014925.1
Prevotella intermedia 17
chromosome II. Complete
sequence.
7491 24979 90% 0.0 83% CP003503.1
NODE_28947
Description Max
Score
Total
Score
Query
Cover
E
value
Identity Accession
Prevotella intermedia DNA,
complete genome. Strain:
OMA14. Chromosome II
4071 9268 32% 0.0 83% AP014598.1
Prevotella intermedia DNA,
chromosome 1. Complete
genome. Strain: 17-2
3517 10209 40% 0.0 81% AP014926.1
Prevotella intermedia 17
chromosome I. Complete
sequence
3517 10209 40% 0.0 81% CP003502.1
Prevotella intermedia DNA.
Complete genome. Strain:
OMA14. Chromosome I.
122 122 0% 3e-22 94% AP014597.1
Prevotella intermedia DNA,
chromosome 2. Complete
genome. Strain 17-2
121 121 0% 1e-21 94% AP014925.1
Page 34 of 47
NODE_13609
Description Max
Score
Total
Score
Query
Cover
E
value
Identity Accession
Prevotella intermedia 17
chromosome II. Complete
sequence
6259 18001 77% 0.0 83% CP003503.1
Prevotella intermedia DNA,
chromosome 2. Complete
genome. Strain: 17-2
6255 17997 77% 0.0 83% AP014925.1
Prevotella intermedia DNA.
Complete genome. Strain:
OMA14. Chromosome I
6325 15926 66% 0.0 83% AP014597.1
NODE_12098
Description Max
Score
Total
Score
Query
Cover
E
value
Identity Accession
Prevotella intermedia DNA.
Complete genome. Strain:
OMA14. Chromosome I
5265 16356 54% 0.0 82% AP014597.1
Prevotella intermedia DNA,
chromosome 2. Complete
genome. Strain: 17-2
5068 18108 63% 0.0 81% AP014952.1
Prevotella intermedia 17
chromosome II. Complete
sequence
5068 18113 63% 0.0 81% CP003503.1
NODE_18381
Description Max
Score
Total
Score
Query
Cover
E
value
Identity Accession
Prevotella intermedia DNA,
chromosome 2. Complete
genome. Strain: 17-2
2372 5663 33% 0.0 81% AP014925.1
Prevotella intermedia 17
chromosome II. Complete
sequence
2372 5668 33% 0.0 81% CP003503.1
Prevotella intermedia DNA.
Complete genome. Strain:
OMA14. Chromosome I
2287 3579 20% 0.0 80% AP014597.1
Page 35 of 47
APPENDIX 2 – MAPPING GRAPHS OF INDIVIDUAL SAMPLE DATA
Blue areas represent areas matching that of the P. intermedia 17 reference genome
SAMPLE B – CHROMOSOME 1
Page 36 of 47
N.B. B11 is omitted due to no reads being mapped in either chromosome
SAMPLE B – CHROMOSOME 2
Page 37 of 47
N.B. B11 is omitted due to no reads mapping on either chromosome
Page 38 of 47
SAMPLE C – CHROMOSOME 1
N.B. C2, 3, 8 are omitted due to no reads mapping on either chromosome
Page 39 of 47
SAMPLE C – CHROMOSOME 2
N.B. C2, 3, 8 are omitted due to no reads mapping to either chromosome
Page 40 of 47
SAMPLE D – CHROMOSOME 1
N.B. D2, 4, 8, 10, 11 omitted due to no reads mapped for either chromosome.
Page 41 of 47
SAMPLE D – CHROMOSOME 2
N.B. D2, 4, 8, 10, 11 omitted due to no reads mapped for either chromosome.
Page 42 of 47
APPENDIX 3 – INDIVIDUAL PATIENT MAPPING DATA
SAMPLE B
B2
B3
B4
B5
Page 43 of 47
B6
B7
B8
B9
Page 44 of 47
B10
SAMPLE C
C4
C5
C6
Page 45 of 47
C7
C9
C10
C11
Page 46 of 47
SAMPLE D
D3
D5
D6
D7
Page 47 of 47
D9

More Related Content

What's hot

Emergency Interventions: The use of Oxygen
Emergency Interventions: The use of OxygenEmergency Interventions: The use of Oxygen
Emergency Interventions: The use of OxygenSMACC Conference
 
Who 2019-n cov-corticosteroids-2020.1-eng
Who 2019-n cov-corticosteroids-2020.1-engWho 2019-n cov-corticosteroids-2020.1-eng
Who 2019-n cov-corticosteroids-2020.1-engCIkumparan
 
Pulmonary Neuroendocrine Tumors
Pulmonary Neuroendocrine TumorsPulmonary Neuroendocrine Tumors
Pulmonary Neuroendocrine TumorsJosh Nooner
 
Clinico-demographic trend of Benign Vocal Cord Lesions among Urban Population...
Clinico-demographic trend of Benign Vocal Cord Lesions among Urban Population...Clinico-demographic trend of Benign Vocal Cord Lesions among Urban Population...
Clinico-demographic trend of Benign Vocal Cord Lesions among Urban Population...iosrjce
 
Assessing the effects of prognostic factors in recovery of tuberculosis patie...
Assessing the effects of prognostic factors in recovery of tuberculosis patie...Assessing the effects of prognostic factors in recovery of tuberculosis patie...
Assessing the effects of prognostic factors in recovery of tuberculosis patie...Alexander Decker
 
Comorbidity burden of Tuberculosis: Implications for Sri Lanka
Comorbidity burden of Tuberculosis: Implications for Sri LankaComorbidity burden of Tuberculosis: Implications for Sri Lanka
Comorbidity burden of Tuberculosis: Implications for Sri LankaMahendraArnold
 
Potential health implications of exposure to non-combusted liquefied petroleu...
Potential health implications of exposure to non-combusted liquefied petroleu...Potential health implications of exposure to non-combusted liquefied petroleu...
Potential health implications of exposure to non-combusted liquefied petroleu...Innspub Net
 
Evaluation of Physical &; Mental Status of COVID Recovered Patients Underwent...
Evaluation of Physical &; Mental Status of COVID Recovered Patients Underwent...Evaluation of Physical &; Mental Status of COVID Recovered Patients Underwent...
Evaluation of Physical &; Mental Status of COVID Recovered Patients Underwent...DrHeena tiwari
 
Risk factors of chronic liver disease amongst patients receiving care in a Ga...
Risk factors of chronic liver disease amongst patients receiving care in a Ga...Risk factors of chronic liver disease amongst patients receiving care in a Ga...
Risk factors of chronic liver disease amongst patients receiving care in a Ga...iosrjce
 
Alcohol intake and breast cancer in the European prospective investigation in...
Alcohol intake and breast cancer in the European prospective investigation in...Alcohol intake and breast cancer in the European prospective investigation in...
Alcohol intake and breast cancer in the European prospective investigation in...BARRY STANLEY 2 fasd
 
James Fingleton PhD thesis amended FINAL version 14th November
James Fingleton PhD thesis amended FINAL version 14th NovemberJames Fingleton PhD thesis amended FINAL version 14th November
James Fingleton PhD thesis amended FINAL version 14th NovemberJames Fingleton
 
A Descriptive Study to Assess the Knowledge and Practices Regarding COPD Prev...
A Descriptive Study to Assess the Knowledge and Practices Regarding COPD Prev...A Descriptive Study to Assess the Knowledge and Practices Regarding COPD Prev...
A Descriptive Study to Assess the Knowledge and Practices Regarding COPD Prev...ijtsrd
 
Impact of pulmonary rehabilitation program on health outcomes of patients wit...
Impact of pulmonary rehabilitation program on health outcomes of patients wit...Impact of pulmonary rehabilitation program on health outcomes of patients wit...
Impact of pulmonary rehabilitation program on health outcomes of patients wit...Alexander Decker
 
MRSA poster ASB 2016SP
MRSA poster ASB 2016SPMRSA poster ASB 2016SP
MRSA poster ASB 2016SPCaroline Jones
 
Periodontal Pathogens & Cardiovascular Diseases
Periodontal Pathogens &  Cardiovascular DiseasesPeriodontal Pathogens &  Cardiovascular Diseases
Periodontal Pathogens & Cardiovascular DiseasesDR. ZERAIBI N
 

What's hot (20)

Emergency Interventions: The use of Oxygen
Emergency Interventions: The use of OxygenEmergency Interventions: The use of Oxygen
Emergency Interventions: The use of Oxygen
 
Who 2019-n cov-corticosteroids-2020.1-eng
Who 2019-n cov-corticosteroids-2020.1-engWho 2019-n cov-corticosteroids-2020.1-eng
Who 2019-n cov-corticosteroids-2020.1-eng
 
Pulmonary Neuroendocrine Tumors
Pulmonary Neuroendocrine TumorsPulmonary Neuroendocrine Tumors
Pulmonary Neuroendocrine Tumors
 
1es factor de riesgo para exacerbaciones
1es factor de riesgo para exacerbaciones1es factor de riesgo para exacerbaciones
1es factor de riesgo para exacerbaciones
 
Clinico-demographic trend of Benign Vocal Cord Lesions among Urban Population...
Clinico-demographic trend of Benign Vocal Cord Lesions among Urban Population...Clinico-demographic trend of Benign Vocal Cord Lesions among Urban Population...
Clinico-demographic trend of Benign Vocal Cord Lesions among Urban Population...
 
research 2
research 2research 2
research 2
 
Air pollution and covid 19
Air pollution and covid 19Air pollution and covid 19
Air pollution and covid 19
 
Assessing the effects of prognostic factors in recovery of tuberculosis patie...
Assessing the effects of prognostic factors in recovery of tuberculosis patie...Assessing the effects of prognostic factors in recovery of tuberculosis patie...
Assessing the effects of prognostic factors in recovery of tuberculosis patie...
 
Comorbidity burden of Tuberculosis: Implications for Sri Lanka
Comorbidity burden of Tuberculosis: Implications for Sri LankaComorbidity burden of Tuberculosis: Implications for Sri Lanka
Comorbidity burden of Tuberculosis: Implications for Sri Lanka
 
Potential health implications of exposure to non-combusted liquefied petroleu...
Potential health implications of exposure to non-combusted liquefied petroleu...Potential health implications of exposure to non-combusted liquefied petroleu...
Potential health implications of exposure to non-combusted liquefied petroleu...
 
Evaluation of Physical &; Mental Status of COVID Recovered Patients Underwent...
Evaluation of Physical &; Mental Status of COVID Recovered Patients Underwent...Evaluation of Physical &; Mental Status of COVID Recovered Patients Underwent...
Evaluation of Physical &; Mental Status of COVID Recovered Patients Underwent...
 
Risk factors of chronic liver disease amongst patients receiving care in a Ga...
Risk factors of chronic liver disease amongst patients receiving care in a Ga...Risk factors of chronic liver disease amongst patients receiving care in a Ga...
Risk factors of chronic liver disease amongst patients receiving care in a Ga...
 
Alcohol intake and breast cancer in the European prospective investigation in...
Alcohol intake and breast cancer in the European prospective investigation in...Alcohol intake and breast cancer in the European prospective investigation in...
Alcohol intake and breast cancer in the European prospective investigation in...
 
Study on association of BMI with lung cancer in north indian population
Study on association of BMI with lung cancer in north indian populationStudy on association of BMI with lung cancer in north indian population
Study on association of BMI with lung cancer in north indian population
 
James Fingleton PhD thesis amended FINAL version 14th November
James Fingleton PhD thesis amended FINAL version 14th NovemberJames Fingleton PhD thesis amended FINAL version 14th November
James Fingleton PhD thesis amended FINAL version 14th November
 
A Descriptive Study to Assess the Knowledge and Practices Regarding COPD Prev...
A Descriptive Study to Assess the Knowledge and Practices Regarding COPD Prev...A Descriptive Study to Assess the Knowledge and Practices Regarding COPD Prev...
A Descriptive Study to Assess the Knowledge and Practices Regarding COPD Prev...
 
Impact of pulmonary rehabilitation program on health outcomes of patients wit...
Impact of pulmonary rehabilitation program on health outcomes of patients wit...Impact of pulmonary rehabilitation program on health outcomes of patients wit...
Impact of pulmonary rehabilitation program on health outcomes of patients wit...
 
MRSA poster ASB 2016SP
MRSA poster ASB 2016SPMRSA poster ASB 2016SP
MRSA poster ASB 2016SP
 
Periodontal Pathogens & Cardiovascular Diseases
Periodontal Pathogens &  Cardiovascular DiseasesPeriodontal Pathogens &  Cardiovascular Diseases
Periodontal Pathogens & Cardiovascular Diseases
 
Clinico-epidemiological study of cutaneous tuberculosis in a tertiary care ho...
Clinico-epidemiological study of cutaneous tuberculosis in a tertiary care ho...Clinico-epidemiological study of cutaneous tuberculosis in a tertiary care ho...
Clinico-epidemiological study of cutaneous tuberculosis in a tertiary care ho...
 

Viewers also liked

Talk for #FOGM15: Challenges and Opportunities in Microbiome Studies and th...
Talk for #FOGM15: Challenges and Opportunities  in Microbiome Studies  and th...Talk for #FOGM15: Challenges and Opportunities  in Microbiome Studies  and th...
Talk for #FOGM15: Challenges and Opportunities in Microbiome Studies and th...Jonathan Eisen
 
BRN Symposium 03/06/16 The respiratory microbiome: a new frontier in medicine
BRN Symposium 03/06/16 The respiratory microbiome: a new frontier in medicineBRN Symposium 03/06/16 The respiratory microbiome: a new frontier in medicine
BRN Symposium 03/06/16 The respiratory microbiome: a new frontier in medicinebrnmomentum
 
Sample Prep Solutions for Microbiome Research
Sample Prep Solutions for Microbiome ResearchSample Prep Solutions for Microbiome Research
Sample Prep Solutions for Microbiome ResearchQIAGEN
 
Science Cabaret by Dr. Rodney Dietert "How to train your super organism..via ...
Science Cabaret by Dr. Rodney Dietert "How to train your super organism..via ...Science Cabaret by Dr. Rodney Dietert "How to train your super organism..via ...
Science Cabaret by Dr. Rodney Dietert "How to train your super organism..via ...Kitty Gifford
 
The Human Microbiome in Sports Performance and Health
The Human Microbiome in Sports Performance and HealthThe Human Microbiome in Sports Performance and Health
The Human Microbiome in Sports Performance and Healthctorgan
 
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAGEN
 

Viewers also liked (7)

Talk for #FOGM15: Challenges and Opportunities in Microbiome Studies and th...
Talk for #FOGM15: Challenges and Opportunities  in Microbiome Studies  and th...Talk for #FOGM15: Challenges and Opportunities  in Microbiome Studies  and th...
Talk for #FOGM15: Challenges and Opportunities in Microbiome Studies and th...
 
BRN Symposium 03/06/16 The respiratory microbiome: a new frontier in medicine
BRN Symposium 03/06/16 The respiratory microbiome: a new frontier in medicineBRN Symposium 03/06/16 The respiratory microbiome: a new frontier in medicine
BRN Symposium 03/06/16 The respiratory microbiome: a new frontier in medicine
 
Mesa 2.2. dr. carlos cabrera
Mesa 2.2. dr. carlos cabreraMesa 2.2. dr. carlos cabrera
Mesa 2.2. dr. carlos cabrera
 
Sample Prep Solutions for Microbiome Research
Sample Prep Solutions for Microbiome ResearchSample Prep Solutions for Microbiome Research
Sample Prep Solutions for Microbiome Research
 
Science Cabaret by Dr. Rodney Dietert "How to train your super organism..via ...
Science Cabaret by Dr. Rodney Dietert "How to train your super organism..via ...Science Cabaret by Dr. Rodney Dietert "How to train your super organism..via ...
Science Cabaret by Dr. Rodney Dietert "How to train your super organism..via ...
 
The Human Microbiome in Sports Performance and Health
The Human Microbiome in Sports Performance and HealthThe Human Microbiome in Sports Performance and Health
The Human Microbiome in Sports Performance and Health
 
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
 

Similar to Holly_Davies_Dissertation

Debate on mucolytics
Debate on mucolyticsDebate on mucolytics
Debate on mucolyticsPrem Chand
 
MEDICINE DISEASE ARTICLE FOR STUDENT PEP
MEDICINE DISEASE ARTICLE FOR STUDENT PEPMEDICINE DISEASE ARTICLE FOR STUDENT PEP
MEDICINE DISEASE ARTICLE FOR STUDENT PEPAya Faroug
 
Community Acquired Pneumonia
Community Acquired PneumoniaCommunity Acquired Pneumonia
Community Acquired PneumoniaBhargav Kiran
 
jnci.oxfordjournals.org JNCI Articles 1DOI 10.1093jn.docx
jnci.oxfordjournals.org   JNCI  Articles 1DOI 10.1093jn.docxjnci.oxfordjournals.org   JNCI  Articles 1DOI 10.1093jn.docx
jnci.oxfordjournals.org JNCI Articles 1DOI 10.1093jn.docxchristiandean12115
 
Atypical Presentations of lung cancers.pdf
Atypical Presentations of lung cancers.pdfAtypical Presentations of lung cancers.pdf
Atypical Presentations of lung cancers.pdfKimberly Pulley
 
2016-Crawford-BMC Pulm Med published
2016-Crawford-BMC Pulm Med published2016-Crawford-BMC Pulm Med published
2016-Crawford-BMC Pulm Med publishedJi-Youn Yeo
 
A systematic review of the association between ptb and the development of chr...
A systematic review of the association between ptb and the development of chr...A systematic review of the association between ptb and the development of chr...
A systematic review of the association between ptb and the development of chr...EArl Copina
 
The Case for Lung Cancer Screening ASRT presentation
The Case for Lung Cancer Screening ASRT presentationThe Case for Lung Cancer Screening ASRT presentation
The Case for Lung Cancer Screening ASRT presentationKimberly Luse
 
Asthma In General Practice
Asthma In General PracticeAsthma In General Practice
Asthma In General PracticeSherri Cost
 
Maths final report
Maths final reportMaths final report
Maths final reportJian Leo
 
FNBE0115- MATH SATISTICS Final Report
FNBE0115- MATH SATISTICS Final ReportFNBE0115- MATH SATISTICS Final Report
FNBE0115- MATH SATISTICS Final Reportbarbaraxchang
 
Proefschrift Annerika Slok
Proefschrift Annerika SlokProefschrift Annerika Slok
Proefschrift Annerika SlokAnnerika Slok
 
Fishbone Diagram Template Name            
Fishbone Diagram Template   Name                             Fishbone Diagram Template   Name             
Fishbone Diagram Template Name             ShainaBoling829
 
Non-animal models of NSCLC, Dr Dania Movia
Non-animal models of NSCLC, Dr Dania MoviaNon-animal models of NSCLC, Dr Dania Movia
Non-animal models of NSCLC, Dr Dania MoviaKen Rogan
 

Similar to Holly_Davies_Dissertation (20)

Debate on mucolytics
Debate on mucolyticsDebate on mucolytics
Debate on mucolytics
 
Intersticial disease
Intersticial diseaseIntersticial disease
Intersticial disease
 
MEDICINE DISEASE ARTICLE FOR STUDENT PEP
MEDICINE DISEASE ARTICLE FOR STUDENT PEPMEDICINE DISEASE ARTICLE FOR STUDENT PEP
MEDICINE DISEASE ARTICLE FOR STUDENT PEP
 
Bmj.i5813.full
Bmj.i5813.fullBmj.i5813.full
Bmj.i5813.full
 
Community Acquired Pneumonia
Community Acquired PneumoniaCommunity Acquired Pneumonia
Community Acquired Pneumonia
 
jnci.oxfordjournals.org JNCI Articles 1DOI 10.1093jn.docx
jnci.oxfordjournals.org   JNCI  Articles 1DOI 10.1093jn.docxjnci.oxfordjournals.org   JNCI  Articles 1DOI 10.1093jn.docx
jnci.oxfordjournals.org JNCI Articles 1DOI 10.1093jn.docx
 
Atypical Presentations of lung cancers.pdf
Atypical Presentations of lung cancers.pdfAtypical Presentations of lung cancers.pdf
Atypical Presentations of lung cancers.pdf
 
2016-Crawford-BMC Pulm Med published
2016-Crawford-BMC Pulm Med published2016-Crawford-BMC Pulm Med published
2016-Crawford-BMC Pulm Med published
 
A systematic review of the association between ptb and the development of chr...
A systematic review of the association between ptb and the development of chr...A systematic review of the association between ptb and the development of chr...
A systematic review of the association between ptb and the development of chr...
 
The Case for Lung Cancer Screening ASRT presentation
The Case for Lung Cancer Screening ASRT presentationThe Case for Lung Cancer Screening ASRT presentation
The Case for Lung Cancer Screening ASRT presentation
 
Asthma In General Practice
Asthma In General PracticeAsthma In General Practice
Asthma In General Practice
 
Maths final report
Maths final reportMaths final report
Maths final report
 
FNBE0115- MATH SATISTICS Final Report
FNBE0115- MATH SATISTICS Final ReportFNBE0115- MATH SATISTICS Final Report
FNBE0115- MATH SATISTICS Final Report
 
PM2012-808260
PM2012-808260PM2012-808260
PM2012-808260
 
Proefschrift Annerika Slok
Proefschrift Annerika SlokProefschrift Annerika Slok
Proefschrift Annerika Slok
 
TB recurrence in Abbottabad.pptx
TB recurrence in Abbottabad.pptxTB recurrence in Abbottabad.pptx
TB recurrence in Abbottabad.pptx
 
White Paper BC
White Paper BCWhite Paper BC
White Paper BC
 
Fishbone Diagram Template Name            
Fishbone Diagram Template   Name                             Fishbone Diagram Template   Name             
Fishbone Diagram Template Name            
 
Non-animal models of NSCLC, Dr Dania Movia
Non-animal models of NSCLC, Dr Dania MoviaNon-animal models of NSCLC, Dr Dania Movia
Non-animal models of NSCLC, Dr Dania Movia
 
Cell-based Therapy_COPD
 Cell-based Therapy_COPD  Cell-based Therapy_COPD
Cell-based Therapy_COPD
 

Holly_Davies_Dissertation

  • 1. Page 1 of 47 The Presence of Prevotella intermedia 17 within the human lung and its relationship with lung cancer & COPD: a metagenomic analysis of the human lung microbiome Student Name: Holly Davies Student ID: 130023847 Submitted in part candidature for the degree of B.Sc. Biology (Genetics) Institute of Biological, Environmental and Rural Sciences Aberystwyth University Submitted April 2016
  • 2. Page 2 of 47 Contents Page 0. Preface 0.1 Declaration 4 0.2 Acknowledgements 5 0.3 Abstract 6 1. Introduction 1.1 Outline and Objectives 7 1.2 Lung cancer & COPD 8 1.2.1 Lung cancer 8 1.2.2 COPD 10 1.3 Prevotella intermedia 17 12 1.3.1 The Prevotella genus 12 1.3.2 Prevotella intermedia 12 1.3.3 Prevotella intermedia 17 13 1.4 Lung microbiome research 13 1.5 Previous work 14 2. Materials and Methodology 2.1 Aims and objectives 15 2.2 Initial analysis 15 2.2.1 Largest contig assembly 15 2.2.2 NCBI Blast search 16 2.2.3 Alignment 16 2.3 Individual samples 16
  • 3. Page 3 of 47 2.3.1 Import and sampling 17 2.3.2 De novo assembly 17 2.3.2 Read Mapping 17 3. Results 3.1 NCBI Blast results 18 3.2 NODE alignment 19 3.3 Mapping of individual samples 20 4. Discussion 4.1 Discussion of Results 23 4.1.1 NCBI MegaBlast and NODE alignment 23 4.1.2 Individual sample data 23 4.2 Limitations & implications 24 4.3 Further study 24 4.4 Conclusions 25 5. References 26 6. Word Count 31 7. List of Figures/Tables/Images 32 8. Appendix 33
  • 4. Page 4 of 47 0.1 Declaration Module BR32330 I certify that all material in this paper is the result of my own investigation, except where indicated, and references used in preparation of the text have been cited. This paper has not been previously submitted as part of any other assessed module (with the exception of the project proposal submitted for this paper), or submitted for any other degree or diploma. NAME: HOLLY DAVIES DATE: 13/04/2016
  • 5. Page 5 of 47 0.2. Acknowledgements I would like to take this opportunity to thank Dr Justin Pachebat for the opportunity to be a part in this research, and for the constant & helpful advice and support throughout this entire project. I would also like to thank everyone involved in the MEDLUNG project, specifically Joe Healey, Simon Cameron and Tom Hitch for providing the background and basis necessary for me to be able to conduct this research. Finally, I would like to thank Michael Best and Louise Denny for providing motivation and support throughout this project, it has been invaluable to me.
  • 6. Page 6 of 47 0.3 Abstract The aim of this project was to analyse the bacterial DNA present in the sputum of lung cancer and COPD (Chronic Obstructive Pulmonary Disease) patients to further research into developing a biomarker for these diseases in association with the MEDLUNG Project (Metabolic Biomarkers for the Detection of Lung cancer) – a multicentre study on behalf of the National Health Service (NHS). The initial analysis was conducted on an Illumina metagenome contig assembly of data collected from 30 patients (10 healthy, 10 lung cancer, 10 COPD) using NCBI (National Centre for Biotechnology Information) BLAST (Basic Local Alignment Search Tool) searches. From this analysis Prevotella intermedia 17 was identified within the contig assembly. Prevotella intermedia had previously been found orally in periodontal diseases (Maeda et al, 1998) periapical periodontitis (Jacinto et al, 2003), and noma (an acute gangrenous disease) (Bolivar et al, 2012), and also had been found to be associated with cystic fibrosis (Ulrich et al, 2010) and causing an increased risk of pneumonia in mice (Nagaoka et al, 2014). Specifically, Prevotella intermedia 17 is a clinical strain of the species that had only been isolated from the periodontal pocket (Ruan et al, 2015). This analysis was conducted using CLC Genomics Workbench 8 (CLC bio, 2016) and included performing a de novo assembly with the initial patient data from the MEDLUNG collection, and mapping this to the P. intermedia 17 reference genome. From this it was further found that P. intermedia 17 is indeed found in the lungs, but also that lung cancer and COPD have a seriously negative effect upon it, reducing it by 85-99% when compared with the healthy control group. This study has discovered the presence of Prevotella intermedia 17 in the lungs for the first time, and also that P. intermedia 17 does have a relationship with both lung cancer and COPD in humans. This could lead to the development of a new diagnostic test for lung cancer or COPD, or possibly further the knowledge surrounding these diseases and how they manifest in the human lung. Developing a new diagnostic test and providing early screening for patients is vitally important for lung cancer and COPD, as it would have the capacity to save countless lives by giving more people access to curative treatment at an earlier stage where it can be effective.
  • 7. Page 7 of 47 1. Introduction 1.1 OUTLINE AND OBJECTIVES The aim of this project was to analyse the bacterial DNA present in the sputum of lung cancer and COPD (Chronic Obstructive Pulmonary Disease) patients to further research into developing a biomarker (biological molecule which is specific to said diseases) for these diseases in association with the MEDLUNG Project (Metabolic Biomarkers for the Detection of Lung cancer) – a multicentre study on behalf of the National Health Service (NHS). The initial analysis was conducted on an Illumina metagenome contig assembly of data collected from 30 patients (10 healthy, 10 lung cancer, 10 COPD) using NCBI (National Centre for Biotechnology Information) BLAST (Basic Local Alignment Search Tool) searches. From this analysis Prevotella intermedia 17 was identified within the contig assembly. Prevotella intermedia had previously been found orally in periodontal diseases (Maeda et al, 1998) periapical periodontitis (Jacinto et al, 2003), and noma (an acute gangrenous disease) (Bolivar et al, 2012). Outside of oral diseases, Prevotella intermedia had been found to be associated with cystic fibrosis (Ulrich et al, 2010) and causing an increased risk of pneumonia in mice (Nagaoka et al, 2014). Specifically, Prevotella intermedia 17 is a clinical strain of the species that had only been isolated from the periodontal pocket (Ruan et al, 2015), with no links to lung cancer/COPD, or even the lungs in general. From this, the Prevotella intermedia 17 reference genome was aligned with the raw individual patients’ data to confirm its presence within the lungs, and to determine whether it is linked to lung cancer and COPD. Hopefully a link between Prevotella intermedia and these diseases would be established, leading to a new diagnostic test being developed in further study, ensuring early diagnosis and higher survival rates of lung cancer and COPD sufferers. Developing a new diagnostic test and providing early screening for patients is vitally important for lung cancer and COPD, as two-thirds of lung cancer cases are diagnosed at advanced stages whereby curative treatment becomes unavailable (CancerResearchUK, 2015a) and COPD is regularly under- and mis-diagnosed (W.H.O., 2015a). If an early diagnostic test could be developed, it would have the capacity to save countless lives by giving more people access to early treatment.
  • 8. Page 8 of 47 1.2 LUNG CANCER AND COPD Lung cancer and COPD are among 2 of the most prevalent respiratory tract disorders (CancerResearchUK, 2015a), both having extremely high morbidity and mortality (Eddy, 1989, Mallia et al., 2007). The most common cause of cancer death in the UK is lung cancer (CancerResearchUK, 2015b), with COPD causing 6% of deaths globally (W.H.O., 2015a). These diseases are not mutually exclusive, as a high risk of lung cancer usually equals a high risk of COPD (Raviv et al, 2011). Hopefully by developing a biomarker for one, it would give pointers for a biomarker for the other. 1.2.1 LUNG CANCER Lung cancer is the most common cause of cancer death in the UK, accounting for 22% of all deaths from cancer, and is the second most common cancer in the UK (CancerResearchUK, 2015c). Globally, 58% of lung cancer cases occurred in less developed countries in 2012 (Ferlay et al, 2014), and accounted for 1.59 million deaths (W.H.O., 2014). In many cases the cause of the disease is clear, with tobacco smoking accounting for more than 8 out of 10 cases, however other risk factors include exposure to carcinogens and radiation, air pollution, family history and poor immunity (CancerResearchUK, 2015c). Ageing is another factor that is involved in the development of lung cancer, which can be down to an accumulation of the effect of risk factors (overall risk accumulation), however the overall risk accumulation is then combined with the less effective cellular repair mechanisms as a person grows older (W.H.O. 2015b). However, the World Health Organisation states that “more than 30% of cancer deaths could be prevented by modifying or avoiding key risk factors” (W.H.O. 2015b). There are many preventative measures currently operating to attempt to reduce the incidence of lung cancer. The main focus of these are to decrease smoking levels in populations, but there are also some measures to address the rarer risk factors. Smoking cessation is the main method of preventing lung cancer, as after 10 years of smoking cessation, there is a 30-50% reduction in lung cancer mortality risk when compared to persistent smokers (Fiore et al, 1996) and is helped by government campaigning as seen in Image 1. To help a person achieve smoking cessation, the Agency for Healthcare Research and Quality (formerly the Agency for Health Care Policy and Research [AHCPR]) developed a set of clinical smoking-cessation guidelines for the benefit of both the patient and the health care provider
  • 9. Page 9 of 47 (Fiore et al, 1996), including documenting the patient’s tobacco use and the offer of one or more effective smoking cessation treatments (nicotine-replacement, social support, skills training/problem solving etc.). Another method of prevention includes the moderating of occupational exposure to lung carcinogens, such as chromium, arsenic, nickel and asbestos, as when all considered together, attribute to 9-15% of all lung cancer (Alberg et al, 2007). Image 1: Government campaign supporting smoking cessation (Parry, 2010) There are two main classifications of lung cancer; non-small cell and small cell. Non- small cell lung cancer accounts for approximately 85% of lung cancers and occurs in three types; adenocarcinoma, squamous cell carcinoma and large cell carcinoma (CancerCare®, 2016). Small cell lung cancer accounts for the remaining 15% of lung cancer incidences, and tend to grow more quickly than non-small cell tumours (CancerCare®). The most common symptoms associated with lung cancer are coughing, shortness of breath, fatigue and blood present in the sputum (CancerResearchUK, 2015c). Other symptoms can include weight loss, recurrent infections such as bronchitis and pneumonia, and chest pain (American Cancer Society, 2016). Lung cancer can also produce hormone-like substances which enter the bloodstream, causing paraneoplastic syndromes in various tissues and organs such as hypercalcemia (high blood calcium levels), blood clots, gynecomastia (excess breast growth in men) and various nervous system problems (American Cancer Society, 2016). Despite all this there is no national screening programming for lung cancer in the UK, leading to most cases being discovered via x-ray, by which point the cancer is usually too advanced for curative treatment (CancerResearchUK, 2015c). There are some attempts to introduce a screening programme into the UK, such as the UK Lung Cancer Screening Trial (UKLS), which aims to screen people most at risk (e.g. between the age of 50-75) using various
  • 10. Page 10 of 47 clinical tests, the most promising being CT scanning, to help diagnose lung cancer earlier (UKLS, 2012). There are some screening programmes in the US, however they are very selective in who they screen and also use CT scanning to determine diagnosis (CDC, 2016). The problem with the current focus on lung cancer screening is that it requires the use of CT scanning, which exposes the patient to radiation, possibly increasing the risk of cancer (Brenner, 2003). Cancer Research UK state that the essential criteria for a possible screening programme is to be simple, quick, relatively inexpensive and not harmful (CancerResearchUK, 2015c), which the current possible screening programmes do not meet, causing harm through radiation exposure or a possible allergic reaction to the dye used in the CT scan (NHS, 2016). The discovery of a biomarker for lung cancer could save lives through the development of a new diagnostic test, detecting lung cancer before it can be seen on a CT scan, whilst also complying with the essential criteria for a screening programme. 1.2.2 COPD Chronic Obstructive Pulmonary Disease (COPD) is a lung disease which interferes with normal breathing via a persistent blockage of airflow. It causes 25000 deaths per year in the UK and more than 3 million globally in 2012, approximately 6% of all deaths recorded (W.H.O., 2015a), becoming the third most common cause of death in the world (Lozano et al, 2013). However, these numbers are not an accurate representation of how prevalent COPD is, with an estimated 24 million people in the US suffering from the disease without even knowing it (American Lung Association, 2016), pushing for a better diagnosis/screening programme and more public awareness. As with lung cancer, the leading cause of COPD is cigarette smoking (NIH, 2013a). As many as 8 out of 10 COPD-related deaths are caused by smoking (US. Department of Health and Human Services, 2014), accounting for approximately 5.4 million deaths in 2005 (W.H.O. 2016). There are also other risk factors for COPD, mainly prevalent in low-income countries (W.H.O. 2016). Exposure to indoor air pollution, mainly caused by the use of biomass fuels for cooking and heating, is the biggest risk factor in these countries due to inefficient resources available, with approximately 3 billion people using these methods of heating (W.H.O. 2016). Other risk factors include exposure to certain types of dust and chemicals at work (e.g. coal and cadmium) and possibly urban air pollution (not conclusive) (NHS, 2014). The preventative measures in place for COPD are the same as those for lung cancer, as they both have the same risk factors.
  • 11. Page 11 of 47 The poor airflow associated with COPD is the result of the contributions of two conditions; emphysema (the breaking down of lung tissue) and obstructive bronchiolitis (small airways disease) (Vestbo et al, 2013a), which create structural changes within the lungs, as seen in Image 2. The main symptoms associated with COPD include breathlessness, abnormal sputum and a chronic cough (W.H.O. 2015a). However, at first, COPD can present no symptoms, or only mild ones, making early diagnosis difficult (NIH. 2013b). Image 2: Structural changes in human lungs with COPD (Houghton, 2013) There is a diagnostic test for COPD called spirometry, which is only considered for someone over the age of 35-40 who presents with various symptoms and has had a history of exposure to the risk factors (Vestbo et al, 2013b). Spirometry involves the use of a bronchodilator (drug to open airways) and works by measuring the amount of airflow obstruction present (Qaseem et al, 2011). To make a diagnosis, two measurements are made: the forced expiratory volume in one second (FEV1) (greatest volume of air expelled in one second), and the forced vital capacity (FVC) (greatest volume of air expelled in one full breath) (Young & Vincent, 2010). Using these two measurements a FEV1/FVC ratio can be calculated and compared against medical guidelines (usually a ratio lower than 70% in someone with COPD-like symptoms) to determine whether or not they have the disease, however this can lead to an over-diagnosis of COPD in elderly patients (Qaseem et al, 2011). The issue with spirometry as a diagnostic tool is that using it on people who do not present symptoms of COPD
  • 12. Page 12 of 47 has “evidence of uncertain effect, and therefore is currently not recommended” (Vestbo et al, 2013a). Due to this these is no early diagnostic method for people with COPD, therefore by the time the disease is diagnosed, it is too advanced for curative treatment to be successful (W.H.O., 2015a). Developing a diagnostic tool based on a biomarker would be highly beneficial to COPD sufferers as it has the possibility to detect the disease before symptoms have manifested, making treatment more successful. As with lung cancer, this could be introduced as a national screening programme to reduce the deaths caused by COPD, as a large amount of people with COPD are not diagnosed correctly (American Lung Association, 2016). 1.3 PREVOTELLA INTERMEDIA 17 1.3.1. THE PREVOTELLA GENUS The Prevotella genus is a group of anaerobic gram-negative rod-shaped bacteria most commonly found in association with periodontal diseases (Maeda et al, 1998). It is classified among the group of ‘black pigmented bacteria’ due to the formation of smooth and shiny colonies with black/grey colour when grown on a blood agar plate (Shah & Collins, 1990). The original classification for these bacteria was Bacteroides melaninogenicus, until it was reclassified and split into Prevotella melaninogenicus and Prevotella intermedia (Brook, 2015). The Prevotella genus is very versatile, having been found in various areas such as the oral cavity, upper respiratory tract, urogenital tract (Eiring et al, 1998), rumen and human faeces (Hayashi et al, 2007). Many species of Prevotella are potential/opportunistic pathogens (Yunfeng et al, 2015) under a wide range of environments and are known to invade host tissues (Nadkarni et al, 2012). 1.3.2 PREVOTELLA INTERMEDIA Due to its isolation from lesions of patients, Prevotella intermedia has been found as a putative periodontal pathogen, specifically in early periodontitis, advanced periodontitis, and acute necrotizing ulcerative gingivitis (Haffajee & Socransky, 1994). It has also been found to invade the human coronary artery endothelial and smooth muscle cells in vitro (Dorn et al, 1999) and has been found in atheromatous plaques (Haraszthy et al, 1998). A significant find in relation to this study is that “P. intermedia plays a critical role in the complex pathophysiology of lung disease in patients with cystic fibrosis” when in anaerobic sputum
  • 13. Page 13 of 47 plugs (Ulrich et al, 2010). The results of this study could show that this situation is not only limited to cystic fibrosis patients, but also people suffering from lung cancer and COPD. 1.3.3. PREVOTELLA INTERMEDIA 17 P. intermedia 17 is a strain of P. intermedia clinically isolated from a human periodontal pocket (Fukushima et al, 1992). It is differentiated from the other strains of P. intermedia (for example 27 and ATCC 25611) by examining the diameter of fimbriae (curlin protein appendages carrying adhesins) present on its cell surface (Leung et al, 1989). P. intermedia presents type C (8nm diameter) fimbriae, unlike other strains of this species (Dorn et al, 1998). Dorn et al (1998) found that, in terms of the human oral epithelial cell line, P. intermedia 17 has the ability to invade host cells whereas strain 27 and ATCC 25611 cannot, and also possesses strong agglutinating activity for human erythrocytes and can bind to human buccal epithelial cells more avidly than other strains. He further speculates that “the type C fimbriae could promote invasion by providing a means for the bacteria to attach to the cell surface” (Dorn et al, 1998). Fan et al (2006) further state that P. intermedia 17 possesses a cell surface protein with a broad-spectrum extra-cellular-matrix binding ability, which probably mediates its binding through adhesins. With P. intermedia 17’s ability to do this, it could be possible that this strain can also invade the cells of human lungs through the epithelial layer and extracellular matrix present on the alveoli. If this is shown to be true, it would be the first time this strain has been found in the lungs, and further could present a pathological relationship with lung cancer/COPD. 1.4 LUNG MICROBIOME RESEARCH Lung microbiome research is a relatively new method of research in which the bacterial contents of the human lung are analysed, mostly for the purpose of disease investigation. Many factors can influence the environment in the lungs, such as oxygen, pH, hydrophobicity, temperature, salinity, predators, nutrient scarcity and many more (Dickson et al, 2015), factors which disease can alter very easily. The microbiome of the lungs is determined by three ecological factors; “microbial immigration into the airways, elimination of microbes from the airways and the relative reproduction rates of its community members, as determined by regional growth conditions” (Dickson et al, 2015). During disease these three ecological factors change, therefore changing the bacteria species present in the lungs. By examining the changes
  • 14. Page 14 of 47 in bacteria species, it gives insight into the effects the disease is having on the lungs, and possibly opens the door to new diagnostic tests and treatments being developed based on it. Examples of successful lung microbiome projects include; the identification of a core set of common bacteria found in the lungs of COPD patients (Erb-Downward et al, 2011), the discovery that certain members of Staphylococcus and Streptococcus are linked to the progression of idiopathic pulmonary fibrosis (Han et al, 2014), and the discovery that P. intermedia plays a critical role in the pathophysiology of lung diseases in patients with cystic fibrosis (Ulrich et al, 2010). Using the techniques set out from these papers and many more, this study will analyse the microbiome of the lung to identify a bacteria species that possessed a link to lung cancer or COPD. 1.5 PREVIOUS WORK Thirty sputum samples were obtained for the MEDLUNG project (10 from healthy patients, 10 from patients suffering with lung cancer, 10 from patients suffering from COPD) along with the patients’ medical histories (all data was collected and treated in compliance with ethical guidelines and confidentially). The genomic DNA was extracted from these samples and used to create barcoded Illumina sequencing libraries for each individual samples. These were subsequently paired-end sequences on an Illumina HiSeq2000 platform by Simon Cameron as part of his PhD (Cameron, 2015). From this a de novo contig assembly was performed by Tom Hitch (IBERS PhD student), which forms the starting point for this study.
  • 15. Page 15 of 47 2. Materials and Methods 2.1 AIMS AND OBJECTIVES The aims and objectives for this project were to analyse the de novo contig assembly, provided by Tom Hitch (IBERS PhD student), of the DNA samples obtained by the MEDLUNG project. The aim of this was to possibly find a bacteria species which possessed a relationship with lung cancer or COPD, whether that be with its presence or its absence, to possibly develop a biomarker in future research. The discovery of a successful biomarker for these diseases could lead to the development of a new diagnostic test for lung cancer or COPD. An example of how this could happen would be to develop a primer for a biomarker tagged with fluorescent markers, therefore if this biomarker is present it would be visible under ultra violet light, meaning that the patient has one of these diseases (depending on the nature of the biomarker). This would help the global initiative for reducing the suffering from these diseases by enabling early diagnosis before the symptoms manifest, making curative treatment more available. 2.2 INITIAL ANALYSIS For this part of the analysis the metagenome contig assembly produced by Tom Hitch (IBERS PhD student) was used to discover a bacteria species present within the sputum samples of the MEDLUNG collection. To conduct this research, the CLC Genomics Workbench 8 software (CLC bio, 2016) was used. 2.2.1 LARGEST CONTIG ASSEMBLY The first stage of the initial analysis involved arranging the metagenome contig assembly by size, from largest number of base pairs (bp), to the smallest. Due to time constrictions on the project, the 10 largest contigs were chosen to search through as these represented the largest portion of the contig assembly whilst being within time constraints. The 10 largest contigs were saved as a separate sequence list, then saved as separate sequences to allow for analysis.
  • 16. Page 16 of 47 2.2.2 NCBI BLAST SEARCH These individual sequences were then subject to a BLAST (Basic Local Alignment Search Tool) (Altschul et al, 1990) function to identify their individual components by aligning against reference genomes. For this study, the NCBI (National Centre for Biotechnology Information) BLAST database was used due to its extensive collection of reference genomes and genes, and it’s easy to use interface (NCBI, 2016). The individual sequences were run through the NCBI nucleotide BLAST function, using the MegaBlast algorithm (default parameters) (Morgulis et al, 2008) which is used for comparing a query sequence to a reference sequence and is used for sequence identification and intra-species comparison (NCBI, 2015). It was noticed that Prevotella intermedia 17 had hit 5 of the 10 largest contigs at relatively high query cover levels, which was unusual as this strain of Prevotella intermedia had not yet been found in the lungs. Therefore, it was decided to continue with this line of research for the remainder of this project. Also the MegaBlast search results were saved for these 5 contigs to reference later. 2.2.3 ALIGNMENT To display the relation between the 5 largest contigs which hit P. intermedia 17 and the reference genome itself, CLC genomics workbench 8 was used to create a visual alignment of these contigs against the reference genome obtained from the NCBI genome database. To do this the P. intermedia 17 reference genome was imported into CLC using standard import, and then, using Toolbox > Molecular Biology Tools > Sequencing Data Analysis > Assemble Sequences to Reference, was used as the reference genome for assembling each of the 5 contigs to it to display the query cover data obtained from the search results of the MegaBlast. Using the graphics function in CLC, these alignments were then exported for use in the results. 2.3 INDIVIDUAL SAMPLES From the discovery of P. intermedia 17 in the metagenome contig assembly, the next step was to use this reference genome to search through the individual patient samples, obtained by MEDLUNG, to provide further evidence of the presence of this bacteria and to determine whether this strain was linked to lung cancer/COPD. This was also performed in CLC genomics workbench 8. The data consisted of 30 samples (labelled B, C or D depending on which
  • 17. Page 17 of 47 collection of patients the samples was from, and numbered 2-11). Each sample consisted of 4 files, 2 lanes each having 2 reads (forward and backward). 2.3.1 IMPORT AND SAMPLING To start this section of the research, the individual patient data had to be imported into CLC. It was in [fastq] format, therefore it was imported using the Illumina import function, ensuring to select the paired read function to merge the 2 read files into one. After this there was 60 files, 30 samples containing 2 lanes of data each. Then, due to time and computer restraints, it was decided to sample 500000 reads from each file (according to sample size calculation, on average needed to exceed ~2000 reads to be significant). To do this in CLC: Toolbox > NGS Core Tools > Sample Reads, then specified 500000 reads to be sampled. 2.3.2 DE NOVO ASSEMBLY To merge the 2 ‘lane’ files for each sample and to make further analysis easier, it was decided to perform de novo assemblies for each sample. In CLC, with the 2 files of the sample selected, Toolbox > De Novo Sequencing > De Novo Assembly. Default parameters were used with the exception of mapping the contigs back to the contigs, due to time constraints. 2.3.3 READ MAPPING Once the de novo assembly had completed, the next stage was to map the assemblies to the P. intermedia 17 reference genome to identify any reads from the bacteria genome. To do this the Map Reads to Reference function (default parameters) was used, located in CLC > Toolbox > NGS Core Tools > Map Reads to Reference, and the P. intermedia 17 genome selected as the reference. Once all the mapping was completed for a full set of samples i.e. B, a track list was created of all the mapping graphs, and the maximum graph coverage set at 3 across all sample sets for the purpose of comparison later on. These track lists were then exported using the graphics function in CLC for comparison later. Furthermore, the number of reads for each sample set, separated into each individual sample, were plotted as graphs for easy interpretation as results. It was decided that the track lists of mapping graphs were to be included in the appendix as they were summarised by the graphs formulated.
  • 18. Page 18 of 47 3. Results 3.1 NCBI BLAST RESULTS The first set of results show the NCBI MegaBlast search results from the initial analysis of research, indicating the presence of P. intermedia 17 in 5 out of 10 of the largest contigs (nodes). The full NCBI MegaBlast results are available in appendix (1). NODE ID Description Max Score Total Score Query cover E value Identity Accession NODE_54069 P. intermedia 17 chromosome II 7491 24979 90% 0.0 83% CP003503.1 NODE_28947 P. intermedia 17 chromosome I 3517 10209 40% 0.0 81% CP003502.1 NODE_13609 P. intermedia 17 chromosome II 6259 18001 77% 0.0 83% CP003503.1 NODE_12098 P. intermedia 17 chromosome II 5068 18113 63% 0.0 81% CP003503.1 NODE_18381 P. intermedia 17 chromosome II 2372 5668 33% 0.0 81% CP003503.1 Table 1: NCBI MegaBlast results from inputting the 10 largest contigs (nodes) from the initial analysis. Only the 5 contigs that hit P. intermedia 17 are displayed, including the chromosome they hit, the max score, total score, query cover, E value, identity and accession ID. The 5 nodes input into MegaBlast all hit at above 80% identity, with over 33% query cover, with an E value of 0. Therefore P. intermedia 17 was significantly found within the 5 largest contigs of initial analysis, validating the search for this bacteria within the individual samples. To confirm the presence of this bacteria in the contigs, the P. intermedia reference genome of the corresponding chromosome found in the MegaBlast results was aligned with the contigs that returned P. intermedia 17 hits.
  • 19. Page 19 of 47 3.2 NODE ALIGNMENT Figure 1: The alignment of the 5 contigs with the P. intermedia 17 reference genome corresponding to the chromosomes which hit each individual node. The pink areas of the coverage graph display the areas which align with the node. As seen in Figure 1, there is indeed the presence of P. intermedia 17 within the metagenome contig assembly and therefore present in the lungs of the individual patients. Some nodes contain more P. intermedia 17 genome than others, with the most conserved being NODE_54069, containing a 90% query cover and 83% identity to the bacteria genome, and the
  • 20. Page 20 of 47 least conserved being NODE_18381, containing a 33% query cover and 81% identity, displayed in the visual alignment in Figure 1. These results do indeed prove the presence of P. intermedia 17 within the lungs. 3.3 MAPPING OF INDIVIDUAL SAMPLES The next stage of the analysis revolved around mapping the P. intermedia 17 genome to the individual patient data to further support the hypothesis that P. intermedia 17 is present within the lung and to distinguish any relationship between P. intermedia 17 and lung cancer/COPD. The full mapping graphs from this part of the analysis are available in the appendix (2), with individual patient mapping data available in appendix (3). First looked at was the total number of mapped reads across the three patient groups; Control, Lung Cancer, and COPD. Figure 2: The total number of mapped P. intermedia 17 reads across the three patient groups; Control, Lung Cancer, and COPD. The raw data values are displayed above the data bars. 1622 226 6 0 200 400 600 800 1000 1200 1400 1600 1800 Control Lung Cancer COPD Numberofmappedreads Patient Group Total Number of Mapped P. intermedia 17 Reads
  • 21. Page 21 of 47 From Figure 2 it can be said that the highest number of mapped P. intermedia 17 reads were present in the control group (healthy patients), with the number decreasing significantly in lung cancer patients, and even further in COPD patients. To ensure that the trend displayed in Figure 2 was not due to a varying number of reads/bases in the sample data, the average percentage of mapped reads across the entire individual patient group was looked at to see if the trend appeared here also. Figure 3: The average percentage of reads from the individual patient groups that mapped to P. intermedia 17. The actual percentage is displayed to the left of each marker. Figure 3 appears to nearly mirror the trend shown in Figure 2 that the amount of P. intermedia 17 within human sputum is at its highest in healthy people (control group), decreasing significantly in lung cancer patients, and even further in COPD patients. It was decided that it would also be useful to look at the distribution of reads among the two chromosomes in the P. intermedia 17 genome to determine which one is more prevalent among the mapped reads. 0.756 0.289 0.022 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Control Lung Cancer COPD Average%ofmappedreads Patient Group Average % of Reads Mapped to P. intermedia 17
  • 22. Page 22 of 47 Figure 4: The distribution of mapped reads in the patient groups. Chromosome 1 is displayed by the solid black data bars and chromosome 2 is represented by the patterned data bars. The actual value of number of reads is displayed above the data bars. For COPD (due to the low numbers) the data bars cannot be seen – 2 represents chromosome 1 and 5 represents chromosome 2. From Figure 4 it can be seen that chromosome 2 of P. intermedia 17 appears significantly more among the individual patient data than chromosome 1, especially in the control group, where chromosome 2 appears approximately over 4 times more than chromosome 1. However, this could be due to chromosome 2 being approximately 4 times longer than chromosome 1. On the other hand, there is also the fact that chromosome 2 appeared in 4 out of the 5 largest contigs, as opposed to chromosome 1 hitting 1 contig, which could not be affected by sequence length. 299 39 2 1323 187 5 0 200 400 600 800 1000 1200 1400 Control Lung Cancer COPD Numberofmappedreads Patient Group Distribution of Mapped Reads in the Patient Groups Chromosome 1 Chromosome 2
  • 23. Page 23 of 47 4. Discussion 4.1 DISCUSSION OF RESULTS 4.1.1 NCBI MEGABLAST AND NODE ALIGNMENT From the NCBI MegaBlast and the following node alignment it is shown that P. intermedia 17 is definitely present within the human lung. This is the first time that this strain of P. intermedia has been located in the human lung and opens the door for various further studies, such as P. intermedia 17’s molecular relationship with lung cancer/COPD and whether the decrease in the bacterium is directly caused by the presence of the disease. P. intermedia has already been found in the human lung in relation to cystic fibrosis (Ulrich et al, 2010) so it’s not entirely unheard of, however strain 17 has only now been found to be present there also. 4.1.2 INDIVIDUAL SAMPLE DATA The individual sample data provides many conclusions towards this study. Firstly, it further confirms the presence of P. intermedia 17 in the human lung along with the NCBI MegaBlast and node alignment results. In addition to this these results also show a very interesting trend between the control group, lung cancer group and COPD group. Many species of Prevotella are potential/opportunistic pathogens (Yunfeng et al, 2015) under a wide range of environments and are known to invade host tissues (Nadkarni et al, 2012). From this it would be expected that if a link was found between P. intermedia 17 and lung cancer/COPD, the trend displayed would show that the level of bacteria would increase in lung cancer/COPD patients, as P. intermedia 17 would be a pathogen linked to the diseases (more bacteria presence = higher chance of disease developing). However, looking at the results of the individual sample data analysis, the trend is actually opposite in this case. The highest level of P. intermedia 17 was present in the control group (healthy patients) and decreased by approximately 85% in the lung cancer patient group, and decreased a further ~15% in the COPD patient group. To ensure this was not due to variation in sample size between patient groups the average % of mapped reads out of the whole sample group data was calculated, and the trend from this nearly mirrored the trend shown in the total mapped reads graph. These trends show that the presence of P. intermedia decreases when lung cancer or COPD is present in the patient. This could be due to many factors, direct or indirect. The manifestation of these diseases could directly cause the death of the P. intermedia 17 cells, for example by phagocytosis or toxin/hormone release. On
  • 24. Page 24 of 47 the other hand, they could destroy the P. intermedia cells indirectly, through possibly increasing the growth/presence of other bacteria species which compete with the P. intermedia, or through changing the environment in the lungs making it inhabitable for the bacteria. This would be a question for further study, and could lead to a wider knowledge about the mechanics of lung cancer and/or COPD in the human lung. The distribution of mapped reads across the P. intermedia 17 genome was also looked at to see if lung cancer/COPD affected this. Chromosome 1 of P. intermedia 17 contains 579647 base pairs, and chromosome 2 contains 2119790 base pairs, approximately four times more. This is reflected in the distribution of mapped reads in the control patient group, with chromosome 1 having 299 mapped reads and chromosome 2 having 1323 mapped reads. This is relatively maintained in the other two patient groups with some allowance for standard errors, therefore the diseases do not affect the viability of chromosome 1 or 2 within the human lung. 4.2 LIMITATIONS & IMPLICATIONS Time and computer memory/processor deficiency was a large limitation that was encountered during the research, leading to the sampling of 500000 reads from the individual data. For example, conducting a de novo assembly on the original data was taking up to 18 hours per file, with some attempts aborting due to disk space and computer memory deficiency, therefore taking a lot longer than was expected due to the size of the files. To correct this when using this data in the future, a computer with a very large amount of memory and an excellent processor would be required to complete genomic analysis of the full individual sample data. 4.3 FURTHER STUDY There are many various routes that could be followed when conducting further research from this analysis. An example would be to calculate a minimum threshold of P. intermedia 17 presence within the lung for the two diseases i.e. if a patient falls below this threshold then further investigation would be required or a diagnosis achieved. For this to work a diagnostic test would have to be developed. This could be achieved, for example, by developing a biomarker for P. intermedia 17 and tagging it with a fluorescent marker. When, for example, mixed with patient’s sputum, the biomarker with fluorescent tag would bind to any P.
  • 25. Page 25 of 47 intermedia 17 present, and be visible under an ultra-violet light. The less fluorescence visible, the higher the patient’s chance of having lung cancer or COPD. Another route of further study could be researching how the lung cancer/COPD cells interact with the P. intermedia 17 and cause its reduced prevalence in affected lungs. The manifestation of these diseases could directly cause the death of the P. intermedia 17 cells, for example by phagocytosis or toxin/hormone release. On the other hand, they could destroy the P. intermedia cells indirectly, through possibly increasing the growth/presence of other bacteria species which compete with the P. intermedia, or through changing the environment in the lungs making it inhabitable for the bacteria. Other P. intermedia strains were found in the NCBI MegaBlast results, so researching whether these appear in the human lung could be another route to follow. Also, investigating whether P. intermedia 17 is related to any other diseases that predominantly reside in the lungs, or maybe even whether it has relationships with other types of cancer. Additionally, possibly investigating whether it has any adverse effects upon the disease itself could be a promising option. 4.4 CONCLUSIONS This study has not only discovered the presence of Prevotella intermedia 17 in the lungs for the first time, it has also discovered that it indeed P. intermedia 17 does have a relationship with both lung cancer and chronic obstructive pulmonary disorder in humans. This could lead to the development of a new diagnostic test for lung cancer or COPD, or possibly further the knowledge surrounding these diseases and how they manifest in the human lung. Developing a new diagnostic test and providing early screening for patients is vitally important for lung cancer and COPD, as it would have the capacity to save countless lives by giving more people access to curative treatment at an early stage where it can be effective.
  • 26. Page 26 of 47 5. References ALBERG AJ, FORD JG, SAMET JM. Epidemiology of lung cancer: ACCP evidence-based clinical practice guideline (2nd edition). Chest. 2007;132(29S-55S). ALTSCHUL DF, GISH W, MILLER W, MYERS EW, LIPMAN DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403-10. AMERICAN CANCER SOCIETY. 2016. Signs and Symptoms of Lung Cancer [Online]. American Cancer Society. Available: http://www.cancer.org/cancer/lungcancer-non- smallcell/moreinformation/lungcancerpreventionandearlydetection/lung-cancer-prevention- and-early-detection-signs-and-symptoms [Accessed 4th April 2016]. AMERICAN LUNG ASSOCIATION. 2016. How Serious is COPD [Online]. American Lung Association. Available: http://www.lung.org/lung-health-and-diseases/lung-disease- lookup/copd/learn-about-copd/how-serious-is-copd.html?referrer=https://www.google.co.uk/ [Accessed 4th April 2016]. BOLIVAR I, WHITESON K, STADELMANN B, BARATTI-MAYERD, GIZARD Y, MOMBELLI A. Bacterial diversity in oral samples of children in Niger with acute noma, acute necrotizing gingivitis, and healthy controls. PLoS Negl Trop Dis. 2012;6(3):e1556. BRENNER DJ. Radiation Risks Potentially Associated with Low-Dose CT Screening of Adult Smokers for Lung Cancer. RSNA Radiology. 2004;231(2):030-880. BROOK I. 2015. Bacteroides Infection: Background [Online]. Medscape. Available: http://emedicine.medscape.com/article/233339-overview [Accessed 6th April 2016]. CAMERON S. Charting Human Microbiome and Metabolome Changes in Disease and Stress. Aberystwyth University. 2015. PhD thesis. CANCERCARE®. 2016. Types and Staging of Lung Cancer [Online]. Lungcancer.org (A program of CancerCare®). Available: http://www.lungcancer.org/find_information/publications/163- lung_cancer_101/268-types_and_staging [Accessed 4th April 2016]. CANCERRESEARCHUK. 2015a. Lung Cancer Survival Statistics [Online]. CancerResearchUK. Available: http://www.cancerresearchuk.org/cancer-info/cancerstats/types/lung/survival/lung- cancer-survival-statistics [Accessed 23rd March 2015] CANCERRESEARCHUK. 2015b. Lung Cancer Mortality Statistics [Online]. CancerResearchUK. Available: http://www.cancerresearchuk.org/cancer-info/cancerstats/types/lung/mortality/uk- lung-cancer-mortality-statistics [Accessed 23rd March 2015] CANCERRESEARCHUK. 2015c. General Factsheet for Lung Cancer [Online]. CancerResearchUK. Available: http://www.cancerresearchuk.org/prod_consump/groups/cr_common/@cah/@gen/documents/ generalcontent/cr_120625.pdf [Accessed 23rd March 2015] CENTRES FOR DISEASE CONTROL AND PREVENTION (CDC). 2016. Lung Cancer – Basic Information – What Screening tests are there? [Online]. Centres for Disease Control and
  • 27. Page 27 of 47 Prevention. Available: http://www.cdc.gov/cancer/lung/basic_info/screening.htm [Accessed 4th April 2016]. CLC BIO. 2016. CLC Genomics Workbench 8 [Software]. Qiagen. DICKSON RP, HUFFNAGLE GB. The Lung Microbiome: New Principles for Respiratory Bacteriology in Health and Disease. PLoS Pathog. 2015;11(7):e1004923. DORN BR, DUNN WA JR, PROGULSKE-FOX A. Invasion of human coronary cells by periodontal pathogens. Infect Immun. 1999:67(11);5792-8. DORN BR, LEUNG KP, PROGULSKE-FOX A. Invasion of Human Oral Epithelial Cells by Prevotella intermedia. Infect Immun. 1998;66(12):6054-6057. EDDY, D. Screening for lung cancer. Annals of internal medicine. 1989;111:232-237. EIRING P, WALLER K, WIDMANN A, WERNER H. Fibronectin and laminin binding of urogenital and oral Prevotella species. Zentralbl Bakteriol. 1998;288(3):361-72. FAN Y, DIVYA I, CECILIA A, JANINA P, LEWIS DR. Identification and characterisation of a cell surface protein of Prevotella intermedia 17 with broad-spectrum binding activity for extracellular matrix proteins. Proteomics. 2006;6(22):6023-32. FERLAY J, SOERJOMATARAM I, ERVIK M, DIKSHIT R, ESER S, MATHERS C, REBELO M, PARKIN DM, FORMAN D, BRAY F. 2014. Cancer Incidence and Mortality Worldwide: IARC CancerBase No. 11. Globocan 2012 v1.1. 2014 FIORE MC, BAILEY WC, COHEN SJ. Smoking Cessation: Clinical Practice Guideline No 18. US Department of Health and Human Services, Public Health Service, Agency for Health Care Policy and Research. AHCPR Publ. 1996;96:0692. FUKUSHIMA H, MOROI H, INOUE J, ONOE T, EZAKI T, YABUUCHI E, LEUNG KP, WALKER CB, CLARK WB, SAGAWA H. Phenotypic characteristics and DNA relatedness in Prevotella intermedia and similar organisms. Oral Microbiol Immunol. 1992;7(1):60-4. HAFFAJEE AD, SOCRANSKY SS. Review: Microbial etiological agents of destructive periodontal diseases. Periodontol 2000. 1994;5:78-111. HAN MK, ZHOU Y, MURRAY S, TAYOB N, NOTH I, LAMA VN, MOORE BB, WHITE ES, FLAHERTY KR, HUFFNAGLE GB, MARTINEZ FJ. Lung microbiome and disease progression in idiopathic pulmonary fibrosis: an analysis of the COMET study. The Lancet Respiratory Medicine. 2014;2(7):548-556. HARASZTHY VI, ZAMBOM JJ, TREVISAN M, SHAH R, ZEID M, GENCO RJ. Identification of pathogens in atheromatous plaques. J Dent Res. 1998;77:666. HAYASHI H, SHIBATA K, SAKAMOTO M, TOMITA S, BENNO Y. Prevotella copri sp. nov. and Prevotella stercorea sp. nov., isolated from human faeces. Int J Syst Evol Microbiol. 2007;57(Pt 5):941-6. HOUGHTON AM. Mechanistic links between COPD and lung cancer. Nature Reviews Cancer. 2013;13:233-245.
  • 28. Page 28 of 47 JACINTO RC, GOMES BP, FERRAZ CC, ZAIA AA, FILHO FJ. Microbiological analysis of infected root canals from symptomatic and asymptomatic teeth with periapical periodontitis and the antimicrobial susceptibility of some isolated anaerobic bacteria. Oral Microbiol Immunol. 2003;18(5):285-92. ERB-DOWNWARD JR, THOMPSON DL, HAN MK, FREEMAN CM, MCCLOSKY L, SCHMIDT LA, YOUNG VB, TOEWS GB, CURTIS JL, SUNDARAM B, MARTINEZ FJ, HUFFNAGLE GB. Analysis of the Lung Microbiome in the ‘Healthy’ Smoker and in COPD. PLoS ONE. 2011;6(2):e16384. LEUNG KP, FUKUSHIMA H, SAGAWA H, WALKER CB, CLARK WB. Surface appendages, hemagglutination, and adherence to human epithelial cells of Bacteroides intermedius. Oral Microbiol Immunol. 1989;4(4):204-10. LOZANO R, NAGHAVI M, FOREMAN K. Global and regional mortality from 235 causes of death age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet. 2013;380:2095-128. MAEDA N, OKAMOTO M, KONDO K, ISHIKAWA H, OSADA R, TSURUMOTO A. Incidence of Prevotella intermedia and Prevotella nigrescens in periodontal health and disease. Microbiol Immunol. 1998;42(9):583-9. MALLIA P, CONTOLI M, CARAMORI G, PANDIT A, JOHNSTON S, PAPI A. Exacerbations of asthma and chronic obstructive pulmonary disease (COPD): focus on virus induced exacerbations. Current pharmaceutical design. 2003;13:73-97. MORGULIS A, COLOURIS G, RAYTSELIS Y, MADDEN TL, AGARWALA R, SHAFFER AA. Database indexing for production MegaBLAST searches. Bioinformatics. 2008;24(16):1757- 64. NADKANI MA, BROWNE GV, CHHOUR K, BYUN R, NGUYEN K, CHAPPLE CC. Pattern of distribution of Prevotella species/phylotypes associated with healthy gingiva and periodontal disease. Eur J Clin Microbiol Infect Dis. 2012;31(11):2989-99. NAGAOKA K, YANAGIHARA K, MORINAGA Y, NAKAMURA S, HARADA T. Prevotella intermedia Induces Severe Bacteremic Pneumococcal Pneumonia in Mice with Upregulated Platelet-Activating Factor Receptor Expression. Infection and Immunity. 2014;82(2):587-593. NATIONAL CENTRE FOR BIOTECHNOLOGY INFORMATION (NCBI). 2016. BLAST® [Online]. National Centre for Biotechnology Information, U.S. National Library of Medicine. Available: http://blast.ncbi.nlm.nih.gov/Blast.cgi [Accessed 9th April 2016]. NATIONAL CENTRE FOR BIOTECHNOLOGY INFORMATION (NCBI). 2015. BLAST Homepage and Selected Search Pages: Introducing the BLAST homepage and form elements/functions of selected search pages [Online]. National Centre For Biotechnology Information. Available: ftp://ftp.ncbi.nlm.nih.gov/pub/factsheets/HowTo_BLASTGuide.pdf [Accessed 9th April 2016]. NATIONAL HEALTH SERVICE (NHS). 2014. Chronic obstructive pulmonary disease – Causes of COPD [Online]. NHS Choices. Available: http://www.nhs.uk/Conditions/Chronic-obstructive- pulmonary-disease/Pages/Causes.aspx [Accessed 5th April 2016].
  • 29. Page 29 of 47 NATIONAL HEALTH SERVICE (NHS). 2016. CT Scan – Introduction [Online]. NHS Choices. Available: http://www.nhs.uk/conditions/ct-scan/Pages/Introduction.aspx [Accessed 4th April 2016]. NATIONAL INSTITUTES OF HEALTH (NIH). 2013a. What is COPD? [Online]. National Heart, Lung, and Blood Institute. Available: http://www.nhlbi.nih.gov/health/health- topics/topics/copd/ [Accessed 5th April 2016] NATIONAL INSTITUTES OF HEALTH (NIH). 2013b. What are the signs and symptoms of COPD? [Online]. National Heart, Lung, and Blood Institute. Available: https://www.nhlbi.nih.gov/health/health-topics/topics/copd/signs [Accessed 5th April 2016]. QASEEM A, WILT TJ, WEINBERGER SE, HANANIA NA, CRINER G, VAN BER MOLEN T, MARCINIUK DD. Diagnosis and Management of Stable Chronic Obstructive Pulmonary Disease: A Clinical Practice Guideline Update from the American College of Physicians, American College of Chest Physicians, American Thoracic Society and European Respiratory Society. Annals of Internal Medicine. 2011;155(3):179-91 PARRY. 2010. Use and abuse of drugs – the link between smoking and lung cancer [Image][Online] Available: http://www.corescience.co.uk/index.php?option=com_content&view=article&id=58%3Ause- and-abuse-of-drugs&catid=43%3Adrugs&Itemid=41&limitstart=3 [Accessed 13th April 2016] RAVIV S, HAWKINS K, DECAMP M, KALHAN R. Lung cancer in chronic obstructive pulmonary disease: enhancing surgical options and outcomes. American journal of respiratory and critical care medicine. 2011;176:532-555. RUAN Y, SHEN L, ZOU Y, QI Z, YIN J, JIANG J, GUO L, HE L, CHEN Z, TANG Z, QIN S. Comparative genome analysis of Prevotella intermedia strain isolated from infected root canal reveals features related to pathogenicity and adaptation. BMC Genomics. 2015;16(1):1. SHAH HN, COLLINS DM. NOTES: Prevotella, a new genus to include bacteroides melaninogenicus and related species formerly classified in the genus bacteroides. Int J Systematic. 1990;40(2):205-8. UK LUNG CANCER SCREENING TRIAL (UKLS). 2012. Background to UKLS [Online]. UKLS. Available: https://www.ukls.org/index.html [Accessed 4th April 2016]. ULRICH M, BEER I, BRAITMAIER P, DIERKES M, KUMMER F, KRISMER B. Relative contribution of Prevotella intermedia and Pseudomonas aeruginosa to lung pathology in airways of patients with cystic fibrosis. Thorax. 2010;65(11):978-84. US. DEPARTMENT OF HEALTH AND HUMAN SERVICES. 2014. The Health Consequences of Smoking – 50 years of progress: A report of the surgeon general [Online]. Centres for Disease Control and Prevention. Available: http://www.cdc.gov/tobacco/data_statistics/sgr/50th- anniversary/index.htm [Accessed 5th April 2016]. VESTBO, JORGEN. Definition and Overview: Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Pulmonary Disease. Global Initiative for Chronic Obstructive Lung Disease. 2013:pp(1-7).
  • 30. Page 30 of 47 VESTBO, JORGEN. Diagnosis and Assessment: Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Pulmonary Disease. Global Initiative for Chronic Obstructive Lung Disease. 2013:pp(9-17). WORLD HEALTH ORGANISATION (W.H.O.). 2016. Chronic respiratory diseases – Causes of COPD [Online]. World Health Organisation. Available: http://www.who.int/respiratory/copd/causes/en/ [Accessed 5th April 2016]. WORLD HEALTH ORGANISATION (W.H.O.). 2015a. Chronic Obstructive Pulmonary Disease (COPD) Factsheet [Online]. World Health Organisation. Available: http://www.who.int/mediacentre/factsheets/fs315/en/ [Accessed 23rd March 2015]. WORLD HEALTH ORGANISATION (W.H.O.). 2015b. Cancer Factsheet [Online]. World Health Organisation. Available: http://www.who.int/mediacentre/factsheets/fs297/en/ [Accessed 4th April 2016]. WORLD HEALTH ORGANISATION (W.H.O.). 2014. World Cancer Report 2014. [Online]. World Health Organisation. Available: http://apps.who.int/bookorders/anglais/detart1.jsp?codlan=1&codcol=76&codcch=31# [Accessed 4th April 2016]. YOUNG, VINCENT B. (2010). Blueprints Medicine (5th Ed.). Philadelphia: Wolters Kluwer Health/Lippincott William & Wilkins. p. 69. ISBN: 978-0-7817-8870-0. YUNFENG R, LU S, YAN Z, ZHENGNAN Q, JUN Y, JIE J, LIANG G, LIN H, ZIJIANG C, ZISHENG T, SHENGYING Q. Comparative genome analysis of Prevotella intermedia strain isolated from infected root canal reveals features related to pathogenicity and adaptation. BMC Genomics. 2015:16;122.
  • 31. Page 31 of 47 6. Word Count The final word count for this study, excluding the final list of references, acknowledgements, tables, table of contents, and figure/image legends is: 6692
  • 32. Page 32 of 47 7. List of Figures/Tables/Images Table 1: NCBI MegaBlast Search Results for the 5 largest contigs in relation to P. intermedia 17 Figure 1: Node alignment of the 5 contigs with the P. intermedia 17 reference genome Figure 2: Bar chart displaying the total number of mapped reads found in each of the patient groups Figure 3: Line chart displaying the average percentage of mapped reads from the total genomic data in the patient groups Figure 4: Bar chart displaying the distribution of mapped reads across the two chromosomes of the P. intermedia 17 genome for each of the patient groups Image 1: Government campaign supporting smoking cessation Image 2: Structural changes in human lungs with COPD
  • 33. Page 33 of 47 8. Appendix APPENDIX 1 – NCBI MEGABLAST RESULTS (FULL) NODE_54069 Description Max Score Total Score Query Cover E value Identity Accession Prevotella intermedia DNA. Complete genome. Strain: OMA14. Chromosome 1 7413 24943 91% 0.0 83% AP014597.1 Prevotella intermedia DNA. Chromosome 2. Complete genome. Strain: 17-2 7491 24979 90% 0.0 83% AP014925.1 Prevotella intermedia 17 chromosome II. Complete sequence. 7491 24979 90% 0.0 83% CP003503.1 NODE_28947 Description Max Score Total Score Query Cover E value Identity Accession Prevotella intermedia DNA, complete genome. Strain: OMA14. Chromosome II 4071 9268 32% 0.0 83% AP014598.1 Prevotella intermedia DNA, chromosome 1. Complete genome. Strain: 17-2 3517 10209 40% 0.0 81% AP014926.1 Prevotella intermedia 17 chromosome I. Complete sequence 3517 10209 40% 0.0 81% CP003502.1 Prevotella intermedia DNA. Complete genome. Strain: OMA14. Chromosome I. 122 122 0% 3e-22 94% AP014597.1 Prevotella intermedia DNA, chromosome 2. Complete genome. Strain 17-2 121 121 0% 1e-21 94% AP014925.1
  • 34. Page 34 of 47 NODE_13609 Description Max Score Total Score Query Cover E value Identity Accession Prevotella intermedia 17 chromosome II. Complete sequence 6259 18001 77% 0.0 83% CP003503.1 Prevotella intermedia DNA, chromosome 2. Complete genome. Strain: 17-2 6255 17997 77% 0.0 83% AP014925.1 Prevotella intermedia DNA. Complete genome. Strain: OMA14. Chromosome I 6325 15926 66% 0.0 83% AP014597.1 NODE_12098 Description Max Score Total Score Query Cover E value Identity Accession Prevotella intermedia DNA. Complete genome. Strain: OMA14. Chromosome I 5265 16356 54% 0.0 82% AP014597.1 Prevotella intermedia DNA, chromosome 2. Complete genome. Strain: 17-2 5068 18108 63% 0.0 81% AP014952.1 Prevotella intermedia 17 chromosome II. Complete sequence 5068 18113 63% 0.0 81% CP003503.1 NODE_18381 Description Max Score Total Score Query Cover E value Identity Accession Prevotella intermedia DNA, chromosome 2. Complete genome. Strain: 17-2 2372 5663 33% 0.0 81% AP014925.1 Prevotella intermedia 17 chromosome II. Complete sequence 2372 5668 33% 0.0 81% CP003503.1 Prevotella intermedia DNA. Complete genome. Strain: OMA14. Chromosome I 2287 3579 20% 0.0 80% AP014597.1
  • 35. Page 35 of 47 APPENDIX 2 – MAPPING GRAPHS OF INDIVIDUAL SAMPLE DATA Blue areas represent areas matching that of the P. intermedia 17 reference genome SAMPLE B – CHROMOSOME 1
  • 36. Page 36 of 47 N.B. B11 is omitted due to no reads being mapped in either chromosome SAMPLE B – CHROMOSOME 2
  • 37. Page 37 of 47 N.B. B11 is omitted due to no reads mapping on either chromosome
  • 38. Page 38 of 47 SAMPLE C – CHROMOSOME 1 N.B. C2, 3, 8 are omitted due to no reads mapping on either chromosome
  • 39. Page 39 of 47 SAMPLE C – CHROMOSOME 2 N.B. C2, 3, 8 are omitted due to no reads mapping to either chromosome
  • 40. Page 40 of 47 SAMPLE D – CHROMOSOME 1 N.B. D2, 4, 8, 10, 11 omitted due to no reads mapped for either chromosome.
  • 41. Page 41 of 47 SAMPLE D – CHROMOSOME 2 N.B. D2, 4, 8, 10, 11 omitted due to no reads mapped for either chromosome.
  • 42. Page 42 of 47 APPENDIX 3 – INDIVIDUAL PATIENT MAPPING DATA SAMPLE B B2 B3 B4 B5
  • 43. Page 43 of 47 B6 B7 B8 B9
  • 44. Page 44 of 47 B10 SAMPLE C C4 C5 C6
  • 45. Page 45 of 47 C7 C9 C10 C11
  • 46. Page 46 of 47 SAMPLE D D3 D5 D6 D7
  • 47. Page 47 of 47 D9