Using a case-control model for study, viral markers were investigated for their use in predicting water health and contamination source. Samples were taken from three separate watersheds: Agricultural, Urban, and a Reference. From each of these, sub-samples were taken with respect to an identified contamination source: up-stream, down-stream, and at the site of contamination. Each sample was filtered in order to isolate viral particles and viral genetic material (DNA and RNA) was shotgun sequenced using the MiSeq bench top sequencer. Data was quality filtered and matched to a database in order to identify the viruses from which these reads came. Samples were compared to one another in order to identify significant differences in viral communities.
1. Identification of Viral Biomarkers for Healthy Water
Mitchell Webb1*, Miguel Uyaguari-Diaz1, Matthew Croxen1,2, Natalie Prystajecky1,2, Judy Isaac-Renton1,2 and Patrick Tang1,2.
1- University of British Columbia, 2- BCCDC Public Health Microbiology & Reference Laboratory
Abstract
Using a case-control model for study, viral
markers were investigated for their use in
predicting water health and contamination
source. Samples were taken from three separate
watersheds: Agricultural, Urban, and a
Reference. From each of these, sub-samples
were taken with respect to an identified
contamination source: up-stream, down-stream,
and at the site of contamination. Each sample
was filtered in order to isolate viral particles and
viral genetic material (DNA and RNA) was
shotgun sequenced using the MiSeq bench top
sequencer. Data was quality filtered and
matched to a database in order to identify the
viruses from which these reads came. Samples
were compared to one another in order to
identify significant differences in viral
communities.
This work is funded by Genome Canada, Genome British Columbia, Simon Fraser University, and the Public Health Agency of
Canada. This work is carried out with co-investigators at University of British Columbia, Simon Fraser University, University of
Saskatchewan, University of McGill, and Boreal Genomics. The authors thank the staff at Environmental Microbiology Water, and
Molecular Services laboratories (BC-CDC Public Health Microbiology and Reference Laboratory). We also thank Joe
Pennimpede (Capital Regional District), James Hibbert (University of South Carolina), Jan Finke (University of British Columbia)
for sample collection, GIS assistance, and flow cytometry analysis, respectively.
Introduction
Materials and Methods Results
Future Research
Acknowledgements
CONTACT INFORMATION
Mitchell.Webb@alumni.ubc.ca
Patrick.Tang@bccdc.ca
http://www.watersheddiscovery.ca/
Reference
Watershed
Urban
Watershed
Rural
Watershed
Amaral-Zettler, L. A., McCliment, E. A., Ducklow, H. W., and S. M. Huse. 2009. A method for
studying protistan diversity using massively parallel sequencing of V9 hypervariable regions of
small-subunit ribosomal RNA genes. PLoS One 4(7): e6372. Filée, J., Tétart, F., Suttle, C. A.,
and H. M. Krisch. 2005. Marine T4-type bacteriophages, a ubiquitous component of the dark
matter of the biosphere. PNAS 102(45): 12471-12476.
Brussaard, C.P.D. 2004. Optimization procedures for counting viruses by flow cytometry. Applied
and Environmental Microbiology 70(3): 1506-1513.
Caporaso, J. G., Lauber, C. L., Walters, W. A., Berg-Lyons, D., Lozupone, C. A., Turnbaugh,
P. J., Fierer, N., and R. Knight. 2011. Global patterns of 16S rRNA diversity at a depth of millions
of sequences per sample. PNAS 108(1): 4516-4522.
Chen, F., and C. A. Suttle. 1995. Amplification of DNA Polymerase Gene Fragments from
Viruses Infecting Microalgae. Applied and Environmental Microbiology 61(4): 1274-1278.
Culley, A. I., Lang, A. S., and C. A. Suttle. 2006. Metagenomic Analysis of Coastal RNA Virus
Communities. Science 312(5781): 1795-1798.
Hill, J. E., Town, J. R., and S. M. Hemmingsen. 2006. Improved template representation in
cpn60 polymerase chain reaction (PCR) product libraries generated from complex templates by
application of a specific mixture of PCR primers. Environmental microbiology 8(4): 741-746.
White, T. J., T. Bruns, S. Lee, and J. W. Taylor. 1990. Amplification and direct sequencing of
fungal ribosomal RNA genes for phylogenetics. Pp. 315-322 In: PCR Protocols: A guide to
methods and Applications, eds. Innis, M. A., D. H. Gelfand, J. J. Sninsky, and T. J. White.
Academic Press, Inc., New York.
References
The field of metagenomics is rapidly developing. With the continued sequencing of
genomes and collaboration among researchers, databases are maturing and their utility in
identifying micro-organisms continues to increase. However, organism-read matching may
prove to be an insurmountable task to perform to any degree of practical use. Instead, in
the context of water sample analysis, it may be more practical to simply match reads
according to organizational taxonomic units (OTUs). This technique would permit data that
was lost during the database match step of this work flow.
Additionally, further sampling and characterization of the microbial fingerprint of healthy
water samples and contamination sources will offer researches better insight into predictive
patterns of water health. This research focused only on viral markers. However, bacterial
and eukaryotic kingdoms have the potential to offer valuable insight into water health and
potential contamination source.
This work aims to integrate metagenomic profiles with physical, chemical and biological
indicator data to identify A) novel markers of watershed health and B) novel microbial
pollution profiles, suggestive of pollution source.
0
2
4
6
8
10
12
14
Rural Urban1 Urban2 Reference
Thousands
Upstream
Polluted
Downstream
genus Rural Up Rural Pol Rural Dwn Urban Pol 1 Urban Pol 2 Urban Dwn 1Urban Dwn 2Ref Up Ref Dwn
Betacoronavirus 560 364 506 351 369 410 616 77 681
Alphacoronavirus 467 275 402 389 288 290 448 163 508
Gammacoronavirus 282 227 277 102 213 184 281 25 376
Siphoviridae 116 94 77 449 152 356 40 607 40
Viruses_unclassified 197 211 223 134 269 185 158 231 114
T4-like viruses 210 141 176 151 197 166 272 71 324
Potyvirus 185 140 160 162 144 162 181 44 119
Flavivirus 97 105 177 166 109 144 84 192 67
Endornavirus 136 98 95 46 89 77 130 85 92
Sobemovirus 7 641 133 0 0 0 0 2 0
Coronavirinae 103 104 111 40 71 65 89 9 136
Podoviridae 48 41 50 20 140 41 24 211 15
Torovirus 69 29 57 21 49 32 89 70 99
Viruses 41 46 35 49 66 33 90 16 99
Bafinivirus 53 30 50 35 55 47 50 45 45
Varicellovirus 29 32 30 25 105 24 22 111 7
Simplexvirus 30 36 31 10 119 13 26 97 9
Coronavirinae_unclassified 62 39 60 16 28 27 42 2 42
Betabaculovirus 1 1 2 205 4 39 3 58 4
Closterovirus 16 13 9 26 16 39 6 163 6
Tospovirus 27 14 46 19 30 61 38 3 46
Okavirus 33 36 21 16 31 26 44 13 44
Cyprinivirus 11 22 18 19 55 13 16 54 14
Pestivirus 23 20 17 22 28 26 29 8 19
N4-like viruses 12 18 17 21 14 11 21 50 11
Iflavirus 14 21 5 19 14 33 18 10 23
Nairovirus 15 11 18 17 13 16 22 8 30
Dianthovirus 4 108 29 0 0 0 0 4 0
Caudovirales_unclassified 11 16 20 5 31 11 6 27 3
Carlavirus 11 5 5 28 4 32 5 31 1
Tobamovirus 1 2 5 9 1 90 0 4 1
Alphavirus 17 5 4 16 15 10 5 38 2
Potexvirus 2 20 65 3 5 8 4 4 1
Tombusvirus 0 5 2 4 0 98 0 0 0
Ipomovirus 7 6 15 26 15 4 13 5 17
Ascovirus 4 0 1 87 6 1 2 3 0
Arterivirus 15 12 5 25 11 9 7 11 1
Caudovirales 4 5 7 8 26 1 3 34 7
Hepacivirus 10 17 8 19 10 8 4 17 0
Inovirus 0 0 0 2 1 1 0 88 1
Myoviridae 7 8 8 17 15 8 8 9 7
Tymovirus 6 4 2 14 13 18 14 11 5
Betaherpesvirinae 4 9 9 10 11 3 7 33 0
Hypovirus 10 3 4 7 11 15 5 21 9
Lymphocryptovirus 5 8 8 10 11 16 6 16 4
Alphaherpesvirinae_unclassified 4 7 16 0 24 6 5 19 0
Arenavirus 11 8 10 10 12 9 14 3 3
Cytomegalovirus 2 2 2 6 9 1 7 50 0
Rhadinovirus 6 10 8 7 12 6 12 11 6
Nepovirus 10 2 10 14 4 25 7 0 1
Herpesviridae_unclassified 5 12 18 1 16 1 4 14 0
Marafivirus 6 7 5 4 8 26 6 7 2
Muromegalovirus 2 4 0 30 11 4 3 12 1
Tritimovirus 12 11 6 8 5 7 4 1 12
Crinivirus 9 4 7 9 5 11 9 1 9
Iltovirus 4 1 5 12 10 8 2 17 3
Aphthovirus 1 0 0 4 3 3 1 49 0
Herpesvirales_unclassified 3 5 10 2 24 2 2 12 1
Rymovirus 3 4 5 20 6 8 3 7 2
Mardivirus 7 2 1 11 8 9 5 7 5
Coccolithovirus 2 2 2 2 6 0 12 0 28
Batrachovirus 6 5 5 7 11 5 6 5 3
Waikavirus 9 5 3 6 4 10 9 1 6
Badnavirus 7 6 3 14 3 0 9 1 8
Capripoxvirus 8 1 1 8 2 26 2 0 2
Lambda-like viruses 4 7 0 1 12 2 2 18 4
Coronaviridae_unclassified 6 7 3 2 3 4 5 0 18
phiKZ-like viruses 1 2 3 7 6 2 7 20 0
Betaretrovirus 1 0 0 2 0 1 0 43 0
Alphabaculovirus 4 7 0 7 5 10 1 6 6
Avipoxvirus 2 4 16 3 9 2 6 0 4
Tenuivirus 7 2 6 8 3 6 8 0 6
Ophiovirus 5 3 4 3 7 4 11 1 7
Cripavirus 4 2 1 4 4 3 4 17 4
Enterovirus 6 2 3 9 6 5 3 0 2
Cytorhabdovirus 2 6 2 2 1 18 1 2 1
Ampelovirus 2 0 0 11 2 12 5 2 0
Kobuvirus 3 1 3 6 2 4 9 4 2
Molluscipoxvirus 9 3 1 4 1 4 0 12 0
Chlorovirus 5 0 0 4 4 13 1 2 1
Orthobunyavirus 4 0 12 0 4 0 6 0 4
c2-like viruses 3 1 1 13 0 9 0 1 1
Hepatovirus 0 2 1 5 0 18 1 2 0
Poacevirus 2 0 5 4 1 6 4 4 2
Cardiovirus 0 2 2 10 0 5 0 8 1
Brambyvirus 3 3 5 2 3 2 4 3 1
Fabavirus 2 3 0 15 2 3 1 0 0
I3-like viruses 2 5 4 0 7 1 0 6 1
Rubivirus 1 1 1 7 7 0 0 9 0
Cosavirus 4 1 1 5 4 10 0 0 0
Orthopoxvirus 4 1 0 2 13 1 1 0 3
Flaviviridae 3 1 2 7 3 1 0 7 0
Iridovirus 4 1 1 0 2 0 5 7 4
Percavirus 3 2 3 3 6 1 1 5 0
Pneumovirus 3 0 8 0 0 1 5 0 7
Paraturdivirus 6 1 1 4 0 10 1 0 0
Enterococcus 2 1 1 3 5 1 2 3 4
Hantavirus 4 1 2 2 1 2 6 0 4
Respirovirus 0 2 4 2 3 0 6 0 5
T7-like viruses 2 1 1 4 1 2 7 0 3
Phycodnaviridae 1 4 1 4 1 4 4 1 0
Aparavirus 5 3 7 1 0 0 2 0 1
Leporipoxvirus 0 1 0 4 1 0 1 11 1
Bpp-1-like viruses 1 0 2 2 4 0 2 7 0
Phlebovirus 1 1 1 3 1 5 4 1 0
0 0.05 0.1 0.15
Clean water is a tremendous resource for both
Canadian health and the economy. In addition,
water quality plays a particularly important role in
the general health of our many coastal
ecosystems. Unfortunately, urbanization and
agricultural land use threatens the cleanliness of
water and thus increases the importance of
appropriate treatment and testing. However, the
current culture-based approach for water quality
assessment could be improved. It is lacking in
sensitivity, as a large proportion of pathogens
cannot be cultured and are expensive to look for,
and it is reactive, only testing positive after
contamination has occurred. In order to more
thoroughly explore these microbiomes,
researchers have begun to apply high-throughput
sequencing technology in the developing field of
metagenomics. Metagenomics is defined as the
simultaneous study of all genetic material
recovered directly from a sample. In this way, a
community of viruses, bacteria, and protists can
be analyzed as a microbial fingerprint. In the
present research, we use metagenomics to
identify novel biomarkers of watershed health and
to develop a tool for matching the microbial
fingerprint of a contaminated site to a specific
source.
Sampling
Work Flow
Database Match:
USEARCH
Sample site combination:
Sample data, containing the identified
organisms from all water samples, was
compared using a python computer script
Viral Community Heat Map
Population
Watershed Site
Workflow Attrition
Viral Population per WatershedCorrelation Coefficient Matrix
- Up stream
- Contamination
- Down stream
Quality Filter: Raw data produced
by Illumina’s MiSeq genetic sequencer was
analyzed using a nucleotide-based quality
filtering script written in python.
40 L Sample
Viral Retentate
Algorithm: This algorithm is faster than simple BLAST-ing by orders of magnitude. It exploits
common sequences, called kmers, and uses them to perform a preliminary list of possible matches.
Once the list is compiled, a refining match chooses the best result.
Shotgun Sequencing