SlideShare a Scribd company logo
1 of 70
Novel Computational Approaches to
Investigate Microbial Diversity
Qingpeng Zhang
Department of Computer Science and Engineering
Michigan State University
Advisor: Dr. Titus Brown
1
Outline
• Background and motivation
• An efficient k-mer counting approach
• Novel method to investigate microbial
diversity
– Concept of IGS (informative genomic segment)
– Testing IGS method on simulated data sets
– Testing IGS method on real metagenome data
• Summary
2
Background and motivation
Microbial diversity: Sorcerer II Global Ocean
Sampling Expedition
 How many different species in a sea water sample? What is their
abundance distribution? (alpha-diversity) [Hint: ~10,000 different types
of microorganisms]
 Are samples from tropical area and temperate area different?(beta-
diversity) [Hint: Yes, but how different?]
”...functional analysis of the collective genomes of
soil microflora, which we term the metagenome of
the soil.”
- J. Handelsman et al., 1998
Most of the microorganisms can not be
cultured and isolated and sequenced in
way.
In a gram of soil, there are approximately a billion microbial cells,
containing an estimated 4 petabase pairs of DNA (Jack A. Gilbert,2013)
+
Improvement of next generation sequencing = data deluge
+
Decreasing cost of sequencing 4
Metagenomics and
“big data”
”...functional analysis of the collective genomes
of soil microflora, which we term the
metagenome of the soil.”
- J. Handelsman et al., 1998
most of the microorganisms can not be
cultured and isolated and sequenced in
traditional way.
In a gram of soil, there are approximately a billion microbial cells,
containing an estimated 4 petabase pairs of DNA (Jack A. Gilbert,2013)
+
Improvement of next generation sequencing = data deluge
+
Decreasing cost of sequencing 5
Metagenomics and
“big data”
”...functional analysis of the collective genomes
of soil microflora, which we term the
metagenome of the soil.”
- J. Handelsman et al., 1998
most of the microorganisms can not be
cultured and isolated and sequenced in
traditional way.
In a gram of soil, there are approximately a billion microbial cells,
containing an estimated 4 petabase pairs of DNA (Jack A. Gilbert,2013)
+
Improvement of next generation sequencing = data deluge
+
Decreasing cost of sequencing
http://www.e-janco.com/
6
Metagenomics and
“big data”
7
Bacterial genome size: 139,000 bps to 13,000,000 bps
Human genome size: 3,000,000,000 bps
Metagenomics and
“big data”
Tools friendly to big data are required.
Sorcerer II Global Ocean Sampling Expedition
 How many different species in a sea water sample? What is their
abundance distribution? (alpha-diversity) [Hint: ~10,000 different types
of microorganisms, 10 million viruses, one million bacteria in a drop.]
 Are samples from tropical area and temperate area different?(beta-
diversity) [Hint: Yes, but how different?]
Genome and Book
Sample and Library
Which library has more titles?
Which library has more titles?
With only shredded pieces of pages
with (partial) sentences on it to read.
How similar are the libraries by their collection?
With only shredded pieces of pages
with (partial) sentences on it to read.
Sequencing
AssemblyReads Contigs
1000+ bp
Abundance
Mapping
3x
1x
1x
2x
Species A
Species B
Species C
Species D
BinningSpecies
Sample1
Sample2
Sample3
Sample4
Sample5
Sample1 Sample2 Sample3 Sample4 Sample5
Species A 3 3 4 4 0
Species B 1 3 0 2 2
Species C 1 2 2 2 3
Species D 2 1 2 1 1
A typical pipeline of metagenome diversity analysis
Loosely based on slide by Mads Albertsen
100-150 bp
14
Sample
1
Sample
2
Sample
3
Sample
4
Sample
5
Sample
6
SpeciesA 3 3 4 4 0 0
SpeciesB 1 3 0 2 2 2
SpeciesC 1 2 2 2 3 3
SpeciesD 2 1 2 1 1 1
SpeciesE 4 1 3 0 2 5
SpeciesF 5 1 2 2 2 5
SpeciesG 2 2 1 2 1 3
Alpha diversity:
“How many different species in a seawater sample?”
Beta diversity:
“How different are the seawater samples?”
Sample1 Sample1 Sample1 Sample1 Sample1 Sample1
Sample1 0.0 0.25 0.5 0.5 0.75 1.0
Sample2 0.25 0.0 0.25 0.25 0.75 1.0
Sample3 0.5 0.25 0.0 0.25 0.75 1.0
Sample4 0.5 0.25 0.25 0.0 0.75 1.0
Sample5 0.75 0.75 0.75 0.75 0.0 0.25
Sample6 1.0 1.0 1.0 1.0 0.25 0.0
15
Tools: QIIME, mothur, etc.
We need this!
Sequencing
AssemblyReads Contigs
1000+ bp
Abundance
Mapping
3x
1x
1x
2x
Species A
Species B
Species C
Species D
BinningSpecies
Sample1
Challenges:
• Assembly? - difficult (eg. Only small portions of reads can be
assembled)
• Binning? – difficult (eg. may need reference genomes, computationally
expensive)
• Mapping? – difficult (eg. accuracy, computationally expensive)
• 4 petabase pairs of DNA in a gram of soil(Jack A. Gilbert,2013)
A typical pipeline of metagenome diversity analysis
100-150 bp
Samples by species table
16
Sequencing
AssemblyReads Contigs
1000+ bp
Abundance
Mapping
3x
1x
1x
2x
Species A
Species B
Species C
Species D
BinningSpecies
Sample1
Species-by-sample table
A typical pipeline of metagenome diversity analysis
100-150 bp
17
Reads -> k-mers
18
K-mer : DNA segment with length of k.
>24288227
TTTGCCTGTTGAAAACCTGTAAACGGGATGTTTCCGGGGTTTTCCGCCCCGGTAGGGTTGAGAC
GTGCCCCGCGGTCGCGAACAGCTCGCCGCTTACCGGCGTAAGAGCGATCCGCAACACACCCCG
GAGCCTCTACCCCCCCCTCACGC
TTTGCCTGTT
TTGCCTGTTG
TTTCCGGGGT
TTCCGGGGTT
TTCCGGGGTT
ATCCGCGGTT
TTCCGGGGTT
TTCCGGGGTT
TTCCGGCCGTT
GGTTTTTCCG
Reads – sample A K-mers – sample A
Reads – sample B K-mers – sample B
19
Counting (unique) k-mers to evaluate diversity
Reads – sample A K-mers – sample A
Reads – sample B K-mers – sample B
20
Counting (unique) words to evaluate diversity
• handle large number of k-mers. (simple hash table will not
work)
• retrieve count of k-mer randomly (in-memory).
– To check the occurrence of a k-mer in multiple samples to find shared
k-mers
21
Challenges in counting k-mers
Soil metagenome
sample
Number of reads Number of k-mers
k=29
Number of
unique k-mers
IOHH.2301.7.1859 273,021,397 32,762,567,744 22,333,571,795
Outline
• Background and motivation
• An efficient k-mer counting approach
• Novel method to investigate microbial
diversity
– Concept of IGS (informative genomic segment)
– Testing IGS method on simulated data sets
– Testing IGS method on real metagenome data
• Summary
22
An efficient k-mer counting approach
o The Count-min Sketch consists of one or more hash
tables of different size
o Each entry in the hash tables is a counter representing
the number of k-mers that hash to that location
o To retrieve the count for a given k-mer we select the
minimum count across all of the hash entries.
 Highly scalable: Constant memory consuming,
independent of k and dataset size
 Probabilistic properties well suited to next generation
sequencing datasets
 Memory usage is fixed and low, with certain
counting false positive rate as tradeoff because
of collision
 The one-sided counting error is low and
predictable.
Khmer - An efficient k-mer counting approach
23
• My contributions:
• algorithm design/analysis, exploring the mathematics behind, the choice of optimal parameters
• contributing codes, including unique k-mers counting, overlap k-mer counting, optimal
parameter choice, others related to my specific research project.
• benchmarking, testing
• exploration of applications like error trimming, filter low abundance reads, digital normalization,
etc. suggestion on features
• work on the khmer manuscript 24
An efficient k-mer counting approach
>24288227
TTTGCCTGTTGAAAACCTGAAAACGGGATGTTTCCG
GGGTTTTCCGCCCCGGTAGGGTTGAGACGTGCCCC
GCGGTCGCGAACAGCTCGCCGCTTACCGGCGTAAG
AGCGATCCGCAACACACCCCGGAGCCTCTACCCCCC
CCTCACGC
c
GAAAACCTGA
AAAACCTGAA
AAACCTGAAA 25
Sequencing errors cause serious problem
In high coverage reads data set, most
of the unique k-mers are erroneous
caused by sequencing errors.
Counting k-mer to measure
diversity directly?
>24288227
TTTGCCTGTTGAAAACCTGAAAACGGGATGTTTCCG
GGGTTTTCCGCCCCGGTAGGGTTGAGACGTGCCCC
GCGGTCGCGAACAGCTCGCCGCTTACCGGCGTAAG
AGCGATCCGCAACACACCCCGGAGCCTCTACCCCCC
CCTCACGC
c
GAAAACCTGA
AAAACCTGAA
AAACCTGAAA 26
Sequencing errors cause serious problem
In high coverage reads data set, most
of the unique k-mers are erroneous
caused by sequencing errors.
Counting k-mer to measure
diversity directly?
Outline
• Background and motivation
• An efficient k-mer counting approach
• Novel method to investigate microbial
diversity
– Concept of IGS (informative
– Testing IGS method on simulated data sets
– Testing IGS method on real metagenome data
• Summary
27
Concept of IGS (informative genomic segment)
Sequencing
AssemblyReads Contigs
1000+ bp
Abundance
Mapping
3x
1x
1x
2x
Species A
Species B
Species C
Species D
BinningSpecies
Sample1
Species-by-sample table
A typical pipeline of metagenome diversity analysis
100-150 bp
28
• IGS (informative genomic segment)
– can represent the novel information of a genome
– can be used as a new unit to do diversity analysis to replace species
– Assembly-free
– Reference-free
– Efficient and scalable
Replace
samples-by-species table
with
samples-by-IGSs table
IGSs
(Informative Genomic Segment)
Reads
Genome
29
1. We can use median k-mer abundance to estimate the
sequencing coverage of a read without a reference assembly.
2. We can retrieve the coverage of a read across different
samples.
3. If a read has a coverage as C, there will be approximately
other C-1 reads with the same coverage covering the same
DNA region.
4. We can estimate the size of covered genomic region by the
number of reads and the corresponding coverage.
Four tricks!
From reads to IGSs
30
Reads – sample A K-mers – sample A
The abundance of the k-mers in a read tends to be
similar, which is close to the coverage of the read.
Coverage of a read and how to get it without a
reference assembly
31
Coverage of read: the sequencing depth of the DNA region that read
originates from
>24288227
TTTGCCTGTTGAAAACCTGTAAACGGGATGTTTCCGGGGTTTTCCGCCCCGGTAGGGTTGAGAC
GTGCCCCGCGGTCGCGAACAGCTCGCCGCTTACCGGCGTAAGAGCGATCCGCAACACACCCCG
GAGCCTCTACCCCCCCCTCACGC
GAAAACCTGT
AAAACCTGTA
AAACCTGTAA
TTTGCCTGTT
TTGCCTGTTG
TTTCCGGGGT
TTCCGGGGTT
TTCCGGGGTT
ATCCGCGGTT
TTCCGGGGTT
TTCCGGGGTT
TTCCGGCCGTT
GGTTTTTCCG
4
5 4
5
4
5
4
4
3
4
5
4
5
4
Reads – sample A K-mers – sample A
32
K-mer abundance
>24288227
TTTGCCTGTTGAAAACCTGAAAACGGGATGTTTCCGGGGTTTTCCGCCCCGGTAGGGTTGAGA
CGTGCCCCGCGGTCGCGAACAGCTCGCCGCTTACCGGCGTAAGAGCGATCCGCAACACACCCC
GGAGCCTCTACCCCCCCCTCACGC
GAAAACCTGA
AAAACCTGAA
AAACCTGAAA
TTTGCCTGTT
TTGCCTGTTG
TTTCCGGGGT
TTCCGGGGTT
TTCCGGGGTT
ATCCGCGGTT
TTCCGGGGTT
TTCCGGGGTT
TTCCGGCCGTT
GGTTTTTCCG
4
5 4
5
4
5
4
4
3
4
5
1
1
1
Reads – sample A K-mers – sample A
33
xx
K-mer abundance
>24288227
TTTGCCTGTTGAAAACCTGAAAACGGGATGTTTCCGGGGTTTTCCGCCCCGGTAGGGTTGAGA
CGTGCCCCGCGGTCGCGAACAGCTCGCCGCTTACCGGCGTAAGAGCGATCCGCAACACACCCC
GGAGCCTCTACCCCCCCCTCACGC
GAAAACCTGA
AAAACCTGAA
AAACCTGAAA
TTTGCCTGTT
TTGCCTGTTG
TTTCCGGGGT
TTCCGGGGTT
TTCCGGGGTT
ATCCGCGGTT
TTCCGGGGTT
TTCCGGGGTT
TTCCGGCCGTT
GGTTTTTCCG
4
5 4
5
4
5
4
4
3
4
5
1
1
1
median k-mer abundance to represent
the sequencing coverage of the read
34
35
Trick 1: We can use median k-mer
abundance to estimate the sequencing
coverage of a read without a reference
assembly.
median k-mer abundance to represent
the sequencing coverage of the read
Reads – sample A K-mers – sample B
36
Trick 2: We can estimate the sequencing
coverage of a read across different
samples
Trick 1: We can use median k-mer
abundance to estimate the sequencing
coverage of a read without a reference
assembly.
Coverage = 4
Sample A
Sample B
Sample C
Read Coverage Profile:
Coverage = 6
Coverage = 2 37
3:2:2
3:2:2 3:2:2
3:2:2
3:2:2
3:2:2
3:2:2
2:0:3
2:0:3
2:0:3
2:0:3
2:0:3
2:0:3
2:4:3
2:4:3
2:4:3
2:4:3
2:4:3
2:4:3
2:4:3
3:2:2
Sample A
Sample B
Sample C
38
3:2:2
3:2:2 3:2:2
3:2:2
3:2:2
3:2:2
3:2:2
2:0:3
2:0:3
2:0:3
2:0:3
2:0:3
2:0:3
2:4:3
2:4:3
2:4:3
2:4:3
2:4:3
2:4:3
2:4:3
3:2:2
Sample A
Sample B
Sample C
“Contigs with similar coverage profiles are
likely to have originated from the same
microbial population” (Imelfort et al. 2014)
Reads with similar coverage profiles are
likely to have originated from the same
species.
39
3:2:2
3:2:2
3:2:2
3:2:2
3:2:2
3:2:2
3:2:2
2:0:3
2:0:3
2:0:3
2:0:3
2:0:3
2:0:3
2:4:3
2:4:3
2:4:3
2:4:3
2:4:3
2:4:3
2:4:3
Genomic Content Sample A
Genomic Content Sample B
Genomic Content Sample C
3:2:2
bin 1 bin 2 bin 3
Do the reads in each bin cover the same size of genomic region?
6 reads with coverage of 3 in the sample where they are from
2 reads with coverage of 2 in the sample where they are from
VS
41
What is the size of genomic region covered by the reads in each bin?
What we know:
C: coverage of the read in the sample in the bin
N: number of reads in the bin
Reads – sample A K-mers – sample A
42
Trick 3: If a read has a coverage as C,
there will be approximately other C-1
reads with the same coverage covering
the same DNA region.
43
Reads – sample A
Trick 4: We can estimate the size of
covered genomic region by the number
of reads(N) and the corresponding
coverage(C).
44
Reads – sample A
N ~= 7, C ~= 7
N ~= 30, C ~= 10
Trick 4: We can estimate the size of
covered genomic region by the number
of reads(N) and the corresponding
coverage(C).
Reads – sample A
45
N ~= 7, C ~= 7
N ~= 30, C ~= 10
Size of covered
DNA region
N/C = 30/10 = 3 N/C = 7/7= 1
Trick 4: We can estimate the size of
covered genomic region by the number
of reads(N) and the corresponding
coverage(C).
Reads – sample A
46
Trick 4: We can estimate the size of
covered genomic region by the number
of reads(N) and the corresponding
coverage(C).
N ~= 7, C ~= 7
N ~= 30, C ~= 10
Size of covered
DNA region
N/C = 30/10 = 3 N/C = 7/7= 1
Unit of size of covered DNA region:
IGS - IGS(informative genomic segment)
47
Do the reads in each bin cover the same size of genomic region?
6 reads with coverage of 3 in the sample where they are from
2 reads with coverage of 2 in the sample where they are from
VS
6/3 = 2 4/2 = 2 2/2 = 1
3:2:2
3:2:2
3:2:2
3:2:2
3:2:2
3:2:2
3:2:2 2:0:3
2:0:32:0:3
2:0:3
2:0:32:0:3
2:4:3
2:4:3
2:4:3
2:4:3
2:4:3
2:4:3
2:4:3
3:2:2
SampleA SampleB SampleC
IGS1 3 2 2
IGS2 3 2 2
IGS3 2 0 3
IGS4 2 0 3
IGS5 2 4 3
48
Sequencing
AssemblyReads Contigs
1000+ bp
Abundance
Mapping
3x
1x
1x
2x
Species A
Species B
Species C
Species D
BinningSpecies
Sample1
Challenges:
• Assembly? - difficult (eg. Only small portions of reads can be
assembled)
• Binning? – difficult (eg. may need reference genomes, computationally
expensive)
• Mapping? – difficult (eg. accuracy, computationally expensive)
• 4 petabase pairs of DNA in a gram of soil(Jack A. Gilbert,2013)
Species-by-sample table
A typical pipeline of metagenome diversity analysis
100-150 bp
49
IGS(informative genomic segment) can
represent the novel information of a genome
• The more IGSs there are in a sample, the higher diversity , higher richness the sample has.
• The more IGSs two samples share, the more similar the two samples are.
IGS (informative genomic segment):
genomic segments that do not share genomic content with the same length of read
50
(meta)Genome size = number of IGSs x IGS length (read length)
Outline
• Background and motivation
• An efficient k-mer counting approach
• Novel method to investigate microbial
diversity
– Concept of IGS (informative genomic segment)
– Testing IGS method on simulated data sets
– Testing IGS method on real metagenome data
• Summary
51
Testing IGS method on simulated data sets
Simulated data sets with different species composition
52
1. Estimated (meta)genome size matches real size perfectly.
2. Evenness metric can represent species distribution correctly
No sequencing error is introduced in this simulated data
53
Samples-by-IGSs tableSamples-by-species table
Ground truth
Using IGS
method
Mantel Correlation:
0.9714
Calculated dissimilarity matrix matches the matrix
calculated from species composition.
54
Ordination (PCoA) Clustering
AAAB
ABCD
AABC
ABCE
FGHI
AFGH
With an accurate dissimilarity matrix, it will be straightforward to perform PCoA
or clustering analysis to demonstrate the relationship between samples.
Influence of error rate and coverage to
alpha diversity analysis
Metagenome size is more accurate with increasing coverage
and lower error rate
55
56
Beta diversity is less prone to error rate and low coverage
Influence of error rate and coverage to
beta diversity analysis
Outline
• Background and motivation
• An efficient k-mer counting approach
• Novel method to investigate microbial
diversity
– Concept of IGS (informative genomic segment)
– Testing IGS method on simulated data sets
– Testing IGS method on real metagenome data
• Summary
57
Testing IGS method on real metagenome data
Sorcerer II Global Ocean Sampling Expedition
It has been calculated that microbes in ocean
account for about half of the biomass on Earth.
In a drop (one millilitre) of seawater, one can
find 10 million viruses, one million bacteria
Tropical - Galapogas
Tropical – Open Ocean
Sargasso
Temperate - North
Temperate - South
Estuary
Samples can be separated by
locations using IGS-based method –
as good as the result in original
research based on alignment-based
method or better.
TemperateTropicalandSargasso
Estuary
Temperate - North
Temperate -South
Tropical - Open Ocean
Tropical - Galapogas
Sargasso
Alpha diversity analysis shows
samples from tropical area have
higher richness.
Temperate
Tropical and Sargasso
ARMO (Amazon Rain Forest Microbial Observatory) project
Soil metagenome
sample
Number of reads Number of k-mers
k=29
Number of
unique k-mers
IOHH.2301.7.1859 273,021,397 32,762,567,744 22,333,571,795
21 samples, 1T size of data, 5 billion reads.
62
Blue: Prairie
Red: Forest
1. Conversion (forest -> pasture) affects
composition
2. Prairie samples more similar in space
3. Subsamples with 8M reads are used.
0.0080
ITXB_1
IOHH_1
IOAH_1
ITXB_2
IOAH_2
INPP_2
INPO_1
ITXC
INPP_1
INPN_2
IOHH_2
INPN_1
INPI_2
INPO_2
IOAG_1
IOAG_2
INPI_1
63
Samples can be clustered by different treatments and distances using IGS method.
100m
0.01m
10m
0.1m
0.1m
0.01m
1m
10m
Blue: Prairie
Red: Forest
64
Using subsamples can get decent
result of beta diversity.
Blue: Prairie
Red: Forest
Mantel correlation to
similarity matrix calculated
from subsample with 8M
reads
Alpha diversity analysis shows
samples from prairie area have
higher richness than forest area.
Blue: Prairie
Red: Forest
Outline
• Background and motivation
• An efficient k-mer counting approach
• Novel method to investigate microbial
diversity
– Concept of IGS (informative genomic segment)
– Testing IGS method on simulated data sets
– Testing IGS method on real metagenome data
• Summary
66
Summary
67
• A whole framework with potentials:
• Take all information into account, rare species
• Alpha diversity ( richness, evenness) (Chao1, ACE
estimators, etc.)
• Beta diversity ( abundance-based, incidence-based)
• Integrate with existing packages (khmer, mothur, QIIME,
scikit-bio,etc.)
• Scalable, efficient:
• Based on efficient k-mer counting
• Using subsamples may be enough for specific task
Summary
• Extend applications beyond diversity analysis
– Extracts interesting reads by coverage profile
– Reads binning/classification based on reads coverage
profile
– Genome size estimation/sequencing effort evaluation
• Refine pipeline, optimize performance
– Adjust IGS resolution
– Improve methods to handle large number of IGSs
more efficiently
– Iterative beta diversity analysis
68
What’s next
Publications
• Using the concept of informative genomic segment to investigate
microbial diversity of metagenomic sample (in preparation)
• [stream Crossing the streams: a framework for streaming analysis of
short DNA sequencing reads. Qingpeng Zhang, Sharon Awad, C. Titus Brown.
(submitted) https://peerj.com/preprints/890/
• [khmer] These are not the k-mers you are looking for: efficient online k-mer
counting using a probabilistic data structure. Qingpeng Zhang, Jason Pell,
Rosangela Canino-Koning, Adina Chuang Howe, C. Titus Brown. PLOS ONE, July 25,
2014, DOI: 10.1371/journal.pone.0101271
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0101271
• [diginorm] A Reference-Free Algorithm for Computational
Normalization of Shotgun Sequencing Data. C. Titus Brown, Adina Howe,
Qingpeng Zhang, Alexis B. Pyrkosz, Timothy H. Brom. Submitted.
http://arxiv.org/abs/1203.4802
• [Osedax] Genomic versatility and functional variation between two
dominant heterotrophic symbionts of deep-sea Osedax worms. Shana K Goffredi,
Hana Yi, Qingpeng Zhang, Jane E Klann,Isabelle A Struve, Robert C Vrijenhoek and
C Titus Brown. The ISME Journal doi:10.1038/ismej.2013.201
IGS method
seaworm symbiont
digital normalization
khmer
error analysis
69
Dr. Titus Brown
Lab members of GED
Dr. Sherine Awad
Michael Crusoe
Jiarong Guo
Luiz Irber
Camille Scott
Collaborators
Dr. James Tiedje
Dr. Joan Ross
Dr. Shana K Goffredi
Yiseul KIm
Acknowledgement
Committee members
Jaimes Cole
Richard Enbody
Yanni Sun
Eric Torng
Former members of GED
Dr. Adina Howe
Eric McDonald
Dr. Jason Pell
Dr. Likit Preeyanon
Dr. Elijah Lowe
khmer developers

More Related Content

What's hot

2015. Petr Smykal. Study domestication and to broaden genetic diversity of w...
2015. Petr Smykal.  Study domestication and to broaden genetic diversity of w...2015. Petr Smykal.  Study domestication and to broaden genetic diversity of w...
2015. Petr Smykal. Study domestication and to broaden genetic diversity of w...FOODCROPS
 
Centre for Genetic Resources The Netherlands
Centre for Genetic Resources  The NetherlandsCentre for Genetic Resources  The Netherlands
Centre for Genetic Resources The NetherlandsCIAT
 
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Mick Watson
 
Techniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodTechniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodrcolatru
 
Banana, Ensete and Boesenbergia Genomics - Schwarzacher, Heslop-Harrison, Har...
Banana, Ensete and Boesenbergia Genomics - Schwarzacher, Heslop-Harrison, Har...Banana, Ensete and Boesenbergia Genomics - Schwarzacher, Heslop-Harrison, Har...
Banana, Ensete and Boesenbergia Genomics - Schwarzacher, Heslop-Harrison, Har...Pat (JS) Heslop-Harrison
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotesc.titus.brown
 
Accelerating crop genetic gains with genomic selection
Accelerating crop genetic gains with genomic selectionAccelerating crop genetic gains with genomic selection
Accelerating crop genetic gains with genomic selectionViolinaBharali
 
Genotype imputation study in Gir dairy cattle of Gujarat
Genotype imputation study in Gir dairy cattle of GujaratGenotype imputation study in Gir dairy cattle of Gujarat
Genotype imputation study in Gir dairy cattle of GujaratSuperior Animal Genetics (SAG)
 
Genomic Selection in dairy cattle breeding -An overview
Genomic Selection in dairy cattle breeding -An overviewGenomic Selection in dairy cattle breeding -An overview
Genomic Selection in dairy cattle breeding -An overviewSuperior Animal Genetics (SAG)
 
Human genetics evolutionary genetics
Human genetics   evolutionary geneticsHuman genetics   evolutionary genetics
Human genetics evolutionary geneticsDan Gaston
 
Case studies of HTS / NGS applications
Case studies of HTS / NGS applicationsCase studies of HTS / NGS applications
Case studies of HTS / NGS applicationsrjorton
 
'Novel technologies to study the resistome'
'Novel technologies to study the resistome''Novel technologies to study the resistome'
'Novel technologies to study the resistome'Willem van Schaik
 
Mini core collection - an international public good
Mini core collection - an international public goodMini core collection - an international public good
Mini core collection - an international public goodICRISAT
 
The science of genomics and livestock genetic improvement
The science of genomics and livestock genetic improvementThe science of genomics and livestock genetic improvement
The science of genomics and livestock genetic improvementILRI
 
GRM 2011: Approaches, resources and tools for rice gene discovery and breeding
GRM 2011: Approaches, resources and tools for rice gene discovery and breedingGRM 2011: Approaches, resources and tools for rice gene discovery and breeding
GRM 2011: Approaches, resources and tools for rice gene discovery and breedingCGIAR Generation Challenge Programme
 
MAGIC population and its application in crop improvement
MAGIC population and its application in crop improvementMAGIC population and its application in crop improvement
MAGIC population and its application in crop improvementSanghaviBoddu
 
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...FOODCROPS
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencingcdgenomics525
 

What's hot (20)

2015. Petr Smykal. Study domestication and to broaden genetic diversity of w...
2015. Petr Smykal.  Study domestication and to broaden genetic diversity of w...2015. Petr Smykal.  Study domestication and to broaden genetic diversity of w...
2015. Petr Smykal. Study domestication and to broaden genetic diversity of w...
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Centre for Genetic Resources The Netherlands
Centre for Genetic Resources  The NetherlandsCentre for Genetic Resources  The Netherlands
Centre for Genetic Resources The Netherlands
 
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
 
Techniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodTechniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-good
 
Banana, Ensete and Boesenbergia Genomics - Schwarzacher, Heslop-Harrison, Har...
Banana, Ensete and Boesenbergia Genomics - Schwarzacher, Heslop-Harrison, Har...Banana, Ensete and Boesenbergia Genomics - Schwarzacher, Heslop-Harrison, Har...
Banana, Ensete and Boesenbergia Genomics - Schwarzacher, Heslop-Harrison, Har...
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
 
Accelerating crop genetic gains with genomic selection
Accelerating crop genetic gains with genomic selectionAccelerating crop genetic gains with genomic selection
Accelerating crop genetic gains with genomic selection
 
Genotype imputation study in Gir dairy cattle of Gujarat
Genotype imputation study in Gir dairy cattle of GujaratGenotype imputation study in Gir dairy cattle of Gujarat
Genotype imputation study in Gir dairy cattle of Gujarat
 
Testing for Food Authenticity
Testing for Food AuthenticityTesting for Food Authenticity
Testing for Food Authenticity
 
Genomic Selection in dairy cattle breeding -An overview
Genomic Selection in dairy cattle breeding -An overviewGenomic Selection in dairy cattle breeding -An overview
Genomic Selection in dairy cattle breeding -An overview
 
Human genetics evolutionary genetics
Human genetics   evolutionary geneticsHuman genetics   evolutionary genetics
Human genetics evolutionary genetics
 
Case studies of HTS / NGS applications
Case studies of HTS / NGS applicationsCase studies of HTS / NGS applications
Case studies of HTS / NGS applications
 
'Novel technologies to study the resistome'
'Novel technologies to study the resistome''Novel technologies to study the resistome'
'Novel technologies to study the resistome'
 
Mini core collection - an international public good
Mini core collection - an international public goodMini core collection - an international public good
Mini core collection - an international public good
 
The science of genomics and livestock genetic improvement
The science of genomics and livestock genetic improvementThe science of genomics and livestock genetic improvement
The science of genomics and livestock genetic improvement
 
GRM 2011: Approaches, resources and tools for rice gene discovery and breeding
GRM 2011: Approaches, resources and tools for rice gene discovery and breedingGRM 2011: Approaches, resources and tools for rice gene discovery and breeding
GRM 2011: Approaches, resources and tools for rice gene discovery and breeding
 
MAGIC population and its application in crop improvement
MAGIC population and its application in crop improvementMAGIC population and its application in crop improvement
MAGIC population and its application in crop improvement
 
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencing
 

Viewers also liked

Talk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingTalk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingJonathan Eisen
 
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...Renzo Kottmann
 
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...Renzo Kottmann
 
Sophie F. summer Poster Final
Sophie F. summer Poster FinalSophie F. summer Poster Final
Sophie F. summer Poster FinalSophie Friedheim
 
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from MetagenomesMads Albertsen
 
16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence AnalysisAbdulrahman Muhammad
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewPaolo Dametto
 
HOME MARKETING STRATEGY
HOME MARKETING STRATEGYHOME MARKETING STRATEGY
HOME MARKETING STRATEGYRDerstler
 
Follett destiny
Follett destinyFollett destiny
Follett destinyllgarza
 
Cum să găteşti friptura perfectă
Cum să găteşti friptura perfectăCum să găteşti friptura perfectă
Cum să găteşti friptura perfectăserios31
 
Cloudwords Perspectives - A book for the global marketing professional
Cloudwords Perspectives - A book for the global marketing professionalCloudwords Perspectives - A book for the global marketing professional
Cloudwords Perspectives - A book for the global marketing professionalMichael Meinhardt
 
Brand Newsrooms - Real-time Marketing Strategy, Operations and Execution
Brand Newsrooms - Real-time Marketing Strategy, Operations and ExecutionBrand Newsrooms - Real-time Marketing Strategy, Operations and Execution
Brand Newsrooms - Real-time Marketing Strategy, Operations and ExecutionInsight Summit Series
 
Genre survey results
Genre survey resultsGenre survey results
Genre survey resultsHollieRackley
 
Itg investor presentation_1mar16 web
Itg investor presentation_1mar16 webItg investor presentation_1mar16 web
Itg investor presentation_1mar16 webInvestment_Tech_Group
 
Freelance survey 2015 (voor Interim Managers in Finance)
Freelance survey 2015 (voor Interim Managers in Finance)Freelance survey 2015 (voor Interim Managers in Finance)
Freelance survey 2015 (voor Interim Managers in Finance)TriFinance Nederland
 
LavaCon Portland 2013 - Cloudwords
LavaCon Portland 2013 - CloudwordsLavaCon Portland 2013 - Cloudwords
LavaCon Portland 2013 - CloudwordsMichael Meinhardt
 

Viewers also liked (20)

Talk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingTalk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meeting
 
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
 
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...
 
Sophie F. summer Poster Final
Sophie F. summer Poster FinalSophie F. summer Poster Final
Sophie F. summer Poster Final
 
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
 
16S classifier
16S classifier16S classifier
16S classifier
 
16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 
Future of metagenomics
Future of metagenomicsFuture of metagenomics
Future of metagenomics
 
HOME MARKETING STRATEGY
HOME MARKETING STRATEGYHOME MARKETING STRATEGY
HOME MARKETING STRATEGY
 
Follett destiny
Follett destinyFollett destiny
Follett destiny
 
Cum să găteşti friptura perfectă
Cum să găteşti friptura perfectăCum să găteşti friptura perfectă
Cum să găteşti friptura perfectă
 
Cloudwords Perspectives - A book for the global marketing professional
Cloudwords Perspectives - A book for the global marketing professionalCloudwords Perspectives - A book for the global marketing professional
Cloudwords Perspectives - A book for the global marketing professional
 
Brand Newsrooms - Real-time Marketing Strategy, Operations and Execution
Brand Newsrooms - Real-time Marketing Strategy, Operations and ExecutionBrand Newsrooms - Real-time Marketing Strategy, Operations and Execution
Brand Newsrooms - Real-time Marketing Strategy, Operations and Execution
 
Genre survey results
Genre survey resultsGenre survey results
Genre survey results
 
Tuuli Toivonen, Opetushallitus 24.1.2013
Tuuli Toivonen, Opetushallitus 24.1.2013 Tuuli Toivonen, Opetushallitus 24.1.2013
Tuuli Toivonen, Opetushallitus 24.1.2013
 
Itg investor presentation_1mar16 web
Itg investor presentation_1mar16 webItg investor presentation_1mar16 web
Itg investor presentation_1mar16 web
 
Freelance survey 2015 (voor Interim Managers in Finance)
Freelance survey 2015 (voor Interim Managers in Finance)Freelance survey 2015 (voor Interim Managers in Finance)
Freelance survey 2015 (voor Interim Managers in Finance)
 
Alfavit
AlfavitAlfavit
Alfavit
 
LavaCon Portland 2013 - Cloudwords
LavaCon Portland 2013 - CloudwordsLavaCon Portland 2013 - Cloudwords
LavaCon Portland 2013 - Cloudwords
 

Similar to Novel Computational Approaches to Investigate Microbial Diversity

Comprehensive Exam Slides 11/13/2013
Comprehensive Exam Slides 11/13/2013Comprehensive Exam Slides 11/13/2013
Comprehensive Exam Slides 11/13/2013Qingpeng "Q.P." Zhang
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorialc.titus.brown
 
Target Inducing Local Lesions In Genome (Tilling)
Target Inducing Local Lesions In Genome (Tilling)Target Inducing Local Lesions In Genome (Tilling)
Target Inducing Local Lesions In Genome (Tilling)Ankit R. Chaudhary
 
genomics-180323095216.pptx
genomics-180323095216.pptxgenomics-180323095216.pptx
genomics-180323095216.pptxBharatlalmeena6
 
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...c.titus.brown
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...VHIR Vall d’Hebron Institut de Recerca
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101Ino de Bruijn
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assemblyc.titus.brown
 
Theory and practice of graphical population analysis
Theory and practice of graphical population analysisTheory and practice of graphical population analysis
Theory and practice of graphical population analysisGenome Reference Consortium
 
Microbiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteMicrobiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteQIAGEN
 
DNA barcoding and Insect Diversity Coservation
DNA barcoding and Insect Diversity CoservationDNA barcoding and Insect Diversity Coservation
DNA barcoding and Insect Diversity Coservationvishnugm
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing priyanka raviraj
 

Similar to Novel Computational Approaches to Investigate Microbial Diversity (20)

Comprehensive Exam Slides 11/13/2013
Comprehensive Exam Slides 11/13/2013Comprehensive Exam Slides 11/13/2013
Comprehensive Exam Slides 11/13/2013
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
Target Inducing Local Lesions In Genome (Tilling)
Target Inducing Local Lesions In Genome (Tilling)Target Inducing Local Lesions In Genome (Tilling)
Target Inducing Local Lesions In Genome (Tilling)
 
genomics-180323095216.pptx
genomics-180323095216.pptxgenomics-180323095216.pptx
genomics-180323095216.pptx
 
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 
2013 duke-talk
2013 duke-talk2013 duke-talk
2013 duke-talk
 
phd-defense41
phd-defense41phd-defense41
phd-defense41
 
2012 oslo-talk
2012 oslo-talk2012 oslo-talk
2012 oslo-talk
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Genomics and Plant Genomics
Genomics and Plant GenomicsGenomics and Plant Genomics
Genomics and Plant Genomics
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
 
Theory and practice of graphical population analysis
Theory and practice of graphical population analysisTheory and practice of graphical population analysis
Theory and practice of graphical population analysis
 
Bes meeting spain 2015_alfredo garcia fernandez
Bes meeting spain 2015_alfredo garcia fernandezBes meeting spain 2015_alfredo garcia fernandez
Bes meeting spain 2015_alfredo garcia fernandez
 
Microbiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteMicrobiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro Suite
 
DNA barcoding and Insect Diversity Coservation
DNA barcoding and Insect Diversity CoservationDNA barcoding and Insect Diversity Coservation
DNA barcoding and Insect Diversity Coservation
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing
 

More from Qingpeng "Q.P." Zhang (9)

VenmoPlus
VenmoPlusVenmoPlus
VenmoPlus
 
Qingpeng zhang 0713
Qingpeng zhang 0713Qingpeng zhang 0713
Qingpeng zhang 0713
 
Qingpeng zhang 0711
Qingpeng zhang 0711Qingpeng zhang 0711
Qingpeng zhang 0711
 
VenmoPlus0708
VenmoPlus0708VenmoPlus0708
VenmoPlus0708
 
VenmoPlus demo week6
VenmoPlus demo week6VenmoPlus demo week6
VenmoPlus demo week6
 
0629venmoplus
0629venmoplus0629venmoplus
0629venmoplus
 
Qingpeng zhang week5
Qingpeng zhang week5Qingpeng zhang week5
Qingpeng zhang week5
 
Introducing VenmoPlus.com 6/27 version
Introducing VenmoPlus.com 6/27 versionIntroducing VenmoPlus.com 6/27 version
Introducing VenmoPlus.com 6/27 version
 
committee_meeting_1031
committee_meeting_1031committee_meeting_1031
committee_meeting_1031
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

Novel Computational Approaches to Investigate Microbial Diversity

  • 1. Novel Computational Approaches to Investigate Microbial Diversity Qingpeng Zhang Department of Computer Science and Engineering Michigan State University Advisor: Dr. Titus Brown 1
  • 2. Outline • Background and motivation • An efficient k-mer counting approach • Novel method to investigate microbial diversity – Concept of IGS (informative genomic segment) – Testing IGS method on simulated data sets – Testing IGS method on real metagenome data • Summary 2 Background and motivation
  • 3. Microbial diversity: Sorcerer II Global Ocean Sampling Expedition  How many different species in a sea water sample? What is their abundance distribution? (alpha-diversity) [Hint: ~10,000 different types of microorganisms]  Are samples from tropical area and temperate area different?(beta- diversity) [Hint: Yes, but how different?]
  • 4. ”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.” - J. Handelsman et al., 1998 Most of the microorganisms can not be cultured and isolated and sequenced in way. In a gram of soil, there are approximately a billion microbial cells, containing an estimated 4 petabase pairs of DNA (Jack A. Gilbert,2013) + Improvement of next generation sequencing = data deluge + Decreasing cost of sequencing 4 Metagenomics and “big data”
  • 5. ”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.” - J. Handelsman et al., 1998 most of the microorganisms can not be cultured and isolated and sequenced in traditional way. In a gram of soil, there are approximately a billion microbial cells, containing an estimated 4 petabase pairs of DNA (Jack A. Gilbert,2013) + Improvement of next generation sequencing = data deluge + Decreasing cost of sequencing 5 Metagenomics and “big data”
  • 6. ”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.” - J. Handelsman et al., 1998 most of the microorganisms can not be cultured and isolated and sequenced in traditional way. In a gram of soil, there are approximately a billion microbial cells, containing an estimated 4 petabase pairs of DNA (Jack A. Gilbert,2013) + Improvement of next generation sequencing = data deluge + Decreasing cost of sequencing http://www.e-janco.com/ 6 Metagenomics and “big data”
  • 7. 7 Bacterial genome size: 139,000 bps to 13,000,000 bps Human genome size: 3,000,000,000 bps Metagenomics and “big data” Tools friendly to big data are required.
  • 8. Sorcerer II Global Ocean Sampling Expedition  How many different species in a sea water sample? What is their abundance distribution? (alpha-diversity) [Hint: ~10,000 different types of microorganisms, 10 million viruses, one million bacteria in a drop.]  Are samples from tropical area and temperate area different?(beta- diversity) [Hint: Yes, but how different?]
  • 11. Which library has more titles?
  • 12. Which library has more titles? With only shredded pieces of pages with (partial) sentences on it to read.
  • 13. How similar are the libraries by their collection? With only shredded pieces of pages with (partial) sentences on it to read.
  • 14. Sequencing AssemblyReads Contigs 1000+ bp Abundance Mapping 3x 1x 1x 2x Species A Species B Species C Species D BinningSpecies Sample1 Sample2 Sample3 Sample4 Sample5 Sample1 Sample2 Sample3 Sample4 Sample5 Species A 3 3 4 4 0 Species B 1 3 0 2 2 Species C 1 2 2 2 3 Species D 2 1 2 1 1 A typical pipeline of metagenome diversity analysis Loosely based on slide by Mads Albertsen 100-150 bp 14
  • 15. Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 SpeciesA 3 3 4 4 0 0 SpeciesB 1 3 0 2 2 2 SpeciesC 1 2 2 2 3 3 SpeciesD 2 1 2 1 1 1 SpeciesE 4 1 3 0 2 5 SpeciesF 5 1 2 2 2 5 SpeciesG 2 2 1 2 1 3 Alpha diversity: “How many different species in a seawater sample?” Beta diversity: “How different are the seawater samples?” Sample1 Sample1 Sample1 Sample1 Sample1 Sample1 Sample1 0.0 0.25 0.5 0.5 0.75 1.0 Sample2 0.25 0.0 0.25 0.25 0.75 1.0 Sample3 0.5 0.25 0.0 0.25 0.75 1.0 Sample4 0.5 0.25 0.25 0.0 0.75 1.0 Sample5 0.75 0.75 0.75 0.75 0.0 0.25 Sample6 1.0 1.0 1.0 1.0 0.25 0.0 15 Tools: QIIME, mothur, etc. We need this!
  • 16. Sequencing AssemblyReads Contigs 1000+ bp Abundance Mapping 3x 1x 1x 2x Species A Species B Species C Species D BinningSpecies Sample1 Challenges: • Assembly? - difficult (eg. Only small portions of reads can be assembled) • Binning? – difficult (eg. may need reference genomes, computationally expensive) • Mapping? – difficult (eg. accuracy, computationally expensive) • 4 petabase pairs of DNA in a gram of soil(Jack A. Gilbert,2013) A typical pipeline of metagenome diversity analysis 100-150 bp Samples by species table 16
  • 17. Sequencing AssemblyReads Contigs 1000+ bp Abundance Mapping 3x 1x 1x 2x Species A Species B Species C Species D BinningSpecies Sample1 Species-by-sample table A typical pipeline of metagenome diversity analysis 100-150 bp 17
  • 18. Reads -> k-mers 18 K-mer : DNA segment with length of k. >24288227 TTTGCCTGTTGAAAACCTGTAAACGGGATGTTTCCGGGGTTTTCCGCCCCGGTAGGGTTGAGAC GTGCCCCGCGGTCGCGAACAGCTCGCCGCTTACCGGCGTAAGAGCGATCCGCAACACACCCCG GAGCCTCTACCCCCCCCTCACGC TTTGCCTGTT TTGCCTGTTG TTTCCGGGGT TTCCGGGGTT TTCCGGGGTT ATCCGCGGTT TTCCGGGGTT TTCCGGGGTT TTCCGGCCGTT GGTTTTTCCG
  • 19. Reads – sample A K-mers – sample A Reads – sample B K-mers – sample B 19 Counting (unique) k-mers to evaluate diversity
  • 20. Reads – sample A K-mers – sample A Reads – sample B K-mers – sample B 20 Counting (unique) words to evaluate diversity
  • 21. • handle large number of k-mers. (simple hash table will not work) • retrieve count of k-mer randomly (in-memory). – To check the occurrence of a k-mer in multiple samples to find shared k-mers 21 Challenges in counting k-mers Soil metagenome sample Number of reads Number of k-mers k=29 Number of unique k-mers IOHH.2301.7.1859 273,021,397 32,762,567,744 22,333,571,795
  • 22. Outline • Background and motivation • An efficient k-mer counting approach • Novel method to investigate microbial diversity – Concept of IGS (informative genomic segment) – Testing IGS method on simulated data sets – Testing IGS method on real metagenome data • Summary 22 An efficient k-mer counting approach
  • 23. o The Count-min Sketch consists of one or more hash tables of different size o Each entry in the hash tables is a counter representing the number of k-mers that hash to that location o To retrieve the count for a given k-mer we select the minimum count across all of the hash entries.  Highly scalable: Constant memory consuming, independent of k and dataset size  Probabilistic properties well suited to next generation sequencing datasets  Memory usage is fixed and low, with certain counting false positive rate as tradeoff because of collision  The one-sided counting error is low and predictable. Khmer - An efficient k-mer counting approach 23
  • 24. • My contributions: • algorithm design/analysis, exploring the mathematics behind, the choice of optimal parameters • contributing codes, including unique k-mers counting, overlap k-mer counting, optimal parameter choice, others related to my specific research project. • benchmarking, testing • exploration of applications like error trimming, filter low abundance reads, digital normalization, etc. suggestion on features • work on the khmer manuscript 24 An efficient k-mer counting approach
  • 25. >24288227 TTTGCCTGTTGAAAACCTGAAAACGGGATGTTTCCG GGGTTTTCCGCCCCGGTAGGGTTGAGACGTGCCCC GCGGTCGCGAACAGCTCGCCGCTTACCGGCGTAAG AGCGATCCGCAACACACCCCGGAGCCTCTACCCCCC CCTCACGC c GAAAACCTGA AAAACCTGAA AAACCTGAAA 25 Sequencing errors cause serious problem In high coverage reads data set, most of the unique k-mers are erroneous caused by sequencing errors. Counting k-mer to measure diversity directly?
  • 26. >24288227 TTTGCCTGTTGAAAACCTGAAAACGGGATGTTTCCG GGGTTTTCCGCCCCGGTAGGGTTGAGACGTGCCCC GCGGTCGCGAACAGCTCGCCGCTTACCGGCGTAAG AGCGATCCGCAACACACCCCGGAGCCTCTACCCCCC CCTCACGC c GAAAACCTGA AAAACCTGAA AAACCTGAAA 26 Sequencing errors cause serious problem In high coverage reads data set, most of the unique k-mers are erroneous caused by sequencing errors. Counting k-mer to measure diversity directly?
  • 27. Outline • Background and motivation • An efficient k-mer counting approach • Novel method to investigate microbial diversity – Concept of IGS (informative – Testing IGS method on simulated data sets – Testing IGS method on real metagenome data • Summary 27 Concept of IGS (informative genomic segment)
  • 28. Sequencing AssemblyReads Contigs 1000+ bp Abundance Mapping 3x 1x 1x 2x Species A Species B Species C Species D BinningSpecies Sample1 Species-by-sample table A typical pipeline of metagenome diversity analysis 100-150 bp 28
  • 29. • IGS (informative genomic segment) – can represent the novel information of a genome – can be used as a new unit to do diversity analysis to replace species – Assembly-free – Reference-free – Efficient and scalable Replace samples-by-species table with samples-by-IGSs table IGSs (Informative Genomic Segment) Reads Genome 29
  • 30. 1. We can use median k-mer abundance to estimate the sequencing coverage of a read without a reference assembly. 2. We can retrieve the coverage of a read across different samples. 3. If a read has a coverage as C, there will be approximately other C-1 reads with the same coverage covering the same DNA region. 4. We can estimate the size of covered genomic region by the number of reads and the corresponding coverage. Four tricks! From reads to IGSs 30
  • 31. Reads – sample A K-mers – sample A The abundance of the k-mers in a read tends to be similar, which is close to the coverage of the read. Coverage of a read and how to get it without a reference assembly 31 Coverage of read: the sequencing depth of the DNA region that read originates from
  • 35. 35 Trick 1: We can use median k-mer abundance to estimate the sequencing coverage of a read without a reference assembly. median k-mer abundance to represent the sequencing coverage of the read
  • 36. Reads – sample A K-mers – sample B 36 Trick 2: We can estimate the sequencing coverage of a read across different samples Trick 1: We can use median k-mer abundance to estimate the sequencing coverage of a read without a reference assembly.
  • 37. Coverage = 4 Sample A Sample B Sample C Read Coverage Profile: Coverage = 6 Coverage = 2 37
  • 39. 3:2:2 3:2:2 3:2:2 3:2:2 3:2:2 3:2:2 3:2:2 2:0:3 2:0:3 2:0:3 2:0:3 2:0:3 2:0:3 2:4:3 2:4:3 2:4:3 2:4:3 2:4:3 2:4:3 2:4:3 3:2:2 Sample A Sample B Sample C “Contigs with similar coverage profiles are likely to have originated from the same microbial population” (Imelfort et al. 2014) Reads with similar coverage profiles are likely to have originated from the same species. 39
  • 40. 3:2:2 3:2:2 3:2:2 3:2:2 3:2:2 3:2:2 3:2:2 2:0:3 2:0:3 2:0:3 2:0:3 2:0:3 2:0:3 2:4:3 2:4:3 2:4:3 2:4:3 2:4:3 2:4:3 2:4:3 Genomic Content Sample A Genomic Content Sample B Genomic Content Sample C 3:2:2 bin 1 bin 2 bin 3 Do the reads in each bin cover the same size of genomic region? 6 reads with coverage of 3 in the sample where they are from 2 reads with coverage of 2 in the sample where they are from VS
  • 41. 41 What is the size of genomic region covered by the reads in each bin? What we know: C: coverage of the read in the sample in the bin N: number of reads in the bin
  • 42. Reads – sample A K-mers – sample A 42 Trick 3: If a read has a coverage as C, there will be approximately other C-1 reads with the same coverage covering the same DNA region.
  • 43. 43 Reads – sample A Trick 4: We can estimate the size of covered genomic region by the number of reads(N) and the corresponding coverage(C).
  • 44. 44 Reads – sample A N ~= 7, C ~= 7 N ~= 30, C ~= 10 Trick 4: We can estimate the size of covered genomic region by the number of reads(N) and the corresponding coverage(C).
  • 45. Reads – sample A 45 N ~= 7, C ~= 7 N ~= 30, C ~= 10 Size of covered DNA region N/C = 30/10 = 3 N/C = 7/7= 1 Trick 4: We can estimate the size of covered genomic region by the number of reads(N) and the corresponding coverage(C).
  • 46. Reads – sample A 46 Trick 4: We can estimate the size of covered genomic region by the number of reads(N) and the corresponding coverage(C). N ~= 7, C ~= 7 N ~= 30, C ~= 10 Size of covered DNA region N/C = 30/10 = 3 N/C = 7/7= 1 Unit of size of covered DNA region: IGS - IGS(informative genomic segment)
  • 47. 47 Do the reads in each bin cover the same size of genomic region? 6 reads with coverage of 3 in the sample where they are from 2 reads with coverage of 2 in the sample where they are from VS 6/3 = 2 4/2 = 2 2/2 = 1
  • 49. Sequencing AssemblyReads Contigs 1000+ bp Abundance Mapping 3x 1x 1x 2x Species A Species B Species C Species D BinningSpecies Sample1 Challenges: • Assembly? - difficult (eg. Only small portions of reads can be assembled) • Binning? – difficult (eg. may need reference genomes, computationally expensive) • Mapping? – difficult (eg. accuracy, computationally expensive) • 4 petabase pairs of DNA in a gram of soil(Jack A. Gilbert,2013) Species-by-sample table A typical pipeline of metagenome diversity analysis 100-150 bp 49
  • 50. IGS(informative genomic segment) can represent the novel information of a genome • The more IGSs there are in a sample, the higher diversity , higher richness the sample has. • The more IGSs two samples share, the more similar the two samples are. IGS (informative genomic segment): genomic segments that do not share genomic content with the same length of read 50 (meta)Genome size = number of IGSs x IGS length (read length)
  • 51. Outline • Background and motivation • An efficient k-mer counting approach • Novel method to investigate microbial diversity – Concept of IGS (informative genomic segment) – Testing IGS method on simulated data sets – Testing IGS method on real metagenome data • Summary 51 Testing IGS method on simulated data sets
  • 52. Simulated data sets with different species composition 52 1. Estimated (meta)genome size matches real size perfectly. 2. Evenness metric can represent species distribution correctly No sequencing error is introduced in this simulated data
  • 53. 53 Samples-by-IGSs tableSamples-by-species table Ground truth Using IGS method Mantel Correlation: 0.9714 Calculated dissimilarity matrix matches the matrix calculated from species composition.
  • 54. 54 Ordination (PCoA) Clustering AAAB ABCD AABC ABCE FGHI AFGH With an accurate dissimilarity matrix, it will be straightforward to perform PCoA or clustering analysis to demonstrate the relationship between samples.
  • 55. Influence of error rate and coverage to alpha diversity analysis Metagenome size is more accurate with increasing coverage and lower error rate 55
  • 56. 56 Beta diversity is less prone to error rate and low coverage Influence of error rate and coverage to beta diversity analysis
  • 57. Outline • Background and motivation • An efficient k-mer counting approach • Novel method to investigate microbial diversity – Concept of IGS (informative genomic segment) – Testing IGS method on simulated data sets – Testing IGS method on real metagenome data • Summary 57 Testing IGS method on real metagenome data
  • 58. Sorcerer II Global Ocean Sampling Expedition It has been calculated that microbes in ocean account for about half of the biomass on Earth. In a drop (one millilitre) of seawater, one can find 10 million viruses, one million bacteria
  • 59. Tropical - Galapogas Tropical – Open Ocean Sargasso Temperate - North Temperate - South Estuary Samples can be separated by locations using IGS-based method – as good as the result in original research based on alignment-based method or better. TemperateTropicalandSargasso
  • 60. Estuary Temperate - North Temperate -South Tropical - Open Ocean Tropical - Galapogas Sargasso Alpha diversity analysis shows samples from tropical area have higher richness. Temperate Tropical and Sargasso
  • 61. ARMO (Amazon Rain Forest Microbial Observatory) project Soil metagenome sample Number of reads Number of k-mers k=29 Number of unique k-mers IOHH.2301.7.1859 273,021,397 32,762,567,744 22,333,571,795 21 samples, 1T size of data, 5 billion reads.
  • 62. 62 Blue: Prairie Red: Forest 1. Conversion (forest -> pasture) affects composition 2. Prairie samples more similar in space 3. Subsamples with 8M reads are used.
  • 63. 0.0080 ITXB_1 IOHH_1 IOAH_1 ITXB_2 IOAH_2 INPP_2 INPO_1 ITXC INPP_1 INPN_2 IOHH_2 INPN_1 INPI_2 INPO_2 IOAG_1 IOAG_2 INPI_1 63 Samples can be clustered by different treatments and distances using IGS method. 100m 0.01m 10m 0.1m 0.1m 0.01m 1m 10m Blue: Prairie Red: Forest
  • 64. 64 Using subsamples can get decent result of beta diversity. Blue: Prairie Red: Forest Mantel correlation to similarity matrix calculated from subsample with 8M reads
  • 65. Alpha diversity analysis shows samples from prairie area have higher richness than forest area. Blue: Prairie Red: Forest
  • 66. Outline • Background and motivation • An efficient k-mer counting approach • Novel method to investigate microbial diversity – Concept of IGS (informative genomic segment) – Testing IGS method on simulated data sets – Testing IGS method on real metagenome data • Summary 66 Summary
  • 67. 67 • A whole framework with potentials: • Take all information into account, rare species • Alpha diversity ( richness, evenness) (Chao1, ACE estimators, etc.) • Beta diversity ( abundance-based, incidence-based) • Integrate with existing packages (khmer, mothur, QIIME, scikit-bio,etc.) • Scalable, efficient: • Based on efficient k-mer counting • Using subsamples may be enough for specific task Summary
  • 68. • Extend applications beyond diversity analysis – Extracts interesting reads by coverage profile – Reads binning/classification based on reads coverage profile – Genome size estimation/sequencing effort evaluation • Refine pipeline, optimize performance – Adjust IGS resolution – Improve methods to handle large number of IGSs more efficiently – Iterative beta diversity analysis 68 What’s next
  • 69. Publications • Using the concept of informative genomic segment to investigate microbial diversity of metagenomic sample (in preparation) • [stream Crossing the streams: a framework for streaming analysis of short DNA sequencing reads. Qingpeng Zhang, Sharon Awad, C. Titus Brown. (submitted) https://peerj.com/preprints/890/ • [khmer] These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. Qingpeng Zhang, Jason Pell, Rosangela Canino-Koning, Adina Chuang Howe, C. Titus Brown. PLOS ONE, July 25, 2014, DOI: 10.1371/journal.pone.0101271 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0101271 • [diginorm] A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data. C. Titus Brown, Adina Howe, Qingpeng Zhang, Alexis B. Pyrkosz, Timothy H. Brom. Submitted. http://arxiv.org/abs/1203.4802 • [Osedax] Genomic versatility and functional variation between two dominant heterotrophic symbionts of deep-sea Osedax worms. Shana K Goffredi, Hana Yi, Qingpeng Zhang, Jane E Klann,Isabelle A Struve, Robert C Vrijenhoek and C Titus Brown. The ISME Journal doi:10.1038/ismej.2013.201 IGS method seaworm symbiont digital normalization khmer error analysis 69
  • 70. Dr. Titus Brown Lab members of GED Dr. Sherine Awad Michael Crusoe Jiarong Guo Luiz Irber Camille Scott Collaborators Dr. James Tiedje Dr. Joan Ross Dr. Shana K Goffredi Yiseul KIm Acknowledgement Committee members Jaimes Cole Richard Enbody Yanni Sun Eric Torng Former members of GED Dr. Adina Howe Eric McDonald Dr. Jason Pell Dr. Likit Preeyanon Dr. Elijah Lowe khmer developers