SlideShare a Scribd company logo
GLBIO/CCBC Microbiome Analysis
Workshop: Metagenomics
Morgan G.I. Langille
Assistant Professor
Dalhousie University
May 16, 2016
Learning Objectives
• Contrast 16S and metagenomic sequencing
• Taxonomy from metagenomes
• Function from metagenomes
• Applicability of assembling and gene calling with metagenomic data
• Metagenomic inference and limitations
• Tutorial on processing metagenomic data to determine functional
and taxonomic profiles
16S vs Metagenomics
• 16S is targeted sequencing of a single gene which acts as a
marker for identification
• Pros
– Well established
– Sequencing costs are relatively cheap (~50,000 reads/sample)
– Only amplifies what you want (no host contamination)
• Cons
– Primer choice can bias results towards certain organisms
– Usually not enough resolution to identify to the strain level
– Different primers are needed for archaea & eukaryotes (18S)
– Doesn’t identify viruses
16S vs Metagenomics
• Metagenomics: sequencing all the DNA in a sample
• Pros
– No primer bias
– Can identify all microbes (euks, viruses, etc.)
– Provides functional information (“What are they doing?”)
• Cons
– More expensive (millions of sequences needed)
– Host/site contamination can be significant
– May not be able to sequence “rare” microbes
– Complex bioinformatics
TAXONOMIC PROFILES
Who is there?
Metagenomics: Who is there?
• Goal: Identify the relative abundance of different
microbes in a sample given using metagenomics
• Problems:
– Reads are all mixed together
– Reads can be short (~100bp)
– Lateral gene transfer
• Two broad approaches
1. Binning Based
2. Marker Based
Binning Based
• Attempts to group or “bin” reads into the
genome from which they originated
• Composition-based
– Uses sequence composition such as GC%, k-mers (e.g.
Naïve Bayes Classifier)
– Generally not very precise
• Sequence-based
– Compare reads to large reference database using
BLAST (or some other similarity search method)
– Reads are assigned based on “Best-hit” or “Lowest
Common Ancestor” approach
LCA: Lowest Common Ancestor
• Use all BLAST hits above a threshold and assign taxonomy at the
lowest level in the tree which covers these taxa.
• Notable Examples:
– MEGAN: http://ab.inf.uni-
tuebingen.de/software/megan5/
• One of the first metagenomic tools
• Does functional profiling too!
– MG-RAST: https://metagenomics.anl.gov/
• Web-based pipeline (might need to wait awhile for results)
– Kraken: https://ccb.jhu.edu/software/kraken/
• Fastest binning approach to date and very accurate.
• Large computing requirements (e.g. >128GB RAM)
Marker Based
• Single Gene
• Identify and extract reads hitting a single marker gene (e.g.
16S, cpn60, or other “universal” genes)
• Use existing bioinformatics pipeline (e.g. QIIME, etc.)
• Multiple Gene
• Several universal genes
– PhyloSift (Darling et al, 2014)
» Uses 37 universal single-copy genes
• Clade specific markers
– MetaPhlAn2 (Truong et al., 2015)
Marker or Binning?
• Binning approaches
– Similarity search is computationally intensive
– Varying genome sizes and LGT can bias results
• Marker approaches
– Doesn’t allow functions to be linked directly to
organisms
– Genome reconstruction/assembly is not possible
– Dependent on choice of markers
MetaPhlAn2
• Uses “clade-specific” gene markers
• A clade represents a set of genomes that can be
as broad as a phylum or as specific as a species
• Uses ~1 million markers derived from 17,000
genomes
– ~13,500 bacterial and archaeal, ~3,500 viral, and ~110
eukaryotic
• Can identify down to the species level (and
possibly even strain level)
• Can handle millions of reads on a standard
computer within a few minutes
MetaPhlAn Marker Selection
MetaPhlAn Marker Selection
Using MetaPhlan
• MetaPhlan uses Bowtie2 for sequence similarity searching
(nucleotide sequences vs. nucleotide database)
• Paired-end data can be used directly
• Each sample is processed individually and then multiple
sample can be combined together at the last step
• Output is relative abundances at different taxonomic levels
Absolute vs. Relative Abundance
• Absolute abundance: Numbers represent real
abundance of thing being measured (e.g. the
actual quantity of a particular gene or organism)
• Relative abundance: Numbers represent
proportion of thing being measured within
sample
• In almost all cases microbiome studies are
measuring relative abundance
– This is due to DNA amplification during sequencing
library preparation not being quantitative
Relative Abundance Use Case
• Sample A:
– Has 108 bacterial cells (but we don’t know this from sequencing)
– 25% of the microbiome from this sample is classified as Shigella
• Sample B:
– Has 106 bacterial cells (but we don’t know this from sequencing)
– 50% of the microbiome from this sample is classified as Shigella
• “Sample B contains twice as much Shigella as Sample A”
– WRONG! (If quantified it we would find Sample A has more Shigella)
• “Sample B contains a greater proportion of Shigella compared to
Sample A”
– Correct!
FUNCTIONAL COMPOSITION
What are they doing?
What do we mean by function?
• General categories
– Photosynthesis
– Nitrogen metabolism
– Glycolysis
• Specific gene families
– Nifh
– EC: 1.1.1.1 (alchohol dehydrogenase)
– K00929 (butyrate kinase)
Various Functional Databases
• COG
– Well known but original classification (not updated since 2003)
• SEED
– Used by the RAST and MG-RAST systems
• PFAM
– Focused more on protein domains
• EggNOG
– Very comprehensive (~190k groups)
• UniRef
– Has clustering at different levels (e.g. UniRef100, UniRef90, UniRef50)
– Most comprehensive and is constantly updated
• KEGG
– Very popular, each entry is well annotated, and often linked into “Modules” or “Pathways”
– Full access now requires a license fee
• MetaCyc
– Becoming more widely used.
– More microbe focused than KEGG
KEGG
• We will focus on using the
KEGG database during this
workshop
• KEGG Orthologs (KOs)
– Most specific. Thought to be
homologs and doing the same
exact “function”
– ~12,000 KOs in the database
– These can be linked into KEGG
Modules and KEGG Pathways,
– Identifiers: K01803, K00231, etc.
KEGG (cont.)
• KEGG Modules
– Manually defined functional units
– Small groups of KOs that function together
– ~750 KEGG Modules
– Identified: M00002, M00011, etc.
KEGG (cont.)
• KEGG Pathways
– Groups KOs into large pathways (~230)
– Each pathway has a graphical map
– Individual KOs or Modules can be
highlighted within these maps
– Pathways can be collapsed into very
general functional terms (e.g. Amino Acid
Metabolism, Carbohydrate Metabolism,
etc.)
Metagenomic Annotation Systems
• Web-based
– Provide functional and taxonomic analysis, plus hosts your data.
– EBI Metagenomics Server
– MG-RAST
– IMG/M
• GUI based
– MEGAN
• Taxonomy and functional annotation
– ClovR
• Virtual Machine based, contains SOP, hasn’t been updated recently
• Command-line based
– MetAMOS
• Built in assembly, highly customizable, some features can be buggy
– Humann
• Functional annotation
– DIY
• Set up your own in-house custom computational pipeline
Humann
(Abubucker et al. 2012)
Humann Step 1
• Reads are searched against a protein database (e.g. KEGG)
– Can use BLASTX, but much faster methods now available (e.g. BLAT,
USEARCH, RapSearch2, DIAMOND)
Buchfink et al., 2015
Humann
(Abubucker et al. 2012)
Humann Step 2
• Normalize and weight search results
• The relative abundance of each KO is
calculated:
– Number of reads mapping to a gene sequence in
that KO
– Weighted by the inverse p-value of each mapping
– Normalized by the average length of the KO
Humann
(Abubucker et al. 2012)
Humann Step 3
• Reduce number of pathways
• A KO can map to one or more KEGG Pathways
– Just because a KO is found in a pathway doesn’t mean
that complete pathway exists in the community
– If a pathway has 20 KOs and only 2 KOs are observed
in the community (but at high abundances) what
should be the abundance of the pathway?
– MinPath (Ye, 2009) attempts to estimate the
abundance of these pathways and remove spurious
noise
Humann
(Abubucker et al. 2012)
Humann Step 4
• Reduce false positive pathways further and
normalize by KO copy number
• Using the organism information from the KEGG
hits
– Pathways that are not found to be in any of the
observed organisms AND are made up mostly of KOs
mapping to a different pathway are removed
– KO abundance can be divided by the estimated copy
number of that KO as observed from the KEGG
organism database
Humann
Humann Step 5
• Smoothing pathways by gap filling
– Sequencing depth or poor sequence searches
could lead to some KOs within pathways being
absent or in low abundance
– KOs with 1.5 interquartile ranges below the
pathway median are raised to the pathway
median
Humann
(Abubucker et al. 2012)
What about assembly?
• Assembly is often
used in genomics to
join raw reads into
longer contigs and
scaffolds
TECHNOLOGY FEATURE
2. Find overlaps between reads
…AGCCTAGACCTACAGGATGCGCGACACGT
GGATGCGCGACACGTCGCATATCCGGT…
3. Assemble overlaps into contigs
1. Fragment DNA and sequence
4. Assemble contigs into scaffolds
ar
O
av
h
ea
h
g
p
ev
m
in
Ju
In
ge
ev
fo
ge
as
an
scGenome assembly stitches together a genome
MichaelSchatz,ColdSpringHarbor
rved.
Assembly for Metagenomics?
• Pros
– Less computation time for similarity search (sequences are collapsed)
– Can allow annotation when reads are too short (<100bp)
– Can sometimes (partially) reconstruct genomes
• Cons
– Assembly is computationally intensive (high memory machines
needed)
– Collapsed reads must be added back to get relative abundances (not
all assemblers do this natively)
– Low read depth and high diversity can cause assemblers to fail
– Reads are not all from the same genome so chimeras are possible
– Some organisms/genes will assemble easier (e.g. more abundant)
which could lead to annotation bias
What about gene calling?
• In genomics, normally you would predict the start and stop
positions of genes using a gene prediction program before
annotating the genes
• In metagenomics:
– Pros:
• May result in less false positives from annotating “non-real” genes
• Lowers the number of similarity searches
– Cons
• Computationally intensive
• No good learning dataset
• Raw reads will not cover an entire gene
• Often requires assembled data
– Possible tools: FragGeneScan, MetaGeneAnnotator
– Alternative: Do 6 frame-translation (e.g. BLASTX)
Community Function Potential
• Important that this is metagenomics, not
metatranscriptomics, and not metaproteomics
• These annotations suggest the functional
potential of the community
• The presence of these genes/functions does not
mean that they are biologically active (e.g. may
not be transcribed)
PICRUST
Predicting function from 16S profiles
Sample 1 Sample 2 Sample 3
OTU 1 4 0 2
OTU 2 1 0 0
OTU 3 2 4 2
16S rRNA gene
QIIME
Shotgun Metagenomics
HUMAnN
Sample 1 Sample 2 Sample 3
K00001 20 15 18
K00002 1 2 0
K00003 4 5 4
MetaPhlAn
PICRUSt
STAMPSTAMP
41
PICRUSt
• Phylogenetic Investigation of Communities by
Reconstruction of Unobserved States
• http://picrust.github.com
PICRUSt: How does it work?
Predicting the abundance of a
single function
Known gene abundance
Ancestral gene abundance
Predicted gene abundance
Predicting the abundance of a
single function
Known gene abundance
Ancestral gene
abundance
Predicted gene
abundance
Repeat for each function (~8000X)
Repeat for all unknown tips (>100,000)
PICRUSt: Predicting Metagenomes
S1 S2 S3
12345 10 0 5
67890 1 0 0
66666 4 8 2
16S Copy
Number
12345 5
67890 1
66666 2
S1 S2 S3
12345 2 0 1
67890 1 0 0
66666 2 4 1
Normalized OTU Table
PICRUST 16S Predictions
OTU Table
PICRUSt: Predicting Metagenomes
S1 S2 S3
12345 10 0 5
67890 1 0 0
66666 4 8 2
16S Copy
Number
12345 5
67890 1
66666 2
K0001 K0002 K0003
12345 4 0 2
67890 1 0 0
66666 2 4 2
S1 S2 S3
12345 2 0 1
67890 1 0 0
66666 2 4 1
S1 S2 S3
12345 2 0 1
67890 1 0 0
66666 2 4 1
S1 S2 S3
K0001 13 8 6
K0002 8 16 4
K0003 8 8 4
Normalized OTU Table
Metagenome Prediction
PICRUST 16S Predictions
PICRUST KEGG Predictions
OTU Table
PICRUSt predictions across body sites
47
Langille et al., 2013, Nature Biotechnology
48
49
50
VISUALIZATION AND STATISTICS
What is important?
Visualization and Statistics
• Various tools are available to determine
statistically significant taxonomic differences
across groups of samples
– Excel
– SigmaPlot
– Past
– R (many libraries)
– Python (matplotlib)
– STAMP
STAMP
STAMP Plots
STAMP
• Input
1. “Profile file”: Table of features (samples by OTUs,
samples by functions, etc.)
• Features can form a heirarchy (e.g. Phylum, Order, Class,
etc) to allow data to be collapsed within the program
2. “Group file”: Contains different metadata for
grouping samples
• Can be two groups: (e.g. Healthy vs Sick) or multiple groups
(e.g. Water depth at 2M, 4M, and 6M)
• Output
– PCA, heatmap, box, and bar plots
– Tables of significantly different features
METAGENOMICS WORKFLOW
Putting it all together
Microbiome Helper
• Standard Operating Procedures (SOPs)
– 16S
– Shotgun Metagenomics
• Scripts to wrap and integrate existing tools
– Available as an Ubuntu Virtualbox
• Tutorials/Walkthroughs
• https://github.com/mlangill/microbiome_helper/wiki
IMR: Integrated Microbiome Resource
• Offers sequencing and bioinformatics for
microbiome projects (http://cgeb-imr.ca)
QUESTIONS?
Tutorial

More Related Content

What's hot

Genome annotation
Genome annotationGenome annotation
Genome annotation
Rezwana Nishat
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
Surender Rawat
 
qRT PCR
qRT PCRqRT PCR
qRT PCR
MANDEEP KAUR
 
sequencing of genome
sequencing of genomesequencing of genome
sequencing of genomeNaveen Gupta
 
NGS File formats
NGS File formatsNGS File formats
NGS File formats
HARSHITHA EBBALI
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
VHIR Vall d’Hebron Institut de Recerca
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
Vijay Hemmadi
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
VHIR Vall d’Hebron Institut de Recerca
 
Real Time PCR
Real Time PCRReal Time PCR
Real Time PCR
ASHIKH SEETHY
 
Fasta
FastaFasta
Metagenomics
MetagenomicsMetagenomics
Metagenomics
HIMANSHU JAIN
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
RishikaMaji
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
Dhananjay Desai
 
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Torsten Seemann
 
DNA SEQUENCING METHOD
DNA SEQUENCING METHODDNA SEQUENCING METHOD
DNA SEQUENCING METHODMusa Khan
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
Despoina Kalfakakou
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncing
SAIFALI444
 

What's hot (20)

Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
qRT PCR
qRT PCRqRT PCR
qRT PCR
 
sequencing of genome
sequencing of genomesequencing of genome
sequencing of genome
 
NGS File formats
NGS File formatsNGS File formats
NGS File formats
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
 
Real Time PCR
Real Time PCRReal Time PCR
Real Time PCR
 
16s
16s16s
16s
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
Fasta
FastaFasta
Fasta
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
 
DNA SEQUENCING METHOD
DNA SEQUENCING METHODDNA SEQUENCING METHOD
DNA SEQUENCING METHOD
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncing
 

Viewers also liked

Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsBayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Jonathan Eisen
 
Metagenomics and it’s applications
Metagenomics and it’s applicationsMetagenomics and it’s applications
Metagenomics and it’s applications
Sham Sadiq
 
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Larry Smarr
 
Hervé Blottiere-El impacto de las ciencias ómicas en la medicina, la nutrició...
Hervé Blottiere-El impacto de las ciencias ómicas en la medicina, la nutrició...Hervé Blottiere-El impacto de las ciencias ómicas en la medicina, la nutrició...
Hervé Blottiere-El impacto de las ciencias ómicas en la medicina, la nutrició...
Fundación Ramón Areces
 
Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan EisenPhylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
Jonathan Eisen
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
Rutger Vos
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics Researchers
Larry Smarr
 
[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomicsMads Albertsen
 
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
QIAGEN
 
Reframing Phylogenomics
Reframing PhylogenomicsReframing Phylogenomics
Reframing Phylogenomics
Joe Parker
 
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Mick Watson
 
Microbial Metagenomics and Human Health
Microbial Metagenomics and Human HealthMicrobial Metagenomics and Human Health
Microbial Metagenomics and Human Health
Larry Smarr
 
Next Generation Sequencing of Fish Microbiome- AquaCyprus 2014
Next Generation Sequencing of Fish Microbiome- AquaCyprus 2014Next Generation Sequencing of Fish Microbiome- AquaCyprus 2014
Next Generation Sequencing of Fish Microbiome- AquaCyprus 2014
Mahdi Ghanbari
 
Metagenomics newer approach in understanding Microbes
Metagenomics newer approach in understanding Microbes  Metagenomics newer approach in understanding Microbes
Metagenomics newer approach in understanding Microbes
Society for Microbiology and Infection care
 
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
QIAGEN
 
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Vall d'Hebron Institute of Research (VHIR)
 
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
VHIR Vall d’Hebron Institut de Recerca
 
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAGEN
 
MicrobeDB Overview
MicrobeDB OverviewMicrobeDB Overview
MicrobeDB Overview
Morgan Langille
 

Viewers also liked (20)

Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsBayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
 
Metagenomics and it’s applications
Metagenomics and it’s applicationsMetagenomics and it’s applications
Metagenomics and it’s applications
 
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
 
Hervé Blottiere-El impacto de las ciencias ómicas en la medicina, la nutrició...
Hervé Blottiere-El impacto de las ciencias ómicas en la medicina, la nutrició...Hervé Blottiere-El impacto de las ciencias ómicas en la medicina, la nutrició...
Hervé Blottiere-El impacto de las ciencias ómicas en la medicina, la nutrició...
 
Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan EisenPhylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics Researchers
 
[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics
 
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
 
Reframing Phylogenomics
Reframing PhylogenomicsReframing Phylogenomics
Reframing Phylogenomics
 
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
 
Microbial Metagenomics and Human Health
Microbial Metagenomics and Human HealthMicrobial Metagenomics and Human Health
Microbial Metagenomics and Human Health
 
Next Generation Sequencing of Fish Microbiome- AquaCyprus 2014
Next Generation Sequencing of Fish Microbiome- AquaCyprus 2014Next Generation Sequencing of Fish Microbiome- AquaCyprus 2014
Next Generation Sequencing of Fish Microbiome- AquaCyprus 2014
 
Microbiome 2013
Microbiome 2013Microbiome 2013
Microbiome 2013
 
Metagenomics newer approach in understanding Microbes
Metagenomics newer approach in understanding Microbes  Metagenomics newer approach in understanding Microbes
Metagenomics newer approach in understanding Microbes
 
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
 
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
 
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
 
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
 
MicrobeDB Overview
MicrobeDB OverviewMicrobeDB Overview
MicrobeDB Overview
 

Similar to GLBIO/CCBC Metagenomics Workshop

Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to functionAbhik Seal
 
Lecture on the annotation of transposable elements
Lecture on the annotation of transposable elementsLecture on the annotation of transposable elements
Lecture on the annotation of transposable elements
fmaumus
 
ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?
Nick Loman
 
Intro to in silico drug discovery 2014
Intro to in silico drug discovery 2014Intro to in silico drug discovery 2014
Intro to in silico drug discovery 2014
Lee Larcombe
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Prof. Wim Van Criekinge
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
Ranjan Jyoti Sarma
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
Timothy Tickle
 
Experimental methods and the big data sets
Experimental methods and the big data sets Experimental methods and the big data sets
Experimental methods and the big data sets
improvemed
 
Ensembl annotation
Ensembl annotationEnsembl annotation
Ensembl annotation
Genome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
Genome Reference Consortium
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
University of California, Davis
 
CRISPR presentation extended Mouse Modeling
CRISPR presentation extended Mouse ModelingCRISPR presentation extended Mouse Modeling
CRISPR presentation extended Mouse ModelingTristan Kempston
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
RuthMWinnie
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
EdizonJambormias2
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
Monica Munoz-Torres
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
Bas van Breukelen
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genome
Brian Krueger
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Surya Saha
 
proteomic and Genomics and the available proteomic technologies and the data ...
proteomic and Genomics and the available proteomic technologies and the data ...proteomic and Genomics and the available proteomic technologies and the data ...
proteomic and Genomics and the available proteomic technologies and the data ...
SamiMohamed28
 
Sequencedatabases
SequencedatabasesSequencedatabases
SequencedatabasesAbhik Seal
 

Similar to GLBIO/CCBC Metagenomics Workshop (20)

Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
Lecture on the annotation of transposable elements
Lecture on the annotation of transposable elementsLecture on the annotation of transposable elements
Lecture on the annotation of transposable elements
 
ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?
 
Intro to in silico drug discovery 2014
Intro to in silico drug discovery 2014Intro to in silico drug discovery 2014
Intro to in silico drug discovery 2014
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
Experimental methods and the big data sets
Experimental methods and the big data sets Experimental methods and the big data sets
Experimental methods and the big data sets
 
Ensembl annotation
Ensembl annotationEnsembl annotation
Ensembl annotation
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
CRISPR presentation extended Mouse Modeling
CRISPR presentation extended Mouse ModelingCRISPR presentation extended Mouse Modeling
CRISPR presentation extended Mouse Modeling
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genome
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
 
proteomic and Genomics and the available proteomic technologies and the data ...
proteomic and Genomics and the available proteomic technologies and the data ...proteomic and Genomics and the available proteomic technologies and the data ...
proteomic and Genomics and the available proteomic technologies and the data ...
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 

More from Morgan Langille

Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...
Morgan Langille
 
Inferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionInferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionMorgan Langille
 
Characterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown FunctionCharacterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown Function
Morgan Langille
 
BioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific DataBioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific Data
Morgan Langille
 
Unknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.netUnknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.netMorgan Langille
 
HMMER 3 & Community Profiling
HMMER 3 & Community ProfilingHMMER 3 & Community Profiling
HMMER 3 & Community Profiling
Morgan Langille
 
Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Morgan Langille
 
Microbial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference ReviewMicrobial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference Review
Morgan Langille
 
A graduate student's experience in bioinformatics
A graduate student's experience in bioinformaticsA graduate student's experience in bioinformatics
A graduate student's experience in bioinformatics
Morgan Langille
 

More from Morgan Langille (9)

Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...
 
Inferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionInferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic composition
 
Characterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown FunctionCharacterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown Function
 
BioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific DataBioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific Data
 
Unknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.netUnknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.net
 
HMMER 3 & Community Profiling
HMMER 3 & Community ProfilingHMMER 3 & Community Profiling
HMMER 3 & Community Profiling
 
Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...
 
Microbial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference ReviewMicrobial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference Review
 
A graduate student's experience in bioinformatics
A graduate student's experience in bioinformaticsA graduate student's experience in bioinformatics
A graduate student's experience in bioinformatics
 

Recently uploaded

THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
yusufzako14
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
Cherry
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 

Recently uploaded (20)

THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 

GLBIO/CCBC Metagenomics Workshop

  • 1. GLBIO/CCBC Microbiome Analysis Workshop: Metagenomics Morgan G.I. Langille Assistant Professor Dalhousie University May 16, 2016
  • 2. Learning Objectives • Contrast 16S and metagenomic sequencing • Taxonomy from metagenomes • Function from metagenomes • Applicability of assembling and gene calling with metagenomic data • Metagenomic inference and limitations • Tutorial on processing metagenomic data to determine functional and taxonomic profiles
  • 3. 16S vs Metagenomics • 16S is targeted sequencing of a single gene which acts as a marker for identification • Pros – Well established – Sequencing costs are relatively cheap (~50,000 reads/sample) – Only amplifies what you want (no host contamination) • Cons – Primer choice can bias results towards certain organisms – Usually not enough resolution to identify to the strain level – Different primers are needed for archaea & eukaryotes (18S) – Doesn’t identify viruses
  • 4. 16S vs Metagenomics • Metagenomics: sequencing all the DNA in a sample • Pros – No primer bias – Can identify all microbes (euks, viruses, etc.) – Provides functional information (“What are they doing?”) • Cons – More expensive (millions of sequences needed) – Host/site contamination can be significant – May not be able to sequence “rare” microbes – Complex bioinformatics
  • 6. Metagenomics: Who is there? • Goal: Identify the relative abundance of different microbes in a sample given using metagenomics • Problems: – Reads are all mixed together – Reads can be short (~100bp) – Lateral gene transfer • Two broad approaches 1. Binning Based 2. Marker Based
  • 7. Binning Based • Attempts to group or “bin” reads into the genome from which they originated • Composition-based – Uses sequence composition such as GC%, k-mers (e.g. Naïve Bayes Classifier) – Generally not very precise • Sequence-based – Compare reads to large reference database using BLAST (or some other similarity search method) – Reads are assigned based on “Best-hit” or “Lowest Common Ancestor” approach
  • 8. LCA: Lowest Common Ancestor • Use all BLAST hits above a threshold and assign taxonomy at the lowest level in the tree which covers these taxa. • Notable Examples: – MEGAN: http://ab.inf.uni- tuebingen.de/software/megan5/ • One of the first metagenomic tools • Does functional profiling too! – MG-RAST: https://metagenomics.anl.gov/ • Web-based pipeline (might need to wait awhile for results) – Kraken: https://ccb.jhu.edu/software/kraken/ • Fastest binning approach to date and very accurate. • Large computing requirements (e.g. >128GB RAM)
  • 9. Marker Based • Single Gene • Identify and extract reads hitting a single marker gene (e.g. 16S, cpn60, or other “universal” genes) • Use existing bioinformatics pipeline (e.g. QIIME, etc.) • Multiple Gene • Several universal genes – PhyloSift (Darling et al, 2014) » Uses 37 universal single-copy genes • Clade specific markers – MetaPhlAn2 (Truong et al., 2015)
  • 10. Marker or Binning? • Binning approaches – Similarity search is computationally intensive – Varying genome sizes and LGT can bias results • Marker approaches – Doesn’t allow functions to be linked directly to organisms – Genome reconstruction/assembly is not possible – Dependent on choice of markers
  • 11. MetaPhlAn2 • Uses “clade-specific” gene markers • A clade represents a set of genomes that can be as broad as a phylum or as specific as a species • Uses ~1 million markers derived from 17,000 genomes – ~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic • Can identify down to the species level (and possibly even strain level) • Can handle millions of reads on a standard computer within a few minutes
  • 14. Using MetaPhlan • MetaPhlan uses Bowtie2 for sequence similarity searching (nucleotide sequences vs. nucleotide database) • Paired-end data can be used directly • Each sample is processed individually and then multiple sample can be combined together at the last step • Output is relative abundances at different taxonomic levels
  • 15. Absolute vs. Relative Abundance • Absolute abundance: Numbers represent real abundance of thing being measured (e.g. the actual quantity of a particular gene or organism) • Relative abundance: Numbers represent proportion of thing being measured within sample • In almost all cases microbiome studies are measuring relative abundance – This is due to DNA amplification during sequencing library preparation not being quantitative
  • 16. Relative Abundance Use Case • Sample A: – Has 108 bacterial cells (but we don’t know this from sequencing) – 25% of the microbiome from this sample is classified as Shigella • Sample B: – Has 106 bacterial cells (but we don’t know this from sequencing) – 50% of the microbiome from this sample is classified as Shigella • “Sample B contains twice as much Shigella as Sample A” – WRONG! (If quantified it we would find Sample A has more Shigella) • “Sample B contains a greater proportion of Shigella compared to Sample A” – Correct!
  • 18. What do we mean by function? • General categories – Photosynthesis – Nitrogen metabolism – Glycolysis • Specific gene families – Nifh – EC: 1.1.1.1 (alchohol dehydrogenase) – K00929 (butyrate kinase)
  • 19. Various Functional Databases • COG – Well known but original classification (not updated since 2003) • SEED – Used by the RAST and MG-RAST systems • PFAM – Focused more on protein domains • EggNOG – Very comprehensive (~190k groups) • UniRef – Has clustering at different levels (e.g. UniRef100, UniRef90, UniRef50) – Most comprehensive and is constantly updated • KEGG – Very popular, each entry is well annotated, and often linked into “Modules” or “Pathways” – Full access now requires a license fee • MetaCyc – Becoming more widely used. – More microbe focused than KEGG
  • 20. KEGG • We will focus on using the KEGG database during this workshop • KEGG Orthologs (KOs) – Most specific. Thought to be homologs and doing the same exact “function” – ~12,000 KOs in the database – These can be linked into KEGG Modules and KEGG Pathways, – Identifiers: K01803, K00231, etc.
  • 21. KEGG (cont.) • KEGG Modules – Manually defined functional units – Small groups of KOs that function together – ~750 KEGG Modules – Identified: M00002, M00011, etc.
  • 22. KEGG (cont.) • KEGG Pathways – Groups KOs into large pathways (~230) – Each pathway has a graphical map – Individual KOs or Modules can be highlighted within these maps – Pathways can be collapsed into very general functional terms (e.g. Amino Acid Metabolism, Carbohydrate Metabolism, etc.)
  • 23. Metagenomic Annotation Systems • Web-based – Provide functional and taxonomic analysis, plus hosts your data. – EBI Metagenomics Server – MG-RAST – IMG/M • GUI based – MEGAN • Taxonomy and functional annotation – ClovR • Virtual Machine based, contains SOP, hasn’t been updated recently • Command-line based – MetAMOS • Built in assembly, highly customizable, some features can be buggy – Humann • Functional annotation – DIY • Set up your own in-house custom computational pipeline
  • 25. Humann Step 1 • Reads are searched against a protein database (e.g. KEGG) – Can use BLASTX, but much faster methods now available (e.g. BLAT, USEARCH, RapSearch2, DIAMOND) Buchfink et al., 2015
  • 27. Humann Step 2 • Normalize and weight search results • The relative abundance of each KO is calculated: – Number of reads mapping to a gene sequence in that KO – Weighted by the inverse p-value of each mapping – Normalized by the average length of the KO
  • 29. Humann Step 3 • Reduce number of pathways • A KO can map to one or more KEGG Pathways – Just because a KO is found in a pathway doesn’t mean that complete pathway exists in the community – If a pathway has 20 KOs and only 2 KOs are observed in the community (but at high abundances) what should be the abundance of the pathway? – MinPath (Ye, 2009) attempts to estimate the abundance of these pathways and remove spurious noise
  • 31. Humann Step 4 • Reduce false positive pathways further and normalize by KO copy number • Using the organism information from the KEGG hits – Pathways that are not found to be in any of the observed organisms AND are made up mostly of KOs mapping to a different pathway are removed – KO abundance can be divided by the estimated copy number of that KO as observed from the KEGG organism database
  • 33. Humann Step 5 • Smoothing pathways by gap filling – Sequencing depth or poor sequence searches could lead to some KOs within pathways being absent or in low abundance – KOs with 1.5 interquartile ranges below the pathway median are raised to the pathway median
  • 35. What about assembly? • Assembly is often used in genomics to join raw reads into longer contigs and scaffolds TECHNOLOGY FEATURE 2. Find overlaps between reads …AGCCTAGACCTACAGGATGCGCGACACGT GGATGCGCGACACGTCGCATATCCGGT… 3. Assemble overlaps into contigs 1. Fragment DNA and sequence 4. Assemble contigs into scaffolds ar O av h ea h g p ev m in Ju In ge ev fo ge as an scGenome assembly stitches together a genome MichaelSchatz,ColdSpringHarbor rved.
  • 36. Assembly for Metagenomics? • Pros – Less computation time for similarity search (sequences are collapsed) – Can allow annotation when reads are too short (<100bp) – Can sometimes (partially) reconstruct genomes • Cons – Assembly is computationally intensive (high memory machines needed) – Collapsed reads must be added back to get relative abundances (not all assemblers do this natively) – Low read depth and high diversity can cause assemblers to fail – Reads are not all from the same genome so chimeras are possible – Some organisms/genes will assemble easier (e.g. more abundant) which could lead to annotation bias
  • 37. What about gene calling? • In genomics, normally you would predict the start and stop positions of genes using a gene prediction program before annotating the genes • In metagenomics: – Pros: • May result in less false positives from annotating “non-real” genes • Lowers the number of similarity searches – Cons • Computationally intensive • No good learning dataset • Raw reads will not cover an entire gene • Often requires assembled data – Possible tools: FragGeneScan, MetaGeneAnnotator – Alternative: Do 6 frame-translation (e.g. BLASTX)
  • 38. Community Function Potential • Important that this is metagenomics, not metatranscriptomics, and not metaproteomics • These annotations suggest the functional potential of the community • The presence of these genes/functions does not mean that they are biologically active (e.g. may not be transcribed)
  • 40. Sample 1 Sample 2 Sample 3 OTU 1 4 0 2 OTU 2 1 0 0 OTU 3 2 4 2 16S rRNA gene QIIME Shotgun Metagenomics HUMAnN Sample 1 Sample 2 Sample 3 K00001 20 15 18 K00002 1 2 0 K00003 4 5 4 MetaPhlAn PICRUSt STAMPSTAMP
  • 41. 41 PICRUSt • Phylogenetic Investigation of Communities by Reconstruction of Unobserved States • http://picrust.github.com
  • 42. PICRUSt: How does it work?
  • 43. Predicting the abundance of a single function Known gene abundance Ancestral gene abundance Predicted gene abundance
  • 44. Predicting the abundance of a single function Known gene abundance Ancestral gene abundance Predicted gene abundance Repeat for each function (~8000X) Repeat for all unknown tips (>100,000)
  • 45. PICRUSt: Predicting Metagenomes S1 S2 S3 12345 10 0 5 67890 1 0 0 66666 4 8 2 16S Copy Number 12345 5 67890 1 66666 2 S1 S2 S3 12345 2 0 1 67890 1 0 0 66666 2 4 1 Normalized OTU Table PICRUST 16S Predictions OTU Table
  • 46. PICRUSt: Predicting Metagenomes S1 S2 S3 12345 10 0 5 67890 1 0 0 66666 4 8 2 16S Copy Number 12345 5 67890 1 66666 2 K0001 K0002 K0003 12345 4 0 2 67890 1 0 0 66666 2 4 2 S1 S2 S3 12345 2 0 1 67890 1 0 0 66666 2 4 1 S1 S2 S3 12345 2 0 1 67890 1 0 0 66666 2 4 1 S1 S2 S3 K0001 13 8 6 K0002 8 16 4 K0003 8 8 4 Normalized OTU Table Metagenome Prediction PICRUST 16S Predictions PICRUST KEGG Predictions OTU Table
  • 47. PICRUSt predictions across body sites 47 Langille et al., 2013, Nature Biotechnology
  • 48. 48
  • 49. 49
  • 50. 50
  • 52. Visualization and Statistics • Various tools are available to determine statistically significant taxonomic differences across groups of samples – Excel – SigmaPlot – Past – R (many libraries) – Python (matplotlib) – STAMP
  • 53. STAMP
  • 54.
  • 56. STAMP • Input 1. “Profile file”: Table of features (samples by OTUs, samples by functions, etc.) • Features can form a heirarchy (e.g. Phylum, Order, Class, etc) to allow data to be collapsed within the program 2. “Group file”: Contains different metadata for grouping samples • Can be two groups: (e.g. Healthy vs Sick) or multiple groups (e.g. Water depth at 2M, 4M, and 6M) • Output – PCA, heatmap, box, and bar plots – Tables of significantly different features
  • 58. Microbiome Helper • Standard Operating Procedures (SOPs) – 16S – Shotgun Metagenomics • Scripts to wrap and integrate existing tools – Available as an Ubuntu Virtualbox • Tutorials/Walkthroughs • https://github.com/mlangill/microbiome_helper/wiki
  • 59. IMR: Integrated Microbiome Resource • Offers sequencing and bioinformatics for microbiome projects (http://cgeb-imr.ca)