Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Lecture 16:
EVE 161:

Microbial Phylogenomics
!
Lec...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Where we are going and where we have been
!Previous...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Era IV: Genomes in the environment
Era IV:
Diversit...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Era IV: Genomes in the environment
A Very Simple Sy...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Pierce’s Disease
!5
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
• Obligate xylem feeder
• Transmits Xylella between...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !7
Xylem feeding insects also very successful
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Xylem and Phloem
From
Lodish
et al.
2000
!8
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Animal nutrition
• Xylem is frequently missing esse...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Plant response to sap feeders
• Possible solutions ...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Moran N. A. PNAS 2007;104:8627-8633
©2007 by Nation...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
DNA
extraction
PCR
Sequence
rRNA genes
Sequence ali...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Baumania is close relative of Buchnera symbionts of...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Baumania is close relative of Buchnera symbionts of...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !15
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !16
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Sharpshooter Shotgun Sequencing
shotgun
Wu et al. 2...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Higher Evolutionary Rates in
Endosymbionts
Wu et al...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Variation in Evolution Rates

Correlated with Repai...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Polymorphisms in Metapopulation
• Data from ~200 ho...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Predict metabolic networks from Genome
Wu et al.
20...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Baumannia is a Vitamin and Cofactor Producing Machi...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Baumannia is a Vitamin and Cofactor Producing Machi...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !27
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
???????
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Commonly Used Binning Methods

Did not Work Well
• ...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
A
B
C
D
E
F
G
T
U
V
W
X
Y
Z
Binning challenge
!30
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
A
B
C
D
E
F
G
T
U
V
W
X
Y
Z
Binning challenge
Best ...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
A	

B	

C	

D	

E	

F	

G
T	

U	

V	

W	

X	

Y	

Z...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
DNA
extraction
PCR
Sequence
rRNA genes
Sequence ali...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !34
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Wu et al. 2006 PLoS Biology 4: e188.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
???????
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
CFB Phyla
Wu et al. 2006 PLoS Biology 4: e188.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Sulcia makes essential amino acids
!38
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Sulcia makes essential amino acids
!39
ESSENTIAL AM...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Wu et al. 2006 PLoS Biology 4: e188.
Baumannia make...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Era IV: Genomes in the environment
Phylogeny & Meta...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
• Why analyze protein coding “marker” genes in
meta...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
AMPHORA: AutoMated PHylogenOmic infeRence
Wu and Ei...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Choosing Marker Genes
• How choose the marker genes...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Choosing Marker Genes
• How choose the marker genes...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Identification and alignment of metagenomic reads?
...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Identification and alignment of metagenomic reads?
...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
HMMs and Profiles
• Useful guide here http://compbi...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
0.2
Kineococcus radiotolerans SRS30216
Nitrosomonas...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
0.1
Halorhodospira halophila SL1
Buchnera aphidicol...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
ORF00955
gi73748443
gi57234592
gi86609871
gi8660512...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Search with HMMs
!54
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Search with HMMs
!55
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
What Else Needed for This to Be Better?
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
What Else Needed for This to Be Better?
• More Phyl...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Zorro - Automated Masking
cetoTrueTree
0.0
1.0
2.0
...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Representative
Genomes
Extract
Protein
Annotation
A...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Kembel Combiner
typically used as a qualitative mea...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Phylosift - Mining the Global Metagenome
Jonathan
E...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Output 1: Taxonomy
Taxonomic
summary
plots in
Krona...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Tree Reconciliation in PhyloSift
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Tree Reconciliation in PhyloSift
Environmental
Sequ...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Output 2: Phylogenetic Tree of Reads
Placement tree...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Edge PCA: Identify lineages
that explain most varia...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
QIIME and Edge PCA on 110
fecal metagenomes from
Ya...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Output 3: Edge PCA
" Edge PCA for exploratory data ...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Phylosift/ pplacer Workflow
Aaron Darling, Guillaum...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Phylosift Markers
• PMPROK – Dongying Wu’s Bac/Arch...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
More Markers
Phylogenetic
group
Genome
Number
Gene
...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Better Reference Tree
Lang JM, Darling AE, Eisen JA...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Stalking the Fourth Domain in Metagenomic Data:
Sea...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Figure 1. Phylogenetic tree of the RecA superfamily...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Five RecA subfamilies were identified as being nove...
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Methods
Identification of deeply-branching ss-rRNA ...
Upcoming SlideShare
Loading in …5
×

UC Davis EVE161 Lecture 16 by @phylogenomics

2,053 views
1,905 views

Published on

Slides for Lecture 16 in EVE 161 Course by Jonathan Eisen at UC Davis

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,053
On SlideShare
0
From Embeds
0
Number of Embeds
144
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

UC Davis EVE161 Lecture 16 by @phylogenomics

  1. 1. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Lecture 16: EVE 161:
 Microbial Phylogenomics ! Lecture #16: Era IV: Shotgun Metagenomics ! UC Davis, Winter 2014 Instructor: Jonathan Eisen !1
  2. 2. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Where we are going and where we have been !Previous lecture: ! 15: Era IV: Shotgun Metagenomics ! Current Lecture: ! 16: Era IV: Metagenomic Diversity and Binning ! Next Lecture: ! 17: Era IV: Metagenomic Case Study !2
  3. 3. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Era IV: Genomes in the environment Era IV: Diversity from Metagenomics
  4. 4. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Era IV: Genomes in the environment A Very Simple System
  5. 5. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Pierce’s Disease !5
  6. 6. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 • Obligate xylem feeder • Transmits Xylella between plants • Much like mosquitoes transmit malarial pathogen • Only animal listed as possible “bioterror” agent by US DHS !6 Glassy winged sharpshooter GLASSY-WINGED SHARPSHOOTERASeriousThreattoCaliforniaAgriculture FROM THE UNIVERSITY OF CALIFORNIA’S PIERCE’S DISEASE RESEARCH AND EMERGENCY RESPONSE TASK FORCE This informational brochure was produced by ANR Communication Services for the University of Califor- nia Pierce’s Disease Research and Emergency Response Task Force. You may download a copy of the brochure from the Division of Agriculture and Natural Resources web site at http://danr.ucop.edu or from the Communication Services web site at http://danrcs.ucdavis.edu. For local information, contact your UC Cooperative Extension farm advisor: Adults Egg masses Glassy-winged Sharpshooter Generalized Lifecycle 100 80 60 40 20 0 Jan. Mar. May July Sept. Nov. Glassy-winged sharpshooters overwinter as adults and begin laying egg masses in late February through May. This first generation matures as adults in late May through late August. Second- generation egg masses are laid starting in mid- June through late September, which develop into over-wintering adults.
  7. 7. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !7 Xylem feeding insects also very successful
  8. 8. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Xylem and Phloem From Lodish et al. 2000 !8
  9. 9. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Animal nutrition • Xylem is frequently missing essential amino acids, vitamins and Co-Factors, and has only small amounts of carbon skeletons !9
  10. 10. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Plant response to sap feeders • Possible solutions to no aa, vitamins, etc in xylem ! Eat other things ! Evolve metabolic pathways to synthesize missing nutrients ! Find some poor sap to make the stuff for you !10
  11. 11. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Moran N. A. PNAS 2007;104:8627-8633 ©2007 by National Academy of Sciences !11 Sharpshooter: Cuerna sayi bacteriomes symbionts in their bacteriomes Moran et al. 2003 Environ. Microbiol. Moran et al. 2005 Appl. Environ. Microbio Candidatus “Baumannia cicadellinicola” (Gammaproteobacteria) Candidatus “Sulcia muelleri” (Bacteroidetes) D Takiya 10!m “Candidatus Baumannia cicadellinicola” (Gammaproteoba in “red” portion of bacteriome of Homalodisca vitripennis N=host nucleus B=Bacteriocyte membrane E=Endosym Irregu ~2 !m
  12. 12. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 DNA extraction PCR Sequence rRNA genes Sequence alignment = Data matrixPhylogenetic tree PCR rRNA1 Yeast Makes lots of copies of the rRNA genes in sample E. coli Humans A T T A G A A C A T C A C A A C A G G A G T T C rRNA1 E. coli Humans Yeast !12 rRNA1 5’ ...TACAGTATAGGTG GAGCTAGCGATCGAT CGA... 3’ PCR and phylogenetic analysis of rRNA genes bacteriomes r two obligate bacteriomes Moran et al. 2003 Environ. Microbiol. Moran et al. 2005 Appl. Environ. Microbiol. cola” (Gammaproteobacteria) roidetes) D Takiya mm domen of H. vitripennis
  13. 13. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Baumania is close relative of Buchnera symbionts of aphids Sharpshoote Aphids Aphids Aphids Ants Flies !13 Wu et al. 2006 PLoS Biology 4: e188.
  14. 14. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Baumania is close relative of Buchnera symbionts of aphids Sharpshoote Aphids Aphids Aphids Ants Flies !14 Wu et al. 2006 PLoS Biology 4: e188.
  15. 15. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !15
  16. 16. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !16
  17. 17. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Sharpshooter Shotgun Sequencing shotgun Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’s lab
  18. 18. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  19. 19. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  20. 20. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  21. 21. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Higher Evolutionary Rates in Endosymbionts Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
  22. 22. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Variation in Evolution Rates
 Correlated with Repair Gene Presence MutS MutL + + + + + + + + _ _ _ _ Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
  23. 23. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Polymorphisms in Metapopulation • Data from ~200 hosts –104 SNPs –2 indels • PCR surveys show that this is between host variation • Much lower ratio of transitions:transversions than in Blochmannia • Consistent with absence of MMR from Blochmannia
  24. 24. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Predict metabolic networks from Genome Wu et al. 2006 PLoS Biology 4: e188. !24
  25. 25. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Baumannia is a Vitamin and Cofactor Producing Machine Wu et al. 2006 PLoS Biology 4: e188. !25 VITAMIN AND COFACTOR PRODUCING MACHINE
  26. 26. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Baumannia is a Vitamin and Cofactor Producing Machine Wu et al. 2006 PLoS Biology 4: e188. !26 NO PATHWAYS FOR ESSENTIAL AMINO 
 ACID SYNTHESIS
  27. 27. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !27
  28. 28. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  29. 29. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 ???????
  30. 30. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Commonly Used Binning Methods
 Did not Work Well • Assembly –Only Baumannia generated good contigs • Depth of coverage –Everything else 0-1X coverage • Nucleotide composition –No detectible peaks in any vector we looked at
  31. 31. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 A B C D E F G T U V W X Y Z Binning challenge !30
  32. 32. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 A B C D E F G T U V W X Y Z Binning challenge Best binning method: reference genomes !31
  33. 33. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 A B C D E F G T U V W X Y Z Binning challenge No reference genome? What do you do? ! Phylogeny ....
  34. 34. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 DNA extraction PCR Sequence rRNA genes Sequence alignment = Data matrixPhylogenetic tree PCR rRNA1 rRNA2 Makes lots of copies of the rRNA genes in sample rRNA1 5’ ...ACACACATAGGTG GAGCTAGCGATCGAT CGA... 3’ E. coli Humans A T T A G A A C A T C A C A A C A G G A G T T C rRNA1 E. coli Humans rRNA2 !33 rRNA2 5’ ...TACAGTATAGGTG GAGCTAGCGATCGAT CGA... 3’ PCR and phylogenetic analysis of rRNA genes 3)/,0+ bacteriomes r two obligate bacteriomes Moran et al. 2003 Environ. Microbiol. Moran et al. 2005 Appl. Environ. Microbiol. cola” (Gammaproteobacteria) roidetes) D Takiya mm domen of H. vitripennis
  35. 35. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !34
  36. 36. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Wu et al. 2006 PLoS Biology 4: e188.
  37. 37. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  38. 38. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 ???????
  39. 39. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 CFB Phyla Wu et al. 2006 PLoS Biology 4: e188.
  40. 40. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Sulcia makes essential amino acids !38
  41. 41. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Sulcia makes essential amino acids !39 ESSENTIAL AMINO 
 ACID PRODUCING MACHINE
  42. 42. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Wu et al. 2006 PLoS Biology 4: e188. Baumannia makes vitamins and cofactors Sulcia makes amino acids
  43. 43. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Era IV: Genomes in the environment Phylogeny & Metagenomes
  44. 44. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 • Why analyze protein coding “marker” genes in metagenomic data? •
  45. 45. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 AMPHORA: AutoMated PHylogenOmic infeRence Wu and Eisen, 2008. Genome Biology 2008, 9:R151  ! doi:10.1186/gb-2008-9-10-r151 http://genomebiology.com/2008/9/10/R151
  46. 46. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Choosing Marker Genes • How choose the marker genes to use? !
  47. 47. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Choosing Marker Genes • How choose the marker genes to use? ! • “We selected these proteins because they are universally distributed in bacteria; the vast majority of them exist as single copy genes within each genome; and they are housekeeping genes that are involved in information processing (replication, transcription, and translation) or central metabolism, and thus are thought to be relatively recalcitrant to lateral gene transfer [19].” Wu and Eisen, 2008. Genome Biology 2008, 9:R151  ! doi:10.1186/gb-2008-9-10-r151 http://genomebiology.com/2008/9/10/R151
  48. 48. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Identification and alignment of metagenomic reads? • How to identify and rapidly align reads that correspond to each of the markers selected?
  49. 49. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Identification and alignment of metagenomic reads? • How to identify and rapidly align reads that correspond to each of the markers selected? ! • Prealign representatives from genomes • Manually curate to build a “mask” • Create “Hidden markov model” of the alignment • Use HMM to search for, and align from metagenomes ! • For more on HMMs see “What is a Hidden Markov Model” by Sean Eddy. http://www.nature.com/nbt/journal/v22/ n10/full/nbt1004-1315.html Wu and Eisen, 2008. Genome Biology 2008, 9:R151  ! doi:10.1186/gb-2008-9-10-r151 http://genomebiology.com/2008/9/10/R151
  50. 50. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 HMMs and Profiles • Useful guide here http://compbio.soe.ucsc.edu/ ismb99.handouts/KK185FP.html Profile HMM
  51. 51. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  52. 52. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 0.2 Kineococcus radiotolerans SRS30216 Nitrosomonas eutropha C91 Flavobacterium psychrophilum JIP02 86 Saccharopolyspora erythraea NRRL 2338 Magnetospirillum magneticum AMB 1 Bacillus subtilis subsp subtilis str 168 Acidobacteria bacterium Ellin345 Rhodopirellula baltica SH 1 Mycobacterium bovis BCG str Pasteur 1173P2 Escherichia coli CFT073 Bacillus thuringiensis serovar konkukian str 97 27 Synechococcus sp CC9902 Leptospira interrogans serovar Lai str 56601 Rickettsia conorii str Malish 7 Helicobacter pylori J99 Shigella flexneri 5 str 8401 Bordetella pertussis Tohama I Mycobacterium sp JLS Colwellia psychrerythraea 34H Lawsonia intracellularis PHE MN1 00 Prochlorococcus marinus str MIT 9303 Streptomyces coelicolor A3 2 Lactobacillus helveticus DPC 4571 Yersinia pestis Nepal516 Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis Haemophilus influenzae 86 028NP Campylobacter jejuni subsp jejuni 81116 Burkholderia pseudomallei K96243 Borrelia burgdorferi B31 Serratia proteamaculans 568 Tropheryma whipplei TW08 27 Clostridium perfringens SM101 Anaplasma phagocytophilum HZ Psychrobacter arcticus 273 4 Fusobacterium nucleatum subsp nucleatum ATCC 25586 Rickettsia bellii OSU 85 389 Dehalococcoides ethenogenes 195 Buchnera aphidicola str Cc Cinara cedri Candidatus Ruthia magnifica str Cm Calyptogena magnifica Chlamydophila caviae GPIC Roseobacter denitrificans OCh 114 Shewanella denitrificans OS217 Buchnera aphidicola str APS Acyrthosiphon pisum Geobacillus thermodenitrificans NG80 2 Parabacteroides distasonis ATCC 8503 Corynebacterium jeikeium K411 Yersinia pseudotuberculosis IP 32953 Francisella tularensis subsp tularensis SCHU S4 Streptococcus pyogenes MGAS315 Zymomonas mobilis subsp mobilis ZM4 Bacillus anthracis str Ames Shewanella frigidimarina NCIMB 400 Streptococcus gordonii str Challis substr CH1 Thiomicrospira crunogena XCL 2 Streptococcus pyogenes str Manfredo Pseudomonas putida KT2440 Mycobacterium leprae TN Staphylococcus saprophyticus subsp saprophyticus ATCC 15305 Shewanella sp MR 7 Rhodopseudomonas palustris HaA2 Geobacter sulfurreducens PCA Lactococcus lactis subsp cremoris SK11 Nitrosococcus oceani ATCC 19707 Rhodopseudomonas palustris BisB18 Synechococcus elongatus PCC 6301 Pelotomaculum thermopropionicum SI Synechocystis sp PCC 6803 Sorangium cellulosum So ce 56 Wolinella succinogenes DSM 1740 Moorella thermoacetica ATCC 39073 Gluconobacter oxydans 621H Mycoplasma capricolum subsp capricolum ATCC 27343 Actinobacillus succinogenes 130Z Baumannia cicadellinicola str Hc Homalodisca coagulata Pseudomonas syringae pv syringae B728a Hahella chejuensis KCTC 2396 Gramella forsetii KT0803 Brucella abortus biovar 1 str 9 941 Ochrobactrum anthropi ATCC 49188 Campylobacter fetus subsp fetus 82 40 Burkholderia mallei ATCC 23344 Anaplasma marginale str St Maries Frankia sp CcI3 Dehalococcoides sp BAV1 Mycoplasma genitalium G37 Mycoplasma hyopneumoniae 7448 Synechococcus sp JA 3 3Ab Parvibaculum lavamentivorans DS 1 Lactobacillus johnsonii NCC 533 Onion yellows phytoplasma OY M Mycoplasma mycoides subsp mycoides SC str PG1 Desulfococcus oleovorans Hxd3 Escherichia coli 536 Methylobacterium extorquens PA1 Nitrosomonas europaea ATCC 19718 Campylobacter curvus 525 92 Nitrosospira multiformis ATCC 25196 Pseudomonas aeruginosa PAO1 Burkholderia multivorans ATCC 17616 Chlamydophila abortus S26 3 Shewanella loihica PV 4 Candidatus Blochmannia pennsylvanicus str BPEN Xylella fastidiosa 9a5c Chlamydophila felis Fe C 56 Caulobacter crescentus CB15 Orientia tsutsugamushi Boryong Leptospira borgpetersenii serovar Hardjo bovis JB197 Rhodopseudomonas palustris BisA53 Marinomonas sp MWYL1 Chlamydophila pneumoniae TW 183 Brucella suis 1330 Renibacterium salmoninarum ATCC 33209 Citrobacter koseri ATCC BAA 895 Mycobacterium bovis AF2122 97 Lactococcus lactis subsp lactis Il1403 Campylobacter hominis ATCC BAA 381 Polynucleobacter sp QLW P1DMWA 1 Coxiella burnetii RSA 331 Bifidobacterium longum NCC2705 Rickettsia rickettsii str Iowa Bacillus cereus ATCC 10987 Synechococcus sp CC9605 Burkholderia xenovorans LB400 Prosthecochloris vibrioformis DSM 265 Streptococcus pyogenes MGAS10270 Listeria monocytogenes EGD e Burkholderia cenocepacia AU 1054 Trichodesmium erythraeum IMS101 Streptococcus thermophilus CNRZ1066 Haemophilus somnus 129PT Bacteroides fragilis YCH46 Streptococcus pyogenes SSI 1 Shewanella sp MR 4 Chlamydophila pneumoniae CWL029 Myxococcus xanthus DK 1622 Pelobacter carbinolicus DSM 2380 Actinobacillus pleuropneumoniae serovar 3 str JL03 Rickettsia prowazekii str Madrid E Listeria welshimeri serovar 6b str SLCC5334 Salmonella enterica subsp enterica serovar Paratyphi B str SPB7 Bartonella henselae str Houston 1 Shewanella oneidensis MR 1 Rubrobacter xylanophilus DSM 9941 Marinobacter aquaeolei VT8 Neisseria meningitidis FAM18 Acidovorax avenae subsp citrulli AAC00 1 Staphylococcus aureus subsp aureus N315 Prochlorococcus marinus str MIT 9301 Oenococcus oeni PSU 1 Clostridium botulinum A str ATCC 19397 Streptococcus sanguinis SK36 Burkholderia pseudomallei 1710b Bradyrhizobium sp BTAi1 Helicobacter pylori HPAG1 Staphylococcus aureus subsp aureus MW2 Pseudomonas aeruginosa PA7 Arthrobacter aurescens TC1 Candidatus Sulcia muelleri GWSS Propionibacterium acnes KPA171202 Clostridium botulinum A str Hall Psychrobacter cryohalolentis K5 Ralstonia solanacearum GMI1000 Polaromonas naphthalenivorans CJ2 Mycoplasma synoviae 53 Neisseria meningitidis MC58 Campylobacter jejuni subsp jejuni 81 176 Lactobacillus reuteri F275 Prochlorococcus marinus subsp marinus str CCMP1375 Rickettsia canadensis str McKiel Lactococcus lactis subsp cremoris MG1363 Prochlorococcus marinus str MIT 9515 Campylobacter concisus 13826 Bifidobacterium adolescentis ATCC 15703 Lactobacillus gasseri ATCC 33323 Pseudomonas putida F1 Borrelia garinii PBi Ralstonia metallidurans CH34 Alcanivorax borkumensis SK2 Streptococcus thermophilus LMG 18311 Bacillus thuringiensis str Al Hakam Herpetosiphon aurantiacus ATCC 23779 Azoarcus sp EbN1 Corynebacterium glutamicum R Bacteroides thetaiotaomicron VPI 5482 Bordetella bronchiseptica RB50 Synechococcus sp JA 2 3B a 2 13 Xanthomonas oryzae pv oryzae MAFF 311018 Treponema denticola ATCC 35405 Carboxydothermus hydrogenoformans Z 2901 Campylobacter jejuni RM1221 Mycoplasma pulmonis UAB CTIP Candidatus Protochlamydia amoebophila UWE25 Lactobacillus sakei subsp sakei 23K Aeromonas hydrophila subsp hydrophila ATCC 7966 Shewanella sp W3 18 1 Thermus thermophilus HB8 Pseudomonas syringae pv phaseolicola 1448A Rickettsia massiliae MTU5 Helicobacter hepaticus ATCC 51449 Mesoplasma florum L1 Mycoplasma hyopneumoniae 232 Bacillus cereus E33L Staphylococcus aureus subsp aureus MSSA476 Azoarcus sp BH72 Silicibacter pomeroyi DSS 3 Campylobacter jejuni subsp doylei 269 97 Borrelia afzelii PKo Ehrlichia canis str Jake Bordetella petrii Desulfotomaculum reducens MI 1 Gloeobacter violaceus PCC 7421 Escherichia coli O157 H7 EDL933 Staphylococcus aureus subsp aureus JH9 Nitrobacter hamburgensis X14 Rickettsia typhi str Wilmington Novosphingobium aromaticivorans DSM 12444 Staphylococcus aureus subsp aureus USA300 TCH1516 Clostridium beijerinckii NCIMB 8052 Bacillus clausii KSM K16 Xanthomonas oryzae pv oryzae KACC10331 Clostridium botulinum F str Langeland Lactobacillus salivarius subsp salivarius UCC118 Pediococcus pentosaceus ATCC 25745 Burkholderia mallei NCTC 10229 Francisella tularensis subsp tularensis FSC198 Symbiobacterium thermophilum IAM 14863 Pasteurella multocida subsp multocida str Pm70 Bacillus weihenstephanensis KBAB4 Sphingopyxis alaskensis RB2256 Chlamydia trachomatis A HAR 13 Solibacter usitatus Ellin6076 Delftia acidovorans SPH 1 Sinorhizobium medicae WSM419 Psychrobacter sp PRwf 1 Candidatus Pelagibacter ubique HTCC1062 Shewanella baltica OS195 Synechococcus sp CC9311 Aster yellows witches broom phytoplasma AYWB Salinispora arenicola CNS 205 Dinoroseobacter shibae DFL 12 Pelodictyon luteolum DSM 273 Arthrobacter sp FB24 Prochlorococcus marinus str MIT 9211 Nocardioides sp JS614 Corynebacterium glutamicum ATCC 13032 Synechococcus sp WH 8102 Maricaulis maris MCS10 Methylococcus capsulatus str Bath Clostridium botulinum A str ATCC 3502 Bacillus pumilus SAFR 032 Thiobacillus denitrificans ATCC 25259 Lactobacillus plantarum WCFS1 Thermoanaerobacter pseudethanolicus ATCC 33223 Bacillus halodurans C 125 Geobacter uraniireducens Rf4 Clostridium acetobutylicum ATCC 824 Thermotoga maritima MSB8 Francisella tularensis subsp tularensis WY96 3418 Anaeromyxobacter dehalogenans 2CP C Thermobifida fusca YX Erwinia carotovora subsp atroseptica SCRI1043 Yersinia pestis KIM Leuconostoc mesenteroides subsp mesenteroides ATCC 8293 Staphylococcus aureus subsp aureus JH1 Photobacterium profundum SS9 Fervidobacterium nodosum Rt17 B1 Rhodobacter sphaeroides 2 4 1 Roseiflexus sp RS 1 Ehrlichia chaffeensis str Arkansas Streptococcus agalactiae NEM316 Pseudomonas aeruginosa UCBPP PA14 Escherichia coli K12 Sulfurovum sp NBC37 1 Clostridium phytofermentans ISDg Chlamydia trachomatis 434 Bu Streptococcus pyogenes MGAS2096 Salmonella enterica subsp arizonae serovar 62 z4 z23 Hyphomonas neptunium ATCC 15444 Silicibacter sp TM1040 Streptococcus suis 05ZYH33 Francisella tularensis subsp holarctica OSU18 Mycobacterium ulcerans Agy99 Prochlorococcus marinus str NATL1A Escherichia coli O157 H7 str Sakai Mannheimia succiniciproducens MBEL55E Chlorobium chlorochromatii CaD3 Yersinia pestis Antiqua Shewanella sp ANA 3 Mesorhizobium loti MAFF303099 Shewanella putrefaciens CN 32 Wolbachia endosymbiont strain TRS of Brugia malayi Mycoplasma hyopneumoniae J Acidovorax sp JS42 Syntrophomonas wolfei subsp wolfei str Goettingen Alkaliphilus oremlandii OhILAs Deinococcus geothermalis DSM 11300 Pseudoalteromonas haloplanktis TAC125 Ralstonia eutropha H16 Burkholderia ambifaria AMMD Streptococcus pyogenes MGAS8232 Wolbachia endosymbiont of Drosophila melanogaster Acinetobacter baumannii ATCC 17978 Neisseria gonorrhoeae FA 1090 Herminiimonas arsenicoxydans Clostridium perfringens str 13 Lactobacillus brevis ATCC 367 Prochlorococcus marinus str NATL2A Shigella sonnei Ss046 Brucella melitensis 16M Chlamydophila pneumoniae AR39 Xanthobacter autotrophicus Py2 Mycobacterium tuberculosis H37Rv Enterobacter sakazakii ATCC BAA 894 Enterobacter sp 638 Legionella pneumophila str Lens Rickettsia felis URRWXCal2 Staphylococcus epidermidis RP62A Jannaschia sp CCS1 Streptomyces avermitilis MA 4680 Bradyrhizobium japonicum USDA 110 Tropheryma whipplei str Twist Magnetococcus sp MC 1 Verminephrobacter eiseniae EF01 2 Polaromonas sp JS666 Mycobacterium smegmatis str MC2 155 Treponema pallidum subsp pallidum str Nichols Salinibacter ruber DSM 13855 Mycoplasma pneumoniae M129 Streptococcus pyogenes M1 GAS Francisella tularensis subsp holarctica FTA Desulfotalea psychrophila LSv54 Escherichia coli APEC O1 Ehrlichia ruminantium str Gardel Leptospira interrogans serovar Copenhageni str Fiocruz L1 130 Lactobacillus delbrueckii subsp bulgaricus ATCC BAA 365 Desulfovibrio desulfuricans G20 Streptococcus pneumoniae D39 Brucella canis ATCC 23365 Chromohalobacter salexigens DSM 3043 Streptococcus agalactiae A909 Prochlorococcus marinus str MIT 9312 Haemophilus ducreyi 35000HP Leptospira borgpetersenii serovar Hardjo bovis L550 Pseudomonas entomophila L48 Thermus thermophilus HB27 Thermotoga lettingae TMO Salmonella enterica subsp enterica serovar Typhi str Ty2 Salmonella enterica subsp enterica serovar Choleraesuis str SC B67 Haemophilus influenzae PittGG Bartonella bacilliformis KC583 Clostridium kluyveri DSM 555 Mycobacterium tuberculosis H37Ra Burkholderia vietnamiensis G4 Syntrophus aciditrophicus SB Streptococcus pyogenes MGAS6180 Helicobacter pylori 26695 Rhodobacter sphaeroides ATCC 17025 Mycobacterium vanbaalenii PYR 1 Candidatus Vesicomyosocius okutanii HA Pseudoalteromonas atlantica T6c Lactobacillus delbrueckii subsp bulgaricus ATCC 11842 Staphylococcus aureus subsp aureus str Newman Mycoplasma agalactiae PG2 Bacillus anthracis str Ames Ancestor Thermoanaerobacter sp X514 Staphylococcus aureus subsp aureus MRSA252 Bacillus licheniformis ATCC 14580 Salinispora tropica CNB 440 Legionella pneumophila str Paris Arcobacter butzleri RM4018 Burkholderia mallei SAVP1 Pseudomonas stutzeri A1501 Xylella fastidiosa Temecula1 Rhodobacter sphaeroides ATCC 17029 Bdellovibrio bacteriovorus HD100 Haemophilus influenzae Rd KW20 Pseudomonas putida GB 1 Francisella tularensis subsp holarctica Mycobacterium avium subsp paratuberculosis K 10 Rhodoferax ferrireducens T118 Clostridium difficile 630 Sodalis glossinidius str morsitans Neisseria meningitidis Z2491 Clostridium tetani E88 Streptococcus thermophilus LMD 9 Vibrio vulnificus YJ016 Sphingomonas wittichii RW1 Staphylococcus aureus subsp aureus Mu50 Deinococcus radiodurans R1 Shigella boydii Sb227 Pseudomonas fluorescens Pf 5 Thermotoga petrophila RKU 1 Chlamydophila pneumoniae J138 Synechococcus sp RCC307 Rickettsia rickettsii str Sheila Smith Lactobacillus acidophilus NCFM Microcystis aeruginosa NIES 843 Desulfovibrio vulgaris subsp vulgaris str Hildenborough Streptococcus suis 98HAH33 Burkholderia pseudomallei 1106a Frankia sp EAN1pec Mycobacterium tuberculosis CDC1551 Xanthomonas campestris pv vesicatoria str 85 10 Pseudomonas mendocina ymp Chlorobium phaeobacteroides DSM 266 Brucella melitensis biovar Abortus 2308 Shigella dysenteriae Sd197 Photorhabdus luminescens subsp laumondii TTO1 Buchnera aphidicola str Sg Schizaphis graminum Acaryochloris marina MBIC11017 Candidatus Blochmannia floridanus Ehrlichia ruminantium str Welgevonden Burkholderia cenocepacia HI2424 Syntrophobacter fumaroxidans MPOB Bordetella parapertussis 12822 Salmonella typhimurium LT2 Sinorhizobium meliloti 1021 Streptococcus pyogenes MGAS10394 Bacteroides fragilis NCTC 9343 Rhodopseudomonas palustris BisB5 Burkholderia thailandensis E264 Gluconacetobacter diazotrophicus PAl 5 Agrobacterium tumefaciens str C58 Escherichia coli HS Bacillus cereus subsp cytotoxis NVH 391 98 Shewanella baltica OS155 Haemophilus influenzae PittEE Vibrio harveyi ATCC BAA 1116 Ralstonia eutropha JMP134 Petrotoga mobilis SJ95 Mycoplasma penetrans HF 2 Dechloromonas aromatica RCB Staphylococcus aureus subsp aureus Mu3 Bartonella quintana str Toulouse Campylobacter jejuni subsp jejuni NCTC 11168 Mycobacterium tuberculosis F11 Thermoanaerobacter tengcongensis MB4 Yersinia pestis biovar Microtus str 91001 Neorickettsia sennetsu str Miyayama Xanthomonas axonopodis pv citri str 306 Leifsonia xyli subsp xyli str CTCB07 Desulfovibrio vulgaris subsp vulgaris DP4 Streptococcus pyogenes MGAS9429 Nitrobacter winogradskyi Nb 255 Methylobacillus flagellatus KT Mycobacterium sp KMS Staphylococcus haemolyticus JCSC1435 Pseudomonas fluorescens PfO 1 Shigella flexneri 2a str 2457T Coxiella burnetii RSA 493 Neisseria meningitidis 053442 Coxiella burnetii Dugway 5J108 111 Bacillus amyloliquefaciens FZB42 Chlamydia trachomatis D UW 3 CX Roseiflexus castenholzii DSM 13941 Salmonella enterica subsp enterica serovar Paratyphi A str ATCC 9150 Staphylococcus aureus subsp aureus COL Azorhizobium caulinodans ORS 571 Yersinia enterocolitica subsp enterocolitica 8081 Prochlorococcus marinus str MIT 9313 Vibrio cholerae O395 Clostridium thermocellum ATCC 27405 Shigella flexneri 2a str 301 Anaeromyxobacter sp Fw109 5 Brucella ovis ATCC 25840 Paracoccus denitrificans PD1222 Pseudomonas syringae pv tomato str DC3000 Shewanella amazonensis SB2B Bartonella tribocorum CIP 105476 Nitratiruptor sp SB155 2 Shewanella sediminis HAW EB3 Alkaliphilus metalliredigens QYMF Bacillus cereus ATCC 14579 Burkholderia mallei NCTC 10247 Synechococcus elongatus PCC 7942 Burkholderia pseudomallei 668 Staphylococcus aureus RF122 Salmonella enterica subsp enterica serovar Typhi str CT18 Staphylococcus epidermidis ATCC 12228 Geobacillus kaustophilus HTA426 Streptococcus agalactiae 2603V R Streptococcus mutans UA159 Lactobacillus casei ATCC 334 Acinetobacter sp ADP1 Chromobacterium violaceum ATCC 12472 Chlorobium tepidum TLS Cytophaga hutchinsonii ATCC 33406 Streptococcus pyogenes MGAS5005 Rickettsia bellii RML369 C Yersinia pseudotuberculosis IP 31758 Rhizobium etli CFN 42 Helicobacter acinonychis str Sheeba Dichelobacter nodosus VCS1703A Janthinobacterium sp Marseille Alkalilimnicola ehrlichei MLHE 1 Idiomarina loihiensis L2TR Mycoplasma gallisepticum R Yersinia pestis CO92 Oceanobacillus iheyensis HTE831 Buchnera aphidicola str Bp Baizongia pistaciae Yersinia pestis Angola Aeromonas salmonicida subsp salmonicida A449 Porphyromonas gingivalis W83 Granulibacter bethesdensis CGDNIH1 Frankia alni ACN14a Dehalococcoides sp CBDB1 Mycobacterium avium 104 Listeria monocytogenes str 4b F2365 Nostoc sp PCC 7120 Chlamydia muridarum Nigg Methylibium petroleiphilum PM1 Streptococcus pyogenes MGAS10750 Enterococcus faecalis V583 Vibrio vulnificus CMCP6 Acholeplasma laidlawii PG 8A Prochlorococcus marinus str AS9601 Halorhodospira halophila SL1 Xanthomonas campestris pv campestris str 8004 Escherichia coli UTI89 Shewanella pealeana ATCC 700345 Rickettsia akari str Hartford Caldicellulosiruptor saccharolyticus DSM 8903 Staphylococcus aureus subsp aureus NCTC 8325 Prochlorococcus marinus subsp pastoris str CCMP1986 Vibrio parahaemolyticus RIMD 2210633 Psychromonas ingrahamii 37 Legionella pneumophila str Corby Listeria innocua Clip11262 Klebsiella pneumoniae subsp pneumoniae MGH 78578 Staphylococcus aureus subsp aureus USA300 Yersinia pestis Pestoides F Clostridium novyi NT Rhodopseudomonas palustris CGA009 Bacillus anthracis str Sterne Flavobacterium johnsoniae UW101 Streptococcus pneumoniae R6 Chlamydia trachomatis L2b UCH 1 proctitis Sulfurimonas denitrificans DSM 1251 Bacteroides vulgatus ATCC 8482 Thermosipho melanesiensis BI429 Erythrobacter litoralis HTCC2594 Chloroflexus aurantiacus J 10 fl Ureaplasma parvum serovar 3 str ATCC 700970 Rhodospirillum rubrum ATCC 11170 Mycoplasma mobile 163K Aquifex aeolicus VF5 Acidothermus cellulolyticus 11B Anabaena variabilis ATCC 29413 Vibrio cholerae O1 biovar eltor str N16961 Vibrio fischeri ES114 Burkholderia sp 383 Geobacter metallireducens GS 15 Desulfitobacterium hafniense Y51 Mycobacterium sp MCS Acidiphilium cryptum JF 5 Brucella suis ATCC 23445 Corynebacterium diphtheriae NCTC 13129 Synechococcus sp WH 7803 Mycobacterium gilvum PYR GCK Francisella tularensis subsp novicida U112 Rhodococcus sp RHA1 Clavibacter michiganensis subsp michiganensis NCPPB 382 Bradyrhizobium sp ORS278 Corynebacterium efficiens YS 314 Mesorhizobium sp BNC1 Thermosynechococcus elongatus BP 1 Nocardia farcinica IFM 10152 Shewanella baltica OS185 Escherichia coli E24377A Saccharophagus degradans 2 40 Legionella pneumophila subsp pneumophila str Philadelphia 1 Actinobacillus pleuropneumoniae L20 Prochlorococcus marinus str MIT 9215 Xanthomonas campestris pv campestris str ATCC 33913 Rhizobium leguminosarum bv viciae 3841 Streptococcus pneumoniae TIGR4 Pelobacter propionicus DSM 2379 Clostridium perfringens ATCC 13124 Gammaproteobacteria Betaproteobacteria Alphaproteobacteria Epsilonproteobacteria Deltaproteobacteria Acidobacteria Bacteroidetes/Chlorobi Chlamydiae/Planctomycetes Spirochaetes Firmicutes Cyanobacteria Chloroflexi Actinobacteria Aquificae Thermotogae Deinococcus/Thermus Firmicutes Fusobacteria
  53. 53. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  54. 54. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 0.1 Halorhodospira halophila SL1 Buchnera aphidicola str Sg Schizaphis graminum Haemophilus somnus 129PT Marinobacter aquaeolei VT8 Escherichia coli O157 H7 str Sakai Photorhabdus luminescens subsp laumondii TTO1 Aeromonas hydrophila subsp hydrophila ATCC 7966 Candidatus Ruthia magnifica str Cm Calyptogena magnifica Saccharophagus degradans 2 40 Pasteurella multocida subsp multocida str Pm70 Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis Aeromonas salmonicida subsp salmonicida A449 Coxiella burnetii RSA 493 Alkalilimnicola ehrlichei MLHE 1 Psychrobacter sp PRwf 1 Mannheimia succiniciproducens MBEL55E Idiomarina loihiensis L2TR Xylella fastidiosa Temecula1 Candidatus Blochmannia floridanus Sodalis glossinidius str morsitans Psychromonas ingrahamii 37 Pseudoalteromonas atlantica T6c Dichelobacter nodosus VCS1703A Candidatus Blochmannia pennsylvanicus str BPEN Baumannia cicadellinicola str Hc Homalodisca coagulata Haemophilus influenzae PittEE Buchnera aphidicola str Bp Baizongia pistaciae Candidatus Vesicomyosocius okutanii HA Serratia proteamaculans 568 Chromohalobacter salexigens DSM 3043 Buchnera aphidicola str Cc Cinara cedri Acinetobacter sp ADP1 Haemophilus ducreyi 35000HP Shewanella pealeana ATCC 700345 Alcanivorax borkumensis SK2 Hahella chejuensis KCTC 2396 Vibrio fischeri ES114 Acinetobacter baumannii ATCC 17978 Methylococcus capsulatus str Bath Colwellia psychrerythraea 34H Photobacterium profundum SS9 Erwinia carotovora subsp atroseptica SCRI1043 Legionella pneumophila str Corby Actinobacillus pleuropneumoniae L20 Xanthomonas campestris pv vesicatoria str 85 10 Francisella tularensis subsp novicida U112 Pseudomonas stutzeri A1501 Yersinia enterocolitica subsp enterocolitica 8081 Nitrosococcus oceani ATCC 19707 Pseudoalteromonas haloplanktis TAC125 Marinomonas sp MWYL1 Psychrobacter arcticus 273 4 Thiomicrospira crunogena XCL 2 Actinobacillus succinogenes 130Z Buchnera aphidicola str APS Acyrthosiphon pisum 20 26 100 100 75 100 10 93 53 100 34 76 100 95 100 100 29 32 100 9 100 100 100 100 87 100 91 100 100 22 74 100 100 93 100 65 85 46 79 99 100 100 100 99 84 66 48 5 100 75 100 46 73 A. 0.07 Sodalis glossinidius str. 'morsitans' Thiomicrospira crunogena XCL-2 Acinetobacter sp. ADP1 Pseudoalteromonas haloplanktis TAC125 Marinobacter aquaeolei VT8 Colwellia psychrerythraea 34H Pseudomonas stutzeri A1501 Haemophilus somnus 129PT Alkalilimnicola ehrlichei MLHE-1 Mannheimia succiniciproducens MBEL55E Psychromonas ingrahamii 37 Xanthomonas campestris pv. vesicatoria str. 85-10 Methylococcus capsulatus str. Bath Shewanella pealeana ATCC 700345 Candidatus Ruthia magnifica str. Cm Calyptogena magnifica Vibrio fischeri ES114 Erwinia carotovora subsp. atroseptica SCRI1043 Coxiella burnetii RSA 493 Buchnera aphidicola str. Cc Cinara ce Francisella tularensis subsp. novicida U112 Psychrobacter sp. PRwf-1 Xylella fastidiosa Temecula1 Hahella chejuensis KCTC 2396 Nitrosococcus oceani ATCC 19707 Chromohalobacter salexigens DSM 3043 Acinetobacter baumannii ATCC 17978 Haemophilus influenzae PittEE Serratia proteamaculans 568 Escherichia coli O157H7 str. Sakai Saccharophagus degradans 2-40 Photorhabdus luminescens subsp. laumondii TTO1 Candidatus Blochmannia floridanus Aeromonas hydrophila subsp. hydrophila ATCC 7966 Dichelobacter nodosus VCS1703A Marinomonas sp. MWYL1 Baumannia cicadellinicola str. Hc Homalodisca c Buchnera aphidicola str. APS Acyrth Wigglesworthia glossinidia endosymbiont of Glo Photobacterium profundum SS9 Actinobacillus pleuropneumoniae L20 Buchnera aphidicola str. Bp Baizon Candidatus Vesicomyosocius okutanii HA Candidatus Blochmannia pennsylvanicus str. BP Buchnera aphidicola str. Sg Schiz Pseudoalteromonas atlantica T6c Pasteurella multocida subsp. multocida str Halorhodospira halophila SL1 Haemophilus ducreyi 35000HP Idiomarina loihiensis L2TR Yersinia enterocolitica subsp. enterocolitica 8081 Legionella pneumophila str. Corby Actinobacillus succinogenes 130Z Alcanivorax borkumensis SK2 Aeromonas salmonicida subsp. salmonicida A449 Psychrobacter arcticus 273-4 54 37 100 14 10 28 50 100 48 25 45 99 98 100 100 52 37 100 83 100 10 43 24 100 100 8 94 86 100 75 100 37 45 56 31 49 100 64 22 94 23 38 7678 45 22 14 100 72 4 66 2 B. Proteins rRNA
  55. 55. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 ORF00955 gi73748443 gi57234592 gi86609871 gi86605128 gi17229086 gi75910405 gi113476515 gi16329957 gi56752516 gi81300331 gi33862041 gi78779962 gi33865147 gi78213583 gi78184182 gi113953748 gi33863774 gi72382854 gi33241089 gi22298184 gi37521852 ZAVAM73TF Thermomicrobium roseum DSM 5159 Dehalococcoides sp. CBDB1 Dehalococcoides ethenogenes 195 Synechococcus sp. JA 2 3Bʼa 2 13 Synechococcus sp. JA 3 3Ab Nostoc sp. PCC 7120 Anabaena variabilis ATCC 29413 Trichodesmium erythraeum IMS101 Synechocystis sp. PCC 6803 Synechococcus elongatus PCC 6301 Synechococcus elongatus PCC 7942 Prochlorococcus marinus subsp. pastoris str. CCMP1986 Prochlorococcus marinus str. MIT 9312 Synechococcus sp. WH 8102 Synechococcus sp. CC9605 Synechococcus sp. CC9902 Synechococcus sp. CC9311 Prochlorococcus marinus str. MIT 9313 Prochlorococcus marinus str. NATL2A Prochlorococcus marinus subsp. marinus str. CCMP1375 Thermosynechococcus elongatus BP 1 Gloeobacter violaceus PCC 7421
  56. 56. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Search with HMMs !54
  57. 57. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Search with HMMs !55
  58. 58. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  59. 59. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  60. 60. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  61. 61. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 What Else Needed for This to Be Better?
  62. 62. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 What Else Needed for This to Be Better? • More Phylogenetically Diverse Genomes • More Markers esp. Taxon Specific Markers • Better / Faster Searching • Longer reads • Many sequences at once • Many genes at once • Alpha and Beta Diversity • Automated Masking
  63. 63. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Zorro - Automated Masking cetoTrueTree 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 200 400 800 1600 3200 DistancetoTrueTree Sequence Length 200 no masking zorro gblocks 200 Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi: 10.1371/journal.pone.0030288
  64. 64. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Representative Genomes Extract Protein Annotation All v. All BLAST Homology Clustering (MCL) SFams Align & Build HMMs HMMs Screen for Homologs New Genomes Extract Protein Annotation Figure 1 Sharpton et al. 2013 A B C Improving II: More Families
  65. 65. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Kembel Combiner typically used as a qualitative measure because duplicate s quences are usually removed from the tree. However, the test may be used in a semiquantitative manner if all clone even those with identical or near-identical sequences, are i cluded in the tree (13). Here we describe a quantitative version of UniFrac that w call “weighted UniFrac.” We show that weighted UniFrac b haves similarly to the FST test in situations where both a FIG. 1. Calculation of the unweighted and the weighted UniFr measures. Squares and circles represent sequences from two differe environments. (a) In unweighted UniFrac, the distance between t circle and square communities is calculated as the fraction of t branch length that has descendants from either the square or the circ environment (black) but not both (gray). (b) In weighted UniFra branch lengths are weighted by the relative abundance of sequences the square and circle communities; square sequences are weight twice as much as circle sequences because there are twice as many tot circle sequences in the data set. The width of branches is proportion to the degree to which each branch is weighted in the calculations, an gray branches have no weight. Branches 1 and 2 have heavy weigh since the descendants are biased toward the square and circles, respe tively. Branch 3 contributes no value since it has an equal contributio from circle and square sequences after normalization. Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214
  66. 66. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Phylosift - Mining the Global Metagenome Jonathan Eisen Students and other staff: - Eric Lowe, John Zhang, David Coil ! Open source community: - BLAST, LAST, HMMER, Infernal, pplacer, Krona, metAMOS, Bioperl, Bio::Phylo, JSON, etc. etc. ! PhyloSift is open source software: -http://phylosift.wordpress.org -http://github.com/gjospin/phylosift Erick Matsen FHCRC Todd Treangen BNBI, NBACC Holly Bik Tiffanie Nelson Mark Brown Aaron Darling Guillaume Jospin Supported by DHS Grant
  67. 67. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Output 1: Taxonomy Taxonomic summary plots in Krona (Ondov et al 2011)
  68. 68. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Tree Reconciliation in PhyloSift
  69. 69. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Tree Reconciliation in PhyloSift Environmental Sequences Named Taxa
  70. 70. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Output 2: Phylogenetic Tree of Reads Placement tree from 2 week old infant gut data
  71. 71. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Edge PCA: Identify lineages that explain most variation among samples Matsen and Evans 2013, Darling et al Submitted. Output: Edge PCA
  72. 72. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 QIIME and Edge PCA on 110 fecal metagenomes from Yatsunenko et al 2012 Nature.
 
 Sequenced with 454, to about 150Mbp/metagenome Darling et al Submitted. Edge PCA vs. UNIFRAC PCA Edge PCA: Matsen and Evans 2013
  73. 73. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Output 3: Edge PCA " Edge PCA for exploratory data analysis (Matsen and Evans 2013) " Given E edges and S samples: − For each edge, calculate difference in placement mass on either side of edge − Results in E x S matrix − Calculate E x E covariance matrix − Calculate eigenvectors, eigenvalues of covariance matrix " Eigenvector: each value indicates how “important” an edge is in explaining differences among the S samples Example calculating a matrix entry for an edge: This edge gets 5-2=3 mass=5 mass=2
  74. 74. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Phylosift/ pplacer Workflow Aaron Darling, Guillaume Jospin, Holly Bik, Erik Matsen, Eric Lowe, and others Input Sequences rRNA workflow protein workflow profile HMMs used to align candidates to reference alignment Taxonomic Summaries parallel option hmmalign multiple alignment LAST fast candidate search pplacer phylogenetic placement LAST fast candidate search LAST fast candidate search search input against references hmmalign multiple alignment hmmalign multiple alignment Infernal multiple alignment LAST fast candidate search <600 bp >600 bp Sample Analysis & Comparison Krona plots, Number of reads placed for each marker gene Edge PCA, Tree visualization, Bayes factor tests eachinputsequencescannedagainstbothworkflows
  75. 75. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Phylosift Markers • PMPROK – Dongying Wu’s Bac/Arch markers • Eukaryotic Orthologs – Parfrey 2011 paper • 16S/18S rRNA • Mitochondria - protein-coding genes • Viral Markers – Markov clustering on genomes • Codon Subtrees – finer scale taxonomy • Extended Markers – plastids, gene families
  76. 76. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 More Markers Phylogenetic group Genome Number Gene Number Maker Candidate Archaea 62 145415 106 Actinobacteria 63 267783 136 Alphaproteoba 94 347287 121 Betaproteobac 56 266362 311 Gammaproteo 126 483632 118 Deltaproteoba 25 102115 206 Epislonproteo 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684 Wu D, Jospin G, Eisen JA (2013) Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups. PLoS ONE 8(10): e77033. doi:10.1371/journal.pone. 0077033
  77. 77. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Better Reference Tree Lang JM, Darling AE, Eisen JA (2013) Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees and Supermatrices. PLoS ONE 8(4): e62510. doi:10.1371/ journal.pone.0062510
  78. 78. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  79. 79. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees Dongying Wu1 , Martin Wu1,4 , Aaron Halpern2,3 , Douglas B. Rusch2,3 , Shibu Yooseph2,3 , Marvin Frazier2,3 , J. Craig Venter2,3 , Jonathan A. Eisen1 * 1 Department of Evolution and Ecology, Department of Medical Microbiology and Immunology, University of California Davis Genome Center, University of California Davis, Davis, California, United States of America, 2 The J. Craig Venter Institute, Rockville, Maryland, United States of America, 3 The J. Craig Venter Institute, La Jolla, California, United States of America, 4 University of Virginia, Charlottesville, Virginia, United States of America Abstract Background: Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species. Methodology/Principal Findings: We designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) Expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences. Conclusions/Significance: Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them. Citation: Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, et al. (2011) Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011 Editor: Robert Fleischer, Smithsonian Institution National Zoological Park, United States of America Received October 25, 2010; Accepted February 20, 2011; Published March 18, 2011 This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. Funding: The development and main work on this project was supported by the National Science Foundation via an ‘‘Assembling the Tree of Life’’ grant (number 0228651) to to Jonathan A. Eisen and Naomi Ward. The final work on this project was funded by the Gordon and Betty Moore Foundation (through
  80. 80. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Figure 1. Phylogenetic tree of the RecA superfamily. All RecA sequences were grouped into clusters using the Lek algorithm. Representatives of each cluster that contained .2 members were then selected and aligned using MUSCLE. A phylogenetic tree was built by from this alignment using PHYML; bootstrap values are based on 100 replicas. The Lek cluster ID precedes each sequence accession ID. Proposed subfamilies in the RecA superfamily are shaded and given a name on the right. Five of the proposed subfamilies contained only GOS sequences at the time of our initial analysis (RecA-like SAR, Phage SAR1, Phage SAR2, Unknown 1 and Unknown 2) and are highlighted by colored shading. As noted on the tree and in the text, sequences from two Archaea that were released after our initial analysis group in the Unknown 2 subfamily. doi:10.1371/journal.pone.0018011.g001 Stalking the Fourth Domain PLoS ONE | www.plosone.org 5 March 2011 | Volume 6 | Issue 3 | e18011
  81. 81. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Five RecA subfamilies were identified as being novel (i.e., only seen in metagenomic data) in our initial analyses. GOS metagenome assemblies that encode members of these subfamilies were identified and the genes neighboring the novel RecAs were characterized. The neighboring gene descriptions are based on the top BLASTP hits against the NRAA database; taxonomy assignments are based on their closest neighbor in phylogenetic trees built from the top NRAA BLASTP hits. doi:10.1371/journal.pone.0018011.t002 Figure 2. The largest assembly from the GOS data that encodes a novel RecA subfamily member (a representative of subfamily Unknown 2). This GOS assembly (ID 1096627390330) encodes 33 annotated genes plus 16 hypothetical proteins, including several with similarity to known archaeal genes (e.g., DNA primase, translation initiation factor 2, Table 2). The arrow indicates a novel recA homolog from the Unknown 2 subfamily (cluster ID 9). doi:10.1371/journal.pone.0018011.g002 PLoS ONE | www.plosone.org 7 March 2011 | Volume 6 | Issue 3 | e18011
  82. 82. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Methods Identification of deeply-branching ss-rRNA sequences these 340 sequences were extracted from the European Ribosomal RNA database [66] and then manually curated to remove columns with more than 90% gaps or with poor alignment Figure 3. Phylogenetic tree of the RpoB superfamily. All RpoB sequences were grouped into clusters using the Lek algorithm. Representatives of each cluster that contained .2 members were then selected and aligned using MUSCLE. A phylogenetic tree was built by from this alignment using PHYML; bootstrap values are based on 100 replicas. The Lek cluster ID precedes each sequence accession ID. Proposed subfamilies in the RpoB superfamily are shaded and given a name on the right. The two novel RpoB clades that contain only GOS sequences are highlighted by the colored panels. doi:10.1371/journal.pone.0018011.g003 Stalking the Fourth Domain

×