Jonathan Eisen @phylogenomics talk for #LAMG12
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,655
On Slideshare
1,918
From Embeds
1,737
Number of Embeds
56

Actions

Shares
Downloads
29
Comments
0
Likes
5

Embeds 1,737

http://phylogenomics.blogspot.com 867
http://storify.com 108
http://phylogenomics.blogspot.co.uk 107
http://phylogenomics.blogspot.ca 91
http://phylogenomics.blogspot.de 87
http://feedly.com 69
http://phylogenomics.blogspot.fr 37
http://phylogenomics.blogspot.co.nz 26
http://phylogenomics.blogspot.com.au 25
http://phylogenomics.blogspot.ch 23
http://phylogenomics.blogspot.com.es 18
http://phylogenomics.blogspot.com.br 18
http://phylogenomics.blogspot.co.at 16
http://phylogenomics.blogspot.ru 15
http://www.newsblur.com 15
http://phylogenomics.blogspot.jp 14
http://www.phylogenomics.blogspot.com 14
http://phylogenomics.blogspot.in 14
http://phylogenomics.blogspot.nl 14
http://phylogenomics.blogspot.dk 12
https://twitter.com 12
http://phylogenomics.blogspot.ro 11
http://phylogenomics.blogspot.hk 10
http://www.linkedin.com 8
http://digg.com 8
http://phylogenomics.blogspot.se 7
http://phylogenomics.blogspot.fi 7
https://www.inoreader.com 6
https://www.linkedin.com 6
http://www.got-blogger.com 6
http://phylogenomics.blogspot.it 5
http://phylogenomics.blogspot.cz 5
http://phylogenomics.blogspot.be 4
https://phylogenomics.blogspot.com 4
http://phylogenomics.blogspot.ie 4
http://phylogenomics.blogspot.no 3
http://news.google.com 3
http://feedproxy.google.com 3
http://phylogenomics.blogspot.co.il 3
http://www.inoreader.com 3
https://www.commafeed.com 3
http://phylogenomics.blogspot.sg 3
http://phylogenomics.blogspot.pt 3
http://scottcain.net 3
http://phylogenomics.blogspot.mx 3
http://phylogenomics.blogspot.com.ar 2
http://phylogenomics.blogspot.gr 2
http://reader.aol.com 2
http://tweetedtimes.com 1
http://phylogenomics.blogspot.ae 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Phylogenomic Approaches to the Study of Microbial Diversity September 16, 2012 Lake Arrowhead Microbial Genomes #LAMG12 Jonathan A. Eisen University of California, Davis @phylogenomicsSunday, September 16, 12
  • 2. A Bit of History • For the real story about the Lake Arrowhead Microbial Genomes meetings see http://tinyurl.com/LAMG12 • But the key to LAMG meetings are ...Sunday, September 16, 12
  • 3. QuotesSunday, September 16, 12
  • 4. Quotes • Space-time continuum of genes and genomesSunday, September 16, 12
  • 5. Quotes • Space-time continuum of genes and genomes • Microbes not only have a lot of sex, they have a lot of weird sexSunday, September 16, 12
  • 6. Quotes • Space-time continuum of genes and genomes • Microbes not only have a lot of sex, they have a lot of weird sex • Gene sequences are the wormhole that allows one to tunnel into the pastSunday, September 16, 12
  • 7. Quotes • Space-time continuum of genes and genomes • Microbes not only have a lot of sex, they have a lot of weird sex • Gene sequences are the wormhole that allows one to tunnel into the past • This is how you do metagenomics on 50 dollars, and that’s Canadian dollarsSunday, September 16, 12
  • 8. Quotes • Space-time continuum of genes and genomes • Microbes not only have a lot of sex, they have a lot of weird sex • Gene sequences are the wormhole that allows one to tunnel into the past • This is how you do metagenomics on 50 dollars, and that’s Canadian dollars • The human guts are a real milieu of stuffSunday, September 16, 12
  • 9. Quotes • Space-time continuum of genes and genomes • Microbes not only have a lot of sex, they have a lot of weird sex • Gene sequences are the wormhole that allows one to tunnel into the past • This is how you do metagenomics on 50 dollars, and that’s Canadian dollars • The human guts are a real milieu of stuff • Antibiotics do not kill things, they corrupt themSunday, September 16, 12
  • 10. Quotes • There comes a point in life when you have to bring chemists into the pictureSunday, September 16, 12
  • 11. Quotes • There comes a point in life when you have to bring chemists into the picture • The rectal swabs are here in tan colorSunday, September 16, 12
  • 12. Quotes • There comes a point in life when you have to bring chemists into the picture • The rectal swabs are here in tan color • If I have time I will tell you about a dreamSunday, September 16, 12
  • 13. Quotes • There comes a point in life when you have to bring chemists into the picture • The rectal swabs are here in tan color • If I have time I will tell you about a dream • Another thing you need to know" pause "Actually you dont NEED to know any of thisSunday, September 16, 12
  • 14. Quotes • There comes a point in life when you have to bring chemists into the picture • The rectal swabs are here in tan color • If I have time I will tell you about a dream • Another thing you need to know" pause "Actually you dont NEED to know any of this • I have been influenced by Fisher Price throughout my lifeSunday, September 16, 12
  • 15. Quotes • There comes a point in life when you have to bring chemists into the picture • The rectal swabs are here in tan color • If I have time I will tell you about a dream • Another thing you need to know" pause "Actually you dont NEED to know any of this • I have been influenced by Fisher Price throughout my life • This is going to be ironic coming from someone who studies circumcisionSunday, September 16, 12
  • 16. Quotes • And we will bring out the unused cheese from yesterdaySunday, September 16, 12
  • 17. Quotes • And we will bring out the unused cheese from yesterday • A paper came out next yearSunday, September 16, 12
  • 18. Quotes • And we will bring out the unused cheese from yesterday • A paper came out next year • It takes 1000 nanobiologists to make one microbiologistSunday, September 16, 12
  • 19. Quotes • And we will bring out the unused cheese from yesterday • A paper came out next year • It takes 1000 nanobiologists to make one microbiologist • In an engineering sense, the vagina is a simple plug flow reactorSunday, September 16, 12
  • 20. Phylogenomic Approaches to Studying Microbial Diversity Example 1: Phylotyping and Phylogenetic DiversitySunday, September 16, 12
  • 21. rRNA Phylotyping DNA extraction PCR Makes lots of Sequence PCR copies of the rRNA genes rRNA genes in sample rRNA1 5’...ACACACATAGGTGGAGCTA GCGATCGATCGA... 3’ Sequence alignment = Data matrix rRNA2 rRNA1 A C A C A C 5’..TACAGTATAGGTGGAGCTAG CGACGATCGA... 3’ rRNA2 T A C A G T rRNA3 rRNA3 C A C T G T 5’...ACGGCAAAATAGGTGGATT rRNA4 C A C A G T CTAGCGATATAGA... 3’ E. coli A G A C A G rRNA4 5’...ACGGCCCGATAGGTGGATT Humans T A T A G T CTAGCGCCATAGA... 3’ Yeast T A C A G TSunday, September 16, 12
  • 22. rRNA PhylotypingSunday, September 16, 12
  • 23. rRNA Phylotyping E. coli Humans YeastSunday, September 16, 12
  • 24. rRNA Phylotyping E. coli Humans Yeast OTU2 OTU1 OTU4 OTU3 E. coli Humans YeastSunday, September 16, 12
  • 25. rRNA Phylotyping B A Cluster CSunday, September 16, 12
  • 26. rRNA Phylotyping B A Cluster C B A OTUs CSunday, September 16, 12
  • 27. rRNA Phylotyping B A Cluster C B A OTUs C OTU1 OTU2 OTU3 OTU4Sunday, September 16, 12
  • 28. rRNA Phylotyping B A Cluster C B A OTUs C OTU2 OTU1 OTU1 OTU4 OTU3 OTU2 OTU3 E. coli Humans OTU4 YeastSunday, September 16, 12
  • 29. rRNA Phylotyping E. coli Humans YeastSunday, September 16, 12
  • 30. rRNA Phylotyping Just E. coli Humans Phylogeny YeastSunday, September 16, 12
  • 31. rRNA Phylotyping B A Cluster C Just B E. coli Humans Phylogeny A Yeast OTUs C OTU2 OTU1 OTU1 OTU4 OTU3 OTU2 OTU3 E. coli Humans OTU4 YeastSunday, September 16, 12
  • 32. rRNA Phylotyping • OTUs • Taxonomic lists • Relative abundance of taxa • Ecological metrics (alpha and beta diversity) • Phylogenetic metrics • Binning • Identification of novel groups • Clades • Rates of change • LGT • Convergence • PD • Phylogenetic ecology (e.g., Unifrac)Sunday, September 16, 12
  • 33. What’s New in PhylotypingSunday, September 16, 12
  • 34. What’s New in Phylotyping I • More PCR products • Deeper sequencing • The rare biosphere • Relative abundance estimates • More samples (with barcoding) • Times series • Spatially diverse sampling • Fine scale samplingSunday, September 16, 12
  • 35. intense research (5–9), as such studies of β-diversity (variation in mental variation or dispersal limitation community composition) yield insights into the maintenance of vary by spatial scale? Because most bac Beta-Diversity biodiversity. These studies are still relatively rare for micro- organisms, however, and thus our understanding of the mecha- and hardy, we predicted that dispersa primarily across continents, resulting nisms underlying microbial diversity—most of the tree of life— microbial “provinces” (15). At the sam remains limited. environmental factors would contrib β-Diversity, and therefore distance-decay patterns, could be decay at all scales, resulting in the steepe driven solely by differences in environmental conditions across scale as reported in plant and animal c space, a hypothesis summed up by microbiologists as, “every- thing is everywhere—the environmental selects” (10). Under this Results and Discussion model, a distance-decay curve is observed because environmen- We characterized AOB community co tal variables tend to be spatially autocorrelated, and organisms Sanger sequencing of 16S rRNA gene with differing niche preferences are selected from the available primer sets. Here we focus on the resu pool of taxa as the environment changes with distance. sequences from the order Nitrosomo Dispersal limitation can also give rise to β-diversity, as it per- primers specific for AOB within the β- mits historical contingencies to influence present-day biogeo- The second primer set (18) generate graphic patterns. For example, neutral niche models, in which an organism’s marshes 1.sampled marshes sampled for details). for details). its environmental Fig. 1. The 13 abundance (see Table S1 (see Table S1 Marshes com- com- Fig. The 13 is not influenced by Marshes pared with one another within regions are circled. (Inset) The arrangement preferences, predict apoints within marshes. Six pointsThe arrangement a 100-m relatively pared with one another within regions are circled. (Inset) were sampled along On of sampling distance-decay curve (8, 11). Author contributions: J.B.H.M. and M.C.H.-D. designed of sampling points within marshes. Six points births ∼1 kmalongTwo marshescontribute to transect, and a seventh point was sampled were sampled away. a 100-m in the short time seventh pointstochastic km away.were sampled morethe scales, was sampled(outlined stars) Two marshes in intensively, and deaths Northeast United States M.C.H.-D. performed research; J.B.H.M., S.D.A., and M transect, and a ∼1 a grid pattern. a Northeast United Statesdistributionweretaxa (ecological drift). On longer heterogeneous (outlined stars) of sampled more intensively, along four 100-m transects in and M.C.H.-D. wrote the paper. time four 100-m transects in a rangegenetic processes allow results taxon Distance-decay curves for the declare no conflict of interest. along scales, stochastic pattern. a broader grid of Proteobacteria, but yielded similar for Fig. 2. di- The authors Nitrosomadales communities. The versification across the Tables S2 and S3). (Fig. S1 and landscape (evolutionary drift). If dispersal denotes thearticle is alinear regression across all spatial dashed, blue line This least-squares PNAS Direct Submission. Across all samples, we identified 4,931 quality Nitrosomadales scales. The solid lines denote separate regressions within each of the three isa limiting, then current environmental or (operational taxo- 2.spatial scales: within marshes, regional the Nitrosomadales communities. The acces broader range of Proteobacteria, but yielded similar results conditions will sequences, which grouped into 176 OTUs biotic Fig. Distance-decay Freely available marshes within regions circledPNAS open curves for (across online through the in notAcrossand samples, theidentified 4,931 qualitycurve, and thusdashed,Thebluelinelines significantlyregions). The slopes of all withinsolid theof thespatialthis pape (Fig. S1 fully all Tables S2 units)retained a arbitrary 99% Nitrosomadales cutoff. Fig. 1),solidcontinental Dataseparate zero. linear regression across all three nomic and S3). an explain we distance-decay sequence similarity but light andline)denotes(acrossleast-squares The slopeslinesthe each solid in This cutoff using blue the geographicare denote deposition: The sequences red lines less than regressions of (except reported high amount of sequence diversity, scales. are significantly different from the slope of the all scale (blue dashed) line. distance will begrouped the chance of including diversity similarity even after marshes, regional (across marshes within regions circled in correlated with community because se- sequences, which minimized into 176 OTUs (operational taxo- of spatial scales: within Bank database (accession nos. HQ271472–HQ276885 quencing or PCR99% sequence similarity cutoff. appear 1), and continental (across regions). The slopes of all lines (except the solid errors. Most (95%) of the sequences Fig. controlling for closelyarbitraryeither(2).the marine Nitrosospira-like clade, blue line) are significantly less than zero. The slopesdistancesolid red lines E-m nomic units) using another factors to related 1 light somonadales community similarity. Geographic of the con- addressed. To whom correspondence should be Drivers of bacterial β-diversity depend on spatial scale This cutoff retained a to be abundant inof sequence diversity,ref. 19) orare significantly different from the slope of the all scale (blue dashed) line. known high amount estuarine sediments (e.g., but For macroorganisms, the relative because contribution of environ- largest partial regression coefficient (b = 0.40, to tributed the ECOLOGY marine of including diversity (20) (Fig. This article contains supporting information online at minimized the chance bacterium C-17, classified as Nitrosomonasof se- S2). P < 0.0001), with sediment moisture, nitrate concentration, plant mental factors Pairwise community similaritythe sequences appear calcu- cover, salinity, and1073/pnas.1016308108/-/DCSupplemental. quencing or PCR or dispersal limitation to β-diversity depends on errors. Most (95%) of between the samples was air and water temperature contributing to Jennifer relatedMartinya,1, Jonathan A. Nitrosospira-likePennc, Steven D. Allisona,d, and M. Claire Horner-Devinedistance con- closely B. H. either based the the presence or absence of each OTU using smaller, but significant, partial regression coefficients (b e 0.09– lated to on marine Eisenb, Kevin clade, somonadales community similarity. Geographic = a rarefied Sørensen’s index (4). Community similarity using this sediments (e.g., ref. abundance-based 0.17, the 0.05) (Table 1). Because salt marsh bacteria may be known to be abundant in estuarinehighly correlated with the19) or to tributed P < largest of California, Irvine, CAused a global ocean of incidence index was Biology, and dDepartment of Earth System Science, University ocean currents, we also coefficient (b = 0.40, partial regression 92697; bDepartment ECOLOGY a Department of Ecology and classified as Nitrosomonas (20) (Fig. S2). Evolutionary dispersing through marine bacterium Sørensen index (Mantel test: ρvol. 108 P =no. 19 (21). P < 0.0001), with sediment moisture, nitrate(24), to estimate plant C-17, May 10, 2011 | = 0.9239; | 0.0001) 7850–7854 |Ecology, University of California Davis Genome Center, Davis, CA 95616;circulation model (23), as applied previously concentration, Evolution and PNAS | c www.pnas.org/ Center for Marine Biotechnology and Biomedicine, The Scripps Pairwise community similarity between the samples was Jolla, CA 92093; and eSchool and timesandFishery Sciences, University between A plot of community similarity San Diego, La calcu- Institution of Oceanography, University of California atversus geographic distance cover, salinity,of Aquatic and hypothetical microbial cells of Washington, for relative dispersal air of water temperature contributing to Seattle, WA 98195 the presence or samples revealed that the Nitrosomonadales lated based on each pairwise set of absence of each OTU using smaller, but significant, partial regression coefficientspoints 0.09– each sampling location. Dispersal times between sampling (b =Sunday, September 16, 12 display a significant, negative distance-decay curve (slope = −0.08, did not explain more variability in bacterial community similarity
  • 36. Earth Microbiome ProjectSunday, September 16, 12
  • 37. Microbial Range MapsSunday, September 16, 12
  • 38. Things You Could Do • Mississippi River: 2320 miles longSunday, September 16, 12
  • 39. Things You Could Do • Mississippi River: 2320 miles long • 1 site / mile • 3 samples / site • 6960 samples • rRNA PCR w/ barcodes • metagenomics w/ barcodes • Miseq Run: • 30 million sequence reads • 4310 sequences / sample • Hiseq 2000 • 6 billion sequence reads • 862,068 sequences / sampleSunday, September 16, 12
  • 40. Things You Could Do • Mississippi River: 12,249,600 feet long • 1 site / 500 feet • 3 samples / site • 73497 samples • rRNA PCR w/ barcodes • metagenomics w/ barcodes • Miseq Run: • 30 million sequence reads • 408 sequences / sample • Hiseq 2000 • 6 billion sequence reads • 81,635 sequences / sampleSunday, September 16, 12
  • 41. What’s New in Phylotyping II • Metagenomics avoids biases of rRNA PCR shotgun sequenceSunday, September 16, 12
  • 42. Metagenomic Phylotyping B A Cluster C Just B E. coli Humans Phylogeny A Yeast OTUs C OTU2 OTU1 OTU1 OTU4 OTU3 OTU2 OTU3 E. coli Humans OTU4 YeastSunday, September 16, 12
  • 43. Phylogenetic Challenge ??Sunday, September 16, 12
  • 44. Phylogenetic Challenge ??Sunday, September 16, 12
  • 45. Phylogenetic Challenge Multiple approachesSunday, September 16, 12
  • 46. Method 1: Each is an islandSunday, September 16, 12
  • 47. Method 1: Each is an island • Build alignment, models, trees for full length seqs • Analyze fragmented reads one at a timeSunday, September 16, 12
  • 48. Method 1: Each is an island • Build alignment, models, trees for full length seqs • Analyze fragmented reads one at a timeSunday, September 16, 12
  • 49. Method 1: Each is an island • Build alignment, models, trees for full length seqs • Analyze fragmented reads one at a timeSunday, September 16, 12
  • 50. STAP ss-rRNA Taxonomy Pip Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001 STAP database, and the query sequence is aligned to them using a the CLUSTALW profile alignment algorithm [40] as described w above for domain assignment. By adapting the profile alignment s a t o G t t Each sequence s T c analyzed separately a q c e b b S p a Figure 2. Domain assignment. In Step 1, STAP assigns a domain to t each query sequence based on its position in a maximum likelihood d tree of representative ss-rRNA sequences. Because the tree illustrated ‘ here is not rooted, domain assignment would not be accurate and s reliable (sequence similarity based methods cannot make an accurate s assignment in this case either). However the figure illustrates an important role of the tree-based domain assignment step, namely s automatic identification of deep-branching environmental ss-rRNAs. d doi:10.1371/journal.pone.0002566.g002 a PLoS ONE | www.plosone.org 5 Wu et al. 2008 PLoS OneFigureSeptember 16, 12Sunday, 1. A flow chart of the STAP pipeline.
  • 51. AMPHORA Wu and Eisen Genome Biology 2008 9:R151 doi:10.1186/ gb-2008-9-10-r151 Guide treeSunday, September 16, 12
  • 52. Phylotyping w/ Proteins Wu and Eisen Genome Biology 2008 9:R151 doi:10.1186/gb-2008-9-10-r151Sunday, September 16, 12
  • 53. Method 2: Most in the FamilySunday, September 16, 12
  • 54. Phylogenetic Challenge xxxxxxxxxxxxxxxxxxxxxxx xxxxxx xxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxx ??Sunday, September 16, 12
  • 55. Method 2: Most in family xxxxxxxxxxxxxxxxxxxxxxx xxxxxx xxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxx One tree for those w/ overlapSunday, September 16, 12
  • 56. rRNA in Sargasso Metagenome Venter et al., Science 304: 66. 2004Sunday, September 16, 12
  • 57. RecA Phylotyping in Sargasso Data Venter et al., Science 304: 66. 2004Sunday, September 16, 12
  • 58. Weighted % of Clones 0 0.125 0.250 0.375 0.500 Al ph ap ro t eo Be baSunday, September 16, 12 ta ct er pr ia ot eo G 304: 66. 2004 am b ac m t er ap ia ro Ep t eo si ba lo ct Venter et al., Science np er ro ia eo t De ba lta ct pr er ot ia eo ba C EFG ct ya er no ia ba ct er Fi ia rm ic EFTu ut es Ac tin ob ac te ria C hl HSP70 or ob i C Major Phylogenetic Group FB Sargasso Phylotypes C RecA hl or of le xi Sp iro ch ae te s RpoB Fu so ba De ct in er ia oc Sargasso Phylotyping oc cu s- rRNA Th Eu er ry m ar u ch s ae C ot a re na rc ha eo ta
  • 59. STAP, QIIME, Mothur ss-rRNA Taxonomy Pip Combine all into one alignment Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001Sunday, September 16, 12
  • 60. all of these bioinformatics steps together in one package. therefore, to invest a large amount of time and effort to To this end, we have built an automated, user-friendly, get to that list of microbes. But now that current efforts workflow-based system called WATERS: a Workflow for are significantly more advanced and often require com- WATERs the Alignment, Taxonomy, and Ecology of Ribosomal parison of dozens of factors and variables with datasets of Sequences (Fig. 1). In addition to being automated and thousands of sequences, it is not practically feasible to Page 2 of 14 simple to use, because WATERS is executed in the Kepler process these large collections "by hand", and hugely inef- scientific workflow system (Fig. 2) it also has the advan- ficient if instead automated methods can be successfully tage that it keeps track of the data lineage and provenance employed. of data products [23,24]. Broadening the user base Automation A second motivation and perspective is that by minimiz- The primary motivation in building WATERS was to ing the technical difficulty of 16 S rDNA analysis through minimize the technical, bioinformatics challenges that the use of WATERS, we aim to make the analysis of these ic- arise when performing DNA sequence clustering, phylo- datasets more widely available and allow individuals withA). Check Build sly Align chimeras Cluster Treeersnlyed, Diversity Assign Tree w/ statistics & ed graphs Taxonomy Taxonomy ngge- Cytoscape OTU table Unifracde- network files he a nt Figure 1 Overview of WATERS. Schema of WATERS where whiteise boxes indicate "behind the scenes" analyses that are performed in WA- he TERS. Quality control files are generated for white boxes, but not oth- erwise routinely analyzed. Black arrows indicate that metadata (e.g., on sample type) has been overlaid on the data for downstream interpre- n- tation. Colored boxes indicate different types of results files that arend generated for the user for further use and biological interpretation. Colors indicate different types of WATERS actors from Fig. 2 which Figure 2 Screenshot of WATERS in Kepler software. Key features: the library of actors un-collapsed and displayed on the left-hand side, the inputeys were used: green, Diversity metrics, WriteGraphCoordinates, Diversity and output paths where the user declares the location of their input files and desired location for the results files. Each green box is an individual Kepler graphs; blue, Taxonomy, BuildTree, Rename Trees, Save Trees; Create- actor that performs a single action on the data stream. The connectors (black arrows) direct and hook up the actors in a defined sequence. Double-er) clicking on any actor or connector allows it to be manipulated and re-arranged. Unifrac; yellow, CreateOtuTable, CreateCytoscape, CreateOTUFile; 16 white, remaining unnamed actors. n- as chimeric sequences generated during PCR identifyingnto tly Hartman et sets 2010. W.A.T.E.R.S.:as opera- closely related al of sequences (also known a Workflow for the Alignment, Taxonomy, and Ecologync- of Ribosomal units or OTUs), removing redundant tional taxonomic Sequences. BMC Bioinformatics 2010, 11:317 doi: sequences above a certain percent identity cutoff, assign-6S 10.1186/1471-2105-11-317 each sequence or ing putative taxonomic identifiers to As representative of a group, inferring a phylogenetic tree of n- the sequences, and comparing the phylogenetic structure Sunday, September 16, 12
  • 61. One Major Issue with rRNA • Copy number varies greatly between taxa • Can lead to significant errors in estimates of relative abundance from numbers of readsSunday, September 16, 12
  • 62. Kembel Correction Kembel, Wu, Eisen, Green. In press. PLoS Computational Biology. Incorporating 16S gene copy number information improves estimates of microbial diversity and abundanceSunday, September 16, 12
  • 63. Method 3: All in the familySunday, September 16, 12
  • 64. Phylogenetic Challenge ??Sunday, September 16, 12
  • 65. Phylogenetic Challenge A single tree with everything?Sunday, September 16, 12
  • 66. rRNA analysis B A Cluster C Just B E. coli Humans Phylogeny A Yeast OTUs C OTU2 OTU1 OTU1 OTU4 OTU3 OTU2 OTU3 E. coli Humans OTU4 YeastSunday, September 16, 12
  • 67. PhylOTU Finding Meta Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in workflow of PhylOTU. See Results section for details. Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, ODwyer JP, Green JL, Eisen JA, Pollard KS. (2011) doi:10.1371/journal.pcbi.1001061.g001 PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic used toPLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061 alignment Data. build the profile, resulting in a multiple PD versus PID clustering, 2) to explore overlap betw sequence alignment of full-length reference sequences and clusters and recognized taxonomic designations, andSunday, September 16, 12 metagenomic reads. The final step of the alignment process is a the accuracy of PhylOTU clusters from shotgun re
  • 68. RecA, RpoB in GOS GOS 1 GOS 2 GOS 3 GOS 4 Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, et al. (2011) Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, GOS 5 and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011Sunday, September 16, 12
  • 69. Phylosift/ pplacer Aaron Darling, Guillaume Jospin, Holly Bik, Erik Matsen, Eric Lowe, and othersSunday, September 16, 12
  • 70. Phylosift • Probabilistic Phylogenetic Ecology • https://github.com/gjospin/PhyloSift • http://phylosift.wordpress.comSunday, September 16, 12
  • 71. Method 4: All in the genomeSunday, September 16, 12
  • 72. Multiple Genes? A single tree with everything?Sunday, September 16, 12
  • 73. Kembel Combiner Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214Sunday, September 16, 12
  • 74. typically used as a qualitative measure because duplicate s quences are usually removed from the tree. However, the test may be used in a semiquantitative manner if all clone Kembel Combiner even those with identical or near-identical sequences, are i cluded in the tree (13). Here we describe a quantitative version of UniFrac that w call “weighted UniFrac.” We show that weighted UniFrac b haves similarly to the FST test in situations where both a FIG. 1. Calculation of the unweighted and the weighted UniFr measures. Squares and circles represent sequences from two differe environments. (a) In unweighted UniFrac, the distance between t circle and square communities is calculated as the fraction of t branch length that has descendants from either the square or the circ environment (black) but not both (gray). (b) In weighted UniFra branch lengths are weighted by the relative abundance of sequences the square and circle communities; square sequences are weight twice as much as circle sequences because there are twice as many tot circle sequences in the data set. The width of branches is proportion to the degree to which each branch is weighted in the calculations, an gray branches have no weight. Branches 1 and 2 have heavy weigh since the descendants are biased toward the square and circles, respe tively. Branch 3 contributes no value since it has an equal contributio from circle and square sequences after normalization. Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214Sunday, September 16, 12
  • 75. Uses of Phylogeny in Genomics and Metagenomics Example 2: Functional Diversity and Functional PredictionsSunday, September 16, 12
  • 76. Phylogenomics PHYLOGENENETIC PREDICTION OF GENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 2A 3A 1B 2B 3B 1 2 3 4 5 6 1A INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 Based on 1A 1B 2A 2B 3A 3B 1 2 3 4 5 6 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Eisen, 1998 Genome Res 8: Duplication 163-167.Sunday, September 16, 12
  • 77. Diversity of Proteorhodopsins Venter et al., 2004. Science 304: 66.Sunday, September 16, 12
  • 78. Improving Functional Predictions • Same methods discussed for phylotyping improve phylogenomic functional prediction for protein families • Increase in sequence diversity helps tooSunday, September 16, 12
  • 79. Phylosift/ pplacer Aaron Darling, Guillaume Jospin, Holly Bik, Erik Matsen, Eric Lowe, and othersSunday, September 16, 12
  • 80. Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.Sunday, September 16, 12
  • 81. Wu et al. 2005 PLoS Genetics 1: e65.Sunday, September 16, 12
  • 82. Characterizing the niche-space distributions of components NMF in Metagenomes 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .2 0 .4 0 .6 0 .8 1 .0 Polyne sia Archipe la gos_ G S 0 4 8 a _ C ora l R e e f India n O ce a n_ G S 1 2 0 _ O pe n O ce a n Polyne sia Archipe la gos_ G S 0 4 9 _ C oa sta l G a la pa gos Isla nds_ G S 0 2 6 _ O pe n O ce a n India n O ce a n_ G S 1 1 9 _ O pe n O ce a n G e ne ra l C a ribbe a n S e a _ G S 0 1 5 _ C oa sta l C a ribbe a n S e a _ G S 0 1 9 _ C oa sta l India n O ce a n_ G S 1 1 4 _ O pe n O ce a n H igh E a ste rn Tropica l Pa cific_ G S 0 2 3 _ O pe n O ce a n M e dium India n O ce a n_ G S 1 1 0 a _ O pe n O ce a n India n O ce a n_ G S 1 0 8 a _ La goon R e e f Low C a ribbe a n S e a _ G S 0 1 8 _ O pe n O ce a n NA G a la pa gos Isla nds_ G S 0 3 4 _ C oa sta l India n O ce a n_ G S 1 2 2 a _ O pe n O ce a n India n O ce a n_ G S 1 2 1 _ O pe n O ce a n C a ribbe a n S e a _ G S 0 1 7 _ O pe n O ce a n India n O ce a n_ G S 1 1 2 a _ O pe n O ce a n India n O ce a n_ G S 1 1 3 _ O pe n O ce a n India n O ce a n_ G S 1 4 8 _ F ringing R e e f C a ribbe a n S e a _ G S 0 1 6 _ C oa sta l S e a India n O ce a n_ G S 1 2 3 _ O pe n O ce a n India n O ce a n_ G S 1 4 9 _ H a rbor G a la pa gos Isla nds_ G S 0 2 7 _ C oa sta l E a ste rn Tropica l Pa cific_ G S 0 2 2 _ O pe n O ce a n W a te r de pth S ites S a rga sso S e a _ G S 0 0 1 c_ O pe n O ce a n G a la pa gos Isla nds_ G S 0 3 5 _ C oa sta l G a la pa gos Isla nds_ G S 0 3 0 _ W a rm S e e p G a la pa gos Isla nds_ G S 0 2 9 _ C oa sta l >4000m G a la pa gos Isla nds_ G S 0 3 1 _ C oa sta l upwe lling India n O ce a n_ G S 1 1 7 a _ C oa sta l sa m ple 2000!4000m G a la pa gos Isla nds_ G S 0 2 8 _ C oa sta l 900!2000m G a la pa gos Isla nds_ G S 0 3 6 _ C oa sta l 100!200m Polyne sia Archipe la gos_ G S 0 5 1 _ C ora l R e e f Atoll N orth Am e rica n E a st C oa st_ G S 0 1 4 _ C oa sta l 20!100m N orth Am e rica n E a st C oa st_ G S 0 0 6 _ E stua ry 0!20m E a ste rn Tropica l Pa cific_ G S 0 2 1 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 9 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 1 1 _ E stua ry N orth Am e rica n E a st C oa st_ G S 0 0 8 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 1 3 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 4 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 7 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 3 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 2 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 5 _ E m baym e nt Co Co Co Co Co Chlorophyll Salinity Temperature Water Depth Sample Depth Insolation mp mp mp mp mp on on on on on en en en en en t1 t2 t3 t4 t5 (a) (b) (c) Functional biogeography of ocean microbes Figure 3: a) Niche-space non-negative matrix revealed through distributions for our five components (H T );Weitz,site- w/ b) the Dushoff, similarity matrix (HJiang environmental variables for the sites. The matrices Neches, factorization ˆ ˆ T H); c) et al. In press PLoS Langille, are aligned so that the same row corresponds to the same site in each matrix. Sites are One. Comes out 9/18. Levin, etc ordered by applying spectral reordering to the similarity matrix (see Materials and Methods). Rows are aligned across the three matrices.Sunday, September 16, 12
  • 83. Uses of Phylogeny in Genomics and Metagenomics Example 3: Selecting Organisms for StudySunday, September 16, 12
  • 84. GEBA http://www.jgi.doe.gov/programs/GEBA/pilot.htmlSunday, September 16, 12
  • 85. GEBA THAT IS SO LAMG10 http://www.jgi.doe.gov/programs/GEBA/pilot.htmlSunday, September 16, 12
  • 86. How To Keep Up? • IMG • Genomes Online • MicrobeDB • http://github.com/mlangill/microbedb/ • Langille MG, Laird MR, Hsiao WW, Chiu TA, Eisen JA, Brinkman FS. MicrobeDB: a locally maintainable database of microbial genomic sequences. Bioinformatics. 2012 28(14):1947-8.Sunday, September 16, 12
  • 87. Improving PhylotypingSunday, September 16, 12
  • 88. More Markers Phylogenetic group Genome Gene Maker Number Number Candidates Archaea 62 145415 106 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684Sunday, September 16, 12
  • 89. Better Reference Tree Morgan et al. submittedSunday, September 16, 12
  • 90. Improving Functional PredictionsSunday, September 16, 12
  • 91. Sifting Families Representative Genomes B A Extract Protein New Genomes Annotation Extract All v. All Protein BLAST Annotation Homology Screen for (MCL) C Clustering Homologs SFams HMMs Align & Build Sharpton et al. submitted Figure 1 HMMsSunday, September 16, 12
  • 92. Zorro - Automated Masking 9.0 8.0 Distance to True Tree 7.0 6.0 5.0 4.0 200 3.0 no masking ce to True Tree 2.0 zorro 1.0 gblocks 0.0 200 400 800 1600 3200 Sequence Length Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone. 0030288Sunday, September 16, 12
  • 93. Phylogenetic ContrastsSunday, September 16, 12
  • 94. GEBA Lesson We have still only scratched the surface of microbial diversitySunday, September 16, 12
  • 95. PD: All From Wu et al. 2009 Nature 462, 1056-1060Sunday, September 16, 12
  • 96. Families/PD not uniform 31    6   Sunday, September 16, 12
  • 97. GEBA uncultured Number of SAGs from Candidate Phyla 406 1 OD1 OP1 OP3 SAR Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - - Sample collections at 4 additional sites are underway. Phil Hugenholtz 97Sunday, September 16, 12
  • 98. GEBA Lesson Need Experiments from Across the Tree of Life tooSunday, September 16, 12
  • 99. ConclusionSunday, September 16, 12
  • 100. Sunday, September 16, 12
  • 101. MICROBESSunday, September 16, 12
  • 102. Acknowledgements • $$$ • DOE • NSF • GBMF • Sloan • DARPA • DSMZ • DHS • People, places • DOE JGI: Eddy Rubin, Phil Hugenholtz, Nikos Kyrpides • UC Davis: Aaron Darling, Dongying Wu, Holly Bik, Russell Neches, Jenna Morgan-Lang • Other: Jessica Green, Katie Pollard, Martin Wu, Tom Slezak, Jack Gilbert, Steven Kembel, J. Craig Venter, Naomi Ward, Hans-Peter KlenkSunday, September 16, 12