Jonathan Eisen talk at Lake Arrowhead Microbial Genomics Mtg #LAMG10
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Jonathan Eisen talk at Lake Arrowhead Microbial Genomics Mtg #LAMG10

on

  • 2,004 views

Talk by Jonathan Eisen at Lake Arrowhead Microbial Genomes Meeting. Sept. 14, 2010

Talk by Jonathan Eisen at Lake Arrowhead Microbial Genomes Meeting. Sept. 14, 2010

Statistics

Views

Total Views
2,004
Views on SlideShare
1,696
Embed Views
308

Actions

Likes
2
Downloads
11
Comments
0

19 Embeds 308

http://phylogenomics.blogspot.com 249
http://phylogenomics.blogspot.ca 14
http://phylogenomics.blogspot.in 9
http://phylogenomics.blogspot.co.uk 6
http://phylogenomics.blogspot.de 6
http://phylogenomics.blogspot.com.br 4
http://phylogenomics.blogspot.com.au 3
http://phylogenomics.blogspot.sg 2
http://phylogenomics.blogspot.nl 2
http://phylogenomics.blogspot.fr 2
http://phylogenomics.blogspot.jp 2
http://www.phylogenomics.blogspot.com 2
http://phylogenomics.blogspot.se 1
http://phylogenomics.blogspot.mx 1
http://phylogenomics.blogspot.cz 1
http://phylogenomics.blogspot.fi 1
http://phylogenomics.blogspot.com.ar 1
http://static.slidesharecdn.com 1
http://phylogenomics.blogspot.hu 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Jonathan Eisen talk at Lake Arrowhead Microbial Genomics Mtg #LAMG10 Presentation Transcript

  • 1. The Importance of History (and other obsessions) Jonathan A. Eisen UC Davis Talk for Lake Arrowhead Microbial Genomes 2010 (#LAMG10) Wednesday, September 15, 2010
  • 2. Wednesday, September 15, 2010
  • 3. Social Networking in Science Wednesday, September 15, 2010
  • 4. Bacterial evolve Wednesday, September 15, 2010
  • 5. Evolution of Lake Arrowhead Wednesday, September 15, 2010
  • 6. Blast Peptide LAKEARROWHEAD Wednesday, September 15, 2010
  • 7. Wednesday, September 15, 2010
  • 8. Wednesday, September 15, 2010
  • 9. Wednesday, September 15, 2010
  • 10. Homework • Do blastp search with other famous people associated with Lake Arrowhead Meeting • JEFFREYHMILLER • SARAHPALIN and her relationship to fungi B. fuckeliana • see http://phylogenomics.blogspot.com/ 2008/09/tracing-evolutionary-history-of- sarah.html Wednesday, September 15, 2010
  • 11. 2010 Wednesday, September 15, 2010
  • 12. 2008 Wednesday, September 15, 2010
  • 13. 2006 Wednesday, September 15, 2010
  • 14. 2004 Wednesday, September 15, 2010
  • 15. No 2002 Wednesday, September 15, 2010
  • 16. Wayback Machine Wednesday, September 15, 2010
  • 17. 2002 Wednesday, September 15, 2010
  • 18. Wednesday, September 15, 2010
  • 19. Quotes 2004 • Space-time continuum of genes and genomes • Gene sequences are the wormhole that allows one to tunnel into the past • The human mind can conceive of things with no basis in physical reality • Thoughts can go faster than the speed of light Wednesday, September 15, 2010
  • 20. Wednesday, September 15, 2010
  • 21. Quotes 2006 • The human guts are a real milieu of stuff • You better kiss everybody • Microbes not only have a lot of sex, they have a lot of weird sex • This is how you do metagenomics on 50 dollars, and that’s Canadian dollars Wednesday, September 15, 2010
  • 22. Quotes 2008 • Antibiotics do not kill things, they corrupt them • There comes a point in life when you have to bring chemists into the picture • The rectal swabs are here in tan color • And there's Jeffrey Dahmer • We are the environment. We live the phenotype. • If I have time I will tell you about a dream • A paper came out next year Wednesday, September 15, 2010
  • 23. Quotes 2010 • We have been using this word for many years without actually realizing it was correct • Another thing you need to know" pause "Actually you don't NEED to know any of this • "I have been influenced by Fisher Price throughout my life • Don't take that away from us • It takes 1000 nanobiologists to make one microbiologist • I am going to wrap up as I hear the crickets chirping • And we will bring out the unused cheese from yesterday • In an engineering sense, the vagina is a simple plug flow reactor • This is going to be ironic coming from someone who studies circumcision • A little bit about time, but I am going to spend a lot less time on time than on space Wednesday, September 15, 2010
  • 24. Keywords I remember from 2010 • Penis • Vagina • Anthrax • Acne • Ulcer (multiple kinds) • Global warming • Antibiotic resistance • Virulence 24 Wednesday, September 15, 2010
  • 25. Wednesday, September 15, 2010
  • 26. Wednesday, September 15, 2010
  • 27. rRNA Tree of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Wednesday, September 15, 2010
  • 28. Proteobacteria 2002 TM6 OS-K Acidobacteria • At least 40 Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002 Wednesday, September 15, 2010
  • 29. 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002 Wednesday, September 15, 2010
  • 30. 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are only OP3 Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002 Wednesday, September 15, 2010
  • 31. 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are only OP3 Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002 Wednesday, September 15, 2010
  • 32. Why Increase Phylogenetic Coverage? • Common approach within some eukaryotic groups (FGP, NHGRI, etc) • Many successful small projects to fill in bacterial or archaeal gaps • Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature • Many potential benefits Wednesday, September 15, 2010
  • 33. Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Solution I: Coprothmermobacter OP10 sequence more Thermomicrobia Chloroflexi TM7 phyla Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11 Wednesday, September 15, 2010
  • 34. Wednesday, September 15, 2010
  • 35. Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Still highly Coprothmermobacter OP10 biased in terms Thermomicrobia Chloroflexi TM7 of the tree Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11 Wednesday, September 15, 2010
  • 36. Major Lineages of Actinobacteria 2.5 Actinobacteria 2.5.1 Acidimicrobidae 2.5.1 Acidimicrobidae 2.5.1.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.1 Unclassified 2.5.1.3 Acidimicrobineae 2.5.1.3.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.3.2 Acidimicrobiaceae 2.5.1.4 BD2-10 2.5.1.3 Acidimicrobineae 2.5.1.5 EB1017 2.5.2 Actinobacteridae 2.5.1.4 BD2-10 2.5.2.1 Unclassified 2.5.2.10 Ellin306/WR160 2.5.1.5 EB1017 2.5.2.11 Ellin5012 2.5.2.12 Ellin5034 2.5.2 Actinobacteridae 2.5.2.13 Frankineae 2.5.2.13.1 Unclassified 2.5.2.1 Unclassified 2.5.2.13.2 Acidothermaceae 2.5.2.13.3 Ellin6090 2.5.2.10 Ellin306/WR160 2.5.2.13.4 Frankiaceae 2.5.2.11 Ellin5012 2.5.2.13.5 2.5.2.13.6 Geodermatophilaceae Microsphaeraceae 2.5.2.12 Ellin5034 2.5.2.13.7 2.5.2.14 Sporichthyaceae Glycomyces 2.5.2.13 Frankineae 2.5.2.15 2.5.2.15.1 Intrasporangiaceae Unclassified 2.5.2.14 Glycomyces 2.5.2.15.2 2.5.2.15.3 Dermacoccus Intrasporangiaceae 2.5.2.15 Intrasporangiaceae 2.5.2.16 2.5.2.17 Kineosporiaceae Microbacteriaceae 2.5.2.16 Kineosporiaceae 2.5.2.17.1 2.5.2.17.2 Unclassified Agrococcus 2.5.2.17 Microbacteriaceae 2.5.2.17.3 2.5.2.18 Agromyces Micrococcaceae 2.5.2.18 Micrococcaceae 2.5.2.19 2.5.2.2 Micromonosporaceae Actinomyces 2.5.2.19 Micromonosporaceae 2.5.2.20 2.5.2.20.1 Propionibacterineae Unclassified 2.5.2.2 Actinomyces 2.5.2.20.2 2.5.2.20.3 Kribbella Nocardioidaceae 2.5.2.20 Propionibacterineae 2.5.2.20.4 2.5.2.21 Propionibacteriaceae Pseudonocardiaceae 2.5.2.21 Pseudonocardiaceae 2.5.2.22 2.5.2.22.1 Streptomycineae Unclassified 2.5.2.22 Streptomycineae 2.5.2.22.2 2.5.2.22.3 Kitasatospora Streptacidiphilus 2.5.2.23 Streptosporangineae 2.5.2.23 2.5.2.23.1 Streptosporangineae Unclassified 2.5.2.3 Actinomycineae 2.5.2.23.2 2.5.2.23.3 Ellin5129 Nocardiopsaceae 2.5.2.4 Actinosynnemataceae 2.5.2.23.4 2.5.2.23.5 Streptosporangiaceae Thermomonosporaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.3 2.5.2.4 Actinomycineae Actinosynnemataceae 2.5.2.6 Brevibacteriaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.6 Brevibacteriaceae 2.5.2.7 Cellulomonadaceae 2.5.2.7 Cellulomonadaceae 2.5.2.8 Corynebacterineae 2.5.2.8 Corynebacterineae 2.5.2.8.1 Unclassified 2.5.2.8.2 Corynebacteriaceae 2.5.2.9 Dermabacteraceae 2.5.2.8.3 Dietziaceae 2.5.2.8.4 Gordoniaceae 2.5.3 Coriobacteridae 2.5.2.8.5 Mycobacteriaceae 2.5.2.8.6 Rhodococcus 2.5.3.1 Unclassified 2.5.2.8.7 Rhodococcus 2.5.2.8.8 Rhodococcus 2.5.3.2 Atopobiales 2.5.2.9 Dermabacteraceae 2.5.2.9.1 Unclassified 2.5.3.3 Coriobacteriales 2.5.2.9.2 Brachybacterium 2.5.2.9.3 Dermabacter 2.5.3.4 Eggerthellales 2.5.3 Coriobacteridae 2.5.3.1 Unclassified 2.5.4 OPB41 2.5.3.2 Atopobiales 2.5.3.3 Coriobacteriales 2.5.5 PK1 2.5.3.4 Eggerthellales 2.5.4 OPB41 2.5.6 Rubrobacteridae 2.5.5 PK1 2.5.6 Rubrobacteridae 2.5.6.1 Unclassified 2.5.6.1 Unclassified 2.5.6.2 "Thermoleiphilaceae 2.5.6.2 "Thermoleiphilaceae 2.5.6.2.1 Unclassified 2.5.6.2.2 Conexibacter 2.5.6.3 MC47 2.5.6.2.3 XGE514 2.5.6.3 MC47 2.5.6.4 Rubrobacteraceae 2.5.6.4 Rubrobacteraceae Wednesday, September 15, 2010
  • 37. Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Same trend in Coprothmermobacter OP10 Archaea Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11 Wednesday, September 15, 2010
  • 38. Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Same trend in Coprothmermobacter OP10 Eukaryotes Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11 Wednesday, September 15, 2010
  • 39. Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Same trend in Coprothmermobacter OP10 Viruses Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11 Wednesday, September 15, 2010
  • 40. Proteobacteria • GEBA TM6 OS-K • At least 40 phyla Acidobacteria • A genomic Termite Group OP8 of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria and Fibrobacteres Marine GroupA sequences are archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria Actinobacteria three phyla OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Solution: Really Coprothmermobacter OP10 Thermomicrobia Fill in the Tree Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11 Wednesday, September 15, 2010
  • 41. GEBA Pilot Project Overview • Identify major branches in rRNA tree for which no genomes are available • Identify those with a cultured representative in DSMZ • DSMZ grew > 200 of these and prepped DNA • Sequence and finish 100+ (covering breadth of bacterial/archaea diversity) • Annotate, analyze, release data • Assess benefits of tree guided sequencing • 1st paper Wu et al in Nature Dec 2009 Wednesday, September 15, 2010
  • 42. GEBA Pilot Project: Components • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow, Tanya Woyke) • Project management (David Bruce, Eileen Dalin, Lynne Goodwin) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk) • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng) • Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al) • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla) • Adopt a microbe education project (Cheryl Kerfeld) • Outreach (David Gilbert) • $$$ (DOE, DSMZ, GBMF) Wednesday, September 15, 2010
  • 43. GEBA and Openness • All data released as quickly as possible w/ no restrictions to IMG-GEBA; Genbank, etc • Data also available in Biotorrents (http:// biotorrents.net) • Individual genome reports published in OA “Standards in Genome Sciences (SIGS)” • 1st GEBA paper in Nature freely available and published using Creative Commons License 43 Wednesday, September 15, 2010
  • 44. GEBA Lesson 1 rRNA Tree is Useful for Identifying Phylogenetically Novel Organisms 44 Wednesday, September 15, 2010
  • 45. rRNA Tree of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Wednesday, September 15, 2010
  • 46. Network of Life? Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Wednesday, September 15, 2010
  • 47. Compare PD in rRNA and WGT Wednesday, September 15, 2010
  • 48. PD of rRNA, Genome Trees Similar From Wu et al. 2009 Nature 462, 1056-1060 Wednesday, September 15, 2010
  • 49. GEBA Lesson 2 Phylogeny-driven genome selection helps discover new genetic diversity Wednesday, September 15, 2010
  • 50. Network of Life? Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Wednesday, September 15, 2010
  • 51. Protein Family Rarefaction Curves • Take data set of multiple complete genomes • Identify all protein families using MCL • Plot # of genomes vs. # of protein families Wednesday, September 15, 2010
  • 52. Wednesday, September 15, 2010
  • 53. Wednesday, September 15, 2010
  • 54. Wednesday, September 15, 2010
  • 55. Wednesday, September 15, 2010
  • 56. Wednesday, September 15, 2010
  • 57. Synapomorphies exist Wednesday, September 15, 2010
  • 58. Phylogenetic Distribution Novelty: Bacterial Actin Related Protein 2"#3)&4&*&& !"#*)$*),+% 5"#$-.-6&0&1- !"#$%,$-%)( 7"#0(1.8-9& !"#$''+-+,',! 5"#:1,)*&$/0 !"#&$,%+)+-+ !"#$% !"#$%&'()*&& !"#$%&'(%() (( +"#,-.(/01 !"#*+,**'+( ;"#01,&-*0 !"#%*+$--( <"#$-.-3.1%&0 !"#%',&'-+) ') 2"#$&*-.-1 !"#$'(-%%+&$ ="#$.1001 !"#-*$+$(&( !&'( $++ >"#0$1,/%1.&0 !"#&$**+),)-! *$ $++ ;"#01,&-*0 !"#*+,$*'( '* 5"#:1,)*&$/0 !"#&$,%+%-%% $++ 5"#$-.-6&0&1- !"#',&+$)* !&') ?"#@-%1*)A10(-. !"#&%'%&*%* $++ B"#A1%%/0# "#%*,-&*'( )* 2"#*-)').@1*0 !"#*-&'''(+ 5"#$-.-6&0&1- !"#',&&*&* !&'* $++ ?"#@-%1*)A10(-. !"#$)),)*%, $++ ;"#01,&-*0 !"#*+,$*),! ;"#)$C.1$-/@ !"#&&),(*((- +!&' 5"#$-.-6&0&1- !"#$++-&%%! ), ."#,1(-*0 !"#$'-+*$((&! !&', (( !"#(C1%&1*1 !"#$-,(%'+-! (% 5"#$-.-6&0&1- !"#$,+$(,& $++ 5"#:1,)*&$/0 !"#&$,%+-,(,! !&'- -) ?"#4&0$)&4-/@ !"#''-+&%$- )% ?"#@-%1*)A10(-. !"#$)),),%) () 5"#$-.-6&0&1- !"#',&,$$% $++ ?"#C1*0-*&&!"#&$-*$ $(&$ !&'. $++ D"#01(&61 !"#$-&'*)%&+! !"#(C1%&1*1!"#$-%$ $),) !&'/ ?"#@-%1*)A1(-. !"#$((&+,*- $++ <"#@/0$/%/0 !"#&&'&%'*(, !&'(0 +/*! Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor Kunin See also Guljamow et al. 2007 Current Biology. Wednesday, September 15, 2010
  • 59. GEBA Lesson 3 Phylogeny-driven genome selection improves genome annotation Wednesday, September 15, 2010
  • 60. Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology prediction Kostas Natalia Thanos Nikos Iain Mavrommatis Ivanova Lykidis Kyrpides Anderson Wednesday, September 15, 2010
  • 61. GEBA Lesson 4 Metadata and individual genome papers important Wednesday, September 15, 2010
  • 62. SIGS http://standardsingenomics.org/ Wednesday, September 15, 2010
  • 63. GEBA Lesson 5 Phylogeny-driven genome selection improves analysis of metagenome data Wednesday, September 15, 2010
  • 64. Wednesday, September 15, 2010 genomes if no reference • Assigning reads to phylogenetic groups using multiple genes • Phylogenetic binning • Phylogenetic ecology - especially important Weighted % of Clones Al pha pr ot 0 0.1250 0.2500 0.3750 0.5000 Be eo Al ta ba ph G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 am pr ot ct er a m eo ia Be pro ap ba ro ct ta teo D te er G p b el ob ia ta am rot ac pr ac Ep ot te U si lo eo ria m eo te nc ba ba ria la np Ep ap ss ro ct ct ifi te er si rot ed ob ia lo Pr ac n eo eria ot te De pr ba eo ria ba lta ote cte Cy ct pr ob ria an er ob ia o a ac C teo cte Ch te ya b ri ria la no ac a m b te Ac yd id ia ob e Fi act ria rm er Ba act ct er ia Ac ic ia Uses of phylogenetic er ut Ac oi tin es de tin te ob ob s a ac te C cte ria hl ri Aq or a Pl ui an fic ob ct om ae C i yc FB Sp et C iro es hl ch o ae te Major Phylogenetic Group Fi Sp rof rm s ic iro lex i Sargasso Phylotypes ut classification in metagenomics Ch es Fu cha lo ro De U fle so ete nc xi in ba s la Ch oc ss lo ct ifi ro oc ed bi er Ba Ecus ia ct ur - er ia yaTh C rcherm re na aeous frr tsf t pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rc rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB ha a eo ta
  • 65. Wednesday, September 15, 2010 genomes if no reference phylogenetic groups using multiple genes Limited • Phylogenetic binning • Phylogenetic ecology - especially important sampling Weighted % of Clones Al pha pr ot 0 0.1250 0.2500 0.3750 0.5000 Be eo Al ta ba ph G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr a poor genomic am ot ct er m eo ia Be pro ap ba ro ct ta teo D te er G p b el ob ia ta • Assigning reads to in past pr ac am rot ac Ep ot te U si lo eo ria m eo te nc ba ba ria la np Ep ap ss ro ct ct ifi te er si rot ed ob ia lo Pr ac n eo eria ot te De pr ba eo ria ba lta ote cte Cy ct pr ob ria an er ob ia o a by ac C teo cte Ch te ya b ri ria la no ac a m b te Ac yd id ia ob e Fi act ria rm er Ba act ct er ia Ac ic ia Uses of phylogenetic er ut Ac oi tin es de tin te ob ob s a ac te C cte ria hl ri Aq or a Pl ui an fic ob ct om ae C i yc FB Sp et C iro es hl ch o ae te Major Phylogenetic Group Fi Sp rof rm s ic iro lex i Sargasso Phylotypes ut classification in metagenomics Ch es Fu cha lo ro De U fle so ete nc xi in ba s la Ch oc ss lo ct ifi ro oc ed bi er Ba Ecus ia ct ur - er ia yaTh C rcherm re na aeous frr tsf t pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rc rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB ha a eo ta
  • 66. Metagenomic Analysis Improves w/ Phylogenetic Sampling • Small but real improvements in –Gene identification / confirmation –Functional prediction –Binning –Phylogenetic classification Wednesday, September 15, 2010
  • 67. Metagenomic Analysis Improves w/ Phylogenetic Sampling • Small but real improvements in –Gene identification / confirmation –Functional prediction –Binning –Phylogenetic classification • But not a lot ... Wednesday, September 15, 2010
  • 68. GEBA Future 1 Need to adapt genomic and metagenomic methods to make use of GEBA data Wednesday, September 15, 2010
  • 69. Phylogenetic Binning Using AMPHORA dnaG 0.7 frr infC 0.6 nusA pgk pyrG 0.5 0.4 Improves with better rplA rplB rplC rplD 0.3 phylogenetic methods rplE rplF rplK rplL 0.2 rplM rplN rplP 0.1 rplS rplT rpmA 0 rpoB rpsB es ia es s s ria bi ia ia om ae ia e ria ia ria ia ria xi te te ia er er er er er fle er ro et ut rpsC fic te te te te yd de ae ct ct ct ct ct Ba act lo yc ro ic ac ac ac ac ui m ch oi ba Ch ba ba ba Ba rm rpsE lo Aq ob ob ob ob ob er la iro eo Ch eo eo eo Fi ed Ch ct an te te id tin ct rpsI Sp ot ot ot ot Ac ro ro ifi an Cy Ac Pr pr pr pr ss ap np rpsJ Pl ha ta ta ed la m lo el Be nc p rpsK si ifi am Al D Ep U ss rpsM G la nc rpsS U smpB tsf AMPHORA - each read on its own tree Wednesday, September 15, 2010
  • 70. Improving Phylogeny for Metagenomic Reads • Examples using reference trees – AMPHORA (Wu and Eisen) – PPlacer (Erik Matsen) – FastTree (Morgan Price) • Variants – Use concatenated alignment of markers not just individual genes (Steven Kembel) – Apply to OTU identification not just classification (Thomas Sharpton) – CoBinning: look for linkage among fragments/genes (Aaron Darling) Wednesday, September 15, 2010
  • 71. Phylogenetic Binning Using AMPHORA dnaG 0.7 frr infC 0.6 nusA pgk pyrG 0.5 0.4 Improves with more rplA rplB rplC rplD 0.3 gene families rplE rplF rplK rplL 0.2 rplM rplN rplP 0.1 rplS rplT rpmA 0 rpoB rpsB es ia es s s ria bi ia ia om ae ia e ria ia ria ia ria xi te te ia er er er er er fle er ro et ut rpsC fic te te te te yd de ae ct ct ct ct ct Ba act lo yc ro ic ac ac ac ac ui m ch oi ba Ch ba ba ba Ba rm rpsE lo Aq ob ob ob ob ob er la iro eo Ch eo eo eo Fi ed Ch ct an te te id tin ct rpsI Sp ot ot ot ot Ac ro ro ifi an Cy Ac Pr pr pr pr ss ap np rpsJ Pl ha ta ta ed la m lo el Be nc p rpsK si ifi am Al D Ep U ss rpsM G la nc rpsS U smpB tsf AMPHORA - each read on its own tree Wednesday, September 15, 2010
  • 72. Identifying new markers • Take all genomes • All vs. all search • Identify protein families • For each family measure –Evenness in copy number –Universality –Phylogenetic congruence with WGT –Monophyly for superfamilies Wednesday, September 15, 2010
  • 73. Distances between gene trees and the AMPHORA concatenated genome tree rpmA coaE coaE rpmA trmD rplL rpsS rpsQ radA rplR rplD rplQ tsf rpsH frr smpB ttf rpsO rplR rplP rplM rpsS rplI rplV rpsB rplT rpsO rplO mraW rpsP rpsH rpsK rplQ rplU rplL tsf rplT trmD rplE rplS rpsP ttf rplC rpsI rplV mraW rplS rpsL infC rpsG rpsM rplM rplO rplI rplU pyrH rpsL rpsM rpsQ ruvA guaA radA rpsG purA smpB rplK priA rplD rpsK infC rplK rplC serS rplE rplA rplA rplF frr ruvA rplF rpsC serS rplN rplN rplP guaA rpsE ruvB pyrH rpsB rpsI rpsJ secY rRNA16S rpsJ secY purA rplB rplB priA nusA rpsE ruvB rpsC rRNA16S nusA 0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 NODAL distance SPLIT distance AMPHORA marker Ribosomal protein Transcription/translation related protein DNA repair protein Protein of other function Distance between the genome tree and 100 random trees (average ± standard deviation) Wednesday, September 15, 2010
  • 74. Identifying new phylogenetic markers within phyla • Take all genomes within a phylum • All vs. all search • Identify protein families • For each family measure –Evenness in copy number –Universality –Phylogenetic congruence with WGT –Monophyly for superfamilies Wednesday, September 15, 2010
  • 75. Keep only the families with: Universality * Evenness * monophyly >= 90*90*90 Phylogenetic group Genome Number Gene Number Maker Candidates Archaea 62 145415 102 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 142 Betaproteobacteria 56 266362 294 Gammaproteobacteria 126 483632 141 Deltaproteobacteria 25 102115 44 Epislonproteobacteria 18 33416 446 Bacteriodes 25 71531 179 Chlamydae 13 13823 561 Chloroflexi 10 33577 140 Cyanobacteria 36 124080 532 Firmicutes 106 312309 80 Spirochaetes 18 38832 72 Thermi 5 14160 727 Thermotogae 9 17037 646 Wednesday, September 15, 2010
  • 76. Al p ha pr ot Be eo ta ba G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ct am ot er m eo ia Wednesday, September 15, 2010 ap ba ro ct D te er el ob ia ta pr ac Ep ot te U si lo eo ria nc ba la np ct ss ro er ifi te ia ed ob Pr ac ot te eo ria ba Cy ct an er ob ia ac Ch te ria la m Ac yd id ia ob e Ba act ct er er ia Ac oi de tin te ob s Other needs? ac te ria Aq Pl ui an fic ct om ae yc Sp et AMPHORA - each read on its own tree iro es ch ae Fi te rm s ic ut Ch es lo ro U fle nc xi la Ch ss lo ifi ro ed bi Ba ct er ia Phylogenetic Binning Using AMPHORA frr tsf pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB
  • 77. Other Ways to Make Better Use of GEBA Data • Rebuild protein family models • Experiments from across the tree needed • Need better phylogenies, including HGT • Improved tools for using distantly related genomes in metagenomic analysis • Better recording and sharing of metadata about organisms Wednesday, September 15, 2010
  • 78. GEBA Future 2 The dark matter of the biological universe Wednesday, September 15, 2010
  • 79. rRNA Tree of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Wednesday, September 15, 2010
  • 80. Phylogenetic Diversity: Sequenced Bacteria & Archaea From Wu et al. 2009 Wednesday, September 15, 2010
  • 81. Phylogenetic Diversity with GEBA From Wu et al. 2009 Wednesday, September 15, 2010
  • 82. Phylogenetic Diversity: Isolates From Wu et al. 2009 Wednesday, September 15, 2010
  • 83. Phylogenetic Diversity: All From Wu et al. 2009 Wednesday, September 15, 2010
  • 84. Fantasy analysis of # PFAMs GEBA Genomes PD/Genome ~0.1 PFAMs/Genome ~1000 PFAMs/PD ~10000 Total PFAMS ~10,000,000 From Wu et al. 2009 Wednesday, September 15, 2010
  • 85. Conclusions • Sequencing phylogenetically novel genomes has many benefits • To obtain the most benefits, we need to change and adapt: computationally and experimentally • Most of the phylogenetic diversity of microbes remains to be sampled • Long live the Lake Arrowhead Microbial Genomes meeting Wednesday, September 15, 2010
  • 86. Wednesday, September 15, 2010
  • 87. MICROBES Wednesday, September 15, 2010
  • 88. Proteobacteria • GEBA TM6 OS-K • At least 40 phyla Acidobacteria • A genomic Termite Group OP8 of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria and Fibrobacteres Marine GroupA sequences are archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria Actinobacteria three phyla OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Solution: Really Coprothmermobacter OP10 Thermomicrobia Fill in the Tree Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11 Wednesday, September 15, 2010
  • 89. Thanks Institutions $$$$ JGI etc DOE UC Davis NSF DSMZ GBMF TIGR People Dongying Wu Phil Hugenholtz Nikos Kyrpides FIgure from Barton, Eisen et al. Hans-Peter Klenk “Evolution”, CSHL Press. Eddy Rubin Based on tree from Pace NR, 2003. Wednesday, September 15, 2010