Jonathan Eisen talk on "Phylogenomics of Microbes" at Lake Arrowhead Small Genomes Meeting 2002

3,844 views

Published on

Talk by Jonathan Eisen on Phylogenomics of microbes at Lake Arrowhead Small Genomes meeting in 2002.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,844
On SlideShare
0
From Embeds
0
Number of Embeds
2,854
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Jonathan Eisen talk on "Phylogenomics of Microbes" at Lake Arrowhead Small Genomes Meeting 2002

  1. 1. TIGR
  2. 2. “Nothing in biology makes sense except in the light of evolution.” T. H. Dobzhansky (1973)TIGR
  3. 3. Topics of Discussion• Introduction to phylogenomics• Uses of evolutionary analysis in genomics – Selection of species – Functional prediction – Gene duplication – Gene loss – Genome rearrangements – Lateral transfer – Uncultured species – Specialization TIGR
  4. 4. Phylogenomic AnalysisPhylogenomics involves combining evolutionary reconstructions of genes, proteins, pathways, and species with analysis of complete genome sequences. TIGR
  5. 5. Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species TIGR
  6. 6. Strain Selection and Evolution• Increasing phylogenetic representation• Determining relatedness to model organism• Understanding major evolutionary transitions• Identifying taxa with unusual (high or low) rates of evolution• Identifying source of DNA from uncultured species• Species naming and type strains (e.g., see Ward et. al. 2001)TIGR
  7. 7. Evolutionary Diversity Still Poorly Represented in Complete GenomesTIGR Bacteria Archaea
  8. 8. S. pombe Genome Analysis Eukaryotes vs. Prokaryotes S. pombe S. cerevisiae Eukaryotes Encephalatozoon Archaea Worm FlyBacteria Humans Dictyostelium Arabidopsis Chlamydomonas Phytophthora Tetrahymena Plasmodium Trypanosoma Euglena Naegleria Trichomonas Giardia TIGR
  9. 9. Single vs. Multi-celled S. pombe Fungi S. cerevisiae Encephalatozoon Microsporidia Worm Animals Fly Humans Dictyostelium Dictyostelia Arabidopsis Plants Chlamydomonas Phytophthora Heterokonts Tetrahymena Ciliates Plasmodium Apicomplexa Trypanosoma Kinetoplastids Euglena Euglenas Naegleria Acrasidae Trichomonas ParabisaliaTIGR Giardia Diplomonads
  10. 10. Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species TIGR
  11. 11. Predicting Function• Identification of motifs• Homology/similarity based methods – Highest hit, top hit, HMMs, threading• Evolutionary methods – Phylogenetic trees – Ds/Dn – Phylogenetic profilesTIGR
  12. 12. FlyMutS.BorbuTrepaGTBP.MousehMSH4.HumanMSH4.CaeelMSH4.Yeastorf.ArathhMSH3.Humanorf.Chltrorf.DeiraatMSH2.ArathMSH2.NeucrMSH2.YeastMSH2.HumanMSH2.MouseMSH2.RatMSH2.XenlaSPE1.Dromeorf.TrepaMutS.AquaeorfStrpyMutS.Helpy yshD MSH3.Yeast MutS sgMutS.Saugl orfGTBP.Human MutS.Bacsu MSH6.Arath MutS2 orf MutS.Metth hMHS5 MSH5 MutS Swi4.Spombe MSH1.Spombe .MutSChltrThemaNeigoArathNeucrXenlaTrepaTheaqEcoliBacsuStrpyYeastHumanMouseMSH1.Yeast MSH6.Yeast Rep3.MouseMutS2.SauglHelpyDeiraSynspAquaeBorbuMutS2.MetthMutS2-SauglMutS2-MetthFlyMSH5CaeelHelpyMutS2.SauglMutS2.MetthChltrDeiraTheaqThemaNeigoEcoliMSH3SynspBacsuStrpyBorbuArathMSH6NeucrYeastHumanMouseRatMutS1XenlaAquae MSH1 Spombe Yeast Mouse Caeel Human Spombe Spombe Yeast Mouse Yeast Human Arath Caeel Spombe ArathMutS2 MSH6 MSH3 MSH2 MSH4 MutS1 MSH1 MSH5 MutS2 MSH2 MSH4D.C.B.A.Neigo Bacsu Synsp Borbu Deira Strpy Ecoli Aquae Theaq Thema human Yeast CaeelSegAllMMR && Segregationin All MMR regation MMR of Mismatches and in Nucleus Crossover Large Loops (Bacteria) Crossover Mitochondria Small Loops in Nucleus in Nucleus TIGR
  13. 13. rRNA and Uncultured MicrobesTIGR
  14. 14. Evolutionary Rate Variation 6 5 4 1 3 2TIGR
  15. 15. Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species TIGR
  16. 16. Why Duplications Are Useful to Identify• Allows division into orthologs and paralogs• Improves functional predictions• Helps identify mechanisms of duplication• Can be used to study mutation processes in different parts of a genome• Lineage specific duplications may be indicative of species’ specific adaptations TIGR
  17. 17. Lineage Specific Duplications in Wolbachia wMel Annotation hypothetical protein prophage LambdaW2, baseplate ankyrin repeat domain protein hypothetical protein assembly protein J, putative ankyrin repeat domain protein hypothetical protein prophage LambdaW2, baseplate ankyrin repeat domain protein hypothetical protein assembly protein V, putative ankyrin repeat domain protein hypothetical protein FRAMESHIFT ankyrin repeat domain protein hypothetical protein prophage LambdaW2, baseplate ankyrin repeat domain protein hypothetical protein assembly protein V, putative ankyrin repeat domain protein hypothetical protein FRAMESHIFT conserved domain protein hypothetical protein prophage LambdaW2, baseplate conserved domain protein hypothetical protein assembly protein W, putative conserved domain protein hypothetical protein prophage LambdaW2, minor tail conserved domain protein hypothetical protein protein Z, putative, conserved hypothetical protein hypothetical protein FRAMESHIFT conserved hypothetical protein hypothetical protein prophage LambdaW2, site- conserved hypothetical protein hypothetical protein specific recombinase, resolvase conserved hypothetical protein hypothetical protein family conserved hypothetical protein hypothetical protein prophage LambdaW4, ankyrin conserved hypothetical protein hypothetical protein repeat domain protein conserved hypothetical protein hypothetical protein prophage LambdaW4, DNA conserved hypothetical protein hypothetical protein methylase conserved hypothetical protein hypothetical protein prophage LambdaW4, portal conserved hypothetical protein hypothetical protein protein, FRAMESHIFT conserved hypothetical protein hypothetical protein prophage LambdaW4, portal conserved hypothetical protein hypothetical protein protein, FRAMESHIFT conserved hypothetical protein hypothetical protein prophage LambdaW4, terminase conserved hypothetical protein hypothetical protein large subunit, putative conserved hypothetical protein hypothetical protein prophage LambdaW5, ankyrin conserved hypothetical protein hypothetical protein repeat domain protein conserved hypothetical protein hypothetical protein prophage LambdaW5, ankyrin conserved hypothetical protein hypothetical protein repeat domain protein conserved hypothetical protein hypothetical protein prophage LambdaW5, ankyrin conserved hypothetical protein hypothetical protein repeat domain protein conserved hypothetical protein hypothetical protein prophage LambdaW5, baseplate conserved hypothetical protein hypothetical protein assembly protein J, putative, conserved hypothetical protein hypothetical protein FRAMESHIFT conserved hypothetical protein hypothetical protein prophage LambdaW5, baseplate FRAMESHIFT hypothetical protein assembly protein V, putative conserved hypothetical protein hypothetical protein prophage LambdaW5, baseplate POINT MUTATION hypothetical protein assembly protein W, putative conserved hypothetical protein, hypothetical protein prophage LambdaW5, minor tail degenerate hypothetical protein protein Z, putative, degenerate, conserved hypothetical protein, hypothetical protein FRAMESHIFT FRAMESHIFT hypothetical protein prophage LambdaW5, site- conserved hypothetical protein, hypothetical protein specific recombinase, resolvase FRAMESHIFT hypothetical protein family conserved hypothetical protein, hypothetical protein regulatory protein RepA, putative FRAMESHIFT hypothetical protein regulatory protein RepA, putative conserved hypothetical protein, hypothetical protein reverse transcriptase, putative FRAMESHIFT hypothetical protein reverse transcriptase, putative conserved hypothetical protein, hypothetical protein reverse transcriptase, putative interruption-C hypothetical protein sodium/alanine symporter family conserved hypothetical protein, hypothetical protein protein POINT MUTATION hypothetical protein sodium/alanine symporter family conserved hypothetical protein, hypothetical protein protein POINT MUTATION hypothetical protein TenA/THI-4 family protein conserved hypothetical protein, hypothetical protein transcriptional regulator truncated hypothetical protein transcriptional regulator conserved hypothetical protein, hypothetical protein transcriptional regulator truncation hypothetical protein transcriptional regulator DNA mismatch repair protein hypothetical protein transcriptional regulator MutL (mutL) hypothetical protein transcriptional regulator DNA repair protein RadC, hypothetical protein transcriptional regulator, putative putative hypothetical protein translation elongation factor Tu DNA repair protein RadC, hypothetical protein (tuf) putative, truncation hypothetical protein translation elongation factor Tu DNA repair protein RadC, hypothetical protein (tuf) truncation hypothetical protein transposase, degenerate DnaJ domain protein hypothetical protein transposase, IS4 family DnaJ domain protein hypothetical protein transposase, IS4 family exopolysaccharide synthesis hypothetical protein transposase, IS4 family protein ExoD-related protein major facilitator family transposase, IS5 family, exopolysaccharide synthesis transporter interruption-N protein ExoD-related protein major facilitator family transposase, IS5 family, HNH endonuclease family transporter truncation protein major facilitator family transposase, putative, degenerate HNH endonuclease family transporter transposase, putative, degenerate protein membrane protein, putative transposase, putative, degenerate hypothetical protein membrane protein, putative type IV secretion system protein hypothetical protein membrane protein, putative VirB4, putative hypothetical protein MutL family protein UDP-N-acetylglucosamine hypothetical protein Na+/H+ antiporter family protein pyrophosphorylase-related hypothetical protein Na+/H+ antiporter, putative protein hypothetical protein permease, putative hypothetical protein portal protein, FRAMESHIFT hypothetical protein portal protein, FRAMESHIFT hypothetical protein prophage LambdaW1, DNA TIGR hypothetical protein methylase hypothetical protein prophage LambdaW1, terminase hypothetical protein large subunit, putative hypothetical protein prophage LambdaW2, ankyrin hypothetical protein repeat domain protein hypothetical protein prophage LambdaW2, ankyrin hypothetical protein repeat domain protein
  18. 18. MutL Duplication in Wolbachia wMelORF01096 DNA mismatch repair protein MutL (mutL)ORF00446 MutL family protein TIGR
  19. 19. MutL Duplication in Wolbachia wMelTIGR
  20. 20. Older Duplication of UVDE Schizosaccharomyces pombeGP139 Neurospora crassaPIRS55262S552 Clostridium perfringensGP18145 Bacillus subtilisSPP45864YWJD Bacillus cereusGP6759487embCAB B BACAN 01914 UV endonuclease Bacillus haloduransOMNINTL01BH B BACAN 01459 UV endonuclease Deinococcus radioduransGP61167 Nostoc sp. PCC 7120GP17130610d 0.1TIGR
  21. 21. Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species TIGR
  22. 22. X-filesEisen et al. 2000. Genome Biology 1(6): 11.1-11.9Also see Tillier and Collins. 2000. Nature Genetics 26(2):195-7 and Suyama and Bork. 2001. Trends Genetics 17: 10-13.TIGR
  23. 23. C. trachomatis vs C. pneumoniae Dot Plot Origin C. pneumoniae AR39 TerminusTIGR C. trachomatis MoPn Read et al. 2000
  24. 24. StrpB vs. StrpA All1362310013622900136227001362250013622300 Series11362210013621900136217001362150013621300 TIGR 0 500 1000 1500 2000 2500
  25. 25. StrpB vs. StrpA: Orthologs1362310013622900136227001362250013622300 Series11362210013621900136217001362150013621300 TIGR 0 500 1000 1500 2000 2500
  26. 26. Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species TIGR
  27. 27. Most ‘Evidence’ for Gene Transfer has Alternative ExplanationsObservation Other Causes Always OccursUnusual Distribution Sampling bias Not if recipient already has gene.Unusual GC/Codons Selection Not if donor/recipient similar. Not if it occurred long ago.High hit to "distant" species Selection Usually. Rate variation Gene lossIncongruent trees Bad trees Usually. Missed paralogsCorrelation of above with Selection Only if genes keep order afterneighbors transfer. TIGR
  28. 28. Steps in Lateral Gene TransferA B C D 6 3-5 2 1TIGR
  29. 29. Mitochondrial GenomeIntegration into A. thaliana chrII 0 4E+05 3E+05 2E+05 1E+05 3.6E+06 3.5E+06 3.4E+06 3.3E+06 3.2E+06 thaliana D. C. B. Alternative Chromosome II Possible Mitochondrial Form D’D’A. 1’ 1 A B C A’ 3 Insertion Mitochondrial PointAlternative GenomeTIGR Lin et al., 1999
  30. 30. Number of pBVTs Depends1800 on # of Genomes Analyzed160014001200 Fruit fly1000 C. elegans Arabidopsis Yeast800 Parasites600400200 0 1 2 3 4 5 Other TIGR Number of protein sets Salzberg et al. 2001
  31. 31. Trees Don’t Support Transfer IITIGR
  32. 32. Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species TIGR
  33. 33. TIGR Beja O, et.al., Science 2000 289:1902-6, Nature (2001) 411: 786-789
  34. 34. Puf Operons from Uncultured BacteriaTIGR
  35. 35. Puf Operons vs. Cultured SpeciesTIGR
  36. 36. Alternative Phylogenetic Anchors Chlorobium tepidum Cytophaga hutchinsonii Prevotella ruminocola Bacteroides fragilis Porphyromonas gingivalis MBBAD68TRTIGR MBBAD65TR
  37. 37. Acknowledgements• Outside TIGR –A. Stoltzfus –H. Ochman –D. Bryant –W. F. Doolittle –M. Eisen –M-I Benito• $$$: –NSF –NIH –ONR –DOE –NEBTIGR
  38. 38. B. anthracis lineage specific duplicationsORF04205 molybdopterin biosynthesis protein MoeA (moeA)ORF05907 molybdopterin biosynthesis protein MoeA (moeA)ORF02636 molybdopterin biosynthesis protein MoeA (moeA)ORF04204 molybdopterin biosynthesis protein MoeB, putativeORF05908 molybdopterin biosynthesis protein MoeB, putativeORF02634 molybdopterin biosynthesis protein MoeB, putativeORF05904 molybdopterin converting factor, subunit 1 (moaD)ORF02639 molybdopterin converting factor, subunit 1 (moaD)ORF04206 molybdopterin converting factor, subunit 2 (moaE)ORF05905 molybdopterin converting factor, subunit 2 (moaE)ORF02638 molybdopterin converting factor, subunit 2 (moaE)TIGR Based on Read et al. submitted
  39. 39. Schizosaccharomyces pombeGP139 Neurospora crassaPIRS55262S552 Clostridium perfringensGP18145 Bacillus subtilisSPP45864YWJD Bacillus cereusGP6759487embCAB B BACAN 01914 UV endonuclease Bacillus haloduransOMNINTL01BH B BACAN 01459 UV endonuclease Deinococcus radioduransGP61167 Nostoc sp. PCC 7120GP17130610d 0.1TIGR
  40. 40. TIGR
  41. 41. C. pneumoniae Paralogs by Position 1250000 1000000 750000 500000 Subject Orf Position 250000 0 0 250000 500000 750000 1000000 1250000TIGR Query Orf Position
  42. 42. C. pneumoniae Paralogs - 1250000 Lineage Specific 1000000 750000 500000 Subject Orf Position 250000 0 0 250000 500000 750000 1000000 1250000TIGR Query Orf Position

×