Phylogenomics

                         Jonathan A. Eisen
                            UC Davis

              Bodega Applied Phylogenetics Workshop
                          March 7, 2011
Tuesday, March 8, 2011
Fleischmann et al.
                         1995 Science
                         269:496-512
Tuesday, March 8, 2011
Whole Genome Shotgun Sequencing




Tuesday, March 8, 2011
Whole Genome Shotgun Sequencing




Tuesday, March 8, 2011
Whole Genome Shotgun Sequencing




 Warner Brothers, Inc.




Tuesday, March 8, 2011
Whole Genome Shotgun Sequencing


                          shotgun


 Warner Brothers, Inc.




Tuesday, March 8, 2011
Whole Genome Shotgun Sequencing


                          shotgun


 Warner Brothers, Inc.




Tuesday, March 8, 2011
Whole Genome Shotgun Sequencing


                          shotgun


 Warner Brothers, Inc.
                                    sequence




Tuesday, March 8, 2011
Whole Genome Shotgun Sequencing


                          shotgun


 Warner Brothers, Inc.
                                    sequence




Tuesday, March 8, 2011
Assemble Fragments




Tuesday, March 8, 2011
Assemble Fragments


                  sequencer output




Tuesday, March 8, 2011
Assemble Fragments


                  sequencer output




Tuesday, March 8, 2011
Assemble Fragments


                  sequencer output

                                     assemble
                                     fragments




Tuesday, March 8, 2011
Assemble Fragments


                  sequencer output

                                     assemble
                                     fragments

                                     Closure &

                                     Annotation




Tuesday, March 8, 2011
From http://genomesonline.org
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Genome Sequences Have
               Revolutionized Microbiology
         • Predictions of metabolic processes
         • Better vaccine and drug design
         • New insights into mechanisms of evolution
         • Genomes serve as template for functional
           studies
         • New enzymes and materials for engineering
           and synthetic biology
Tuesday, March 8, 2011
General Steps in Analysis of
            Complete Genomes
       • Identification/prediction of genes
       • Characterization of gene features
       • Characterization of genome features
       • Prediction of gene function
       • Prediction of pathways
       • Integration with known biological
         data
       • Comparative genomics

Tuesday, March 8, 2011
Genome Size




Tuesday, March 8, 2011
Genome
         Structure:
            More
          Variable
         than Once
          Thought




Tuesday, March 8, 2011
Tuesday, March 8, 2011
Why Completeness is
     • Improves characterization of genome
       features
           – Gene order, replication origins
     • Better comparative genomics
           – Genome duplications, inversions
     • Presence and absence of particular genes
       can be very important
     • Missing sequence might be important (e.g.,
       centromere)
     • Allows researchers to focus on biology not
       sequencing


Tuesday, March 8, 2011
Vibrio cholerae Metabolism




Tuesday, March 8, 2011
Tuesday, March 8, 2011
From http://genomesonline.org
Tuesday, March 8, 2011
Phylogenomic Analysis

         • Evolutionary reconstructions greatly
           improve genome analyses
         • Genome analysis greatly improves
           evolutionary reconstructions
         • There is a feedback loop such that these
           should be integrated



Tuesday, March 8, 2011
Outline


         • Phylogenomic Tales
               –   Selecting genomes for sequencing
               –   Species evolution
               –   Predicting functions of genes
               –   Uncultured microbes
               –   Searching for novel organisms and genes




Tuesday, March 8, 2011
Outline


         • Phylogenomic Tales
               –   Selecting genomes for sequencing
               –   Species evolution
               –   Predicting functions of genes
               –   Uncultured microbes
               –   Searching for novel organisms and genes
         • All of these going to be told in context of a
           recent project “A Genomic Encyclopedia of
           Bacteria and Archaea” (aka GEBA)
Tuesday, March 8, 2011
GEBA Introduction

                         Knowing What We Don’t Know




Tuesday, March 8, 2011
Major Microbial Sequencing
                    Efforts
      •   Coordinated, top-down efforts
            – Fungal Genome Initiative (Broad/Whitehead)
            – Gordon and Betty Moore Foundation Marine Microbial Genome
              Sequencing Project
            – Sanger Center Pathogen Sequencing Unit
            – NHGRI Human Gut Microbiome Project
            – NIH Human Microbiome Program
      •   White paper or grant systems
            –   NIAID Microbial Sequencing Centers
            –   DOE/JGI Community Sequencing Program
            –   DOE/JGI BER Sequencing Program
            –   NSF/USDA Microbial Genome Sequencing
      •   Covers lots of ground and biological diversity



Tuesday, March 8, 2011
As of 2002




Tuesday, March 8, 2011
As of 2002               Proteobacteria
                         TM6
                         OS-K                    • At least 40
                         Acidobacteria
                         Termite Group
                         OP8
                                                   phyla of
                         Nitrospira
                         Bacteroides               bacteria
                         Chlorobi
                         Fibrobacteres
                         Marine GroupA
                         WS3
                         Gemmimonas
                         Firmicutes
                         Fusobacteria
                         Actinobacteria
                         OP9
                         Cyanobacteria
                         Synergistes
                         Deferribacteres
                         Chrysiogenetes
                         NKB19
                         Verrucomicrobia
                         Chlamydia
                         OP3
                         Planctomycetes
                         Spriochaetes
                         Coprothmermobacter
                         OP10
                         Thermomicrobia
                         Chloroflexi
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae
                         Thermudesulfobacteria
                         Thermotogae
                         OP1                       Based on
                         OP11                      Hugenholtz, 2002
Tuesday, March 8, 2011
As of 2002               Proteobacteria
                         TM6
                         OS-K
                                                 • At least 40
                         Acidobacteria
                         Termite Group
                         OP8
                                                   phyla of
                         Nitrospira
                         Bacteroides               bacteria
                         Chlorobi
                         Fibrobacteres
                         Marine GroupA           • Genome
                         WS3
                         Gemmimonas
                         Firmicutes
                                                   sequences are
                         Fusobacteria
                         Actinobacteria
                                                   mostly from
                         OP9
                         Cyanobacteria
                         Synergistes
                                                   three phyla
                         Deferribacteres
                         Chrysiogenetes
                         NKB19
                         Verrucomicrobia
                         Chlamydia
                         OP3
                         Planctomycetes
                         Spriochaetes
                         Coprothmermobacter
                         OP10
                         Thermomicrobia
                         Chloroflexi
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae
                         Thermudesulfobacteria
                         Thermotogae
                         OP1                       Based on
                         OP11                      Hugenholtz, 2002
Tuesday, March 8, 2011
As of 2002               Proteobacteria
                         TM6
                         OS-K
                                                 • At least 40
                         Acidobacteria
                         Termite Group
                         OP8
                                                   phyla of
                         Nitrospira
                         Bacteroides               bacteria
                         Chlorobi
                         Fibrobacteres
                         Marine GroupA           • Genome
                         WS3
                         Gemmimonas
                         Firmicutes
                                                   sequences are
                         Fusobacteria
                         Actinobacteria
                                                   mostly from
                         OP9
                         Cyanobacteria
                         Synergistes
                                                   three phyla
                         Deferribacteres
                         Chrysiogenetes
                         NKB19
                                                 • Some other
                         Verrucomicrobia
                         Chlamydia
                         OP3
                                                   phyla are
                         Planctomycetes
                         Spriochaetes              only sparsely
                         Coprothmermobacter
                         OP10
                         Thermomicrobia
                                                   sampled
                         Chloroflexi
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae
                         Thermudesulfobacteria
                         Thermotogae
                         OP1                       Based on
                         OP11                      Hugenholtz, 2002
Tuesday, March 8, 2011
As of 2002               Proteobacteria
                         TM6
                         OS-K
                                                 • At least 40
                         Acidobacteria
                         Termite Group
                         OP8
                                                   phyla of
                         Nitrospira
                         Bacteroides               bacteria
                         Chlorobi
                         Fibrobacteres
                         Marine GroupA           • Genome
                         WS3
                         Gemmimonas
                         Firmicutes
                                                   sequences are
                         Fusobacteria
                         Actinobacteria
                                                   mostly from
                         OP9
                         Cyanobacteria
                         Synergistes
                                                   three phyla
                         Deferribacteres
                         Chrysiogenetes
                         NKB19
                                                 • Some other
                         Verrucomicrobia
                         Chlamydia
                         OP3
                                                   phyla are
                         Planctomycetes
                         Spriochaetes              only sparsely
                         Coprothmermobacter
                         OP10
                         Thermomicrobia
                                                   sampled
                         Chloroflexi
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae
                         Thermudesulfobacteria
                         Thermotogae
                         OP1                       Based on
                         OP11                      Hugenholtz, 2002
Tuesday, March 8, 2011
Need for Tree Guidance Well Established

     • Common approach within some eukaryotic
       groups

     • Many small projects funded to fill in some
       bacterial or archaeal gaps

     • Phylogenetic gaps in bacterial and archaeal
       projects commonly lamented in literature


Tuesday, March 8, 2011
Proteobacteria
• NSF-funded             TM6
                         OS-K
                                                 • At least 40
  Tree of Life           Acidobacteria
                         Termite Group             phyla of
                         OP8
  Project                Nitrospira
                         Bacteroides               bacteria
                         Chlorobi
• A genome               Fibrobacteres
                         Marine GroupA           • Genome
                         WS3
  from each of           Gemmimonas                sequences are
                         Firmicutes
  eight phyla            Fusobacteria
                                                   mostly from
                         Actinobacteria
                         OP9
                         Cyanobacteria
                         Synergistes
                                                   three phyla
                         Deferribacteres
                         Chrysiogenetes
                         NKB19
                                                 • Some other
                         Verrucomicrobia
                         Chlamydia
                         OP3
                                                   phyla are only
                         Planctomycetes
                         Spriochaetes              sparsely
                         Coprothmermobacter
                         OP10
                         Thermomicrobia
                                                   sampled
                         Chloroflexi
                         TM7
                         Deinococcus-Thermus
                                                 • Solution I:
                         Dictyoglomus
 Eisen, Ward,            Aquificae
                         Thermudesulfobacteria
                                                   sequence more
 Robb, Nelson, et        Thermotogae
                                                   phyla
                         OP1
 al                      OP11

Tuesday, March 8, 2011
Organisms Selected
        Phylum                  Species selected


        Chrysiogenes            Chrysiogenes arsenatis (GCA)

        Coprothermobacter       Coprothermobacter proteolyticus (GCBP)

        Dictyoglomi             Dictyoglomus thermophilum (GD T )

        Thermodesulfobacteria   Thermodesulfobacterium commune (GTC)

        Nitrospirae             Thermodesulfovibrio yellowstonii (GTY)

        Thermomicrobia          Thermomicrobium roseum (GTR )

        Deferribacteres         Geovibrio thiophilus (GGT)

        Synergistes             Synergistes jonesii (GSJ)

Tuesday, March 8, 2011
Proteobacteria
• NSF-funded             TM6
                         OS-K
                                                 • At least 40
  Tree of Life           Acidobacteria
                         Termite Group             phyla of bacteria
                         OP8
  Project                Nitrospira
                                                 • Genome
                         Bacteroides

• A genome               Chlorobi
                         Fibrobacteres             sequences are
                         Marine GroupA
  from each of           WS3
                         Gemmimonas                mostly from
  eight phyla            Firmicutes
                         Fusobacteria              three phyla
                         Actinobacteria
                         OP9
                         Cyanobacteria
                                                 • Some other
                         Synergistes
                         Deferribacteres
                         Chrysiogenetes
                                                   phyla are only
                         NKB19
                         Verrucomicrobia           sparsely
                         Chlamydia
                         OP3
                         Planctomycetes
                                                   sampled
                         Spriochaetes
                         Coprothmermobacter      • Still highly
                         OP10
                         Thermomicrobia
                         Chloroflexi
                                                   biased in terms
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                                                   of the tree
                         Aquificae
Eisen & Ward, PIs        Thermudesulfobacteria
                         Thermotogae
                         OP1
                         OP11

Tuesday, March 8, 2011
Major Lineages of Actinobacteria
                                                                       2.5 Actinobacteria
                                                          2.5.1            Acidimicrobidae
                         2.5.1      Acidimicrobidae       2.5.1.1          Unclassified
                                                          2.5.1.2          "Microthrixineae
                         2.5.1.1    Unclassified          2.5.1.3          Acidimicrobineae
                                                          2.5.1.3.1        Unclassified
                         2.5.1.2    "Microthrixineae      2.5.1.3.2        Acidimicrobiaceae
                                                          2.5.1.4          BD2-10
                         2.5.1.3    Acidimicrobineae      2.5.1.5          EB1017
                                                          2.5.2            Actinobacteridae
                         2.5.1.4    BD2-10                2.5.2.1          Unclassified
                                                          2.5.2.10         Ellin306/WR160
                         2.5.1.5    EB1017                2.5.2.11         Ellin5012
                                                          2.5.2.12         Ellin5034
                         2.5.2      Actinobacteridae      2.5.2.13         Frankineae
                                                          2.5.2.13.1       Unclassified
                         2.5.2.1    Unclassified          2.5.2.13.2       Acidothermaceae

                         2.5.2.10   Ellin306/WR160        2.5.2.13.3
                                                          2.5.2.13.4
                                                                           Ellin6090
                                                                           Frankiaceae

                         2.5.2.11   Ellin5012             2.5.2.13.5
                                                          2.5.2.13.6
                                                                           Geodermatophilaceae
                                                                           Microsphaeraceae

                         2.5.2.12   Ellin5034             2.5.2.13.7
                                                          2.5.2.14
                                                                           Sporichthyaceae
                                                                           Glycomyces
                         2.5.2.13   Frankineae            2.5.2.15
                                                          2.5.2.15.1
                                                                           Intrasporangiaceae
                                                                           Unclassified
                         2.5.2.14   Glycomyces            2.5.2.15.2
                                                          2.5.2.15.3
                                                                           Dermacoccus
                                                                           Intrasporangiaceae
                         2.5.2.15   Intrasporangiaceae    2.5.2.16
                                                          2.5.2.17
                                                                           Kineosporiaceae
                                                                           Microbacteriaceae
                         2.5.2.16   Kineosporiaceae       2.5.2.17.1
                                                          2.5.2.17.2
                                                                           Unclassified
                                                                           Agrococcus
                         2.5.2.17   Microbacteriaceae     2.5.2.17.3
                                                          2.5.2.18
                                                                           Agromyces
                                                                           Micrococcaceae
                         2.5.2.18   Micrococcaceae        2.5.2.19
                                                          2.5.2.2
                                                                           Micromonosporaceae
                                                                           Actinomyces
                         2.5.2.19   Micromonosporaceae    2.5.2.20
                                                          2.5.2.20.1
                                                                           Propionibacterineae
                                                                           Unclassified
                         2.5.2.2    Actinomyces           2.5.2.20.2
                                                          2.5.2.20.3
                                                                           Kribbella
                                                                           Nocardioidaceae
                         2.5.2.20   Propionibacterineae   2.5.2.20.4
                                                          2.5.2.21
                                                                           Propionibacteriaceae
                                                                           Pseudonocardiaceae
                         2.5.2.21   Pseudonocardiaceae    2.5.2.22
                                                          2.5.2.22.1
                                                                           Streptomycineae
                                                                           Unclassified
                         2.5.2.22   Streptomycineae       2.5.2.22.2
                                                          2.5.2.22.3
                                                                           Kitasatospora
                                                                           Streptacidiphilus
                         2.5.2.23   Streptosporangineae   2.5.2.23
                                                          2.5.2.23.1
                                                                           Streptosporangineae
                                                                           Unclassified
                         2.5.2.3    Actinomycineae        2.5.2.23.2
                                                          2.5.2.23.3
                                                                           Ellin5129
                                                                           Nocardiopsaceae
                         2.5.2.4    Actinosynnemataceae   2.5.2.23.4
                                                          2.5.2.23.5
                                                                           Streptosporangiaceae
                                                                           Thermomonosporaceae
                         2.5.2.5    Bifidobacteriaceae    2.5.2.3          Actinomycineae
                                                          2.5.2.4          Actinosynnemataceae
                         2.5.2.6    Brevibacteriaceae     2.5.2.5          Bifidobacteriaceae
                                                          2.5.2.6          Brevibacteriaceae
                         2.5.2.7    Cellulomonadaceae     2.5.2.7          Cellulomonadaceae
                                                          2.5.2.8          Corynebacterineae
                         2.5.2.8    Corynebacterineae     2.5.2.8.1        Unclassified
                                                          2.5.2.8.2        Corynebacteriaceae
                         2.5.2.9    Dermabacteraceae      2.5.2.8.3        Dietziaceae
                                                          2.5.2.8.4        Gordoniaceae
                         2.5.3      Coriobacteridae       2.5.2.8.5        Mycobacteriaceae
                                                          2.5.2.8.6        Rhodococcus
                         2.5.3.1    Unclassified          2.5.2.8.7        Rhodococcus
                                                          2.5.2.8.8        Rhodococcus
                         2.5.3.2    Atopobiales           2.5.2.9          Dermabacteraceae
                                                          2.5.2.9.1        Unclassified
                         2.5.3.3    Coriobacteriales      2.5.2.9.2        Brachybacterium
                                                          2.5.2.9.3        Dermabacter
                         2.5.3.4    Eggerthellales        2.5.3            Coriobacteridae
                                                          2.5.3.1          Unclassified
                         2.5.4      OPB41                 2.5.3.2          Atopobiales
                                                          2.5.3.3          Coriobacteriales
                         2.5.5      PK1                   2.5.3.4          Eggerthellales
                                                          2.5.4            OPB41
                         2.5.6      Rubrobacteridae       2.5.5            PK1
                                                          2.5.6            Rubrobacteridae
                         2.5.6.1    Unclassified          2.5.6.1          Unclassified
                                                          2.5.6.2          "Thermoleiphilaceae
                         2.5.6.2    "Thermoleiphilaceae   2.5.6.2.1        Unclassified
                                                          2.5.6.2.2        Conexibacter
                         2.5.6.3    MC47                  2.5.6.2.3        XGE514
                                                          2.5.6.3          MC47
                         2.5.6.4    Rubrobacteraceae      2.5.6.4          Rubrobacteraceae



Tuesday, March 8, 2011
Proteobacteria
• NSF-funded             TM6
                         OS-K
                                                 • At least 40
  Tree of Life           Acidobacteria
                         Termite Group             phyla of bacteria
                         OP8
  Project                Nitrospira
                                                 • Genome
                         Bacteroides

• A genome               Chlorobi
                         Fibrobacteres             sequences are
                         Marine GroupA
  from each of           WS3
                         Gemmimonas                mostly from
  eight phyla            Firmicutes
                         Fusobacteria              three phyla
                         Actinobacteria
                         OP9
                         Cyanobacteria
                                                 • Some other
                         Synergistes
                         Deferribacteres
                         Chrysiogenetes
                                                   phyla are only
                         NKB19
                         Verrucomicrobia           sparsely
                         Chlamydia
                         OP3
                         Planctomycetes
                                                   sampled
                         Spriochaetes
                         Coprothmermobacter      • Same trend in
                         OP10
                         Thermomicrobia
                         Chloroflexi
                                                   Archaea
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae
Eisen & Ward, PIs        Thermudesulfobacteria
                         Thermotogae
                         OP1
                         OP11

Tuesday, March 8, 2011
Proteobacteria
• NSF-funded             TM6
                         OS-K
                                                 • At least 40
  Tree of Life           Acidobacteria
                         Termite Group             phyla of bacteria
                         OP8
  Project                Nitrospira
                                                 • Genome
                         Bacteroides

• A genome               Chlorobi
                         Fibrobacteres             sequences are
                         Marine GroupA
  from each of           WS3
                         Gemmimonas                mostly from
  eight phyla            Firmicutes
                         Fusobacteria              three phyla
                         Actinobacteria
                         OP9
                         Cyanobacteria
                                                 • Some other
                         Synergistes
                         Deferribacteres
                         Chrysiogenetes
                                                   phyla are only
                         NKB19
                         Verrucomicrobia           sparsely
                         Chlamydia
                         OP3
                         Planctomycetes
                                                   sampled
                         Spriochaetes
                         Coprothmermobacter      • Same trend in
                         OP10
                         Thermomicrobia
                         Chloroflexi
                                                   Eukaryotes
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae
Eisen & Ward, PIs        Thermudesulfobacteria
                         Thermotogae
                         OP1
                         OP11

Tuesday, March 8, 2011
Proteobacteria
• NSF-funded             TM6
                         OS-K
                                                 • At least 40
  Tree of Life           Acidobacteria
                         Termite Group             phyla of bacteria
                         OP8
  Project                Nitrospira
                                                 • Genome
                         Bacteroides

• A genome               Chlorobi
                         Fibrobacteres             sequences are
                         Marine GroupA
  from each of           WS3
                         Gemmimonas                mostly from
  eight phyla            Firmicutes
                         Fusobacteria              three phyla
                         Actinobacteria
                         OP9
                         Cyanobacteria
                                                 • Some other
                         Synergistes
                         Deferribacteres
                         Chrysiogenetes
                                                   phyla are only
                         NKB19
                         Verrucomicrobia           sparsely
                         Chlamydia
                         OP3
                         Planctomycetes
                                                   sampled
                         Spriochaetes
                         Coprothmermobacter      • Same trend in
                         OP10
                         Thermomicrobia
                         Chloroflexi
                                                   Viruses
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae
Eisen & Ward, PIs        Thermudesulfobacteria
                         Thermotogae
                         OP1
                         OP11

Tuesday, March 8, 2011
Proteobacteria
• GEBA                   TM6
                         OS-K                    • At least 40
                         Acidobacteria
• A genomic              Termite Group
                         OP8
                                                   phyla of bacteria
  encyclopedia           Nitrospira
                         Bacteroides             • Genome
                         Chlorobi
  of bacteria            Fibrobacteres
                         Marine GroupA
                                                   sequences are
  and archaea            WS3
                         Gemmimonas                mostly from
                         Firmicutes
                         Fusobacteria              three phyla
                         Actinobacteria
                         OP9
                         Cyanobacteria           • Some other
                         Synergistes
                         Deferribacteres
                         Chrysiogenetes
                                                   phyla are only
                         NKB19
                         Verrucomicrobia           sparsely
                         Chlamydia
                         OP3
                         Planctomycetes
                                                   sampled
                         Spriochaetes
                         Coprothmermobacter
                         OP10
                                                 • Solution: Really
                         Thermomicrobia
                         Chloroflexi                Fill in the Tree
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae
                         Thermudesulfobacteria
Eisen & Ward, PIs        Thermotogae
                         OP1
                         OP11

Tuesday, March 8, 2011
http://www.jgi.doe.gov/programs/GEBA/pilot.html
Tuesday, March 8, 2011
GEBA Pilot Project: Components
      • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan
        Eisen, Eddy Rubin, Jim Bristow)
      • Project management (David Bruce, Eileen Dalin, Lynne Goodwin)
      • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
      • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus,
        Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)
      • Annotation and data release (Nikos Kyrpides, Victor Markowitz, et
        al)
      • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor
        Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik
        D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N.
        Ivanova, Athanasios Lykidis, Adam Zemla)
      • Adopt a microbe education project (Cheryl Kerfeld)
      • Outreach (David Gilbert)
      • $$$ (DOE, Eddy Rubin, Jim Bristow)
Tuesday, March 8, 2011
rRNA Tree of Life




                          FIgure from Barton, Eisen et al.
                             “Evolution”, CSHL Press.
                         Based on tree from Pace NR, 2003.

Tuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
B:
                                      Ac
                                      in t
                                         ob
                                            ac
                                               te
                                            B: ria                                 # of Genomes
                                                Am (H




Tuesday, March 8, 2011
                                                    in igh




                                                                              10
                                                                                    15
                                                                                          20
                                                                                                  25
                                                                                                       30
                                                                                                            35




                                                                      0
                                                                          5
                                                       an G
                                                         a C
                                              B: B: er )
                                                  Ba    Aq ob
                                                     ct uif ia
                                                  B: ero ica
                                           B:                   e
                                               D Ch ide
                                           B: e  ef lo te
                                                           r     s
                                               D rri ofl
                                                 ef ba e
                                     B:             e      c xi
                                  B: De B rrib ter
                                      Ep lta : D act es
                                         si Pr ei er
                                           lo o n es
                                              n te oc
                                                Pr ob oc
                                                   ot a ci
                                  B:                  e ct
                                     G            B: oba eri
                                       am B F ct a
                                                 : ir e
                                     B: m Fu mi ria
                                              a
                                         G P so cut
                                          em ro ba e
                                                     t      c s
                                         B: ma eo te
                                                         ba ri
                                             H tim c a
                                               a             t
                                          B: loa ona eri
                                                                a
                                  B:           Pl nae de
                                                  an r te
                                     Th              c o         s




                         Phyla
                                        er B: to bia
                                           m S           m le
                                                          y s
                                        B: od piro ce
                                                 es c te
                                            T       u h
                                        B: he lfo ae s
                                                 rm b te
                                                                                                                 GEBA Pilot Target List




                                            Th o a               s
                                               er de cte
                                                  m s ri
                                                          u a
                                               A: ove lfo
                                                   H n bi
                                              A: alo abu a
                                         A:        A b la
                                             M rc ac e
                                         A: et ha te
                                             M han eo ria
                                               et            g
                                                  ha ob lob
                                                          ac i
                                               A: no te
                                                        m r
                                              A: The icr ia
                                                  Th rm obi
                                                     er oc a
                                                        m oc
                                                          op ci
                                                             ro
                                                               te
                                                                  i
GEBA Pilot Project Overview

        • Identify major branches in rRNA tree for
          which no genomes are available
        • Identify those with a cultured representative
          in DSMZ
        • DSMZ grew > 200 of these and prepped
          DNA
        • Sequence and finish 200+
        • Annotate, analyze, release data
        • Assess benefits of tree guided sequencing
        • 1st paper Wu et al in Nature Dec 2009
Tuesday, March 8, 2011
GEBA Phylogenomic Lesson 1

                 The rRNA Tree of Life is a Useful Tool
                 for Identifying Phylogenetically Novel
                                Genomes



Tuesday, March 8, 2011
rRNA Tree of Life
                         Bacteria




                                                                Archaea




                          Eukaryotes

                             Figure from Barton, Eisen et al.
                             “Evolution”, CSHL Press. 2007.
                          Based on tree from Pace 1997 Science
                                      276:734-740
Tuesday, March 8, 2011
The Core Gets Small ...




Tuesday, March 8, 2011
The Pangenome




Tuesday, March 8, 2011
Islands Among Synteny




Tuesday, March 8, 2011
The Pangenome




Tuesday, March 8, 2011
Network of Life
                         Bacteria




                                                                Archaea




                          Eukaryotes

                             Figure from Barton, Eisen et al.
                                “Evolution”, CSHL Press.
                           Based on tree from Pace NR, 2003.

Tuesday, March 8, 2011
Using the Core




Tuesday, March 8, 2011
Wh




  Whole genome tree
  built using
  AMPHORA
  by Martin Wu and
  Dongying Wu


Tuesday, March 8, 2011
Tuesday, March 8, 2011
Four Models for Rooting TOL
                         from Lake et al. doi: 10.1098/rstb.2009.0035




Tuesday, March 8, 2011
GEBA Phylogenomic Lesson 2

                      rRNA Tree is good but not perfect
                    and better genomic sampling improves
                            phylogenetic inference



Tuesday, March 8, 2011
16s Says Hyphomonas is in Rhodobacteriales




Badger et al.
2005


Tuesday, March 8, 2011
WGT and individual gene trees:
                         Its Related to Caulobacterales




Badger et al.
2005


Tuesday, March 8, 2011
16s                                          WGT, 23S




  Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
Tuesday, March 8, 2011
Caveats: ignoring LGT and using
               concatenated alignments




Tuesday, March 8, 2011
Concatenated Alignment ML Tree




Tuesday, March 8, 2011
Green Non Sulfur Bacteria




Tuesday, March 8, 2011
Chlamydia-Verrucomicrobia




Tuesday, March 8, 2011
Proteobacteria




Tuesday, March 8, 2011
Zimmer. New York Times. 2009
Tuesday, March 8, 2011
GEBA Phylogenomic Lesson 3

                      Phylogenetics guided genome
                     selection (and phylogenetics in
                  general) improves genome annotation



Tuesday, March 8, 2011
Predicting Function

         • Key step in genome projects
         • More accurate predictions help guide
           experimental and computational analyses
         • Many diverse approaches
         • All improved both by “phylogenomic” type
           analyses that integrate evolutionary
           reconstructions and understanding of how
           new functions evolve


Tuesday, March 8, 2011
From Eisen et
                         al. 1997 Nature
                         Medicine 3:
                         1076-1078.
Tuesday, March 8, 2011
Blast Search of H. pylori “MutS”




         • Blast search pulls up Syn. sp MutS#2 with much higher p
           value than other MutS homologs
         • Based on this TIGR predicted this species had mismatch
           repair
                                                              Based on Eisen
         • Assumes functional constancy                       et al. 1997
                                                                 Nature Medicine
                                                                 3: 1076-1078.
Tuesday, March 8, 2011
Predicting Function
         • Identification of motifs
               – Short regions of sequence similarity that are indicative of
                 general activity
               – e.g., ATP binding
         • Homology/similarity based methods
               – Gene sequence is searched against a databases of other
                 sequences
               – If significant similar genes are found, their functional
                 information is used
         • Problem
               – Genes frequently have similarity to hundreds of motifs
                 and multiple genes, not all with the same function


Tuesday, March 8, 2011
MutL??




     From http://asajj.roswellpark.org/huberman/dna_repair/mmr.html
Tuesday, March 8, 2011
Phylogenetic Tree of MutS Family
                                              Aquae
                                                  Strpy
                                                      Bacsu
                                                          Synsp
                                                            Deira Helpy
                                 Yeast
                           Human                              Borbu     Metth
                           Celeg

                                                                           mSaco
                      Yeast
                    Human                                                    Yeast
                    Mouse
                     Arath                                                    Celeg
                                                                             Human
                    Arath
                   Human
                   Mouse
                Spombe                                                        Fly
                   Yeast                                                     Xenla
                                                                             Rat
                                                                             Mouse
                   Yeast                                                    Human
                Spombe                                                       Yeast
                                                                            Neucr
                                                                           Arath

                               Aquae                            Trepa
                               Chltr
                                DeiraTheaq
                                       Thema                  BacsuBorbu              Based on Eisen,
                                                       SynspStrpy                     1998 Nucl Acids
                                           Ecoli
                                                   Neigo                              Res 26: 4291-4300.
Tuesday, March 8, 2011
MutS Subfamilies
                                            MSH5                        MutS2
                                                     Aquae
                                                         Strpy
                                                             Bacsu
                                                                 Synsp
                                                                   Deira Helpy
                                          Yeast
                                    Human                            Borbu        Metth
                                    Celeg

                                                                                mSaco
              MSH6             Yeast
                             Human
                             Mouse
                              Arath
                                                                                  Yeast    MSH4
                                                                                   Celeg
                                                                                  Human
                               Arath
                               Human
        MSH3                Mouse
                                                                                    Fly
                         Spombe
                            Yeast                                                 Xenla
                                                                                  Rat
                                                                                   Mouse
                            Yeast
         MSH1            Spombe
                                                                                  Human
                                                                                  Yeast
                                                                                           MSH2
                                                                                 Neucr
                                                                                Arath


                                        Aquae                        Trepa
                                        Chltr
                                          Deira
                                              Theaq
                                                                   BacsuBorbu
                                                 Thema
                                                            SynspStrpy
                                                  Ecoli
                                                        Neigo                                Based on Eisen,
                                                                                             1998 Nucl Acids
                                                     MutS1
                                                                                             Res 26: 4291-4300.
Tuesday, March 8, 2011
Overlaying Functions onto Tree
                                                                        MutS2
                                           MSH5           Aquae
                                                              Strpy
                                                                  Bacsu
                                                                      Synsp
                                                                        Deira Helpy
                                            Yeast
                                      Human                               Borbu     Metth
                                      Celeg


                     MSH6                                                         mSaco
                                Yeast
                              Human
                              Mouse
                               Arath
                                                                                     YeastMSH4
                                                                                      Celeg
                                                                                     Human
                              Arath
                           Human
              MSH3         Mouse
                                                                                      Fly
                         Spombe
                            Yeast                                                   Xenla
                                                                                    Rat
                                                                                     Mouse
                            Yeast                                                   Human
             MSH1        Spombe                                                     Yeast    MSH2
                                                                                   Neucr
                                                                                  Arath


                                         Aquae                         Trepa
                                         Chltr
                                          DeiraTheaq
                                                                     BacsuBorbu
                                                  Thema
                                                              SynspStrpy                      Based on Eisen,
                                                    Ecoli
                                                          Neigo
                                                                                              1998 Nucl Acids
                                                      MutS1                                   Res 26: 4291-4300.
Tuesday, March 8, 2011
Functional Prediction Using Tree
               MSH5 - Meiotic Crossing Over                MutS2 - Unknown Functions
                                                     Aquae
                                                         Strpy
                                                             Bacsu
                                                                 Synsp
                                                                   Deira Helpy
                                         Yeast
                                   Human                             Borbu     Metth
                                   Celeg

  MSH6 - Nuclear                                                               mSaco
  Repair
                           Yeast
  Of Mismatches          Human                                                               MSH4 - Meiotic Crossing
                         Mouse                                                    Yeast      Over
                          Arath                                                    Celeg
                                                                                  Human
                    Arath
 MSH3 - Nuclear     Human
                  Mouse
 RepairOf Loops Spombe                                                             Fly
                   Yeast                                                         Xenla
                                                                                 Rat
                                                                                  Mouse    MSH2 - Eukaryotic Nuclear
                      Yeast                                                      Human     Mismatch and Loop Repair
 MSH1              Spombe                                                        Yeast
                                                                                Neucr
 Mitochondrial
                                                                               Arath
 Repair
                                   Aquae                            Trepa
                                   Chltr
                                    DeiraTheaq
                                                                  BacsuBorbu
                                            Thema
                                                           SynspStrpy
                                                 Ecoli                                             Based on Eisen,
                                                       Neigo
                                                                                                   1998 Nucl Acids
                                   MutS1 - Bacterial Mismatch and Loop Repair                      Res 26: 4291-4300.
Tuesday, March 8, 2011
Tuesday, March 8, 2011
PHYLOGENENETIC PREDICTION OF GENE FUNCTION



                                     EXAMPLE A                                   METHOD                           EXAMPLE B

                                              2A                         CHOOSE GENE(S) OF INTEREST                        5


                                           3A                                                                          1 3 4
                                                2B                                                                 2
                                                                            IDENTIFY HOMOLOGS                             5
                                      1A 2A 1B 3B                                                                       6



                                                                             ALIGN SEQUENCES

                             1A      2A    3A 1B        2B      3B                                      1    2         3       4   5   6



                                                                           CALCULATE GENE TREE


                                                      Duplication?


                            1A       2A 3A 1B          2B      3B                                       1    2         3       4   5   6



                                                                             OVERLAY KNOWN
                                                                           FUNCTIONS ONTO TREE

                                                      Duplication?


                                     2A 3A 1B          2B      3B                                      1      2        3       4   5   6
                            1A



                                                                           INFER LIKELY FUNCTION
                                                                           OF GENE(S) OF INTEREST
                                                                                                      Ambiguous
                                                      Duplication?



                         Species 1        Species 2          Species 3
                          1A 1B            2A 2B              3A 3B                                     1    2         3       4   5   6


                                                                             ACTUAL EVOLUTION
                                                                         (ASSUMED TO BE UNKNOWN)
                                                                                                                                           Based on Eisen,
                                                                                                                                           1998 Genome
                                                      Duplication
                                                                                                                                           Res 8: 163-167.
Tuesday, March 8, 2011
Phylogenetic Prediction of


         • Termed phylogenomics (Eisen, et al 1997)
         • Greatly improves accuracy of functional
           predictions compared to similarity based
           methods (e.g., blast)
         • Automated methods now available
               – Sean Eddy, Steven Brenner, Kimmen Sjölander,
                 etc.
         • But …

Tuesday, March 8, 2011
Example 2: Recent Changes
        • Phylogenomic functional prediction         NJ



                                                                        *      **
                                                                                               V.cholerae0512
                                                                                                        VC
                                                                                                V.cholerae
                                                                                                        VCA1034
                                                                                                 V.cholerae
                                                                                                          VC
                                                                                                 V.cholerae
                                                                                                         VC
                                                                                                 V.cholerae
                                                                                                         VC
                                                                                                           A0974
                                                                                                           A0068
                                                                                                    V.cholerae
                                                                                                            VC
                                                                                                             0825
                                                                                                           0282


          may not work well for very newly
                                                                                              V.cholerae
                                                                                                       VCA0906
                                                                                                      V.cholerae
                                                                                                              VC
                                                                                                               A0979
                                                                                              V.cholerae
                                                                                                       VCA1056
                                                                                                 V.cholerae
                                                                                                         VC1643
                                                                                                  V.cholerae
                                                                                                          VC2161
                                                                                       **          V.cholerae
                                                                                                           VCA0923
                                                                              **                 V.cholerae
                                                                                                         VC0514
                                                                                                    V.cholerae
                                                                                                             VC
                                                                                                              1868
                                                                                                   V.cholerae
                                                                                                           VC
                                                                                                            A0773
                                                                                                 V.cholerae
                                                                                                         VC1313


          evolved functions
                                                                                                   V.cholerae
                                                                                                           VC
                                                                                                            1859
                                                                                                V.cholerae
                                                                                                        VC1413
                                                                                              V.cholerae
                                                                                                       VCA0268
                                                                      **                                V.cholerae
                                                                                                                VC
                                                                                                                 A0658
                                                                                                   V.cholerae
                                                                                                           VC
                                                                                                            1405
                                                                    *                             V.cholerae
                                                                                                          VC1298
                                                                                                    V.cholerae
                                                                                                            VC1248
                                                                                             V.cholerae
                                                                                                      VCA0864
                                                                                             V.cholerae
                                                                                                      VCA0176
                                                                           **                   V.cholerae
                                                                                                        VCA0220
                                                                                               V.cholerae
                                                                                                        VC
                                                                                                         1289
                                                                              **                   V.cholerae
                                                                                                           VC1069
                                                                                                             A
                                                                                                 V.cholerae
                                                                                                         VC2439


        • Can use understanding of origin of
                                                                                                    V.cholerae
                                                                                                            VC967
                                                                                                             1
                                                                                                    V.cholerae
                                                                                                            VC
                                                                                                             A0031
                                                                                                V.cholerae
                                                                                                        VC1898
                                                                                                    V.cholerae
                                                                                                            VC
                                                                                                             A0663
                                                                                             V.cholerae
                                                                                                     VC0988
                                                                                                       A
                                                                                             V.cholerae
                                                                                                      VC0216
                                                                      *                      V.cholerae
                                                                                                      VC0449
                                                                                            V.cholerae
                                                                                                     VCA0008
                                                                                             V.cholerae
                                                                                                      VC1406
                                                                                                      V.cholerae
                                                                                                              VC
                                                                                                               1535


          novelty to better interpret these cases?
                                                                                               V.cholerae
                                                                                                       VC0840
                                                                                                          B.subtilis
                                                                                                                gi2633766
                                                                                                      Synechocystis
                                                                                                                sp.
                                                                                                                  gi1001299
                                                                         *                   Synechocystis
                                                                                                        sp.gi1001300
                                                                    *                                 Synechocystis
                                                                                                                sp.
                                                                                                                  gi1652276
                                                                          *                     Synechocystis
                                                                                                           sp.
                                                                                                             gi1652103
                                                                                               H.pylori
                                                                                                     gi2313716
                                                                     **                     **H.pylori
                                                                                                    99 gi4155097
                                                                                               C.jejuni
                                                                                                     Cj1190c
                                                                                           C.jejuni
                                                                                                 Cj1110c
                                                                                             A.fulgidus
                                                                                                     gi2649560
                                                                                             A.fulgidus
                                                                                                     gi2649548
                                                                                           ** B.subtilis
                                                                                                       gi2634254


        • Screen genomes for genes that have
                                                                                             B.subtilis
                                                                                                    gi2632630
                                                                                             B.subtilis
                                                                                                     gi2635607
                                                                                             B.subtilis
                                                                                                    gi2635608
                                                                                   **         B.subtilis
                                                                                                     gi2635609
                                                                                 ** ** B.subtilisgi2635882
                                                                                                    gi2635610
                                                                                                  B.subtilis
                                                                                           E.coligi1788195
                                                                                           E.coli
                                                                                                gi2367378
                                                                                * **       E.coligi1788194
                                                                                               E.coli A1092
                                                                                                    gi1787690
                                                                                             V.cholerae
                                                                                                      VC


          changed recently
                                                                                              V.cholerae
                                                                                                       VC
                                                                                                        0098
                                                                                              E.coli
                                                                                                   gi1789453
                                                                                                 H.pylori
                                                                                                       gi2313186
                                                                                                 H.pylori
                                                                                                      99 gi4154603
                                                                                             ** C.jejuni   Cj0144
                                                                                                     C.jejuni
                                                                                                           Cj1564
                                                                                                   **C.jejuni
                                                                                                 C.jejuni
                                                                                                           Cj0262c
                                                                                                      Cj1506c
                                                                                      **          H.pylori
                                                                                                        gi2313163
                                                                                *              ** H.pylori
                                                                                                       99 gi4154575
                                                                                   **            H.pylori
                                                                                                      gi2313179
                                                                                                 H.pylori
                                                                                                      99 gi4154599

         –   Pseudogenes and gene loss
                                                                                              ** C.jejuni Cj0019c
                                                                                                         C.jejuni
                                                                                                               Cj0951c
                                                                                                      C.jejuni
                                                                                                            Cj0246c
                                                                                                     B.subtilis
                                                                                                            gi2633374
                                                                                                      T.maritima
                                                                                                              TM0014
                                                                                                           V.cholerae
                                                                                                                  VC1403
                                                                                                         V.cholerae
                                                                                                                VCA1088
                                                                                                          T.pallidum
                                                                                                                 gi3322777
                                                                                **                               T.pallidum
                                                                                                                        gi3322939
                                                                              **                          T.pallidum
                                                                                                                 gi3322938
                                                                                                           B.burgdorferi
                                                                                                                    gi2688522

         –   Contingency Loci
                                                                                                             T.pallidum
                                                                                                                    gi3322296
                                                                                                         B.burgdorferi
                                                                                                                  gi2688521
                                                                     *                          T.maritima
                                                                                                        TM0429
                                                                                              **T.maritima
                                                                                                        TM0918
                                                                       *                     **T.maritima
                                                                                            T.maritima
                                                                                                        TM0023
                                                                                                     TM1428
                                                                                               T.maritima
                                                                                                       TM1143
                                                                                            T.maritima
                                                                                                     TM1146
                                                                                               P.abyssi
                                                                                                      PAB1308
                                                                                               P.horikoshii
                                                                                                       gi3256846
                                                                                          ** P.abyssiPAB1336


         –   Acquisition (e.g., LGT)
                                                                               **             P.horikoshii
                                                                                                       gi3256896
                                                                      **                   **P.abyssi
                                                                                                    PAB2066
                                                               **                            P.horikoshii
                                                                                        ** P.abyssi   gi3258290
                                                                    *                                PAB1026
                                                                                        ** P.horikoshii DRA00354
                                                                                                        gi3256884
                                                                                                         D.radiodurans
                                                                                                        D.radiodurans
                                                                                                  ** D.radioduransDRA0353
                                                                            **                                   DRA0352
                                                          **                                        V.cholerae
                                                                                                             VC
                                                                                                              1394
                                                                                                   P.abyssi
                                                                                                         PAB1189
                                                                                                   P.horikoshii
                                                                                                           gi3258414


         –   Unusual dS/dN ratios
                                                                                            ** B.burgdorferi
                                                                                                         gi2688621
                                                                                                       M.tuberculosis
                                                                                                                 gi1666149
                                                                                                         V.cholerae
                                                                                                                 VC
                                                                                                                  0622




         –   Rapid evolutionary rates
         –   Recent duplications
Tuesday, March 8, 2011
Example 3: Non homology
                             methods

         • Many genes have homologs in other species
           but no homologs have ever been studied
           experimentally
         • Non-homology methods can make functional
           predictions for these
         • Example: phylogenetic profiling




Tuesday, March 8, 2011
Phylogenetic profiling basis

         • Microbial genes are lost rapidly when not
           maintained by selection
         • Genes can be acquired by lateral transfer
         • Frequently gain and loss occurs for entire
           pathways/processes
         • Thus might be able to use correlated presence/
           absence information to identify genes with
           similar functions

Tuesday, March 8, 2011
Non-Homology Predictions:
               Phylogenetic Profiling

          • Step 1: Search all genes in
            organisms of interest against all
            other genomes

          • Ask: Yes or No, is each gene
            found in each other species

          • Cluster genes by distribution
            patterns (profiles)

Tuesday, March 8, 2011
Carboxydothermus hydrogenoformans


   • Isolated from a Russian hotspring
   • Thermophile (grows at 80°C)
   • Anaerobic
   • Grows very efficiently on CO
     (Carbon Monoxide)
   • Produces hydrogen gas
   • Low GC Gram positive
     (Firmicute)
   • Genome Determined (Wu et al.
     2005 PLoS Genetics 1: e65. )

Tuesday, March 8, 2011
Homologs of Sporulation Genes




                                    Wu et al. 2005
                                    PLoS Genetics 1:
                                    e65.
Tuesday, March 8, 2011
Carboxydothermus sporulates




                         Wu et al. 2005 PLoS Genetics 1: e65.
Tuesday, March 8, 2011
Wu et al. 2005 PLoS Genetics 1: e65.
Tuesday, March 8, 2011
PG Profiling Works Better Using
                    Orthology




Tuesday, March 8, 2011
GEBA Lesson 3:
              Phylogeny driven genome selection (and
             phylogenetics) improves genome annotation
          • Took 56 GEBA genomes and compared results vs. 56
            randomly sampled new genomes
          • Better definition of protein family sequence “patterns”
          • Greatly improves “comparative” and “evolutionary”
            based predictions
          • Conversion of hypothetical into conserved hypotheticals
          • Linking distantly related members of protein families
          • Improved non-homology prediction




Tuesday, March 8, 2011
GEBA Lesson 4:
                          Metadata Important




Tuesday, March 8, 2011
GEBA Phylogenomic Lesson 5

                    Phylogeny-driven genome selection
                    helps discover new genetic diversity




Tuesday, March 8, 2011
Network of Life
                         Bacteria




                                                                Archaea




                          Eukaryotes

                             FIgure from Barton, Eisen et al.
                                “Evolution”, CSHL Press.
                           Based on tree from Pace NR, 2003.

Tuesday, March 8, 2011
Protein Family Rarefaction


         • Take data set of multiple complete genomes
         • Identify all protein families using MCL
         • Plot # of genomes vs. # of protein families




Tuesday, March 8, 2011
Wu et al. 2009 Nature 462, 1056-1060
Tuesday, March 8, 2011
Wu et al. 2009 Nature 462, 1056-1060
Tuesday, March 8, 2011
Wu et al. 2009 Nature 462, 1056-1060
Tuesday, March 8, 2011
Wu et al. 2009 Nature 462, 1056-1060
Tuesday, March 8, 2011
Wu et al. 2009 Nature 462, 1056-1060
Tuesday, March 8, 2011
Synapomorphies exist




Wu et al. 2009 Nature 462, 1056-1060
Tuesday, March 8, 2011
Families/PD not uniform
           +,%-./&#(%)"*




                                   !"#$%"&'(%)"*
       !                                  !


Tuesday, March 8, 2011
Structural Novelty

         • Of the 17000 protein families in the GEBA56, 1800
           are novel in sequence (Wu)


         • Structural modeling suggests many are structurally
           novel too (D'haeseleer)


         • 372 being crystallized by the PSI (Kerfeld)




Tuesday, March 8, 2011
GEBA Phylogenomic Lesson 6

                         Improves analysis of genome data
                            from uncultured organisms




Tuesday, March 8, 2011
Great Plate Count Anomaly




                         Culturing   Microscope

                          Count       Count


Tuesday, March 8, 2011
Great Plate Count Anomaly




                         Culturing       Microscope

                          Count      <<<< Count

Tuesday, March 8, 2011
Environmental DNA Analysis

                                                      DNA




                         Culturing       Microscope

                          Count      <<<< Count

Tuesday, March 8, 2011
rRNA Phylotyping

                                   • Collect DNA from
                                     environment
                                   • PCR amplify rRNA
                                     genes using broad (so-
                                     called universal) primers
                                   • Sequence
                                   • Align to others
                                   • Infer evolutionary tree
                                   • Unknowns “identified”
                                     by placement on tree
                                   • Some use BLAST, but
                                     not as good as phylogeny
Tuesday, March 8, 2011
rRNA PCR

     The Hidden Majority                   Richness estimates




                         Hugenholtz 2002         Bohannan and Hughes 2003


Tuesday, March 8, 2011
Tuesday, March 8, 2011
rRNA data increasing exponentially too
Tuesday, March 8, 2011
rRNA phylotyping issues

         • Massive amounts of data
               – 1 x 10^6 new partial sequences with new 454
               – 2 x 10^6 full length sequences in DB
         • Alignments of new sequences not always
           straightforward
         • Solutions:
               – Reliance on similarity scores (bad)
               – High throughput automated phylogenetic tools
                  • STAP
                  • WATERs
Tuesday, March 8, 2011
Perna et al. 2003
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Diversity of Proteorhodopsins by PCR




                                de la Torre
                                et al 2003


Tuesday, March 8, 2011
Metagenomics



                                 shotgun
                                      sequence




Tuesday, March 8, 2011
Massiuve Diversity of Proteorhodopsins




                                           Venter et al., 2004
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Applied Phylogenetics




Tuesday, March 8, 2011
Example I: Functional Diversity




Tuesday, March 8, 2011
Functional Diversity of Proteorhodopsins?




                                         Venter et al., Science
                                         304: 66. 2004
Tuesday, March 8, 2011
Example II:
                  Phylotyping w/ many genes




Tuesday, March 8, 2011
rRNA Phylotyping in Sargasso Sea




                                 Venter et al., Science
                                 304: 66. 2004
Tuesday, March 8, 2011
Shotgun Sequencing Allows Use of
                 Alternative Anchors (e.g., RecA)




                                             Venter et al., Science
                                             304: 66. 2004
Tuesday, March 8, 2011
Weighted % of Clones




                                                                                                                            0
                                                                                                                                0.1250
                                                                                                                                                 0.2500
                                                                                                                                                                0.3750
                                                                                                                                                                         0.5000
                                                                                 Al
                                                                                   ph
                                                                                        ap
                                                                                           ro
                                                                                              t   eo
                                                                                  Be                 b       ac
                                                                                    ta
                                                                                            pr                  t   er
                                                                                              ot
                                                                                                 e                     ia
                                                                             G                       ob
                                                                                 am




Tuesday, March 8, 2011
                                                                                                             ac
                                                                                   m                            t   er
                                                                                    ap                                 ia
                                                                                             ro
                                                                             Ep                te
                                                                                  si                   ob
                                                                                    lo                       ac
                                                                                      np                        t   er
                                                                                             ro                        ia
                                                                                 De             t eo
                                                                                      lta            b       ac
                                                                                         pr                    te
                                                                                           ot                        ria
                                                                                                  eo
                                                                                                     b
                                                                                            C                ac
                                                                                                ya              ter
                                                                                                   n ob             ia
                                                                                                             ac
                                                                                                                t   er
                                                                                                    Fi                 ia
                                                                                                       rm
                                                                                                              ic
                                                                                                                 u  te
                                                                                            Ac                        s
                                                                                              tin
                                                                                                     ob
                                                                                                             ac
                                                                                                                t   er
                                                                                                         C             ia
                                                                                                             hl
                                                                                                               or
                                                                                                                 ob
                                                                                                                    i
                                                                                                                C
                                                                                                                     FB




                                                  Major Phylogenetic Group
                                                                                                                                                                                  Sargasso Phylotypes




                                                                                                  C
                                                                                                       hl
                                                                                                         or
                                                                                                           of
                                                                                                              le
                                                                                             Sp          xi
                                                                                            iro
                                                                                                ch
                                                                                                   ae
                                                                                                       te
                                                                                         Fu
                                                                                            so            s
                                                                             De                 ba
                                                                                in                 ct
                                                                                                      er
                                                                                   oc                    ia
                                                                                      oc
                                                                                         cu
                                                                                            s-
                                                                                       Eu The
                                                                                          ry       r
                                                                                             ar mu
                                                                                                ch s
                                                                                                   ae
                                                                                       C              ot
                                                                                         re               a
                                                                                           na
                                                                                               rc
                                                                                                  ha
                                                                                                    eo
                                                                                                        ta
                         304: 66. 2004
                                                                                                                                                                                               Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                         EFG




                         Venter et al., Science
                                                                                                                                         EFTu



                                                                                                                                         rRNA
                                                                                                                                         RecA
                                                                                                                                         RpoB
                                                                                                                                         HSP70
Weighted % of Clones




                                                                                                                                             0
                                                                                                                                                 0.1250
                                                                                                                                                                  0.2500
                                                                                                                                                                                 0.3750
                                                                                                                                                                                          0.5000
                                                                                                  Al
                                                                                                    ph
                                                                                                         ap
                                                                                                            ro
                                                                                                               t   eo
                                                                                                   Be                 b       ac
                                                                                                     ta
                                                                                                             pr                  t   er
                                                                                                               ot
                                                                                                                  e                     ia
                                                                                              G                       ob
                                                                                                  am




Tuesday, March 8, 2011
                                                                                                                              ac
                                                                                                    m                            t   er
                                                                                                     ap                                 ia
                                                                                                              ro
                                                                                              Ep                te
                                                                                                   si                   ob
                                                                                                     lo                       ac
                                                                                                       np                        t   er
                                                                                                              ro                        ia
                                                                                                  De             t eo
                                                                                                       lta            b       ac
                                                                                                          pr                    te
                                                                                                            ot                        ria
                                                                                                                   eo
                                                                                                                      b
                                                                                                             C                ac
                                                                                                                 ya              ter
                                                                                                                    n ob             ia
                                                                                                                              ac
                                                                                                                                 t   er
                                                                                                                     Fi                 ia
                                                                                                                        rm
                                                                                                                               ic
                                                                                                                                  u  te
                                                                                                             Ac                        s
                                                                                                               tin
                                                                                                                      ob
                                                                                                                              ac
                                                                                                                                 t   er
                                                                                                                          C             ia
                                                                                                                              hl
                                                                                                                                or
                                                                                                                                  ob
                                                                                                                                     i
                                                                                                                                 C
                                                                                                                                      FB




                                                                   Major Phylogenetic Group
                                                                                                                                                                                                   Sargasso Phylotypes




                                                                                                                   C
                                                                                                                        hl
                                                                                                                          or
                                                                                                                            of
                                                                                                                               le
                                                                                                              Sp          xi
                                                                                                             iro
                                                                                                                 ch
                                                                                                                    ae
                                                                                                                        te
                                                                                                          Fu
                                                                                                             so            s
                                                                                              De                 ba
                                                                                                 in                 ct
                                                                                                                       er
                                                                                                    oc                    ia
                                                                                                       oc
                                                                                                          cu
                                                                                                             s-
                                                                                                        Eu The
                                                                                                           ry       r
                                                                                                              ar mu
                                                                                                                 ch s
                                                                                                                    ae
                                                                                                        C              ot
                                                                                                          re               a
                                                                                                            na
                                                                                                                rc
                                                                                                                   ha
                                                                                                                     eo
                                                                                                                         ta
                                                                                                                                                                                                                Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                                          EFG




                         Venter et al., Science 304: 66-74. 2004
                                                                                                                                                          EFTu



                                                                                                                                                          rRNA
                                                                                                                                                          RecA
                                                                                                                                                          RpoB
                                                                                                                                                          HSP70
Weighted % of Clones




                                                                                                                                             0
                                                                                                                                                 0.1250
                                                                                                                                                                  0.2500
                                                                                                                                                                                 0.3750
                                                                                                                                                                                          0.5000
                                                                                                  Al
                                                                                                    ph
                                                                                                         ap
                                                                                                            ro
                                                                                                               t   eo
                                                                                                   Be                 b       ac
                                                                                                     ta
                                                                                                             pr                  t   er
                                                                                                               ot
                                                                                                                  e                     ia
                                                                                              G                       ob
                                                                                                  am




Tuesday, March 8, 2011
                                                                                                                              ac
                                                                                                    m                            t   er
                                                                                                     ap                                 ia
                                                                                                              ro
                                                                                              Ep                te
                                                                                                   si                   ob
                                                                                                     lo                       ac
                                                                                                       np                        t   er
                                                                                                              ro                        ia
                                                                                                  De             t eo
                                                                                                       lta            b       ac
                                                                                                          pr                    te
                                                                                                            ot                        ria
                                                                                                                   eo
                                                                                                                      b
                                                                                                             C                ac
                                                                                                                 ya              ter
                                                                                                                    n ob             ia
                                                                                                                              ac
                                                                                                                                 t   er
                                                                                                                     Fi                 ia
                                                                                                                        rm
                                                                                                                               ic
                                                                                                                                  u  te
                                                                                                             Ac                        s
                                                                                                               tin
                                                                                                                      ob
                                                                                                                              ac
                                                                                                                                 t   er
                                                                                                                          C             ia
                                                                                                                              hl
                                                                                                                                or
                                                                                                                                  ob
                                                                                                                                     i
                                                                                                                                                 without good


                                                                                                                                 C
                                                                                                                                      FB




                                                                   Major Phylogenetic Group
                                                                                                                                                                                                   Sargasso Phylotypes




                                                                                                                   C
                                                                                                                                                 Cannot be done




                                                                                                                        hl
                                                                                                                          or
                                                                                                                            of
                                                                                                                               le
                                                                                                              Sp          xi
                                                                                                             iro
                                                                                                                 ch
                                                                                                                    ae
                                                                                                                        te
                                                                                                          Fu
                                                                                                             so            s
                                                                                              De                 ba
                                                                                                 in                 ct
                                                                                                                       er
                                                                                                    oc                    ia
                                                                                                                                                 sampling of genomes




                                                                                                       oc
                                                                                                          cu
                                                                                                             s-
                                                                                                        Eu The
                                                                                                           ry       r
                                                                                                              ar mu
                                                                                                                 ch s
                                                                                                                    ae
                                                                                                        C              ot
                                                                                                          re               a
                                                                                                            na
                                                                                                                rc
                                                                                                                   ha
                                                                                                                     eo
                                                                                                                         ta
                                                                                                                                                                                                                Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                                          EFG




                         Venter et al., Science 304: 66-74. 2004
                                                                                                                                                          EFTu



                                                                                                                                                          rRNA
                                                                                                                                                          RecA
                                                                                                                                                          RpoB
                                                                                                                                                          HSP70
Example III:
                           Binning




Tuesday, March 8, 2011
Metagenomics Challenge




Tuesday, March 8, 2011
Binning challenge




Tuesday, March 8, 2011
Binning challenge




                         Best binning method: reference genomes



Tuesday, March 8, 2011
Binning challenge




                         Best binning method: reference genomes



Tuesday, March 8, 2011
Binning challenge




                         No reference genome? What do you do?



Tuesday, March 8, 2011
CFB Phyla




Tuesday, March 8, 2011
Weighted % of Clones




                                                                                                                                             0
                                                                                                                                                 0.1250
                                                                                                                                                                  0.2500
                                                                                                                                                                                 0.3750
                                                                                                                                                                                          0.5000
                                                                                                  Al
                                                                                                    ph
                                                                                                         ap
                                                                                                            ro
                                                                                                               t   eo
                                                                                                   Be                 b       ac
                                                                                                     ta
                                                                                                             pr                  t   er
                                                                                                               ot
                                                                                                                  e                     ia
                                                                                              G                       ob
                                                                                                  am




Tuesday, March 8, 2011
                                                                                                                              ac
                                                                                                    m                            t   er
                                                                                                     ap                                 ia
                                                                                                              ro
                                                                                              Ep                te
                                                                                                   si                   ob
                                                                                                     lo                       ac
                                                                                                       np                        t   er
                                                                                                              ro                        ia
                                                                                                  De             t eo
                                                                                                       lta            b       ac
                                                                                                          pr                    te
                                                                                                            ot                        ria
                                                                                                                   eo
                                                                                                                      b
                                                                                                             C                ac
                                                                                                                 ya              ter
                                                                                                                    n ob             ia
                                                                                                                              ac
                                                                                                                                 t   er
                                                                                                                     Fi                 ia
                                                                                                                        rm
                                                                                                                               ic
                                                                                                                                  u  te
                                                                                                             Ac                        s
                                                                                                               tin
                                                                                                                      ob
                                                                                                                              ac
                                                                                                                                 t   er
                                                                                                                          C             ia
                                                                                                                              hl
                                                                                                                                or
                                                                                                                                  ob
                                                                                                                                     i
                                                                                                                                 C
                                                                                                                                      FB




                                                                   Major Phylogenetic Group
                                                                                                                                                                                                   Sargasso Phylotypes




                                                                                                                   C
                                                                                                                        hl
                                                                                                                          or
                                                                                                                            of
                                                                                                                               le
                                                                                                              Sp          xi
                                                                                                             iro
                                                                                                                 ch
                                                                                                                                                                                                                Phylogenetic Binning




                                                                                                                    ae
                                                                                                                        te
                                                                                                          Fu
                                                                                                             so            s
                                                                                              De                 ba
                                                                                                 in                 ct
                                                                                                                       er
                                                                                                    oc                    ia
                                                                                                       oc
                                                                                                          cu
                                                                                                             s-
                                                                                                        Eu The
                                                                                                           ry       r
                                                                                                              ar mu
                                                                                                                 ch s
                                                                                                                    ae
                                                                                                        C              ot
                                                                                                          re               a
                                                                                                            na
                                                                                                                rc
                                                                                                                   ha
                                                                                                                     eo
                                                                                                                         ta
                                                                                                                                                          EFG




                         Venter et al., Science 304: 66-74. 2004
                                                                                                                                                          EFTu



                                                                                                                                                          rRNA
                                                                                                                                                          RecA
                                                                                                                                                          RpoB
                                                                                                                                                          HSP70
Weighted % of Clones




                                                                                                                                             0
                                                                                                                                                 0.1250
                                                                                                                                                                  0.2500
                                                                                                                                                                                 0.3750
                                                                                                                                                                                          0.5000
                                                                                                  Al
                                                                                                    ph
                                                                                                         ap
                                                                                                            ro
                                                                                                               t   eo
                                                                                                   Be                 b       ac
                                                                                                     ta
                                                                                                             pr                  t   er
                                                                                                               ot
                                                                                                                  e                     ia
                                                                                              G                       ob
                                                                                                  am




Tuesday, March 8, 2011
                                                                                                                              ac
                                                                                                    m                            t   er
                                                                                                     ap                                 ia
                                                                                                              ro
                                                                                              Ep                te
                                                                                                   si                   ob
                                                                                                     lo                       ac
                                                                                                       np                        t   er
                                                                                                              ro                        ia
                                                                                                  De             t eo
                                                                                                       lta            b       ac
                                                                                                          pr                    te
                                                                                                            ot                        ria
                                                                                                                   eo
                                                                                                                      b
                                                                                                             C                ac
                                                                                                                 ya              ter
                                                                                                                    n ob             ia
                                                                                                                              ac
                                                                                                                                 t   er
                                                                                                                     Fi                 ia
                                                                                                                        rm
                                                                                                                               ic
                                                                                                                                  u  te
                                                                                                             Ac                        s
                                                                                                               tin
                                                                                                                      ob
                                                                                                                              ac
                                                                                                                                 t   er
                                                                                                                          C             ia
                                                                                                                              hl
                                                                                                                                or
                                                                                                                                  ob
                                                                                                                                     i
                                                                                                                                                 without good


                                                                                                                                 C
                                                                                                                                      FB




                                                                   Major Phylogenetic Group
                                                                                                                                                                                                   Sargasso Phylotypes




                                                                                                                   C
                                                                                                                                                 Cannot be done




                                                                                                                        hl
                                                                                                                          or
                                                                                                                            of
                                                                                                                               le
                                                                                                              Sp          xi
                                                                                                             iro
                                                                                                                 ch
                                                                                                                    ae
                                                                                                                        te
                                                                                                          Fu
                                                                                                             so            s
                                                                                              De                 ba
                                                                                                 in                 ct
                                                                                                                       er
                                                                                                    oc                    ia
                                                                                                                                                 sampling of genomes




                                                                                                       oc
                                                                                                          cu
                                                                                                             s-
                                                                                                        Eu The
                                                                                                           ry       r
                                                                                                              ar mu
                                                                                                                 ch s
                                                                                                                    ae
                                                                                                        C              ot
                                                                                                          re               a
                                                                                                            na
                                                                                                                rc
                                                                                                                   ha
                                                                                                                     eo
                                                                                                                         ta
                                                                                                                                                                                                                Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                                          EFG




                         Venter et al., Science 304: 66-74. 2004
                                                                                                                                                          EFTu



                                                                                                                                                          rRNA
                                                                                                                                                          RecA
                                                                                                                                                          RpoB
                                                                                                                                                          HSP70
Weighted % of Clones




                                                                                                                                             0
                                                                                                                                                 0.1250
                                                                                                                                                                   0.2500
                                                                                                                                                                                  0.3750
                                                                                                                                                                                           0.5000
                                                                                                  Al
                                                                                                    ph
                                                                                                         ap
                                                                                                            ro
                                                                                                               t   eo
                                                                                                   Be                 b       ac
                                                                                                     ta
                                                                                                             pr                  t   er
                                                                                                               ot
                                                                                                                  e                     ia
                                                                                              G                       ob
                                                                                                  am




Tuesday, March 8, 2011
                                                                                                                              ac
                                                                                                    m                            t   er
                                                                                                     ap                                 ia
                                                                                                              ro
                                                                                              Ep                te
                                                                                                   si                   ob
                                                                                                     lo                       ac
                                                                                                       np                        t   er
                                                                                                              ro                        ia
                                                                                                  De             t eo
                                                                                                       lta            b       ac
                                                                                                          pr                    te
                                                                                                            ot                        ria
                                                                                                                   eo
                                                                                                                      b
                                                                                                             C                ac
                                                                                                                 ya              ter
                                                                                                                    n ob             ia
                                                                                                                              ac
                                                                                                                                 t   er
                                                                                                                     Fi                 ia
                                                                                                                        rm
                                                                                                                               ic
                                                                                                                                  u  te
                                                                                                                                                          improves
                                                                                                             Ac                        s
                                                                                                               tin
                                                                                                                      ob
                                                                                                                              ac
                                                                                                                                 t   er
                                                                                                                          C             ia
                                                                                                                              hl
                                                                                                                                or
                                                                                                                                  ob
                                                                                                                                     i
                                                                                                                                 C
                                                                                                                                                          GEBA Project




                                                                                                                                      FB




                                                                   Major Phylogenetic Group
                                                                                                                                                                                                    Sargasso Phylotypes




                                                                                                                   C
                                                                                                                        hl
                                                                                                                          or
                                                                                                                            of
                                                                                                                               le
                                                                                                              Sp          xi
                                                                                                             iro
                                                                                                                 ch
                                                                                                                    ae
                                                                                                                        te
                                                                                                          Fu
                                                                                                             so            s
                                                                                              De                 ba
                                                                                                 in                 ct
                                                                                                                       er
                                                                                                    oc                    ia
                                                                                                       oc
                                                                                                          cu
                                                                                                                                                          metagenomic analysis




                                                                                                             s-
                                                                                                        Eu The
                                                                                                           ry       r
                                                                                                              ar mu
                                                                                                                 ch s
                                                                                                                    ae
                                                                                                        C              ot
                                                                                                          re               a
                                                                                                            na
                                                                                                                rc
                                                                                                                   ha
                                                                                                                     eo
                                                                                                                         ta
                                                                                                                                                                                                                 Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                                          EFG




                         Venter et al., Science 304: 66-74. 2004
                                                                                                                                                          EFTu



                                                                                                                                                          rRNA
                                                                                                                                                          RecA
                                                                                                                                                          RpoB
                                                                                                                                                          HSP70
GEBA Cyano
  Sequencing status (as of 01/14):
              Awaiting Material            11
              Library                      12
              Production                   22
              Finishing                     5
              Grand Total                  50


  On-going/ Planed Activities:
        - Building Cyanobacterial Metadatabase (IMG-GOLD)
  	

   - 10th Cyanobacterial Molecular Biology Workshop, Lake Arrowhead, CA (06/10)
  	

   	

       --> Cheryl will host: Workshop training as prep for virtual Jamboree




                                                                                         134
Tuesday, March 8, 2011
GEBA RNB
  Plan:
  Sequence multiple Root Nodule Bacteria (RNBs) across the
     planet. Pilot: 100 RNBs.                                               Beta RNB
                                                                             Cupriavidis
  Goal:                                                                      Burkholderia

  •    Understand BioGeographical effects on species evolution             Alpha RNB
                                                                             Azorhizobium
       and understand host-specificity.                                       Allorhizobium
                                                                             Bradyrhizobium
                                                                             Mesorhizobium
  Rationale:                                                                 Rhizobium
                                                                             Sinorhizobium
  •   N2 fixation by legume pastures and crops provides 65% of the N          Devosia
                                                                             Ochrobactrum
      currently utilized in agricultural production.                         Phyllobacterium
                                                                             Balneimonas-like
  •   Contributes 25 to 90 million metric tones N pa.
  •   Symbioses save $US 6-10 billion annually on N fertilizer.
  •   Grain and animal production enhanced by fixed nitrogen supplied
      by the symbiosis.




                                                                       Nikos Kyrpides
                                                                                   135
Tuesday, March 8, 2011
Haloarchaeal GEBA-like




Tuesday, March 8, 2011
Proteobacteria
• NSF-funded             TM6
                         OS-K
                                                 • At least 40
  Tree of Life           Acidobacteria
                         Termite Group             phyla of bacteria
                         OP8
  Project                Nitrospira
                                                 • Genome
                         Bacteroides

• A genome               Chlorobi
                         Fibrobacteres             sequences are
                         Marine GroupA
  from each of           WS3
                         Gemmimonas                mostly from
  eight phyla            Firmicutes
                         Fusobacteria              three phyla
                         Actinobacteria
                         OP9
                         Cyanobacteria
                                                 • Some other
                         Synergistes
                         Deferribacteres
                         Chrysiogenetes
                                                   phyla are only
                         NKB19
                         Verrucomicrobia           sparsely
                         Chlamydia
                         OP3
                         Planctomycetes
                                                   sampled
                         Spriochaetes
                         Coprothmermobacter      • Still not happy
                         OP10
                         Thermomicrobia
                         Chloroflexi
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae
Eisen & Ward, PIs        Thermudesulfobacteria
                         Thermotogae
                         OP1
                         OP11

Tuesday, March 8, 2011
Shotgun Sequencing Allows Use of Other Markers
                                                                                                   Sargasso Phylotypes
                       0.5000




                       0.3750                          GEBA Project
Weighted % of Clones




                       0.2500
                                                       improves                                                                                                                  EFG
                                                                                                                                                                                 EFTu
                                                                                                                                                                                 HSP70



                       0.1250
                                                       metagenomic analysis,                                                                                                     RecA
                                                                                                                                                                                 RpoB
                                                                                                                                                                                 rRNA



                                                       but only a little
                           0
                                               ia


                                                         ia


                                                                      ia


                                                                              ria


                                                                                           ia

                                                                                                   ia

                                                                                                            s


                                                                                                                        ia

                                                                                                                                 i

                                                                                                                                     FB



                                                                                                                                                                            xi



                                                                                                                                                                             s


                                                                                                                                                                            ia




                                                                                                                                                                   ch s

                                                                                                                                                                             a


                                                                                                                                                                           ta
                                                                                                                              ob
                                                                                                          te




                                                                                                                                                                          te




                                                                                                                                                                ar mu

                                                                                                                                                                         ot
                                                                                                                                                 le
                                            er


                                                      er


                                                                   er




                                                                                        er

                                                                                                er




                                                                                                                     er




                                                                                                                                                                         er




                                                                                                                                                                       eo
                                                                             te




                                                                                                                                     C
                                                                                                           u




                                                                                                                                                                      ae
                                                                                                                            or




                                                                                                                                                                      ae
                                                                                                                                              of
                                             t


                                                       t


                                                                  t




                                                                                       t

                                                                                               t




                                                                                                                      t




                                                                                                                                                                      ct
                                                                                                        ic




                                                                                                                                                                      r
                                          ac


                                                    ac


                                                               ac


                                                                           ac


                                                                                    ac

                                                                                            ac




                                                                                                                   ac




                                                                                                                                                                     ha
                                                                                                                                                          Eu The
                                                                                                                          hl




                                                                                                                                            or


                                                                                                                                                                   ch


                                                                                                                                                                   ba
                                                                                                   rm
                                       b

                                                 ob


                                                              ob


                                                                          b


                                                                                    b

                                                                                          ob




                                                                                                                ob

                                                                                                                          C




                                                                                                                                                                  rc
                                                                                                                                          hl


                                                                                                                                                               iro
                                    eo




                                                                       eo


                                                                                 eo




                                                                                                                                                               so
                                                                                                 Fi




                                                                                                                                                               s-




                                                                                                                                                              na
                                                                                                                                          C
                                                e


                                                          te




                                                                                        n




                                                                                                             tin




                                                                                                                                                             ry
                                                                                                                                                 Sp


                                                                                                                                                            Fu
                                      t

                                             ot




                                                                      t

                                                                              ot

                                                                                     ya




                                                                                                                                                            cu




                                                                                                                                                            re
                                   ro




                                                        ro


                                                                   ro




                                                                                                           Ac
                                           pr




                                                                            pr

                                                                                    C




                                                                                                                                                          C
                                                                                                                                                         oc
                                ap




                                                      ap


                                                                  np
                                       ta




                                                                           lta




                                                                                                                                                      oc
                             ph




                                                     m


                                                                lo
                                     Be




                                                                       De
                                                              si
                                                 am




                                                                                                                                                   in
                           Al




                                                           Ep




                                                                                                                                                De
                                                 G




                                                                                                    Major Phylogenetic Group



                                                                                                                                     Venter et al., Science 304: 66-74. 2004
           Tuesday, March 8, 2011
Phylogenomics Future 1

                       Need to adapt genomic and
                   metagenomic methods to make better
                               use of data



Tuesday, March 8, 2011
Improving Metagenomic Analysis

       • Methods
             – More automation
             – Better phylogenetic methods for short reads
             – Improved tools for using distantly related genomes
               in metagenomic analysis
       • Data sets
             – Rebuild protein family models
             – New phylogenetic markers
             – Need better reference phylogenies, including HGT
       • More simulations
Tuesday, March 8, 2011
Automation




Tuesday, March 8, 2011
AMPHORA




                         Guide tree
Tuesday, March 8, 2011
0
                                                                             0.1750
                                                                                      0.3500
                                                                                               0.5250
                                                                                                        0.7000
                                     Al
                                       ph
                                            ap
                                                ro
                                      Be           t eo
                                           ta            ba
                                              pr            ct
                                 G              ot             er
                                                                  i




Tuesday, March 8, 2011
                                     am             eo
                                        m               ba a
                                           ap               ct
                                                               er
                                   De rot                         ia
                                         lta        eo
                                             pr bac
                                Ep              ot
                                     si             eo teri
                                        lo              ba a
                         U                 np               ct
                           nc                  ro              er
                              la                   te             ia
                                 ss                   ob
                                    ifi                  ac
                                        ed
                                             Pr               te
                                                                 ria
                                             C ot
                                               yaeo
                                                    noba
                                                        bact
                                                               e
                                                 C cteria
                                                    hl
                                                       am ria
                                              Ac            yd
                                                                ia
                                                   id               e
                                                      ob
                                                         ac
                                              Ba              te
                                                   ct            ria
                                                      er
                                                         oi
                                             Ac             de
                                                 tin            te
                                                      ob            s
                                                         ac
                                                      Aq teri
                                                          ui a
                                           Pl                fic
                                              an                 ae
                                                  ct
                                                     om
                                               Sp yce
                                                    iro         te
                                                        ch s
                                                            a
                                                    Fi ete
                                                       rm           s
                                                          ic
                                                              ut
                                                    C            es
                                                      hl
                                                         or
                                                            of
                                                               le
                                                       C          x
                                  U                      hl i
                                                            or
                                    nc                         ob
                                         la                          i
                                            ss
                                               ifi
                                                   ed
                                                       Ba
                                                            ct
                                                               er
                                                                  ia
                                   frr




                                   tsf
                                   rpsI
                                   pgk




                                   rplL




                                   rplT
                                   rplF
                                   rplE




                                   rplS
                                   rplP
                                   rplA
                                   infC




                                   rplK
                                   rplB

                                   rplD




                                   rplN
                                   rplC




                                   rpsJ
                                   rplM




                                   rpsE




                                   rpsS
                                   rpsK
                                   rpsB
                                   rpsC
                                   rpoB
                                   pyrG
                                   nusA




                                   rpsM
                                   rpmA
                                   dnaG




                                   smpB
We have more than 700 compete genome sequences:

          •Select 100 representatives
          •Build gene families
          •Identify families that present in all organisms with equal numbers
          •Hmm building and phylogenetic analysis to identify the true makers




                  δ ε    α        β          γ
                         Proteobacteria                    Firmicutes

           Phylogenetic Tree of Bacteria (built from 31 concatenate marker align


Tuesday, March 8, 2011
More Markers




Tuesday, March 8, 2011
AMPHORA 2 Coming w/ More Markers

                   Phylogenetic group      Genome   Gene     Maker
                                           Number   Number   Candidates
                   Archaea                 62       145415   106

                   Actinobacteria          63       267783   136

                   Alphaproteobacteria     94       347287   121

                   Betaproteobacteria      56       266362   311

                   Gammaproteobacteria     126      483632   118

                   Deltaproteobacteria     25       102115   206

                   Epislonproteobacteria   18       33416    455

                   Bacteriodes             25       71531    286

                   Chlamydae               13       13823    560

                   Chloroflexi              10       33577    323

                   Cyanobacteria           36       124080   590

                   Firmicutes              106      312309   87

                   Spirochaetes            18       38832    176

                   Thermi                  5        14160    974

                   Thermotogae             9        17037    684



Tuesday, March 8, 2011
Distances between gene trees and the AMPHORA concatenated genome tree
       rpmA                                                                            coaE
        coaE                                                                          rpmA
       trmD                                                                              rplL
        rpsS                                                                           rpsQ
        radA                                                                            rplR
         rplD                                                                           rplQ
           tsf                                                                         rpsH
           frr                                                                        smpB
            ttf                                                                        rpsO
         rplR                                                                            rplP
         rplM                                                                          rpsS
          rplI                                                                          rplV
        rpsB                                                                             rplT
        rpsO                                                                            rplO
      mraW                                                                              rpsP
        rpsH                                                                           rpsK
         rplQ                                                                           rplU
          rplL                                                                             tsf
          rplT                                                                        trmD
         rplE                                                                           rplS
         rpsP                                                                              ttf
         rplC                                                                           rpsI
         rplV                                                                         mraW
         rplS                                                                           rpsL
         infC                                                                          rpsG
        rpsM                                                                            rplM
         rplO                                                                             rplI
         rplU                                                                          pyrH
         rpsL                                                                          rpsM
        rpsQ                                                                           ruvA
       guaA                                                                            radA
        rpsG                                                                           purA
       smpB                                                                             rplK
         priA                                                                           rplD
        rpsK                                                                             infC
         rplK                                                                           rplC
        serS                                                                             rplE
         rplA                                                                           rplA
          rplF                                                                             frr
        ruvA                                                                             rplF
        rpsC                                                                           serS
         rplN                                                                           rplN
          rplP                                                                        guaA
        rpsE                                                                           ruvB
        pyrH                                                                           rpsB
         rpsI                                                                           rpsJ
        secY                                                                       rRNA16S
         rpsJ                                                                          secY
        purA                                                                            rplB
         rplB                                                                           priA
        nusA                                                                           rpsE
        ruvB                                                                           rpsC
    rRNA16S                                                                            nusA
                  0          1        2         3        4         5        6                    0   0.1   0.2   0.3    0.4      0.5   0.6     0.7   0.8   0.9

                             NODAL distance                                                                                   SPLIT distance

                      AMPHORA marker      Ribosomal protein Transcription/translation related protein
                                                                                                   DNA repair protein     Protein of other function

                      Distance between the genome tree and 100 random trees (average ± standard deviation)

Tuesday, March 8, 2011
Fragments




Tuesday, March 8, 2011
Phylogenetic challenge




                               A single tree with everything




Tuesday, March 8, 2011
PhylOTU: A High-Throughput Procedure Quantifies
  Microbial Community Diversity and Resolves Novel Taxa
  from Metagenomic Data
  Thomas J. Sharpton1*, Samantha J. Riesenfeld1, Steven W. Kembel2, Joshua Ladau1, James P.
  O’Dwyer2,3, Jessica L. Green2, Jonathan A. Eisen4, Katherine S. Pollard1,5
  1 The J. David Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America, 2 Center for Ecology and Evolutionary
  Biology, University of Oregon, Eugene, Oregon, United States of America, 3 Institute of Integrative and Comparative Biology, University of Leeds, Leeds, United Kingdom,
  4 Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America, 5 Institute for Human Genetics & Division of Biostatistics,
                                                                                                                  Finding Metagenomic OTUs
  University of California San Francisco, San Francisco, California, United States of America



        Abstract
        Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic
        units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in
        priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids
        amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-
        finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs
        from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles.
        Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons
        of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries
        identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In
        addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by
        analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the
        biosphere currently hidden from PCR-based surveys of diversity?

     Citation: Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O’Dwyer JP, et al. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community
     Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061
     Editor: Oded Be ` , Technion-Israel Institute of Technology, Israel
                   ´ja
     Received July 22, 2010; Accepted December 17, 2010; Published January 20, 2011
   Copyright: ß 2011 Workflow. et al. This is an open-access article as squares and databases are represented as cylinders in this generalize
       Figure 1. PhylOTU
                          Sharpton Computational processes are represented distributed under the terms of the Creative Commons Attribution License, which permits
   unrestricted use, 8, 2011 Results reproduction in any medium, provided the original author and source are credited.
Tuesday, March       distribution, and section for details.
       workflow of PhylOTU. See
• Build               AMPHORA ALL
     reference
     tree with
     concatenated
     alignment
   • Align reads
     that match
     any of the
     HMMs to
     concatenated
     alignment
   • Place reads
     into
     reference
     tree one at a
     time

Tuesday, March 8, 2011
Phylogenomics Future 2

                         We have still only scratched the
                          surface of microbial diversity




Tuesday, March 8, 2011
rRNA Tree of Life
                         Bacteria




                                                                Archaea




                          Eukaryotes

                             Figure from Barton, Eisen et al.
                             “Evolution”, CSHL Press. 2007.
                          Based on tree from Pace 1997 Science
                                      276:734-740
Tuesday, March 8, 2011
Phylogenetic Diversity: Genomes




From Wu
et al. 2009
Nature
462,
1056-1060
Tuesday, March 8, 2011
Phylogenetic Diversity with GEBA




From Wu
et al. 2009
Nature
462,
1056-1060
Tuesday, March 8, 2011
Phylogenetic Diversity: Isolates




                             From Wu et al. 2009 Nature 462, 1056-1060
Tuesday, March 8, 2011
Phylogenetic Diversity: All




                                 From Wu et al. 2009 Nature 462, 1056-1060
Tuesday, March 8, 2011
Uncultured Lineages:


         •   Get into culture
         •   Enrichment cultures
         •   If abundant in low diversity ecosystems
         •   Flow sorting
         •   Microbeads
         •   Microfluidic sorting
         •   Single cell amplification

Tuesday, March 8, 2011
GEBA uncultured
    Number of SAGs from Candidate Phyla




                                                                406
                                                    1
                                             OD1

                                                   OP1

                                                         OP3

                                                               SAR
    Site   A: Hydrothermal vent               4      1    -     -
    Site   B: Gold Mine                       6     13    2     -
    Site   C: Tropical gyres (Mesopelagic)    -      -    -     2
    Site   D: Tropical gyres (Photic zone)    1      -    -     -




 Sample collections at 4 additional sites are underway.




                                                                       Phil Hugenholtz




                                                                                         159
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Phylogenomics Future 3

                     Need Experiments from Across the
                             Tree of Life too




Tuesday, March 8, 2011
As of 2002               Proteobacteria
                         TM6
                         OS-K                    • At least 40
                         Acidobacteria
                         Termite Group
                         OP8
                                                   phyla of
                         Nitrospira
                         Bacteroides               bacteria
                         Chlorobi
                         Fibrobacteres
                         Marine GroupA
                         WS3
                         Gemmimonas
                         Firmicutes
                         Fusobacteria
                         Actinobacteria
                         OP9
                         Cyanobacteria
                         Synergistes
                         Deferribacteres
                         Chrysiogenetes
                         NKB19
                         Verrucomicrobia
                         Chlamydia
                         OP3
                         Planctomycetes
                         Spriochaetes
                         Coprothmermobacter
                         OP10
                         Thermomicrobia
                         Chloroflexi
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae
                         Thermudesulfobacteria
                         Thermotogae
                         OP1                       Based on
                         OP11                      Hugenholtz, 2002
Tuesday, March 8, 2011
As of 2002               Proteobacteria
                         TM6
                         OS-K
                                                 • At least 40
                         Acidobacteria
                         Termite Group
                         OP8
                                                   phyla of
                         Nitrospira
                         Bacteroides               bacteria
                         Chlorobi
                         Fibrobacteres
                         Marine GroupA           • Experimental
                         WS3
                         Gemmimonas
                         Firmicutes
                                                   studies are
                         Fusobacteria
                         Actinobacteria
                                                   mostly from
                         OP9
                         Cyanobacteria
                         Synergistes
                                                   three phyla
                         Deferribacteres
                         Chrysiogenetes
                         NKB19
                         Verrucomicrobia
                         Chlamydia
                         OP3
                         Planctomycetes
                         Spriochaetes
                         Coprothmermobacter
                         OP10
                         Thermomicrobia
                         Chloroflexi
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae
                         Thermudesulfobacteria
                         Thermotogae
                         OP1                       Based on
                         OP11                      Hugenholtz, 2002
Tuesday, March 8, 2011
As of 2002               Proteobacteria
                         TM6
                         OS-K
                                                 • At least 40
                         Acidobacteria
                         Termite Group
                         OP8
                                                   phyla of
                         Nitrospira
                         Bacteroides               bacteria
                         Chlorobi
                         Fibrobacteres
                         Marine GroupA           • Experimental
                         WS3
                         Gemmimonas
                         Firmicutes
                                                   studies are
                         Fusobacteria
                         Actinobacteria
                                                   mostly from
                         OP9
                         Cyanobacteria
                         Synergistes
                                                   three phyla
                         Deferribacteres
                         Chrysiogenetes
                         NKB19
                                                 • Some studies
                         Verrucomicrobia
                         Chlamydia
                         OP3
                                                   in other phyla
                         Planctomycetes
                         Spriochaetes
                         Coprothmermobacter
                         OP10
                         Thermomicrobia
                         Chloroflexi
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae
                         Thermudesulfobacteria
                         Thermotogae
                         OP1                       Based on
                         OP11                      Hugenholtz, 2002
Tuesday, March 8, 2011
As of 2002               Proteobacteria
                         TM6
                         OS-K
                                                 • At least 40
                         Acidobacteria
                         Termite Group
                         OP8
                                                   phyla of
                         Nitrospira
                         Bacteroides               bacteria
                         Chlorobi
                         Fibrobacteres
                         Marine GroupA           • Genome
                         WS3
                         Gemmimonas
                         Firmicutes
                                                   sequences are
                         Fusobacteria
                         Actinobacteria
                                                   mostly from
                         OP9
                         Cyanobacteria
                         Synergistes
                                                   three phyla
                         Deferribacteres
                         Chrysiogenetes
                         NKB19
                                                 • Some other
                         Verrucomicrobia
                         Chlamydia
                         OP3
                                                   phyla are
                         Planctomycetes
                         Spriochaetes              only sparsely
                         Coprothmermobacter
                         OP10
                         Thermomicrobia
                                                   sampled
                         Chloroflexi
                         TM7
                         Deinococcus-Thermus
                                                 • Same trend in
                         Dictyoglomus
                         Aquificae
                         Thermudesulfobacteria
                                                   Eukaryotes
                         Thermotogae
                         OP1                       Based on
                         OP11                      Hugenholtz, 2002
Tuesday, March 8, 2011
As of 2002               Proteobacteria
                         TM6
                         OS-K
                                                 • At least 40
                         Acidobacteria
                         Termite Group
                         OP8
                                                   phyla of
                         Nitrospira
                         Bacteroides               bacteria
                         Chlorobi
                         Fibrobacteres
                         Marine GroupA           • Genome
                         WS3
                         Gemmimonas
                         Firmicutes
                                                   sequences are
                         Fusobacteria
                         Actinobacteria
                                                   mostly from
                         OP9
                         Cyanobacteria
                         Synergistes
                                                   three phyla
                         Deferribacteres
                         Chrysiogenetes
                         NKB19
                                                 • Some other
                         Verrucomicrobia
                         Chlamydia
                         OP3
                                                   phyla are
                         Planctomycetes
                         Spriochaetes              only sparsely
                         Coprothmermobacter
                         OP10
                         Thermomicrobia
                                                   sampled
                         Chloroflexi
                         TM7
                         Deinococcus-Thermus
                                                 • Same trend in
                         Dictyoglomus
                         Aquificae
                         Thermudesulfobacteria
                                                   Viruses
                         Thermotogae
                         OP1                       Based on
                         OP11                      Hugenholtz, 2002
Tuesday, March 8, 2011
Proteobacteria
                         TM6
                         OS-K
                                                   Need
                         Acidobacteria
                         Termite Group
                         OP8
                                                   experimental
                         Nitrospira
                         Bacteroides
                         Chlorobi
                                                   studies from
                         Fibrobacteres
                         Marine GroupA
                         WS3
                                                   across the tree
                         Gemmimonas
                         Firmicutes                too
                         Fusobacteria
                         Actinobacteria
                         OP9
                         Cyanobacteria
                         Synergistes
                         Deferribacteres
                         Chrysiogenetes
                         NKB19
                         Verrucomicrobia
                         Chlamydia
                         OP3
                         Planctomycetes
                         Spriochaetes                  0.1
                         Coprothmermobacter
                         OP10
                         Thermomicrobia
                         Chloroflexi
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae                Tree based on
                         Thermudesulfobacteria
                         Thermotogae             Hugenholtz (2002)
                         OP1                     with some
                         OP11                    modifications.
Tuesday, March 8, 2011
Proteobacteria
                         TM6
                         OS-K
                                                   Adopt a
                         Acidobacteria
                         Termite Group
                         OP8
                                                   Microbe
                         Nitrospira
                         Bacteroides
                         Chlorobi
                         Fibrobacteres
                         Marine GroupA
                         WS3
                         Gemmimonas
                         Firmicutes
                         Fusobacteria
                         Actinobacteria
                         OP9
                         Cyanobacteria
                         Synergistes
                         Deferribacteres
                         Chrysiogenetes
                         NKB19
                         Verrucomicrobia
                         Chlamydia
                         OP3
                         Planctomycetes
                         Spriochaetes                  0.1
                         Coprothmermobacter
                         OP10
                         Thermomicrobia
                         Chloroflexi
                         TM7
                         Deinococcus-Thermus
                         Dictyoglomus
                         Aquificae                Tree based on
                         Thermudesulfobacteria
                         Thermotogae             Hugenholtz (2002)
                         OP1                     with some
                         OP11                    modifications.
Tuesday, March 8, 2011
Conclusion


         • Phylogenetic sampling of genomes
           improves our understanding of microbial
           diversity in many ways
         • Still need
               – More biogeography
               – More phenotypic/experimental data
               – Deeper phylogenetic sampling



Tuesday, March 8, 2011
Tuesday, March 8, 2011
MICROBES




Tuesday, March 8, 2011
A Happy Tree of Life




Tuesday, March 8, 2011
Acknowledgements


         • GEBA: DOE-JGI, DSMZ
         • GWSS: Nancy Moran & lab, Dongying Wu
         • iSEEM: Katie Pollard, Jessica Green,
           Martin Wu
         • RecA: Dongying Wu, Craig Venter, Doug
           Rusch, et al.



Tuesday, March 8, 2011

Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

  • 1.
    Phylogenomics Jonathan A. Eisen UC Davis Bodega Applied Phylogenetics Workshop March 7, 2011 Tuesday, March 8, 2011
  • 2.
    Fleischmann et al. 1995 Science 269:496-512 Tuesday, March 8, 2011
  • 3.
    Whole Genome ShotgunSequencing Tuesday, March 8, 2011
  • 4.
    Whole Genome ShotgunSequencing Tuesday, March 8, 2011
  • 5.
    Whole Genome ShotgunSequencing Warner Brothers, Inc. Tuesday, March 8, 2011
  • 6.
    Whole Genome ShotgunSequencing shotgun Warner Brothers, Inc. Tuesday, March 8, 2011
  • 7.
    Whole Genome ShotgunSequencing shotgun Warner Brothers, Inc. Tuesday, March 8, 2011
  • 8.
    Whole Genome ShotgunSequencing shotgun Warner Brothers, Inc. sequence Tuesday, March 8, 2011
  • 9.
    Whole Genome ShotgunSequencing shotgun Warner Brothers, Inc. sequence Tuesday, March 8, 2011
  • 10.
  • 11.
    Assemble Fragments sequencer output Tuesday, March 8, 2011
  • 12.
    Assemble Fragments sequencer output Tuesday, March 8, 2011
  • 13.
    Assemble Fragments sequencer output assemble fragments Tuesday, March 8, 2011
  • 14.
    Assemble Fragments sequencer output assemble fragments Closure & Annotation Tuesday, March 8, 2011
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
    Genome Sequences Have Revolutionized Microbiology • Predictions of metabolic processes • Better vaccine and drug design • New insights into mechanisms of evolution • Genomes serve as template for functional studies • New enzymes and materials for engineering and synthetic biology Tuesday, March 8, 2011
  • 21.
    General Steps inAnalysis of Complete Genomes • Identification/prediction of genes • Characterization of gene features • Characterization of genome features • Prediction of gene function • Prediction of pathways • Integration with known biological data • Comparative genomics Tuesday, March 8, 2011
  • 22.
  • 23.
    Genome Structure: More Variable than Once Thought Tuesday, March 8, 2011
  • 24.
  • 25.
    Why Completeness is • Improves characterization of genome features – Gene order, replication origins • Better comparative genomics – Genome duplications, inversions • Presence and absence of particular genes can be very important • Missing sequence might be important (e.g., centromere) • Allows researchers to focus on biology not sequencing Tuesday, March 8, 2011
  • 26.
  • 27.
  • 28.
  • 29.
    Phylogenomic Analysis • Evolutionary reconstructions greatly improve genome analyses • Genome analysis greatly improves evolutionary reconstructions • There is a feedback loop such that these should be integrated Tuesday, March 8, 2011
  • 30.
    Outline • Phylogenomic Tales – Selecting genomes for sequencing – Species evolution – Predicting functions of genes – Uncultured microbes – Searching for novel organisms and genes Tuesday, March 8, 2011
  • 31.
    Outline • Phylogenomic Tales – Selecting genomes for sequencing – Species evolution – Predicting functions of genes – Uncultured microbes – Searching for novel organisms and genes • All of these going to be told in context of a recent project “A Genomic Encyclopedia of Bacteria and Archaea” (aka GEBA) Tuesday, March 8, 2011
  • 32.
    GEBA Introduction Knowing What We Don’t Know Tuesday, March 8, 2011
  • 33.
    Major Microbial Sequencing Efforts • Coordinated, top-down efforts – Fungal Genome Initiative (Broad/Whitehead) – Gordon and Betty Moore Foundation Marine Microbial Genome Sequencing Project – Sanger Center Pathogen Sequencing Unit – NHGRI Human Gut Microbiome Project – NIH Human Microbiome Program • White paper or grant systems – NIAID Microbial Sequencing Centers – DOE/JGI Community Sequencing Program – DOE/JGI BER Sequencing Program – NSF/USDA Microbial Genome Sequencing • Covers lots of ground and biological diversity Tuesday, March 8, 2011
  • 34.
    As of 2002 Tuesday,March 8, 2011
  • 35.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, March 8, 2011
  • 36.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, March 8, 2011
  • 37.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, March 8, 2011
  • 38.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, March 8, 2011
  • 39.
    Need for TreeGuidance Well Established • Common approach within some eukaryotic groups • Many small projects funded to fill in some bacterial or archaeal gaps • Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature Tuesday, March 8, 2011
  • 40.
    Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of OP8 Project Nitrospira Bacteroides bacteria Chlorobi • A genome Fibrobacteres Marine GroupA • Genome WS3 from each of Gemmimonas sequences are Firmicutes eight phyla Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are only Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Solution I: Dictyoglomus Eisen, Ward, Aquificae Thermudesulfobacteria sequence more Robb, Nelson, et Thermotogae phyla OP1 al OP11 Tuesday, March 8, 2011
  • 41.
    Organisms Selected Phylum Species selected Chrysiogenes Chrysiogenes arsenatis (GCA) Coprothermobacter Coprothermobacter proteolyticus (GCBP) Dictyoglomi Dictyoglomus thermophilum (GD T ) Thermodesulfobacteria Thermodesulfobacterium commune (GTC) Nitrospirae Thermodesulfovibrio yellowstonii (GTY) Thermomicrobia Thermomicrobium roseum (GTR ) Deferribacteres Geovibrio thiophilus (GGT) Synergistes Synergistes jonesii (GSJ) Tuesday, March 8, 2011
  • 42.
    Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Still highly OP10 Thermomicrobia Chloroflexi biased in terms TM7 Deinococcus-Thermus Dictyoglomus of the tree Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11 Tuesday, March 8, 2011
  • 43.
    Major Lineages ofActinobacteria 2.5 Actinobacteria 2.5.1 Acidimicrobidae 2.5.1 Acidimicrobidae 2.5.1.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.1 Unclassified 2.5.1.3 Acidimicrobineae 2.5.1.3.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.3.2 Acidimicrobiaceae 2.5.1.4 BD2-10 2.5.1.3 Acidimicrobineae 2.5.1.5 EB1017 2.5.2 Actinobacteridae 2.5.1.4 BD2-10 2.5.2.1 Unclassified 2.5.2.10 Ellin306/WR160 2.5.1.5 EB1017 2.5.2.11 Ellin5012 2.5.2.12 Ellin5034 2.5.2 Actinobacteridae 2.5.2.13 Frankineae 2.5.2.13.1 Unclassified 2.5.2.1 Unclassified 2.5.2.13.2 Acidothermaceae 2.5.2.10 Ellin306/WR160 2.5.2.13.3 2.5.2.13.4 Ellin6090 Frankiaceae 2.5.2.11 Ellin5012 2.5.2.13.5 2.5.2.13.6 Geodermatophilaceae Microsphaeraceae 2.5.2.12 Ellin5034 2.5.2.13.7 2.5.2.14 Sporichthyaceae Glycomyces 2.5.2.13 Frankineae 2.5.2.15 2.5.2.15.1 Intrasporangiaceae Unclassified 2.5.2.14 Glycomyces 2.5.2.15.2 2.5.2.15.3 Dermacoccus Intrasporangiaceae 2.5.2.15 Intrasporangiaceae 2.5.2.16 2.5.2.17 Kineosporiaceae Microbacteriaceae 2.5.2.16 Kineosporiaceae 2.5.2.17.1 2.5.2.17.2 Unclassified Agrococcus 2.5.2.17 Microbacteriaceae 2.5.2.17.3 2.5.2.18 Agromyces Micrococcaceae 2.5.2.18 Micrococcaceae 2.5.2.19 2.5.2.2 Micromonosporaceae Actinomyces 2.5.2.19 Micromonosporaceae 2.5.2.20 2.5.2.20.1 Propionibacterineae Unclassified 2.5.2.2 Actinomyces 2.5.2.20.2 2.5.2.20.3 Kribbella Nocardioidaceae 2.5.2.20 Propionibacterineae 2.5.2.20.4 2.5.2.21 Propionibacteriaceae Pseudonocardiaceae 2.5.2.21 Pseudonocardiaceae 2.5.2.22 2.5.2.22.1 Streptomycineae Unclassified 2.5.2.22 Streptomycineae 2.5.2.22.2 2.5.2.22.3 Kitasatospora Streptacidiphilus 2.5.2.23 Streptosporangineae 2.5.2.23 2.5.2.23.1 Streptosporangineae Unclassified 2.5.2.3 Actinomycineae 2.5.2.23.2 2.5.2.23.3 Ellin5129 Nocardiopsaceae 2.5.2.4 Actinosynnemataceae 2.5.2.23.4 2.5.2.23.5 Streptosporangiaceae Thermomonosporaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.3 Actinomycineae 2.5.2.4 Actinosynnemataceae 2.5.2.6 Brevibacteriaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.6 Brevibacteriaceae 2.5.2.7 Cellulomonadaceae 2.5.2.7 Cellulomonadaceae 2.5.2.8 Corynebacterineae 2.5.2.8 Corynebacterineae 2.5.2.8.1 Unclassified 2.5.2.8.2 Corynebacteriaceae 2.5.2.9 Dermabacteraceae 2.5.2.8.3 Dietziaceae 2.5.2.8.4 Gordoniaceae 2.5.3 Coriobacteridae 2.5.2.8.5 Mycobacteriaceae 2.5.2.8.6 Rhodococcus 2.5.3.1 Unclassified 2.5.2.8.7 Rhodococcus 2.5.2.8.8 Rhodococcus 2.5.3.2 Atopobiales 2.5.2.9 Dermabacteraceae 2.5.2.9.1 Unclassified 2.5.3.3 Coriobacteriales 2.5.2.9.2 Brachybacterium 2.5.2.9.3 Dermabacter 2.5.3.4 Eggerthellales 2.5.3 Coriobacteridae 2.5.3.1 Unclassified 2.5.4 OPB41 2.5.3.2 Atopobiales 2.5.3.3 Coriobacteriales 2.5.5 PK1 2.5.3.4 Eggerthellales 2.5.4 OPB41 2.5.6 Rubrobacteridae 2.5.5 PK1 2.5.6 Rubrobacteridae 2.5.6.1 Unclassified 2.5.6.1 Unclassified 2.5.6.2 "Thermoleiphilaceae 2.5.6.2 "Thermoleiphilaceae 2.5.6.2.1 Unclassified 2.5.6.2.2 Conexibacter 2.5.6.3 MC47 2.5.6.2.3 XGE514 2.5.6.3 MC47 2.5.6.4 Rubrobacteraceae 2.5.6.4 Rubrobacteraceae Tuesday, March 8, 2011
  • 44.
    Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Archaea TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11 Tuesday, March 8, 2011
  • 45.
    Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Eukaryotes TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11 Tuesday, March 8, 2011
  • 46.
    Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Viruses TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11 Tuesday, March 8, 2011
  • 47.
    Proteobacteria • GEBA TM6 OS-K • At least 40 Acidobacteria • A genomic Termite Group OP8 phyla of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria Fibrobacteres Marine GroupA sequences are and archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter OP10 • Solution: Really Thermomicrobia Chloroflexi Fill in the Tree TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Eisen & Ward, PIs Thermotogae OP1 OP11 Tuesday, March 8, 2011
  • 48.
  • 49.
    GEBA Pilot Project:Components • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow) • Project management (David Bruce, Eileen Dalin, Lynne Goodwin) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk) • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng) • Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al) • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla) • Adopt a microbe education project (Cheryl Kerfeld) • Outreach (David Gilbert) • $$$ (DOE, Eddy Rubin, Jim Bristow) Tuesday, March 8, 2011
  • 50.
    rRNA Tree ofLife FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Tuesday, March 8, 2011
  • 51.
  • 52.
  • 53.
  • 54.
    B: Ac in t ob ac te B: ria # of Genomes Am (H Tuesday, March 8, 2011 in igh 10 15 20 25 30 35 0 5 an G a C B: B: er ) Ba Aq ob ct uif ia B: ero ica B: e D Ch ide B: e ef lo te r s D rri ofl ef ba e B: e c xi B: De B rrib ter Ep lta : D act es si Pr ei er lo o n es n te oc Pr ob oc ot a ci B: e ct G B: oba eri am B F ct a : ir e B: m Fu mi ria a G P so cut em ro ba e t c s B: ma eo te ba ri H tim c a a t B: loa ona eri a B: Pl nae de an r te Th c o s Phyla er B: to bia m S m le y s B: od piro ce es c te T u h B: he lfo ae s rm b te GEBA Pilot Target List Th o a s er de cte m s ri u a A: ove lfo H n bi A: alo abu a A: A b la M rc ac e A: et ha te M han eo ria et g ha ob lob ac i A: no te m r A: The icr ia Th rm obi er oc a m oc op ci ro te i
  • 55.
    GEBA Pilot ProjectOverview • Identify major branches in rRNA tree for which no genomes are available • Identify those with a cultured representative in DSMZ • DSMZ grew > 200 of these and prepped DNA • Sequence and finish 200+ • Annotate, analyze, release data • Assess benefits of tree guided sequencing • 1st paper Wu et al in Nature Dec 2009 Tuesday, March 8, 2011
  • 56.
    GEBA Phylogenomic Lesson1 The rRNA Tree of Life is a Useful Tool for Identifying Phylogenetically Novel Genomes Tuesday, March 8, 2011
  • 57.
    rRNA Tree ofLife Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740 Tuesday, March 8, 2011
  • 58.
    The Core GetsSmall ... Tuesday, March 8, 2011
  • 59.
  • 60.
  • 61.
  • 62.
    Network of Life Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Tuesday, March 8, 2011
  • 63.
  • 64.
    Wh Wholegenome tree built using AMPHORA by Martin Wu and Dongying Wu Tuesday, March 8, 2011
  • 65.
  • 66.
    Four Models forRooting TOL from Lake et al. doi: 10.1098/rstb.2009.0035 Tuesday, March 8, 2011
  • 67.
    GEBA Phylogenomic Lesson2 rRNA Tree is good but not perfect and better genomic sampling improves phylogenetic inference Tuesday, March 8, 2011
  • 68.
    16s Says Hyphomonasis in Rhodobacteriales Badger et al. 2005 Tuesday, March 8, 2011
  • 69.
    WGT and individualgene trees: Its Related to Caulobacterales Badger et al. 2005 Tuesday, March 8, 2011
  • 70.
    16s WGT, 23S Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026. Tuesday, March 8, 2011
  • 71.
    Caveats: ignoring LGTand using concatenated alignments Tuesday, March 8, 2011
  • 72.
    Concatenated Alignment MLTree Tuesday, March 8, 2011
  • 73.
    Green Non SulfurBacteria Tuesday, March 8, 2011
  • 74.
  • 75.
  • 76.
    Zimmer. New YorkTimes. 2009 Tuesday, March 8, 2011
  • 77.
    GEBA Phylogenomic Lesson3 Phylogenetics guided genome selection (and phylogenetics in general) improves genome annotation Tuesday, March 8, 2011
  • 78.
    Predicting Function • Key step in genome projects • More accurate predictions help guide experimental and computational analyses • Many diverse approaches • All improved both by “phylogenomic” type analyses that integrate evolutionary reconstructions and understanding of how new functions evolve Tuesday, March 8, 2011
  • 79.
    From Eisen et al. 1997 Nature Medicine 3: 1076-1078. Tuesday, March 8, 2011
  • 80.
    Blast Search ofH. pylori “MutS” • Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs • Based on this TIGR predicted this species had mismatch repair Based on Eisen • Assumes functional constancy et al. 1997 Nature Medicine 3: 1076-1078. Tuesday, March 8, 2011
  • 81.
    Predicting Function • Identification of motifs – Short regions of sequence similarity that are indicative of general activity – e.g., ATP binding • Homology/similarity based methods – Gene sequence is searched against a databases of other sequences – If significant similar genes are found, their functional information is used • Problem – Genes frequently have similarity to hundreds of motifs and multiple genes, not all with the same function Tuesday, March 8, 2011
  • 82.
    MutL?? From http://asajj.roswellpark.org/huberman/dna_repair/mmr.html Tuesday, March 8, 2011
  • 83.
    Phylogenetic Tree ofMutS Family Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg mSaco Yeast Human Yeast Mouse Arath Celeg Human Arath Human Mouse Spombe Fly Yeast Xenla Rat Mouse Yeast Human Spombe Yeast Neucr Arath Aquae Trepa Chltr DeiraTheaq Thema BacsuBorbu Based on Eisen, SynspStrpy 1998 Nucl Acids Ecoli Neigo Res 26: 4291-4300. Tuesday, March 8, 2011
  • 84.
    MutS Subfamilies MSH5 MutS2 Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg mSaco MSH6 Yeast Human Mouse Arath Yeast MSH4 Celeg Human Arath Human MSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse Yeast MSH1 Spombe Human Yeast MSH2 Neucr Arath Aquae Trepa Chltr Deira Theaq BacsuBorbu Thema SynspStrpy Ecoli Neigo Based on Eisen, 1998 Nucl Acids MutS1 Res 26: 4291-4300. Tuesday, March 8, 2011
  • 85.
    Overlaying Functions ontoTree MutS2 MSH5 Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg MSH6 mSaco Yeast Human Mouse Arath YeastMSH4 Celeg Human Arath Human MSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse Yeast Human MSH1 Spombe Yeast MSH2 Neucr Arath Aquae Trepa Chltr DeiraTheaq BacsuBorbu Thema SynspStrpy Based on Eisen, Ecoli Neigo 1998 Nucl Acids MutS1 Res 26: 4291-4300. Tuesday, March 8, 2011
  • 86.
    Functional Prediction UsingTree MSH5 - Meiotic Crossing Over MutS2 - Unknown Functions Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg MSH6 - Nuclear mSaco Repair Yeast Of Mismatches Human MSH4 - Meiotic Crossing Mouse Yeast Over Arath Celeg Human Arath MSH3 - Nuclear Human Mouse RepairOf Loops Spombe Fly Yeast Xenla Rat Mouse MSH2 - Eukaryotic Nuclear Yeast Human Mismatch and Loop Repair MSH1 Spombe Yeast Neucr Mitochondrial Arath Repair Aquae Trepa Chltr DeiraTheaq BacsuBorbu Thema SynspStrpy Ecoli Based on Eisen, Neigo 1998 Nucl Acids MutS1 - Bacterial Mismatch and Loop Repair Res 26: 4291-4300. Tuesday, March 8, 2011
  • 87.
  • 88.
    PHYLOGENENETIC PREDICTION OFGENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 2A 3A 1B 2B 3B 1 2 3 4 5 6 1A INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 1A 1B 2A 2B 3A 3B 1 2 3 4 5 6 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Eisen, 1998 Genome Duplication Res 8: 163-167. Tuesday, March 8, 2011
  • 89.
    Phylogenetic Prediction of • Termed phylogenomics (Eisen, et al 1997) • Greatly improves accuracy of functional predictions compared to similarity based methods (e.g., blast) • Automated methods now available – Sean Eddy, Steven Brenner, Kimmen Sjölander, etc. • But … Tuesday, March 8, 2011
  • 90.
    Example 2: RecentChanges • Phylogenomic functional prediction NJ * ** V.cholerae0512 VC V.cholerae VCA1034 V.cholerae VC V.cholerae VC V.cholerae VC A0974 A0068 V.cholerae VC 0825 0282 may not work well for very newly V.cholerae VCA0906 V.cholerae VC A0979 V.cholerae VCA1056 V.cholerae VC1643 V.cholerae VC2161 ** V.cholerae VCA0923 ** V.cholerae VC0514 V.cholerae VC 1868 V.cholerae VC A0773 V.cholerae VC1313 evolved functions V.cholerae VC 1859 V.cholerae VC1413 V.cholerae VCA0268 ** V.cholerae VC A0658 V.cholerae VC 1405 * V.cholerae VC1298 V.cholerae VC1248 V.cholerae VCA0864 V.cholerae VCA0176 ** V.cholerae VCA0220 V.cholerae VC 1289 ** V.cholerae VC1069 A V.cholerae VC2439 • Can use understanding of origin of V.cholerae VC967 1 V.cholerae VC A0031 V.cholerae VC1898 V.cholerae VC A0663 V.cholerae VC0988 A V.cholerae VC0216 * V.cholerae VC0449 V.cholerae VCA0008 V.cholerae VC1406 V.cholerae VC 1535 novelty to better interpret these cases? V.cholerae VC0840 B.subtilis gi2633766 Synechocystis sp. gi1001299 * Synechocystis sp.gi1001300 * Synechocystis sp. gi1652276 * Synechocystis sp. gi1652103 H.pylori gi2313716 ** **H.pylori 99 gi4155097 C.jejuni Cj1190c C.jejuni Cj1110c A.fulgidus gi2649560 A.fulgidus gi2649548 ** B.subtilis gi2634254 • Screen genomes for genes that have B.subtilis gi2632630 B.subtilis gi2635607 B.subtilis gi2635608 ** B.subtilis gi2635609 ** ** B.subtilisgi2635882 gi2635610 B.subtilis E.coligi1788195 E.coli gi2367378 * ** E.coligi1788194 E.coli A1092 gi1787690 V.cholerae VC changed recently V.cholerae VC 0098 E.coli gi1789453 H.pylori gi2313186 H.pylori 99 gi4154603 ** C.jejuni Cj0144 C.jejuni Cj1564 **C.jejuni C.jejuni Cj0262c Cj1506c ** H.pylori gi2313163 * ** H.pylori 99 gi4154575 ** H.pylori gi2313179 H.pylori 99 gi4154599 – Pseudogenes and gene loss ** C.jejuni Cj0019c C.jejuni Cj0951c C.jejuni Cj0246c B.subtilis gi2633374 T.maritima TM0014 V.cholerae VC1403 V.cholerae VCA1088 T.pallidum gi3322777 ** T.pallidum gi3322939 ** T.pallidum gi3322938 B.burgdorferi gi2688522 – Contingency Loci T.pallidum gi3322296 B.burgdorferi gi2688521 * T.maritima TM0429 **T.maritima TM0918 * **T.maritima T.maritima TM0023 TM1428 T.maritima TM1143 T.maritima TM1146 P.abyssi PAB1308 P.horikoshii gi3256846 ** P.abyssiPAB1336 – Acquisition (e.g., LGT) ** P.horikoshii gi3256896 ** **P.abyssi PAB2066 ** P.horikoshii ** P.abyssi gi3258290 * PAB1026 ** P.horikoshii DRA00354 gi3256884 D.radiodurans D.radiodurans ** D.radioduransDRA0353 ** DRA0352 ** V.cholerae VC 1394 P.abyssi PAB1189 P.horikoshii gi3258414 – Unusual dS/dN ratios ** B.burgdorferi gi2688621 M.tuberculosis gi1666149 V.cholerae VC 0622 – Rapid evolutionary rates – Recent duplications Tuesday, March 8, 2011
  • 91.
    Example 3: Nonhomology methods • Many genes have homologs in other species but no homologs have ever been studied experimentally • Non-homology methods can make functional predictions for these • Example: phylogenetic profiling Tuesday, March 8, 2011
  • 92.
    Phylogenetic profiling basis • Microbial genes are lost rapidly when not maintained by selection • Genes can be acquired by lateral transfer • Frequently gain and loss occurs for entire pathways/processes • Thus might be able to use correlated presence/ absence information to identify genes with similar functions Tuesday, March 8, 2011
  • 93.
    Non-Homology Predictions: Phylogenetic Profiling • Step 1: Search all genes in organisms of interest against all other genomes • Ask: Yes or No, is each gene found in each other species • Cluster genes by distribution patterns (profiles) Tuesday, March 8, 2011
  • 94.
    Carboxydothermus hydrogenoformans • Isolated from a Russian hotspring • Thermophile (grows at 80°C) • Anaerobic • Grows very efficiently on CO (Carbon Monoxide) • Produces hydrogen gas • Low GC Gram positive (Firmicute) • Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. ) Tuesday, March 8, 2011
  • 95.
    Homologs of SporulationGenes Wu et al. 2005 PLoS Genetics 1: e65. Tuesday, March 8, 2011
  • 96.
    Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65. Tuesday, March 8, 2011
  • 97.
    Wu et al.2005 PLoS Genetics 1: e65. Tuesday, March 8, 2011
  • 98.
    PG Profiling WorksBetter Using Orthology Tuesday, March 8, 2011
  • 99.
    GEBA Lesson 3: Phylogeny driven genome selection (and phylogenetics) improves genome annotation • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology prediction Tuesday, March 8, 2011
  • 100.
    GEBA Lesson 4: Metadata Important Tuesday, March 8, 2011
  • 101.
    GEBA Phylogenomic Lesson5 Phylogeny-driven genome selection helps discover new genetic diversity Tuesday, March 8, 2011
  • 102.
    Network of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Tuesday, March 8, 2011
  • 103.
    Protein Family Rarefaction • Take data set of multiple complete genomes • Identify all protein families using MCL • Plot # of genomes vs. # of protein families Tuesday, March 8, 2011
  • 104.
    Wu et al.2009 Nature 462, 1056-1060 Tuesday, March 8, 2011
  • 105.
    Wu et al.2009 Nature 462, 1056-1060 Tuesday, March 8, 2011
  • 106.
    Wu et al.2009 Nature 462, 1056-1060 Tuesday, March 8, 2011
  • 107.
    Wu et al.2009 Nature 462, 1056-1060 Tuesday, March 8, 2011
  • 108.
    Wu et al.2009 Nature 462, 1056-1060 Tuesday, March 8, 2011
  • 109.
    Synapomorphies exist Wu etal. 2009 Nature 462, 1056-1060 Tuesday, March 8, 2011
  • 110.
    Families/PD not uniform +,%-./&#(%)"* !"#$%"&'(%)"* ! ! Tuesday, March 8, 2011
  • 111.
    Structural Novelty • Of the 17000 protein families in the GEBA56, 1800 are novel in sequence (Wu) • Structural modeling suggests many are structurally novel too (D'haeseleer) • 372 being crystallized by the PSI (Kerfeld) Tuesday, March 8, 2011
  • 112.
    GEBA Phylogenomic Lesson6 Improves analysis of genome data from uncultured organisms Tuesday, March 8, 2011
  • 113.
    Great Plate CountAnomaly Culturing Microscope Count Count Tuesday, March 8, 2011
  • 114.
    Great Plate CountAnomaly Culturing Microscope Count <<<< Count Tuesday, March 8, 2011
  • 115.
    Environmental DNA Analysis DNA Culturing Microscope Count <<<< Count Tuesday, March 8, 2011
  • 116.
    rRNA Phylotyping • Collect DNA from environment • PCR amplify rRNA genes using broad (so- called universal) primers • Sequence • Align to others • Infer evolutionary tree • Unknowns “identified” by placement on tree • Some use BLAST, but not as good as phylogeny Tuesday, March 8, 2011
  • 117.
    rRNA PCR The Hidden Majority Richness estimates Hugenholtz 2002 Bohannan and Hughes 2003 Tuesday, March 8, 2011
  • 118.
  • 119.
    rRNA data increasingexponentially too Tuesday, March 8, 2011
  • 120.
    rRNA phylotyping issues • Massive amounts of data – 1 x 10^6 new partial sequences with new 454 – 2 x 10^6 full length sequences in DB • Alignments of new sequences not always straightforward • Solutions: – Reliance on similarity scores (bad) – High throughput automated phylogenetic tools • STAP • WATERs Tuesday, March 8, 2011
  • 121.
    Perna et al.2003 Tuesday, March 8, 2011
  • 122.
  • 123.
  • 124.
  • 125.
    Diversity of Proteorhodopsinsby PCR de la Torre et al 2003 Tuesday, March 8, 2011
  • 126.
    Metagenomics shotgun sequence Tuesday, March 8, 2011
  • 127.
    Massiuve Diversity ofProteorhodopsins Venter et al., 2004 Tuesday, March 8, 2011
  • 128.
  • 129.
  • 130.
    Example I: FunctionalDiversity Tuesday, March 8, 2011
  • 131.
    Functional Diversity ofProteorhodopsins? Venter et al., Science 304: 66. 2004 Tuesday, March 8, 2011
  • 132.
    Example II: Phylotyping w/ many genes Tuesday, March 8, 2011
  • 133.
    rRNA Phylotyping inSargasso Sea Venter et al., Science 304: 66. 2004 Tuesday, March 8, 2011
  • 134.
    Shotgun Sequencing AllowsUse of Alternative Anchors (e.g., RecA) Venter et al., Science 304: 66. 2004 Tuesday, March 8, 2011
  • 135.
    Weighted % ofClones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro t eo Be b ac ta pr t er ot e ia G ob am Tuesday, March 8, 2011 ac m t er ap ia ro Ep te si ob lo ac np t er ro ia De t eo lta b ac pr te ot ria eo b C ac ya ter n ob ia ac t er Fi ia rm ic u te Ac s tin ob ac t er C ia hl or ob i C FB Major Phylogenetic Group Sargasso Phylotypes C hl or of le Sp xi iro ch ae te Fu so s De ba in ct er oc ia oc cu s- Eu The ry r ar mu ch s ae C ot re a na rc ha eo ta 304: 66. 2004 Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science EFTu rRNA RecA RpoB HSP70
  • 136.
    Weighted % ofClones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro t eo Be b ac ta pr t er ot e ia G ob am Tuesday, March 8, 2011 ac m t er ap ia ro Ep te si ob lo ac np t er ro ia De t eo lta b ac pr te ot ria eo b C ac ya ter n ob ia ac t er Fi ia rm ic u te Ac s tin ob ac t er C ia hl or ob i C FB Major Phylogenetic Group Sargasso Phylotypes C hl or of le Sp xi iro ch ae te Fu so s De ba in ct er oc ia oc cu s- Eu The ry r ar mu ch s ae C ot re a na rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 137.
    Weighted % ofClones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro t eo Be b ac ta pr t er ot e ia G ob am Tuesday, March 8, 2011 ac m t er ap ia ro Ep te si ob lo ac np t er ro ia De t eo lta b ac pr te ot ria eo b C ac ya ter n ob ia ac t er Fi ia rm ic u te Ac s tin ob ac t er C ia hl or ob i without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl or of le Sp xi iro ch ae te Fu so s De ba in ct er oc ia sampling of genomes oc cu s- Eu The ry r ar mu ch s ae C ot re a na rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 138.
    Example III: Binning Tuesday, March 8, 2011
  • 139.
  • 140.
  • 141.
    Binning challenge Best binning method: reference genomes Tuesday, March 8, 2011
  • 142.
    Binning challenge Best binning method: reference genomes Tuesday, March 8, 2011
  • 143.
    Binning challenge No reference genome? What do you do? Tuesday, March 8, 2011
  • 144.
  • 145.
    Weighted % ofClones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro t eo Be b ac ta pr t er ot e ia G ob am Tuesday, March 8, 2011 ac m t er ap ia ro Ep te si ob lo ac np t er ro ia De t eo lta b ac pr te ot ria eo b C ac ya ter n ob ia ac t er Fi ia rm ic u te Ac s tin ob ac t er C ia hl or ob i C FB Major Phylogenetic Group Sargasso Phylotypes C hl or of le Sp xi iro ch Phylogenetic Binning ae te Fu so s De ba in ct er oc ia oc cu s- Eu The ry r ar mu ch s ae C ot re a na rc ha eo ta EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 146.
    Weighted % ofClones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro t eo Be b ac ta pr t er ot e ia G ob am Tuesday, March 8, 2011 ac m t er ap ia ro Ep te si ob lo ac np t er ro ia De t eo lta b ac pr te ot ria eo b C ac ya ter n ob ia ac t er Fi ia rm ic u te Ac s tin ob ac t er C ia hl or ob i without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl or of le Sp xi iro ch ae te Fu so s De ba in ct er oc ia sampling of genomes oc cu s- Eu The ry r ar mu ch s ae C ot re a na rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 147.
    Weighted % ofClones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro t eo Be b ac ta pr t er ot e ia G ob am Tuesday, March 8, 2011 ac m t er ap ia ro Ep te si ob lo ac np t er ro ia De t eo lta b ac pr te ot ria eo b C ac ya ter n ob ia ac t er Fi ia rm ic u te improves Ac s tin ob ac t er C ia hl or ob i C GEBA Project FB Major Phylogenetic Group Sargasso Phylotypes C hl or of le Sp xi iro ch ae te Fu so s De ba in ct er oc ia oc cu metagenomic analysis s- Eu The ry r ar mu ch s ae C ot re a na rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 148.
    GEBA Cyano Sequencing status (as of 01/14): Awaiting Material 11 Library 12 Production 22 Finishing 5 Grand Total 50 On-going/ Planed Activities: - Building Cyanobacterial Metadatabase (IMG-GOLD) - 10th Cyanobacterial Molecular Biology Workshop, Lake Arrowhead, CA (06/10) --> Cheryl will host: Workshop training as prep for virtual Jamboree 134 Tuesday, March 8, 2011
  • 149.
    GEBA RNB Plan: Sequence multiple Root Nodule Bacteria (RNBs) across the planet. Pilot: 100 RNBs. Beta RNB Cupriavidis Goal: Burkholderia • Understand BioGeographical effects on species evolution Alpha RNB Azorhizobium and understand host-specificity. Allorhizobium Bradyrhizobium Mesorhizobium Rationale: Rhizobium Sinorhizobium • N2 fixation by legume pastures and crops provides 65% of the N Devosia Ochrobactrum currently utilized in agricultural production. Phyllobacterium Balneimonas-like • Contributes 25 to 90 million metric tones N pa. • Symbioses save $US 6-10 billion annually on N fertilizer. • Grain and animal production enhanced by fixed nitrogen supplied by the symbiosis. Nikos Kyrpides 135 Tuesday, March 8, 2011
  • 150.
  • 151.
    Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Still not happy OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11 Tuesday, March 8, 2011
  • 152.
    Shotgun Sequencing AllowsUse of Other Markers Sargasso Phylotypes 0.5000 0.3750 GEBA Project Weighted % of Clones 0.2500 improves EFG EFTu HSP70 0.1250 metagenomic analysis, RecA RpoB rRNA but only a little 0 ia ia ia ria ia ia s ia i FB xi s ia ch s a ta ob te te ar mu ot le er er er er er er er eo te C u ae or ae of t t t t t t ct ic r ac ac ac ac ac ac ac ha Eu The hl or ch ba rm b ob ob b b ob ob C rc hl iro eo eo eo so Fi s- na C e te n tin ry Sp Fu t ot t ot ya cu re ro ro ro Ac pr pr C C oc ap ap np ta lta oc ph m lo Be De si am in Al Ep De G Major Phylogenetic Group Venter et al., Science 304: 66-74. 2004 Tuesday, March 8, 2011
  • 153.
    Phylogenomics Future 1 Need to adapt genomic and metagenomic methods to make better use of data Tuesday, March 8, 2011
  • 154.
    Improving Metagenomic Analysis • Methods – More automation – Better phylogenetic methods for short reads – Improved tools for using distantly related genomes in metagenomic analysis • Data sets – Rebuild protein family models – New phylogenetic markers – Need better reference phylogenies, including HGT • More simulations Tuesday, March 8, 2011
  • 155.
  • 156.
    AMPHORA Guide tree Tuesday, March 8, 2011
  • 157.
    0 0.1750 0.3500 0.5250 0.7000 Al ph ap ro Be t eo ta ba pr ct G ot er i Tuesday, March 8, 2011 am eo m ba a ap ct er De rot ia lta eo pr bac Ep ot si eo teri lo ba a U np ct nc ro er la te ia ss ob ifi ac ed Pr te ria C ot yaeo noba bact e C cteria hl am ria Ac yd ia id e ob ac Ba te ct ria er oi Ac de tin te ob s ac Aq teri ui a Pl fic an ae ct om Sp yce iro te ch s a Fi ete rm s ic ut C es hl or of le C x U hl i or nc ob la i ss ifi ed Ba ct er ia frr tsf rpsI pgk rplL rplT rplF rplE rplS rplP rplA infC rplK rplB rplD rplN rplC rpsJ rplM rpsE rpsS rpsK rpsB rpsC rpoB pyrG nusA rpsM rpmA dnaG smpB
  • 158.
    We have morethan 700 compete genome sequences: •Select 100 representatives •Build gene families •Identify families that present in all organisms with equal numbers •Hmm building and phylogenetic analysis to identify the true makers δ ε α β γ Proteobacteria Firmicutes Phylogenetic Tree of Bacteria (built from 31 concatenate marker align Tuesday, March 8, 2011
  • 159.
  • 160.
    AMPHORA 2 Comingw/ More Markers Phylogenetic group Genome Gene Maker Number Number Candidates Archaea 62 145415 106 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684 Tuesday, March 8, 2011
  • 161.
    Distances between genetrees and the AMPHORA concatenated genome tree rpmA coaE coaE rpmA trmD rplL rpsS rpsQ radA rplR rplD rplQ tsf rpsH frr smpB ttf rpsO rplR rplP rplM rpsS rplI rplV rpsB rplT rpsO rplO mraW rpsP rpsH rpsK rplQ rplU rplL tsf rplT trmD rplE rplS rpsP ttf rplC rpsI rplV mraW rplS rpsL infC rpsG rpsM rplM rplO rplI rplU pyrH rpsL rpsM rpsQ ruvA guaA radA rpsG purA smpB rplK priA rplD rpsK infC rplK rplC serS rplE rplA rplA rplF frr ruvA rplF rpsC serS rplN rplN rplP guaA rpsE ruvB pyrH rpsB rpsI rpsJ secY rRNA16S rpsJ secY purA rplB rplB priA nusA rpsE ruvB rpsC rRNA16S nusA 0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 NODAL distance SPLIT distance AMPHORA marker Ribosomal protein Transcription/translation related protein DNA repair protein Protein of other function Distance between the genome tree and 100 random trees (average ± standard deviation) Tuesday, March 8, 2011
  • 162.
  • 163.
    Phylogenetic challenge A single tree with everything Tuesday, March 8, 2011
  • 164.
    PhylOTU: A High-ThroughputProcedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data Thomas J. Sharpton1*, Samantha J. Riesenfeld1, Steven W. Kembel2, Joshua Ladau1, James P. O’Dwyer2,3, Jessica L. Green2, Jonathan A. Eisen4, Katherine S. Pollard1,5 1 The J. David Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America, 2 Center for Ecology and Evolutionary Biology, University of Oregon, Eugene, Oregon, United States of America, 3 Institute of Integrative and Comparative Biology, University of Leeds, Leeds, United Kingdom, 4 Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America, 5 Institute for Human Genetics & Division of Biostatistics, Finding Metagenomic OTUs University of California San Francisco, San Francisco, California, United States of America Abstract Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU- finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity? Citation: Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O’Dwyer JP, et al. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061 Editor: Oded Be ` , Technion-Israel Institute of Technology, Israel ´ja Received July 22, 2010; Accepted December 17, 2010; Published January 20, 2011 Copyright: ß 2011 Workflow. et al. This is an open-access article as squares and databases are represented as cylinders in this generalize Figure 1. PhylOTU Sharpton Computational processes are represented distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, 8, 2011 Results reproduction in any medium, provided the original author and source are credited. Tuesday, March distribution, and section for details. workflow of PhylOTU. See
  • 165.
    • Build AMPHORA ALL reference tree with concatenated alignment • Align reads that match any of the HMMs to concatenated alignment • Place reads into reference tree one at a time Tuesday, March 8, 2011
  • 166.
    Phylogenomics Future 2 We have still only scratched the surface of microbial diversity Tuesday, March 8, 2011
  • 167.
    rRNA Tree ofLife Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740 Tuesday, March 8, 2011
  • 168.
    Phylogenetic Diversity: Genomes FromWu et al. 2009 Nature 462, 1056-1060 Tuesday, March 8, 2011
  • 169.
    Phylogenetic Diversity withGEBA From Wu et al. 2009 Nature 462, 1056-1060 Tuesday, March 8, 2011
  • 170.
    Phylogenetic Diversity: Isolates From Wu et al. 2009 Nature 462, 1056-1060 Tuesday, March 8, 2011
  • 171.
    Phylogenetic Diversity: All From Wu et al. 2009 Nature 462, 1056-1060 Tuesday, March 8, 2011
  • 172.
    Uncultured Lineages: • Get into culture • Enrichment cultures • If abundant in low diversity ecosystems • Flow sorting • Microbeads • Microfluidic sorting • Single cell amplification Tuesday, March 8, 2011
  • 173.
    GEBA uncultured Number of SAGs from Candidate Phyla 406 1 OD1 OP1 OP3 SAR Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - - Sample collections at 4 additional sites are underway. Phil Hugenholtz 159 Tuesday, March 8, 2011
  • 174.
  • 175.
  • 176.
  • 177.
  • 178.
    Phylogenomics Future 3 Need Experiments from Across the Tree of Life too Tuesday, March 8, 2011
  • 179.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, March 8, 2011
  • 180.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Experimental WS3 Gemmimonas Firmicutes studies are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, March 8, 2011
  • 181.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Experimental WS3 Gemmimonas Firmicutes studies are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some studies Verrucomicrobia Chlamydia OP3 in other phyla Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, March 8, 2011
  • 182.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Same trend in Dictyoglomus Aquificae Thermudesulfobacteria Eukaryotes Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, March 8, 2011
  • 183.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Same trend in Dictyoglomus Aquificae Thermudesulfobacteria Viruses Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, March 8, 2011
  • 184.
    Proteobacteria TM6 OS-K Need Acidobacteria Termite Group OP8 experimental Nitrospira Bacteroides Chlorobi studies from Fibrobacteres Marine GroupA WS3 across the tree Gemmimonas Firmicutes too Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes 0.1 Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Tree based on Thermudesulfobacteria Thermotogae Hugenholtz (2002) OP1 with some OP11 modifications. Tuesday, March 8, 2011
  • 185.
    Proteobacteria TM6 OS-K Adopt a Acidobacteria Termite Group OP8 Microbe Nitrospira Bacteroides Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes 0.1 Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Tree based on Thermudesulfobacteria Thermotogae Hugenholtz (2002) OP1 with some OP11 modifications. Tuesday, March 8, 2011
  • 186.
    Conclusion • Phylogenetic sampling of genomes improves our understanding of microbial diversity in many ways • Still need – More biogeography – More phenotypic/experimental data – Deeper phylogenetic sampling Tuesday, March 8, 2011
  • 187.
  • 188.
  • 189.
    A Happy Treeof Life Tuesday, March 8, 2011
  • 190.
    Acknowledgements • GEBA: DOE-JGI, DSMZ • GWSS: Nancy Moran & lab, Dongying Wu • iSEEM: Katie Pollard, Jessica Green, Martin Wu • RecA: Dongying Wu, Craig Venter, Doug Rusch, et al. Tuesday, March 8, 2011