Phylogenomics and the
Origin of Novelty in Microbes

        Jonathan A. Eisen
           UC Davis

  MBL Microbial Diversity Course
          July 9, 2011
Phylogenomics and the
Origin of Novelty in Microbes

        Jonathan A. Eisen
           UC Davis

  MBL Microbial Diversity Course
          July 9, 2011
My Obsessions

      Jonathan A. Eisen
         UC Davis

MBL Microbial Diversity Course
        July 9, 2011
Social Networking in Science




HOME PAGE        MY TIMES        TODAY'S PAPER   VIDEO   MOST POPULAR   TIMES TOPICS                                                          Welcome, fcollins    Member Center      Log Out



Sunday, April 1, 2007                                                          Health
WORLD          U.S.        N.Y. / REGION   BUSINESS      TECHNOLOGY      SCIENCE       HEALTH   SPORTS     OPINION    ARTS     STYLE      TRAVEL      JOBS     REAL ESTATE           AUTOS

                                                         FITNESS & NUTRITION   HEALTH CARE POLICY        MENTAL HEALTH & BEHAVIOR



Scientist Reveals Secret of the Ocean: It's Him
By NICHOLAS WADE
Published: April 1, 2007
                                                                                                   PRINT                                      nytimes.com/sports

Maverick scientist J. Craig Venter has done it again. It was just a few years                      SINGLE-PAGE
ago that Dr. Venter announced that the human genome sequenced by Celera
                                                                                                   SAVE
Genomics was in fact, mostly his own. And now, Venter has revealed a second
                                                                                                  SHARE
twist in his genomic self-examination. Venter was discussing his Global
                                                                                                  SHARE
Ocean Voyage, in which he used his personal yacht to collect ocean water
samples from around the world. He then used large filtration units to collect                                           How good is your bracket? Compare your tournament picks
                                                                                                                        to choices from members of The New York Times sports
microbes from the water samples which were then brought back to his high                                                desk and other players.

tech lab in Rockville, MD where he used the same methods that were used to                                              Also in Sports:
                                                                                                                           The Bracket Blog - all the news leading up to the Final
sequence the human genome to study the genomes of the 1000s of ocean                                                       Four
dwelling microbes found in each sample. In discussing the sampling methods, Venter let slip his                            Bats Blog: Spring training updates
                                                                                                                             Play Magazine: How to build a super athlete
latest attack on the standards of science – some of the samples were in fact not from the ocean, but
were from microbial habitats in and on his body.

“The human microbiome is the next frontier,” Dr. Venter said. “The ocean voyage was just a cover.
My main goal has always been to work on the microbes that live in and on people. And now that my
genome is nearly complete, why not use myself as the model for human microbiome studies as well.
”

It is certainly true that in the last few years, the microbes that live in and on people have become a
hot research topic. So hot that the same people who were involved in the race to sequence the human
Bacterial evolve
T. H. Dobzhansky (1973)


“Nothing in biology makes sense
except in the light of evolution.”
Evolutionary Perspective and
        Comparative Biology

• Comparative biology is the analysis of differences
  and similarities between species.
• An evolutionary perspective is useful in such studies
  because it allows one to focus on how and why
  similarities and differences came to be.
• In other words, biological objects have a history and
  understanding that history is important
Phylogenomic Analysis

• Evolutionary reconstructions greatly
  improve genome analyses
• Genome analysis greatly improves
  evolutionary reconstructions
• There is a feedback loop such that these
  should be integrated
Phylogenomics of Novelty



                                      Variation in
Mechanisms of
                                     Mechanisms:
Origin of New
                                    Patterns, Causes
  Functions
                                      and Effects




                Species Evolution
rRNA Tree of Life




   Figure from Barton, Eisen et al.
   “Evolution”, CSHL Press. 2007.
Based on tree from Pace 1997 Science
            276:734-740
Limited Sampling of RRR Studies




          Figure from Barton, Eisen et al.
          “Evolution”, CSHL Press. 2007.
       Based on tree from Pace 1997 Science
                   276:734-740
Limited Sampling of RRR Studies
                                                     Haloferax

                                                     Methanococcus
Chlorobium
Deinococcus
Thermotoga




                 Figure from Barton, Eisen et al.
                 “Evolution”, CSHL Press. 2007.
              Based on tree from Pace 1997 Science
                          276:734-740
Fleischmann et al.
1995 Science
269:496-512
TIGR Genome Projects


                                                      Methanococcus
Chlorobium
Deinococcus
Thermotoga




                  Figure from Barton, Eisen et al.
                  “Evolution”, CSHL Press. 2007.
               Based on tree from Pace 1997 Science
                           276:734-740
Fleischmann et al.
1995 Science
269:496-512
Whole Genome Shotgun Sequencing
Whole Genome Shotgun Sequencing
Whole Genome Shotgun Sequencing




Warner Brothers, Inc.
Whole Genome Shotgun Sequencing


                         shotgun


Warner Brothers, Inc.
Whole Genome Shotgun Sequencing


                         shotgun


Warner Brothers, Inc.
Whole Genome Shotgun Sequencing


                         shotgun


Warner Brothers, Inc.
                                   sequence
Whole Genome Shotgun Sequencing


                         shotgun


Warner Brothers, Inc.
                                   sequence
Assemble Fragments
Assemble Fragments


sequencer output
Assemble Fragments


sequencer output
Assemble Fragments


sequencer output

                   assemble
                   fragments
Assemble Fragments


sequencer output

                   assemble
                   fragments

                   Closure &

                   Annotation
From http://genomesonline.org
General Steps in Analysis of
         Complete Genomes
•   Identification/prediction of genes
•   Characterization of gene features
•   Characterization of genome features
•   Prediction of gene function
•   Prediction of pathways
•   Integration with known biological data
•   Comparative genomics
Genome Sequences Have
  Revolutionized Microbiology
• Predictions of metabolic processes
• Better vaccine and drug design
• New insights into mechanisms of evolution
• Genomes serve as template for functional
  studies
• New enzymes and materials for engineering
  and synthetic biology
From http://genomesonline.org
Outline


• Phylogenomic Tales
  –   Selecting genomes for sequencing
  –   Species evolution
  –   Predicting functions of genes
  –   Uncultured microbes
  –   Searching for novel organisms and genes
Outline


• Phylogenomic Tales
  –   Selecting genomes for sequencing
  –   Species evolution
  –   Predicting functions of genes
  –   Uncultured microbes
  –   Searching for novel organisms and genes
• All of these going to be told in context of a
  recent project “A Genomic Encyclopedia of
  Bacteria and Archaea” (aka GEBA)
GEBA Introduction

Knowing What We Don’t Know
Major Microbial Sequencing
                  Efforts
•   Coordinated, top-down efforts
     – Fungal Genome Initiative (Broad/Whitehead)
     – Gordon and Betty Moore Foundation Marine Microbial Genome Sequencing
       Project
     – Sanger Center Pathogen Sequencing Unit
     – NHGRI Human Gut Microbiome Project
     – NIH Human Microbiome Program
•   White paper or grant systems
     –   NIAID Microbial Sequencing Centers
     –   DOE/JGI Community Sequencing Program
     –   DOE/JGI BER Sequencing Program
     –   NSF/USDA Microbial Genome Sequencing
•   Covers lots of ground and biological diversity
As of 2002
As of 2002   Proteobacteria
             TM6
             OS-K                    • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA
             WS3
             Gemmimonas
             Firmicutes
             Fusobacteria
             Actinobacteria
             OP9
             Cyanobacteria
             Synergistes
             Deferribacteres
             Chrysiogenetes
             NKB19
             Verrucomicrobia
             Chlamydia
             OP3
             Planctomycetes
             Spriochaetes
             Coprothmermobacter
             OP10
             Thermomicrobia
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Genome
             WS3
             Gemmimonas
             Firmicutes
                                       sequences are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
             Verrucomicrobia
             Chlamydia
             OP3
             Planctomycetes
             Spriochaetes
             Coprothmermobacter
             OP10
             Thermomicrobia
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Genome
             WS3
             Gemmimonas
             Firmicutes
                                       sequences are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
                                     • Some other
             Verrucomicrobia
             Chlamydia
             OP3
                                       phyla are
             Planctomycetes
             Spriochaetes              only sparsely
             Coprothmermobacter
             OP10
             Thermomicrobia
                                       sampled
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Genome
             WS3
             Gemmimonas
             Firmicutes
                                       sequences are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
                                     • Some other
             Verrucomicrobia
             Chlamydia
             OP3
                                       phyla are
             Planctomycetes
             Spriochaetes              only sparsely
             Coprothmermobacter
             OP10
             Thermomicrobia
                                       sampled
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
Need for Tree Guidance Well Established

• Common approach within some eukaryotic
  groups

• Many small projects funded to fill in some
  bacterial or archaeal gaps

• Phylogenetic gaps in bacterial and archaeal
  projects commonly lamented in literature
Proteobacteria
• NSF-funded       TM6
                   OS-K
                                           • At least 40
  Tree of Life     Acidobacteria
                   Termite Group             phyla of
                   OP8
  Project          Nitrospira
                   Bacteroides               bacteria
                   Chlorobi
• A genome         Fibrobacteres
                   Marine GroupA           • Genome
                   WS3
  from each of     Gemmimonas                sequences are
                   Firmicutes
  eight phyla      Fusobacteria
                                             mostly from
                   Actinobacteria
                   OP9
                   Cyanobacteria
                   Synergistes
                                             three phyla
                   Deferribacteres
                   Chrysiogenetes
                   NKB19
                                           • Some other
                   Verrucomicrobia
                   Chlamydia
                   OP3
                                             phyla are only
                   Planctomycetes
                   Spriochaetes              sparsely
                   Coprothmermobacter
                   OP10
                   Thermomicrobia
                                             sampled
                   Chloroflexi
                   TM7
                   Deinococcus-Thermus
                                           • Solution I:
                   Dictyoglomus
Eisen, Ward,       Aquificae
                   Thermudesulfobacteria
                                             sequence more
Robb, Nelson, et   Thermotogae
                                             phyla
                   OP1
al                 OP11
Organisms Selected
Phylum                  Species selected


Chrysiogenes            Chrysiogenes arsenatis (GCA)

Coprothermobacter       Coprothermobacter proteolyticus (GCBP)

Dictyoglomi             Dictyoglomus thermophilum (GD T )

Thermodesulfobacteria   Thermodesulfobacterium commune (GTC)

Nitrospirae             Thermodesulfovibrio yellowstonii (GTY)

Thermomicrobia          Thermomicrobium roseum (GTR )

Deferribacteres         Geovibrio thiophilus (GGT)

Synergistes             Synergistes jonesii (GSJ)
Proteobacteria
• NSF-funded        TM6
                    OS-K
                                            • At least 40
  Tree of Life      Acidobacteria
                    Termite Group             phyla of bacteria
                    OP8
  Project           Nitrospira
                                            • Genome
                    Bacteroides

• A genome          Chlorobi
                    Fibrobacteres             sequences are
                    Marine GroupA
  from each of      WS3
                    Gemmimonas                mostly from
  eight phyla       Firmicutes
                    Fusobacteria              three phyla
                    Actinobacteria
                    OP9
                    Cyanobacteria
                                            • Some other
                    Synergistes
                    Deferribacteres
                    Chrysiogenetes
                                              phyla are only
                    NKB19
                    Verrucomicrobia           sparsely
                    Chlamydia
                    OP3
                    Planctomycetes
                                              sampled
                    Spriochaetes
                    Coprothmermobacter      • Still highly
                    OP10
                    Thermomicrobia
                    Chloroflexi
                                              biased in terms
                    TM7
                    Deinococcus-Thermus
                    Dictyoglomus
                                              of the tree
                    Aquificae
Eisen & Ward, PIs   Thermudesulfobacteria
                    Thermotogae
                    OP1
                    OP11
Major Lineages of Actinobacteria
                                                      2.5 Actinobacteria
                                         2.5.1            Acidimicrobidae
        2.5.1      Acidimicrobidae       2.5.1.1          Unclassified
                                         2.5.1.2          "Microthrixineae
        2.5.1.1    Unclassified          2.5.1.3          Acidimicrobineae
                                         2.5.1.3.1        Unclassified
        2.5.1.2    "Microthrixineae      2.5.1.3.2        Acidimicrobiaceae
                                         2.5.1.4          BD2-10
        2.5.1.3    Acidimicrobineae      2.5.1.5          EB1017
                                         2.5.2            Actinobacteridae
        2.5.1.4    BD2-10                2.5.2.1          Unclassified
                                         2.5.2.10         Ellin306/WR160
        2.5.1.5    EB1017                2.5.2.11         Ellin5012
                                         2.5.2.12         Ellin5034
        2.5.2      Actinobacteridae      2.5.2.13         Frankineae
                                         2.5.2.13.1       Unclassified
        2.5.2.1    Unclassified          2.5.2.13.2       Acidothermaceae

        2.5.2.10   Ellin306/WR160        2.5.2.13.3
                                         2.5.2.13.4
                                                          Ellin6090
                                                          Frankiaceae

        2.5.2.11   Ellin5012             2.5.2.13.5
                                         2.5.2.13.6
                                                          Geodermatophilaceae
                                                          Microsphaeraceae

        2.5.2.12   Ellin5034             2.5.2.13.7
                                         2.5.2.14
                                                          Sporichthyaceae
                                                          Glycomyces
        2.5.2.13   Frankineae            2.5.2.15
                                         2.5.2.15.1
                                                          Intrasporangiaceae
                                                          Unclassified
        2.5.2.14   Glycomyces            2.5.2.15.2
                                         2.5.2.15.3
                                                          Dermacoccus
                                                          Intrasporangiaceae
        2.5.2.15   Intrasporangiaceae    2.5.2.16
                                         2.5.2.17
                                                          Kineosporiaceae
                                                          Microbacteriaceae
        2.5.2.16   Kineosporiaceae       2.5.2.17.1
                                         2.5.2.17.2
                                                          Unclassified
                                                          Agrococcus
        2.5.2.17   Microbacteriaceae     2.5.2.17.3
                                         2.5.2.18
                                                          Agromyces
                                                          Micrococcaceae
        2.5.2.18   Micrococcaceae        2.5.2.19
                                         2.5.2.2
                                                          Micromonosporaceae
                                                          Actinomyces
        2.5.2.19   Micromonosporaceae    2.5.2.20
                                         2.5.2.20.1
                                                          Propionibacterineae
                                                          Unclassified
        2.5.2.2    Actinomyces           2.5.2.20.2
                                         2.5.2.20.3
                                                          Kribbella
                                                          Nocardioidaceae
        2.5.2.20   Propionibacterineae   2.5.2.20.4
                                         2.5.2.21
                                                          Propionibacteriaceae
                                                          Pseudonocardiaceae
        2.5.2.21   Pseudonocardiaceae    2.5.2.22
                                         2.5.2.22.1
                                                          Streptomycineae
                                                          Unclassified
        2.5.2.22   Streptomycineae       2.5.2.22.2
                                         2.5.2.22.3
                                                          Kitasatospora
                                                          Streptacidiphilus
        2.5.2.23   Streptosporangineae   2.5.2.23
                                         2.5.2.23.1
                                                          Streptosporangineae
                                                          Unclassified
        2.5.2.3    Actinomycineae        2.5.2.23.2
                                         2.5.2.23.3
                                                          Ellin5129
                                                          Nocardiopsaceae
        2.5.2.4    Actinosynnemataceae   2.5.2.23.4
                                         2.5.2.23.5
                                                          Streptosporangiaceae
                                                          Thermomonosporaceae
        2.5.2.5    Bifidobacteriaceae    2.5.2.3          Actinomycineae
                                         2.5.2.4          Actinosynnemataceae
        2.5.2.6    Brevibacteriaceae     2.5.2.5          Bifidobacteriaceae
                                         2.5.2.6          Brevibacteriaceae
        2.5.2.7    Cellulomonadaceae     2.5.2.7          Cellulomonadaceae
                                         2.5.2.8          Corynebacterineae
        2.5.2.8    Corynebacterineae     2.5.2.8.1        Unclassified
                                         2.5.2.8.2        Corynebacteriaceae
        2.5.2.9    Dermabacteraceae      2.5.2.8.3        Dietziaceae
                                         2.5.2.8.4        Gordoniaceae
        2.5.3      Coriobacteridae       2.5.2.8.5        Mycobacteriaceae
                                         2.5.2.8.6        Rhodococcus
        2.5.3.1    Unclassified          2.5.2.8.7        Rhodococcus
                                         2.5.2.8.8        Rhodococcus
        2.5.3.2    Atopobiales           2.5.2.9          Dermabacteraceae
                                         2.5.2.9.1        Unclassified
        2.5.3.3    Coriobacteriales      2.5.2.9.2        Brachybacterium
                                         2.5.2.9.3        Dermabacter
        2.5.3.4    Eggerthellales        2.5.3            Coriobacteridae
                                         2.5.3.1          Unclassified
        2.5.4      OPB41                 2.5.3.2          Atopobiales
                                         2.5.3.3          Coriobacteriales
        2.5.5      PK1                   2.5.3.4          Eggerthellales
                                         2.5.4            OPB41
        2.5.6      Rubrobacteridae       2.5.5            PK1
                                         2.5.6            Rubrobacteridae
        2.5.6.1    Unclassified          2.5.6.1          Unclassified
                                         2.5.6.2          "Thermoleiphilaceae
        2.5.6.2    "Thermoleiphilaceae   2.5.6.2.1        Unclassified
                                         2.5.6.2.2        Conexibacter
        2.5.6.3    MC47                  2.5.6.2.3        XGE514
                                         2.5.6.3          MC47
        2.5.6.4    Rubrobacteraceae      2.5.6.4          Rubrobacteraceae
Proteobacteria
• NSF-funded        TM6
                    OS-K
                                            • At least 40
  Tree of Life      Acidobacteria
                    Termite Group             phyla of bacteria
                    OP8
  Project           Nitrospira
                                            • Genome
                    Bacteroides

• A genome          Chlorobi
                    Fibrobacteres             sequences are
                    Marine GroupA
  from each of      WS3
                    Gemmimonas                mostly from
  eight phyla       Firmicutes
                    Fusobacteria              three phyla
                    Actinobacteria
                    OP9
                    Cyanobacteria
                                            • Some other
                    Synergistes
                    Deferribacteres
                    Chrysiogenetes
                                              phyla are only
                    NKB19
                    Verrucomicrobia           sparsely
                    Chlamydia
                    OP3
                    Planctomycetes
                                              sampled
                    Spriochaetes
                    Coprothmermobacter      • Same trend in
                    OP10
                    Thermomicrobia
                    Chloroflexi
                                              Archaea
                    TM7
                    Deinococcus-Thermus
                    Dictyoglomus
                    Aquificae
Eisen & Ward, PIs   Thermudesulfobacteria
                    Thermotogae
                    OP1
                    OP11
Proteobacteria
• NSF-funded        TM6
                    OS-K
                                            • At least 40
  Tree of Life      Acidobacteria
                    Termite Group             phyla of bacteria
                    OP8
  Project           Nitrospira
                                            • Genome
                    Bacteroides

• A genome          Chlorobi
                    Fibrobacteres             sequences are
                    Marine GroupA
  from each of      WS3
                    Gemmimonas                mostly from
  eight phyla       Firmicutes
                    Fusobacteria              three phyla
                    Actinobacteria
                    OP9
                    Cyanobacteria
                                            • Some other
                    Synergistes
                    Deferribacteres
                    Chrysiogenetes
                                              phyla are only
                    NKB19
                    Verrucomicrobia           sparsely
                    Chlamydia
                    OP3
                    Planctomycetes
                                              sampled
                    Spriochaetes
                    Coprothmermobacter      • Same trend in
                    OP10
                    Thermomicrobia
                    Chloroflexi
                                              Eukaryotes
                    TM7
                    Deinococcus-Thermus
                    Dictyoglomus
                    Aquificae
Eisen & Ward, PIs   Thermudesulfobacteria
                    Thermotogae
                    OP1
                    OP11
Proteobacteria
• NSF-funded        TM6
                    OS-K
                                            • At least 40
  Tree of Life      Acidobacteria
                    Termite Group             phyla of bacteria
                    OP8
  Project           Nitrospira
                                            • Genome
                    Bacteroides

• A genome          Chlorobi
                    Fibrobacteres             sequences are
                    Marine GroupA
  from each of      WS3
                    Gemmimonas                mostly from
  eight phyla       Firmicutes
                    Fusobacteria              three phyla
                    Actinobacteria
                    OP9
                    Cyanobacteria
                                            • Some other
                    Synergistes
                    Deferribacteres
                    Chrysiogenetes
                                              phyla are only
                    NKB19
                    Verrucomicrobia           sparsely
                    Chlamydia
                    OP3
                    Planctomycetes
                                              sampled
                    Spriochaetes
                    Coprothmermobacter      • Same trend in
                    OP10
                    Thermomicrobia
                    Chloroflexi
                                              Viruses
                    TM7
                    Deinococcus-Thermus
                    Dictyoglomus
                    Aquificae
Eisen & Ward, PIs   Thermudesulfobacteria
                    Thermotogae
                    OP1
                    OP11
Proteobacteria
• GEBA              TM6
                    OS-K                    • At least 40
                    Acidobacteria
• A genomic         Termite Group
                    OP8
                                              phyla of bacteria
  encyclopedia      Nitrospira
                    Bacteroides             • Genome
                    Chlorobi
  of bacteria       Fibrobacteres
                    Marine GroupA
                                              sequences are
  and archaea       WS3
                    Gemmimonas                mostly from
                    Firmicutes
                    Fusobacteria              three phyla
                    Actinobacteria
                    OP9
                    Cyanobacteria           • Some other
                    Synergistes
                    Deferribacteres
                    Chrysiogenetes
                                              phyla are only
                    NKB19
                    Verrucomicrobia           sparsely
                    Chlamydia
                    OP3
                    Planctomycetes
                                              sampled
                    Spriochaetes
                    Coprothmermobacter
                    OP10
                                            • Solution: Really
                    Thermomicrobia
                    Chloroflexi                Fill in the Tree
                    TM7
                    Deinococcus-Thermus
                    Dictyoglomus
                    Aquificae
                    Thermudesulfobacteria
Eisen & Ward, PIs   Thermotogae
                    OP1
                    OP11
http://www.jgi.doe.gov/programs/GEBA/pilot.html
GEBA Pilot Project: Components
• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan
  Eisen, Eddy Rubin, Jim Bristow)
• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)
• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus,
  Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)
• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et
  al)
• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor
  Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik
  D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N.
  Ivanova, Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)
• Outreach (David Gilbert)
• $$$ (DOE, Eddy Rubin, Jim Bristow)
rRNA Tree of Life




 FIgure from Barton, Eisen et al.
    “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
B:
             Ac
               tin
                     ob
                       ac
                      te
                   B: ria                                  # of Genomes
                       Am (H
                           in igh




                                                      10
                                                            15
                                                                  20
                                                                          25
                                                                               30
                                                                                    35




                                              0
                                                  5
                              an G
                                a C
                     B: B: er )
                         Ba    Aq ob
                            ct uif ia
                         B: ero ica
                  B:                    e
                      D Ch ide
                  B: e  ef lo te
                                  r      s
                      D rri ofl
                        ef ba e
            B:             e      c xi
         B: De B rrib ter
             Ep lta : D act es
                si Pr ei er
                  lo o n es
                     n te oc
                       Pr ob oc
                          ot a ci
         B:                  e ct
            G            B: oba eri
              am B F ct a
                        : ir e
            B: m Fu mi ria
                     a
                G P so cut
                 em ro ba e
                            t      c s
                B: ma eo te
                                ba ri
                    H tim c a
                      a             t
                 B: loa ona eri
                                        a
         B:           Pl nae de
                         an r te
            Th              c o          s




Phyla
               er B: to bia
                  m S           m le
                                 y s
               B: od piro ce
                        es c te
                   T       u h
               B: he lfo ae s
                        rm b te
                                                                                         GEBA Pilot Target List




                   Th o a                s
                      er de cte
                         m s ri
                                 u a
                      A: ove lfo
                          H n bi
                     A: alo abu a
                A:        A b la
                    M rc ac e
                A: et ha te
                    M han eo ria
                      et            g
                         ha ob lob
                                 ac i
                      A: no te
                               m r
                     A: The icr ia
                         Th rm obi
                            er oc a
                               m oc
                                 op ci
                                    ro
                                       te
                                          i
GEBA Pilot Project Overview

• Identify major branches in rRNA tree for
  which no genomes are available
• Identify those with a cultured representative
  in DSMZ
• DSMZ grew > 200 of these and prepped
  DNA
• Sequence and finish 200+
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
• 1st paper Wu et al in Nature Dec 2009
Assess Benefits of GEBA


• All genomes have some value

• But what, if any, is the benefit of tree-
  guided sequencing over other selection
  methods

• Lessons for other large scale microbial
  genome projects?
GEBA Phylogenomic Lesson 1

 The rRNA Tree of Life is a Useful Tool
 for Identifying Phylogenetically Novel
                Genomes
rRNA Tree of Life
Bacteria




                                       Archaea




 Eukaryotes

    Figure from Barton, Eisen et al.
    “Evolution”, CSHL Press. 2007.
 Based on tree from Pace 1997 Science
             276:734-740
The Core Gets Small ...
The Pangenome
Islands Among Synteny
Network of Life
Bacteria




                                       Archaea




 Eukaryotes

    Figure from Barton, Eisen et al.
       “Evolution”, CSHL Press.
  Based on tree from Pace NR, 2003.
T. roseum mobile motility element




                 Wu et al doi:10.1371/journal.pone.0004207
Phylogenetic Distribution Novelty:
                Bacterial Actin Related Protein
                                                                2"#3)&4&*&& !"#*)$*),+%
                                                                5"#$-.-6&0&1- !"#$%,$-%)(
                                                               7"#0(1.8-9& !"#$''+-+,',!
                                                               5"#:1,)*&$/0 !"#&$,%+)+-+                                   !"#$%
                                                                 !"#$%&'()*&& !"#$%&'(%()
                                                         ((      +"#,-.(/01 !"#*+,**'+(
                                                              ;"#01,&-*0 !"#%*+$--(
                                                             <"#$-.-3.1%&0 !"#%',&'-+)
                                                             ')     2"#$&*-.-1 !"#$'(-%%+&$
                                                                       ="#$.1001 !"#-*$+$(&(                                !&'(
                                                           $++          >"#0$1,/%1.&0 !"#&$**+),)-!
                                                    *$          $++ ;"#01,&-*0 !"#*+,$*'(
                                                                     '*        5"#:1,)*&$/0 !"#&$,%+%-%%
                                                                  $++         5"#$-.-6&0&1- !"#',&+$)*
                                                                                                                            !&')
                                                                              ?"#@-%1*)A10(-. !"#&%'%&*%*
                                                                     $++ B"#A1%%/0# "#%*,-&*'(
                                                                         )*     2"#*-)').@1*0 !"#*-&'''(+
                                                                                 5"#$-.-6&0&1- !"#',&&*&*                   !&'*
                                                                      $++       ?"#@-%1*)A10(-. !"#$)),)*%,
                                                                         $++ ;"#01,&-*0 !"#*+,$*),!
                                                                                  ;"#)$C.1$-/@ !"#&&),(*((-                 +!&'
                                                                                       5"#$-.-6&0&1- !"#$++-&%%!
                                                     ),                    ."#,1(-*0 !"#$'-+*$((&!                          !&',
                                                                 ((      !"#(C1%&1*1 !"#$-,(%'+-!
                                                                        (%                 5"#$-.-6&0&1- !"#$,+$(,&
                                                               $++                          5"#:1,)*&$/0 !"#&$,%+-,(,!      !&'-
                                             -)                                         ?"#4&0$)&4-/@ !"#''-+&%$-
                                                      )%                                  ?"#@-%1*)A10(-. !"#$)),),%)
                                                              ()                                   5"#$-.-6&0&1- !"#',&,$$%
                                                                           $++               ?"#C1*0-*&&!"#&$-*$ $(&$       !&'.
                                                                          $++     D"#01(&61 !"#$-&'*)%&+!
                                                                                   !"#(C1%&1*1!"#$-%$ $),)                  !&'/
                                                                            ?"#@-%1*)A1(-. !"#$((&+,*-
                                                     $++               <"#@/0$/%/0 !"#&&'&%'*(,                           !&'(0


                                             +/*!



   Haliangium ochraceum DSM 14365                   Patrik D’haeseleer, Adam Zemla, Victor Kunin

Wu et al. 2009 Nature 462, 1056-1060   See also Guljamow et al. 2007 Current Biology.
articles

Analysis of the genome sequence of the
¯owering plant
The Arabidopsis Genome Initiative
  Authorship of this paper should be cited as `The Arabidopsis Genome Iniative'. A full list of contributors appears at the end of this paper
..........................................................................................................................................................................................................................................................................
                                                                                                                              .                                                                                                                             .

The ¯owering plant                          is an important model system for identifying genes and determining their functions.
Here we report the analysis of the genomic sequence of                 . The sequenced regions cover 115.4 megabases of the
125-megabase genome and extend into centromeric regions. The evolution of                   involved a whole-genome duplication,
followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene
transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000
families, similar to the functional diversity of            and                           the other sequenced multicellular
eukaryotes.               has many families of new proteins but also lacks several common protein families, indicating that the sets
of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the ®rst
complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes
in all eukaryotes, identifying a wide range of plant-speci®c gene functions and establishing rapid systematic ways to identify
genes for crop improvement.




                                                     C. elegans                      Drosophila

                                                                                                                                        Overview of sequencing strategy




    Arabidopsis thaliana



                                                                Arabidopsis




                                                                                          Arabidopsis
Using the Core
Wh




Whole genome tree
built using
AMPHORA
by Martin Wu and
Dongying Wu
GEBA Phylogenomic Lesson 2

   rRNA Tree is good but not perfect
 and better genomic sampling improves
         phylogenetic inference
16s Says Hyphomonas is in Rhodobacteriales




Badger et al.
2005
WGT and individual gene trees:
                Its Related to Caulobacterales




Badger et al.
2005
16s                                              WGT, 23S




Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
Zimmer. New York Times. 2009
GEBA Phylogenomic Lesson 3

     Phylogenetics guided genome
    selection (and phylogenetics in
 general) improves genome annotation
Predicting Function

• Key step in genome projects
• More accurate predictions help guide
  experimental and computational analyses
• Many diverse approaches
• All improved both by “phylogenomic” type
  analyses that integrate evolutionary
  reconstructions and understanding of how
  new functions evolve
From Eisen et
al. 1997 Nature
Medicine 3:
1076-1078.
Blast Search of H. pylori “MutS”




• Blast search pulls up Syn. sp MutS#2 with much higher p
  value than other MutS homologs
• Based on this TIGR predicted this species had mismatch
  repair
                                                     Based on Eisen
• Assumes functional constancy                       et al. 1997
                                                        Nature Medicine
                                                        3: 1076-1078.
MutL??




From http://asajj.roswellpark.org/huberman/dna_repair/mmr.html
Phylogenetic Tree of MutS Family
                               Aquae
                                   Strpy
                                       Bacsu
                                           Synsp
                                             Deira Helpy
                  Yeast
            Human                              Borbu     Metth
            Celeg

                                                            mSaco
      Yeast
    Human                                                     Yeast
    Mouse
     Arath                                                     Celeg
                                                              Human
     Arath
    Human
    Mouse
 Spombe                                                        Fly
    Yeast                                                     Xenla
                                                              Rat
                                                              Mouse
    Yeast                                                    Human
 Spombe                                                       Yeast
                                                             Neucr
                                                            Arath

                Aquae                            Trepa
                Chltr
                 DeiraTheaq
                        Thema                  BacsuBorbu              Based on Eisen,
                                        SynspStrpy                     1998 Nucl Acids
                            Ecoli
                                    Neigo                              Res 26: 4291-4300.
MutS Subfamilies
                          MSH5                        MutS2
                                   Aquae
                                       Strpy
                                           Bacsu
                                               Synsp
                                                 Deira Helpy
                        Yeast
                  Human                            Borbu        Metth
                  Celeg

                                                              mSaco
  MSH6       Yeast
           Human
           Mouse
            Arath
                                                                Yeast    MSH4
                                                                 Celeg
                                                                Human
             Arath
             Human
MSH3      Mouse
                                                                  Fly
       Spombe
          Yeast                                                 Xenla
                                                                Rat
                                                                 Mouse
          Yeast
MSH1   Spombe
                                                                Human
                                                                Yeast
                                                                         MSH2
                                                               Neucr
                                                              Arath


                      Aquae                        Trepa
                      Chltr
                        Deira
                            Theaq
                                                 BacsuBorbu
                               Thema
                                          SynspStrpy
                                Ecoli
                                      Neigo                                Based on Eisen,
                                                                           1998 Nucl Acids
                                   MutS1
                                                                           Res 26: 4291-4300.
Overlaying Functions onto Tree
                                                      MutS2
                         MSH5           Aquae
                                            Strpy
                                                Bacsu
                                                    Synsp
                                                      Deira Helpy
                          Yeast
                    Human                               Borbu     Metth
                    Celeg


  MSH6                                                          mSaco
              Yeast
            Human
            Mouse
             Arath
                                                                   YeastMSH4
                                                                    Celeg
                                                                   Human
            Arath
         Human
MSH3     Mouse
                                                                    Fly
       Spombe
          Yeast                                                   Xenla
                                                                  Rat
                                                                   Mouse
          Yeast                                                   Human
MSH1   Spombe                                                     Yeast    MSH2
                                                                 Neucr
                                                                Arath


                       Aquae                         Trepa
                       Chltr
                        DeiraTheaq
                                                   BacsuBorbu
                                Thema
                                            SynspStrpy                      Based on Eisen,
                                  Ecoli
                                        Neigo
                                                                            1998 Nucl Acids
                                    MutS1                                   Res 26: 4291-4300.
Functional Prediction Using Tree
           MSH5 - Meiotic Crossing Over               MutS2 - Unknown Functions
                                                Aquae
                                                    Strpy
                                                        Bacsu
                                                            Synsp
                                                              Deira Helpy
                                    Yeast
                              Human                             Borbu     Metth
                              Celeg

MSH6 - Nuclear                                                            mSaco
Repair
                      Yeast
Of Mismatches       Human                                                               MSH4 - Meiotic Crossing
                    Mouse                                                    Yeast      Over
                     Arath                                                    Celeg
                                                                             Human
                   Arath
MSH3 - Nuclear     Human
                 Mouse
RepairOf Loops Spombe                                                         Fly
                  Yeast                                                     Xenla
                                                                            Rat
                                                                             Mouse    MSH2 - Eukaryotic Nuclear
                   Yeast                                                    Human     Mismatch and Loop Repair
MSH1            Spombe                                                      Yeast
                                                                           Neucr
Mitochondrial
                                                                          Arath
Repair
                              Aquae                            Trepa
                              Chltr
                               DeiraTheaq
                                                             BacsuBorbu
                                       Thema
                                                      SynspStrpy
                                            Ecoli                                             Based on Eisen,
                                                  Neigo
                                                                                              1998 Nucl Acids
                              MutS1 - Bacterial Mismatch and Loop Repair                      Res 26: 4291-4300.
PHYLOGENENETIC PREDICTION OF GENE FUNCTION



            EXAMPLE A                                   METHOD                           EXAMPLE B

                     2A                         CHOOSE GENE(S) OF INTEREST                        5


                  3A                                                                          1 3 4
                       2B                                                                 2
                                                   IDENTIFY HOMOLOGS                             5
             1A 2A 1B 3B                                                                       6



                                                    ALIGN SEQUENCES

    1A      2A    3A 1B        2B      3B                                      1    2         3       4   5   6



                                                  CALCULATE GENE TREE


                             Duplication?


   1A       2A 3A 1B          2B      3B                                       1    2         3       4   5   6



                                                    OVERLAY KNOWN
                                                  FUNCTIONS ONTO TREE

                             Duplication?


            2A 3A 1B          2B      3B                                      1      2        3       4   5   6
   1A



                                                  INFER LIKELY FUNCTION
                                                  OF GENE(S) OF INTEREST
                                                                             Ambiguous
                             Duplication?



Species 1        Species 2          Species 3
 1A 1B            2A 2B              3A 3B                                     1    2         3       4   5   6


                                                    ACTUAL EVOLUTION
                                                (ASSUMED TO BE UNKNOWN)
                                                                                                                  Based on Eisen,
                                                                                                                  1998 Genome
                             Duplication
                                                                                                                  Res 8: 163-167.
Evolutionary Rate Variation

1      2
               4         6
           3
               5
Phylogenetic Prediction of Function

• Greatly improves accuracy of functional
  predictions compared to similarity alone
  (e.g., blast)
• Many surrogate methods (e.g., COGs)
• Automated phylogenetic methods now
  available
  – Sean Eddy, Steven Brenner, Kimmen Sjölander,
    etc.
• But …
Example 2: Recent Changes
• Phylogenomic functional prediction         NJ



                                                                *     **
                                                                                       V.cholerae
                                                                                                VC
                                                                                        V.cholerae
                                                                                                VC
                                                                                                  0512
                                                                                                  A1034
                                                                                         V.cholerae
                                                                                                  VC
                                                                                         V.cholerae
                                                                                                  VC
                                                                                         V.cholerae
                                                                                                 VC
                                                                                                    A0974
                                                                                                   A0068
                                                                                            V.cholerae
                                                                                                    VC0825
                                                                                                   0282


  may not work well for very newly
                                                                                      V.cholerae
                                                                                               VCA0906
                                                                                              V.cholerae
                                                                                                       VC
                                                                                                        A0979
                                                                                      V.cholerae
                                                                                               VCA1056
                                                                                         V.cholerae
                                                                                                 VC1643
                                                                                          V.cholerae
                                                                                                   VC
                                                                                                    2161
                                                                                           V.cholerae
                                                                                                   VCA0923
                                                                      **       **        V.cholerae
                                                                                                 VC0514
                                                                                            V.cholerae
                                                                                                     VC1868
                                                                                           V.cholerae
                                                                                                   VCA0773
                                                                                         V.cholerae
                                                                                                 VC1313


  evolved functions
                                                                                           V.cholerae
                                                                                                   VC1859
                                                                                        V.cholerae
                                                                                                 VC
                                                                                                  1413
                                                                                      V.cholerae
                                                                                               VCA0268
                                                                                                V.cholerae
                                                                                                        VC
                                                                                                         A0658
                                                              **                           V.cholerae
                                                                                                   VC1405
                                                                                          V.cholerae
                                                                                                   VC
                                                                                                    1298
                                                            *                               V.cholerae
                                                                                     V.cholerae
                                                                                              VCA0864
                                                                                                     VC
                                                                                                      1248
                                                                                     V.cholerae
                                                                                              VCA0176
                                                                                        V.cholerae
                                                                                                VCA0220
                                                                   **                  V.cholerae
                                                                                                VC1289
                                                                                           V.cholerae
                                                                                                   VC1069
                                                                                                     A
                                                                      **                 V.cholerae
                                                                                                 VC2439


• Can use understanding of origin of
                                                                                            V.cholerae
                                                                                                    VC967
                                                                                                      1
                                                                                            V.cholerae
                                                                                                    VCA0031
                                                                                        V.cholerae
                                                                                                 VC
                                                                                                  1898
                                                                                            V.cholerae
                                                                                                    VCA0663
                                                                                     V.cholerae
                                                                                             VC0988
                                                                                               A
                                                                                     V.cholerae
                                                                                              VC0216
                                                                                     V.cholerae
                                                                                              VC0449
                                                              *                     V.cholerae
                                                                                             VCA0008
                                                                                     V.cholerae
                                                                                              VC1406
                                                                                              V.cholerae
                                                                                                       VC
                                                                                                        1535


  novelty to better interpret these cases?
                                                                                       V.cholerae
                                                                                                VC
                                                                                                 0840
                                                                                                  B.subtilis
                                                                                                        gi2633766
                                                                                              Synechocystis
                                                                                                        sp.
                                                                                                          gi1001299
                                                                                     Synechocystis
                                                                                                sp.gi1001300
                                                                 *                            Synechocystis
                                                                                                        sp.
                                                                                                          gi1652276
                                                            *                           Synechocystis
                                                                  *                    H.pylori sp.  gi1652103
                                                                                             gi2313716
                                                                                       H.pylori
                                                                                            99 gi4155097
                                                                                    **C.jejuni
                                                             **                    C.jejuniCj1190c
                                                                                         Cj1110c
                                                                                     A.fulgidus
                                                                                             gi2649560
                                                                                     A.fulgidus
                                                                                             gi2649548
                                                                                   ** B.subtilis
                                                                                               gi2634254


• Screen genomes for genes that have
                                                                                     B.subtilis
                                                                                            gi2632630
                                                                                     B.subtilis
                                                                                             gi2635607
                                                                                     B.subtilis
                                                                                            gi2635608
                                                                                      B.subtilis
                                                                           ** ** B.subtilis  gi2635609
                                                                         **                 gi2635610
                                                                                          B.subtilis
                                                                                   E.coli        gi2635882
                                                                                   E.coligi1788195
                                                                                        gi2367378
                                                                        * **       E.coligi1788194
                                                                                       E.coli A1092
                                                                                            gi1787690
                                                                                     V.cholerae
                                                                                              VC


  changed recently
                                                                                      V.cholerae
                                                                                               VC0098
                                                                                      E.coli
                                                                                           gi1789453
                                                                                         H.pylori
                                                                                               gi2313186
                                                                                         H.pylori
                                                                                              99 gi4154603
                                                                                             C.jejuni
                                                                                     ** C.jejuni   Cj0144
                                                                                                   Cj1564
                                                                                             C.jejuni
                                                                              **         C.jejuniCj0262c
                                                                                           ** Cj1506c
                                                                                          H.pylori
                                                                                                gi2313163
                                                                        *                 H.pylori
                                                                                               99 gi4154575
                                                                                       **H.pylori
                                                                                               gi2313179
                                                                           **            H.pylori
                                                                                              99 gi4154599

–   Pseudogenes and gene loss                                                         ** C.jejuni Cj0019c
                                                                                                  C.jejuni
                                                                                              C.jejuni Cj0951c
                                                                                                    Cj0246c
                                                                                             B.subtilis
                                                                                                    gi2633374
                                                                                              T.maritima
                                                                                                      TM0014
                                                                                                   V.cholerae
                                                                                                          VC
                                                                                                 V.cholerae
                                                                                                         VC
                                                                                                            1403
                                                                                                          A1088
                                                                                                  T.pallidum
                                                                                                         gi3322777
                                                                                                         T.pallidum
                                                                        **                        T.pallidum gi3322939
                                                                                                         gi3322938
                                                                      **                           B.burgdorferi
                                                                                                            gi2688522

–   Contingency Loci
                                                                                                      T.pallidum
                                                                                                             gi3322296
                                                                                                  B.burgdorferi
                                                             *                          T.maritima gi2688521
                                                                                                TM0429
                                                                                        T.maritima
                                                                                      **T.maritima
                                                                                                TM0918
                                                                                     ** TM1428
                                                                                    T.maritima  TM0023
                                                               *                       T.maritima
                                                                                               TM1143
                                                                                    T.maritima
                                                                                             TM1146
                                                                                       P.abyssi
                                                                                              PAB1308
                                                                                       P.horikoshii
                                                                                                gi3256846
                                                                                  ** P.horikoshii
                                                                                      P.abyssi
                                                                                             PAB1336

–   Acquisition (e.g., LGT)
                                                                       **                      gi3256896
                                                              **                   **P.abyssi
                                                                                            PAB2066
                                                       **                            P.horikoshii
                                                                                              gi3258290
                                                            *                   ** P.abyssi  PAB1026
                                                                                       P.horikoshii
                                                                                                gi3256884
                                                                                **               D.radiodurans
                                                                                                          DRA00354
                                                                                                D.radiodurans
                                                                                                          DRA0353
                                                                                          ** D.radiodurans
                                                  **                **                               VC DRA0352
                                                                                            V.cholerae 1394
                                                                                           P.abyssi
                                                                                                 PAB1189
                                                                                           P.horikoshii
                                                                                                    gi3258414


–   Unusual dS/dN ratios
                                                                                    ** B.burgdorferi
                                                                                                 gi2688621
                                                                                               M.tuberculosis
                                                                                                         gi1666149
                                                                                                 V.cholerae
                                                                                                         VC
                                                                                                          0622




–   Rapid evolutionary rates
–   Recent duplications
Example 3: Non homology methods

• Many genes have homologs in other species
  but no homologs have ever been studied
  experimentally
• Non-homology methods can make functional
  predictions for these
• Example: phylogenetic profiling (extension of
  prior work of Koonin, Tatusov, Ragan, et al.)
Phylogenetic profiling basis

• Microbial genes are lost rapidly when not
  maintained by selection
• Genes can be acquired by lateral transfer
• Frequently gain and loss occurs for entire
  pathways/processes
• Thus might be able to use correlated presence/
  absence information to identify genes with
  similar functions
Non-Homology Predictions:
    Phylogenetic Profiling

• Step 1: Search all genes in
  organisms of interest against all
  other genomes

• Ask: Yes or No, is each gene
  found in each other species

• Cluster genes by distribution
  patterns (profiles)
Carboxydothermus hydrogenoformans


• Isolated from a Russian hotspring
• Thermophile (grows at 80°C)
• Anaerobic
• Grows very efficiently on CO
  (Carbon Monoxide)
• Produces hydrogen gas
• Low GC Gram positive
  (Firmicute)
• Genome Determined (Wu et al.
  2005 PLoS Genetics 1: e65. )
Homologs of Sporulation Genes




                         Wu et al. 2005
                         PLoS Genetics 1:
                         e65.
Carboxydothermus sporulates




       Wu et al. 2005 PLoS Genetics 1: e65.
Wu et al. 2005 PLoS Genetics 1: e65.
PG Profiling Works Better Using
          Orthology
GEBA Lesson 3:
  Phylogeny driven genome selection (and
 phylogenetics) improves genome annotation
• Took 56 GEBA genomes and compared results vs. 56
  randomly sampled new genomes
• Better definition of protein family sequence “patterns”
• Greatly improves “comparative” and “evolutionary”
  based predictions
• Conversion of hypothetical into conserved hypotheticals
• Linking distantly related members of protein families
• Improved non-homology prediction
GEBA Lesson 4:
 Metadata Important
GEBA Phylogenomic Lesson 5

  Phylogeny-driven genome selection
  helps discover new genetic diversity
Network of Life
Bacteria




                                       Archaea




 Eukaryotes

    FIgure from Barton, Eisen et al.
       “Evolution”, CSHL Press.
  Based on tree from Pace NR, 2003.
Protein Family Rarefaction


• Take data set of multiple complete genomes
• Identify all protein families using MCL
• Plot # of genomes vs. # of protein families
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Synapomorphies exist




Wu et al. 2009 Nature 462, 1056-1060
GEBA Phylogenomic Lesson 6

  Improves analysis of genome data
     from uncultured organisms
rRNA Phylotyping

          • Collect DNA from
            environment
          • PCR amplify rRNA
            genes using broad (so-
            called universal) primers
          • Sequence
          • Align to others
          • Infer evolutionary tree
          • Unknowns “identified”
            by placement on tree
          • Some use BLAST, but
            not as good as phylogeny
rRNA PCR

The Hidden Majority            Richness estimates




             Hugenholtz 2002         Bohannan and Hughes 2003
Metagenomics



        shotgun
             sequence
Example I:
Phylotyping w/ many genes
rRNA Phylotyping in Sargasso Sea




                          Venter et al., Science
                          304: 66. 2004
Shotgun Sequencing Allows Use of
 Alternative Anchors (e.g., RecA)




                             Venter et al., Science
                             304: 66. 2004
Weighted % of Clones




                                                                                                          0
                                                                                                              0.1250
                                                                                                                               0.2500
                                                                                                                                              0.3750
                                                                                                                                                       0.5000
                                                        Al
                                                             ph
                                                               ap
                                                                       ro
                                                                            te
                                                             Be                  ob
                                                                  ta                     ac
                                                                    pr                        te
                                                                           ot                     ria
                                                    G                           eo
                                                        am                           ba
                                                              m                           ct
                                                                   ap                           er
                                                                       ro                            ia
                                                    Ep                      te
                                                         si                      ob
                                                              lo                         ac
                                                                  np                          te
                                                                       ro                          ria
                                                         D                  te
                                                             el                  ob
                                                               ta                        ac
                                                                   pr                         te
                                                                           ot                      ria
                                                                                eo
                                                                       C             ba
                                                                           ya             ct
                                                                             no                 er
                                                                                b                    ia
                                                                                         ac
                                                                                            te
                                                                                Fi                ria
                                                                                     rm
                                                                                          ic
                                                                                              ut
                                                                       Ac                        e   s
                                                                           tin
                                                                              ob
                                                                                         ac
                                                                                            te
                                                                                     C            ria
                                                                                         hl
                                                                                            o  ro
                                                                                                   bi
                                                                                              C
                                                                                                  FB




                         Major Phylogenetic Group
                                                                                                                                                                Sargasso Phylotypes




                                                                            C
                                                                                 hl
                                                                                    o     ro
                                                                                               fle
                                                                        Sp                           xi
                                                                                iro
                                                                                     ch
                                                                                          ae
                                                                        Fu                      te
                                                                                so                   s
                                                    D                                ba
                                                        ei                                ct
                                                          no                                    er
                                                             c     oc               ia
                                                                     cu
                                                                        s-
                                                                   Eu Th
                                                                      ry erm
                                                                         ar
                                                                            ch us
                                                                   C           ae
                                                                     re           ot
                                                                        na           a
                                                                           rc
                                                                              ha
                                                                                 eo
                                                                                    ta
304: 66. 2004
                                                                                                                                                                             Shotgun Sequencing Allows Use of Other Markers




                                                                                                                   EFG




Venter et al., Science
                                                                                                                   EFTu



                                                                                                                   rRNA
                                                                                                                   RecA
                                                                                                                   RpoB
                                                                                                                   HSP70
Weighted % of Clones




                                                                                                                           0
                                                                                                                               0.1250
                                                                                                                                                0.2500
                                                                                                                                                               0.3750
                                                                                                                                                                        0.5000
                                                                         Al
                                                                              ph
                                                                                ap
                                                                                        ro
                                                                                             te
                                                                              Be                  ob
                                                                                   ta                     ac
                                                                                     pr                        te
                                                                                            ot                     ria
                                                                     G                           eo
                                                                         am                           ba
                                                                               m                           ct
                                                                                    ap                           er
                                                                                        ro                            ia
                                                                     Ep                      te
                                                                          si                      ob
                                                                               lo                         ac
                                                                                   np                          te
                                                                                        ro                          ria
                                                                          D                  te
                                                                              el                  ob
                                                                                ta                        ac
                                                                                    pr                         te
                                                                                            ot                      ria
                                                                                                 eo
                                                                                        C             ba
                                                                                            ya             ct
                                                                                              no                 er
                                                                                                 b                    ia
                                                                                                          ac
                                                                                                             te
                                                                                                 Fi                ria
                                                                                                      rm
                                                                                                           ic
                                                                                                               ut
                                                                                        Ac                        e   s
                                                                                            tin
                                                                                               ob
                                                                                                          ac
                                                                                                             te
                                                                                                      C            ria
                                                                                                          hl
                                                                                                             o  ro
                                                                                                                    bi
                                                                                                               C
                                                                                                                   FB




                                          Major Phylogenetic Group
                                                                                                                                                                                 Sargasso Phylotypes




                                                                                             C
                                                                                                  hl
                                                                                                     o     ro
                                                                                                                fle
                                                                                         Sp                           xi
                                                                                                 iro
                                                                                                      ch
                                                                                                           ae
                                                                                         Fu                      te
                                                                                                 so                   s
                                                                     D                                ba
                                                                         ei                                ct
                                                                           no                                    er
                                                                              c     oc               ia
                                                                                      cu
                                                                                         s-
                                                                                    Eu Th
                                                                                       ry erm
                                                                                          ar
                                                                                             ch us
                                                                                    C           ae
                                                                                      re           ot
                                                                                         na           a
                                                                                            rc
                                                                                               ha
                                                                                                  eo
                                                                                                     ta
                                                                                                                                                                                              Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                    EFG




Venter et al., Science 304: 66-74. 2004
                                                                                                                                    EFTu



                                                                                                                                    rRNA
                                                                                                                                    RecA
                                                                                                                                    RpoB
                                                                                                                                    HSP70
Weighted % of Clones




                                                                                                                           0
                                                                                                                               0.1250
                                                                                                                                                0.2500
                                                                                                                                                               0.3750
                                                                                                                                                                        0.5000
                                                                         Al
                                                                              ph
                                                                                ap
                                                                                        ro
                                                                                             te
                                                                              Be                  ob
                                                                                   ta                     ac
                                                                                     pr                        te
                                                                                            ot                     ria
                                                                     G                           eo
                                                                         am                           ba
                                                                               m                           ct
                                                                                    ap                           er
                                                                                        ro                            ia
                                                                     Ep                      te
                                                                          si                      ob
                                                                               lo                         ac
                                                                                   np                          te
                                                                                        ro                          ria
                                                                          D                  te
                                                                              el                  ob
                                                                                ta                        ac
                                                                                    pr                         te
                                                                                            ot                      ria
                                                                                                 eo
                                                                                        C             ba
                                                                                            ya             ct
                                                                                              no                 er
                                                                                                 b                    ia
                                                                                                          ac
                                                                                                             te
                                                                                                 Fi                ria
                                                                                                      rm
                                                                                                           ic
                                                                                                               ut
                                                                                        Ac                        e   s
                                                                                            tin
                                                                                               ob
                                                                                                          ac
                                                                                                             te
                                                                                                      C            ria
                                                                                                          hl
                                                                                                             o  ro
                                                                                                                    bi
                                                                                                                               without good


                                                                                                               C
                                                                                                                   FB




                                          Major Phylogenetic Group
                                                                                                                                                                                 Sargasso Phylotypes




                                                                                             C
                                                                                                                               Cannot be done




                                                                                                  hl
                                                                                                     o     ro
                                                                                                                fle
                                                                                         Sp                           xi
                                                                                                 iro
                                                                                                      ch
                                                                                                           ae
                                                                                         Fu                      te
                                                                                                 so                   s
                                                                     D                                ba
                                                                         ei                                ct
                                                                           no                                    er
                                                                              c                      ia
                                                                                                                               sampling of genomes




                                                                                    oc
                                                                                      cu
                                                                                         s-
                                                                                    Eu Th
                                                                                       ry erm
                                                                                          ar
                                                                                             ch us
                                                                                    C           ae
                                                                                      re           ot
                                                                                         na           a
                                                                                            rc
                                                                                               ha
                                                                                                  eo
                                                                                                     ta
                                                                                                                                                                                              Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                    EFG




Venter et al., Science 304: 66-74. 2004
                                                                                                                                    EFTu



                                                                                                                                    rRNA
                                                                                                                                    RecA
                                                                                                                                    RpoB
                                                                                                                                    HSP70
Example II:
 Binning
Metagenomics Challenge
Binning challenge
Binning challenge




Best binning method: reference genomes
Binning challenge




Best binning method: reference genomes
Binning challenge




No reference genome? What do you do?
Glassy Winged Sharpshooter
                 • Obligate xylem feeder
                 • Can transmit Pierce’s
                   Disease agent
                 • Potential bioterror agent
                 • Needs to get amino-
                   acids and other nutrients
                   from symbionts like
                   aphids
Wu et al. 2006 PLoS Biology 4: e188.
CFB Phyla
Weighted % of Clones




                                                                                                                           0
                                                                                                                               0.1250
                                                                                                                                                0.2500
                                                                                                                                                               0.3750
                                                                                                                                                                        0.5000
                                                                         Al
                                                                              ph
                                                                                ap
                                                                                        ro
                                                                                             te
                                                                              Be                  ob
                                                                                   ta                     ac
                                                                                     pr                        te
                                                                                            ot                     ria
                                                                     G                           eo
                                                                         am                           ba
                                                                               m                           ct
                                                                                    ap                           er
                                                                                        ro                            ia
                                                                     Ep                      te
                                                                          si                      ob
                                                                               lo                         ac
                                                                                   np                          te
                                                                                        ro                          ria
                                                                          D                  te
                                                                              el                  ob
                                                                                ta                        ac
                                                                                    pr                         te
                                                                                            ot                      ria
                                                                                                 eo
                                                                                        C             ba
                                                                                            ya             ct
                                                                                              no                 er
                                                                                                 b                    ia
                                                                                                          ac
                                                                                                             te
                                                                                                 Fi                ria
                                                                                                      rm
                                                                                                           ic
                                                                                                               ut
                                                                                        Ac                        e   s
                                                                                            tin
                                                                                               ob
                                                                                                          ac
                                                                                                             te
                                                                                                      C            ria
                                                                                                          hl
                                                                                                             o  ro
                                                                                                                    bi
                                                                                                               C
                                                                                                                   FB




                                          Major Phylogenetic Group
                                                                                                                                                                                 Sargasso Phylotypes




                                                                                             C
                                                                                                  hl
                                                                                                     o     ro
                                                                                                                fle
                                                                                         Sp                           xi
                                                                                                 iro
                                                                                                      ch
                                                                                                                                                                                              Phylogenetic Binning




                                                                                                           ae
                                                                                         Fu                      te
                                                                                                 so                   s
                                                                     D                                ba
                                                                         ei                                ct
                                                                           no                                    er
                                                                              c     oc               ia
                                                                                      cu
                                                                                         s-
                                                                                    Eu Th
                                                                                       ry erm
                                                                                          ar
                                                                                             ch us
                                                                                    C           ae
                                                                                      re           ot
                                                                                         na           a
                                                                                            rc
                                                                                               ha
                                                                                                  eo
                                                                                                     ta
                                                                                                                                    EFG




Venter et al., Science 304: 66-74. 2004
                                                                                                                                    EFTu



                                                                                                                                    rRNA
                                                                                                                                    RecA
                                                                                                                                    RpoB
                                                                                                                                    HSP70
Weighted % of Clones




                                                                                                                           0
                                                                                                                               0.1250
                                                                                                                                                0.2500
                                                                                                                                                               0.3750
                                                                                                                                                                        0.5000
                                                                         Al
                                                                              ph
                                                                                ap
                                                                                        ro
                                                                                             te
                                                                              Be                  ob
                                                                                   ta                     ac
                                                                                     pr                        te
                                                                                            ot                     ria
                                                                     G                           eo
                                                                         am                           ba
                                                                               m                           ct
                                                                                    ap                           er
                                                                                        ro                            ia
                                                                     Ep                      te
                                                                          si                      ob
                                                                               lo                         ac
                                                                                   np                          te
                                                                                        ro                          ria
                                                                          D                  te
                                                                              el                  ob
                                                                                ta                        ac
                                                                                    pr                         te
                                                                                            ot                      ria
                                                                                                 eo
                                                                                        C             ba
                                                                                            ya             ct
                                                                                              no                 er
                                                                                                 b                    ia
                                                                                                          ac
                                                                                                             te
                                                                                                 Fi                ria
                                                                                                      rm
                                                                                                           ic
                                                                                                               ut
                                                                                        Ac                        e   s
                                                                                            tin
                                                                                               ob
                                                                                                          ac
                                                                                                             te
                                                                                                      C            ria
                                                                                                          hl
                                                                                                             o  ro
                                                                                                                    bi
                                                                                                                               without good


                                                                                                               C
                                                                                                                   FB




                                          Major Phylogenetic Group
                                                                                                                                                                                 Sargasso Phylotypes




                                                                                             C
                                                                                                                               Cannot be done




                                                                                                  hl
                                                                                                     o     ro
                                                                                                                fle
                                                                                         Sp                           xi
                                                                                                 iro
                                                                                                      ch
                                                                                                           ae
                                                                                         Fu                      te
                                                                                                 so                   s
                                                                     D                                ba
                                                                         ei                                ct
                                                                           no                                    er
                                                                              c                      ia
                                                                                                                               sampling of genomes




                                                                                    oc
                                                                                      cu
                                                                                         s-
                                                                                    Eu Th
                                                                                       ry erm
                                                                                          ar
                                                                                             ch us
                                                                                    C           ae
                                                                                      re           ot
                                                                                         na           a
                                                                                            rc
                                                                                               ha
                                                                                                  eo
                                                                                                     ta
                                                                                                                                                                                              Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                    EFG




Venter et al., Science 304: 66-74. 2004
                                                                                                                                    EFTu



                                                                                                                                    rRNA
                                                                                                                                    RecA
                                                                                                                                    RpoB
                                                                                                                                    HSP70
Weighted % of Clones




                                                                                                                           0
                                                                                                                               0.1250
                                                                                                                                                0.2500
                                                                                                                                                               0.3750
                                                                                                                                                                        0.5000
                                                                         Al
                                                                              ph
                                                                                ap
                                                                                        ro
                                                                                             te
                                                                              Be                  ob
                                                                                   ta                     ac
                                                                                     pr                        te
                                                                                            ot                     ria
                                                                     G                           eo
                                                                         am                           ba
                                                                               m                           ct
                                                                                    ap                           er
                                                                                        ro                            ia
                                                                     Ep                      te
                                                                          si                      ob
                                                                               lo                         ac
                                                                                   np                          te
                                                                                        ro                          ria
                                                                          D                  te
                                                                              el                  ob
                                                                                ta                        ac
                                                                                    pr                         te
                                                                                            ot                      ria
                                                                                                 eo
                                                                                        C             ba
                                                                                            ya             ct
                                                                                              no                 er
                                                                                                 b                    ia
                                                                                                          ac
                                                                                                             te
                                                                                                 Fi                ria
                                                                                                      rm
                                                                                                           ic
                                                                                                               ut
                                                                                                                  e
                                                                                                                                    improves
                                                                                        Ac                            s
                                                                                            tin
                                                                                               ob
                                                                                                          ac
                                                                                                             te
                                                                                                      C            ria
                                                                                                          hl
                                                                                                             o  ro
                                                                                                                    bi
                                                                                                               C
                                                                                                                                    GEBA Project




                                                                                                                   FB




                                          Major Phylogenetic Group
                                                                                                                                                                                 Sargasso Phylotypes




                                                                                             C
                                                                                                  hl
                                                                                                     o     ro
                                                                                                                fle
                                                                                         Sp                           xi
                                                                                                 iro
                                                                                                      ch
                                                                                                           ae
                                                                                         Fu                      te
                                                                                                 so                   s
                                                                     D                                ba
                                                                         ei                                ct
                                                                           no                                    er
                                                                              c     oc               ia
                                                                                      cu
                                                                                                                                    metagenomic analysis




                                                                                         s-
                                                                                    Eu Th
                                                                                       ry erm
                                                                                          ar
                                                                                             ch us
                                                                                    C           ae
                                                                                      re           ot
                                                                                         na           a
                                                                                            rc
                                                                                               ha
                                                                                                  eo
                                                                                                     ta
                                                                                                                                                                                              Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                    EFG




Venter et al., Science 304: 66-74. 2004
                                                                                                                                    EFTu



                                                                                                                                    rRNA
                                                                                                                                    RecA
                                                                                                                                    RpoB
                                                                                                                                    HSP70
GEBA Cyano
Sequencing status (as of 01/14):
        Awaiting
Material            11
        Library                      12
        Production                   22
        Finishing                     5
        Grand
Total                  50


On-going/ Planed Activities:
    - Building Cyanobacterial Metadatabase (IMG-GOLD)

   - 10th Cyanobacterial Molecular Biology Workshop, Lake Arrowhead, CA (06/10)

   
       --> Cheryl will host: Workshop training as prep for virtual Jamboree




                                                                                   123
GEBA RNB
Plan:
Sequence multiple Root Nodule Bacteria (RNBs) across the
   planet. Pilot: 100 RNBs.                                               Beta RNB
                                                                           Cupriavidis
Goal:                                                                      Burkholderia

•   Understand BioGeographical effects on species evolution              Alpha RNB
                                                                           Azorhizobium
    and understand host-specificity.                                        Allorhizobium
                                                                           Bradyrhizobium
                                                                           Mesorhizobium
Rationale:                                                                 Rhizobium
                                                                           Sinorhizobium
•   N2 fixation by legume pastures and crops provides 65% of the N          Devosia
                                                                           Ochrobactrum
    currently utilized in agricultural production.                         Phyllobacterium
                                                                           Balneimonas-like
•   Contributes 25 to 90 million metric tones N pa.
•   Symbioses save $US 6-10 billion annually on N fertilizer.
•   Grain and animal production enhanced by fixed nitrogen supplied
    by the symbiosis.




                                                                     Nikos Kyrpides
                                                                                 124
Haloarchaeal GEBA-like
Proteobacteria
• NSF-funded        TM6
                    OS-K
                                            • At least 40
  Tree of Life      Acidobacteria
                    Termite Group             phyla of bacteria
                    OP8
  Project           Nitrospira
                                            • Genome
                    Bacteroides

• A genome          Chlorobi
                    Fibrobacteres             sequences are
                    Marine GroupA
  from each of      WS3
                    Gemmimonas                mostly from
  eight phyla       Firmicutes
                    Fusobacteria              three phyla
                    Actinobacteria
                    OP9
                    Cyanobacteria
                                            • Some other
                    Synergistes
                    Deferribacteres
                    Chrysiogenetes
                                              phyla are only
                    NKB19
                    Verrucomicrobia           sparsely
                    Chlamydia
                    OP3
                    Planctomycetes
                                              sampled
                    Spriochaetes
                    Coprothmermobacter      • Still not happy
                    OP10
                    Thermomicrobia
                    Chloroflexi
                    TM7
                    Deinococcus-Thermus
                    Dictyoglomus
                    Aquificae
Eisen & Ward, PIs   Thermudesulfobacteria
                    Thermotogae
                    OP1
                    OP11
Weighted % of Clones




                                                                                                                           0
                                                                                                                                   0.1250
                                                                                                                                                    0.2500
                                                                                                                                                                   0.3750
                                                                                                                                                                            0.5000
                                                                         Al
                                                                              ph
                                                                                ap
                                                                                        ro
                                                                                             te
                                                                              Be                  ob
                                                                                   ta                     ac
                                                                                     pr                        te
                                                                                            ot                     ria
                                                                     G                           eo
                                                                         am                           ba
                                                                               m                           ct
                                                                                    ap                           er
                                                                                        ro                            ia
                                                                     Ep                      te
                                                                          si                      ob
                                                                               lo                         ac
                                                                                   np                          te
                                                                                        ro                          ria
                                                                          D                  te
                                                                              el                  ob
                                                                                ta                        ac
                                                                                    pr                         te
                                                                                            ot                      ria
                                                                                                 eo
                                                                                        C             ba
                                                                                            ya             ct
                                                                                              no                 er
                                                                                                 b                    ia
                                                                                                          ac
                                                                                                             te
                                                                                                 Fi                ria
                                                                                                      rm
                                                                                                           ic
                                                                                                               ut
                                                                                                                  e
                                                                                                                               improves
                                                                                        Ac                            s
                                                                                            tin
                                                                                               ob
                                                                                                          ac
                                                                                                             te
                                                                                                      C            ria
                                                                                                          hl
                                                                                                             o  ro
                                                                                                                    bi
                                                                                                               C
                                                                                                                               GEBA Project




                                                                                                                   FB




                                          Major Phylogenetic Group
                                                                                                                                                                                     Sargasso Phylotypes




                                                                                                                               but only a little
                                                                                             C
                                                                                                  hl
                                                                                                     o     ro
                                                                                                                fle
                                                                                         Sp                           xi
                                                                                                 iro
                                                                                                      ch
                                                                                                           ae
                                                                                         Fu                      te
                                                                                                 so                   s
                                                                     D                                ba
                                                                         ei                                ct
                                                                           no                                    er
                                                                              c     oc               ia
                                                                                      cu
                                                                                         s-
                                                                                                                               metagenomic analysis,




                                                                                    Eu Th
                                                                                       ry erm
                                                                                          ar
                                                                                             ch us
                                                                                    C           ae
                                                                                      re           ot
                                                                                         na           a
                                                                                            rc
                                                                                               ha
                                                                                                  eo
                                                                                                     ta
                                                                                                                                                                                                  Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                        EFG




Venter et al., Science 304: 66-74. 2004
                                                                                                                                        EFTu



                                                                                                                                        rRNA
                                                                                                                                        RecA
                                                                                                                                        RpoB
                                                                                                                                        HSP70
Phylogenomics Future 1

    Need to adapt genomic and
metagenomic methods to make better
            use of data
Improving Metagenomic Analysis

• Methods
  – More automation
  – Better phylogenetic methods for short reads
  – Improved tools for using distantly related genomes
    in metagenomic analysis
• Data sets
  – Rebuild protein family models
  – New phylogenetic markers
  – Need better reference phylogenies, including HGT
• More simulations
AMPHORA




          Guide tree
AMPHORA
2
Coming
w/
More
Markers

Phylogene9c
group       Genome
Number   Gene
Number   Maker
Candidates


Archaea                 62              145415        106

Ac-nobacteria           63              267783        136

Alphaproteobacteria     94              347287        121

Betaproteobacteria      56              266362        311

Gammaproteobacteria     126             483632        118

Deltaproteobacteria     25              102115        206

Epislonproteobacteria   18              33416         455

Bacteriodes             25              71531         286

Chlamydae               13              13823         560

Chloroflexi              10              33577         323

Cyanobacteria           36              124080        590

Firmicutes              106             312309        87

Spirochaetes            18              38832         176

Thermi                  5               14160         974

Thermotogae             9               17037         684
Phylogenetic challenge




      A single tree with everything
PhylOTU: A High-Throughput Procedure Quantifies
Microbial Community Diversity and Resolves Novel Taxa
from Metagenomic Data
T h o m as J. Sh ar p t o n 1 *, Sa m a n t h a J. Riese n f el d 1 , St e v e n W. K e m b el 2 , Josh u a La d a u 1 , Ja m es P.
O ’ D w y er 2,3 , Jessica L. G re e n 2 , Jo n a t h a n A . Eise n 4 , K a t h e rin e S. Pollar d 1,5
1 The J. David Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America, 2 Center for Ecology and Evolutionary
Biology, University of Oregon, Eugene, Oregon, United States of America, 3 Institute of Integrative and Comparative Biology, University of Leeds, Leeds, United Kingdom,
4 Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America, 5 Institute for Human Genetics & Division of Biostatistics,
                                                                                                                Finding Metagenomic OTUs
University of California San Francisco, San Francisco, California, United States of America



      A bstract
     Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic
     units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in
     priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids
     amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-
     finding methods. To circumvent these limitations, we developed Ph y l OTU, a computational workflow that identifies OTUs
     from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles.
     Using simulated metagenomic data, we quantified the accuracy with which Ph y l OTU clusters reads into OTUs. Comparisons
     of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries
     identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In
     addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by
     analysis of PCR sequences. Taken together, these results suggest that Ph y l OTU enables characterization of part of the
     biosphere currently hidden from PCR-based surveys of diversity?

  Cit a tio n: Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O’Dwyer JP, et al. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community
  Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061
   E d it or: Oded Be ` , Technion-Israel Institute of Technology, Israel
                    ´ja
   Receiv e d July 22, 2010; A cce p t e d December 17, 2010; Pu b lish e d January 20, 2011
   C o p yrig h t:   2011 Sharpton et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
• Build           AMPHORA ALL
  reference
  tree with
  concatenated
  alignment
• Align reads
  that match
  any of the
  HMMs to
  concatenated
  alignment
• Place reads
  into
  reference
  tree one at a
  time
Phylogenomics Future 2

We have still only scratched the
 surface of microbial diversity
rRNA Tree of Life
Bacteria




                                       Archaea




 Eukaryotes

    Figure from Barton, Eisen et al.
    “Evolution”, CSHL Press. 2007.
 Based on tree from Pace 1997 Science
             276:734-740
Phylogenetic Diversity: Genomes




From Wu
et al. 2009
Nature
462,
1056-1060
Phylogenetic Diversity with GEBA




From Wu
et al. 2009
Nature
462,
1056-1060
Phylogenetic Diversity: Isolates




                 From Wu et al. 2009 Nature 462, 1056-1060
Phylogenetic Diversity: All




               From Wu et al. 2009 Nature 462, 1056-1060
Uncultured Lineages:


•   Get into culture
•   Enrichment cultures
•   If abundant in low diversity ecosystems
•   Flow sorting
•   Microbeads
•   Microfluidic sorting
•   Single cell amplification
GEBA uncultured
   Number of SAGs from Candidate Phyla




                                                               406
                                                   1
                                            OD1




                                                              SAR
                                                        OP3
                                                  OP1
   Site   A: Hydrothermal vent               4      1    -     -
   Site   B: Gold Mine                       6     13    2     -
   Site   C: Tropical gyres (Mesopelagic)    -      -    -     2
   Site   D: Tropical gyres (Photic zone)    1      -    -     -




Sample collections at 4 additional sites are underway.




                                                                      Phil Hugenholtz




                                                                                        142
Phylogenomics Future 3

Need Experiments from Across the
        Tree of Life too
As of 2002   Proteobacteria
             TM6
             OS-K                    • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA
             WS3
             Gemmimonas
             Firmicutes
             Fusobacteria
             Actinobacteria
             OP9
             Cyanobacteria
             Synergistes
             Deferribacteres
             Chrysiogenetes
             NKB19
             Verrucomicrobia
             Chlamydia
             OP3
             Planctomycetes
             Spriochaetes
             Coprothmermobacter
             OP10
             Thermomicrobia
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Experimental
             WS3
             Gemmimonas
             Firmicutes
                                       studies are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
             Verrucomicrobia
             Chlamydia
             OP3
             Planctomycetes
             Spriochaetes
             Coprothmermobacter
             OP10
             Thermomicrobia
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Experimental
             WS3
             Gemmimonas
             Firmicutes
                                       studies are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
                                     • Some studies
             Verrucomicrobia
             Chlamydia
             OP3
                                       in other phyla
             Planctomycetes
             Spriochaetes
             Coprothmermobacter
             OP10
             Thermomicrobia
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Genome
             WS3
             Gemmimonas
             Firmicutes
                                       sequences are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
                                     • Some other
             Verrucomicrobia
             Chlamydia
             OP3
                                       phyla are
             Planctomycetes
             Spriochaetes              only sparsely
             Coprothmermobacter
             OP10
             Thermomicrobia
                                       sampled
             Chloroflexi
             TM7
             Deinococcus-Thermus
                                     • Same trend in
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
                                       Eukaryotes
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Genome
             WS3
             Gemmimonas
             Firmicutes
                                       sequences are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
                                     • Some other
             Verrucomicrobia
             Chlamydia
             OP3
                                       phyla are
             Planctomycetes
             Spriochaetes              only sparsely
             Coprothmermobacter
             OP10
             Thermomicrobia
                                       sampled
             Chloroflexi
             TM7
             Deinococcus-Thermus
                                     • Same trend in
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
                                       Viruses
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
Proteobacteria
TM6
OS-K
                          Need
Acidobacteria
Termite Group
OP8
                          experimental
Nitrospira
Bacteroides
Chlorobi
                          studies from
Fibrobacteres
Marine GroupA
WS3
                          across the tree
Gemmimonas
Firmicutes                too
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
                              0.1
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae                Tree based on
Thermudesulfobacteria
Thermotogae             Hugenholtz (2002)
OP1                     with some
OP11                    modifications.
Proteobacteria
TM6
OS-K
                          Adopt a
Acidobacteria
Termite Group
OP8
                          Microbe
Nitrospira
Bacteroides
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
                              0.1
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae                Tree based on
Thermudesulfobacteria
Thermotogae             Hugenholtz (2002)
OP1                     with some
OP11                    modifications.
Conclusion


• Phylogenetic sampling of genomes
  improves our understanding of microbial
  diversity in many ways
• Still need
  – More biogeography
  – More phenotypic/experimental data
  – Deeper phylogenetic sampling
MICROBES
A Happy Tree of Life
Acknowledgements
•   GEBA:
     – $$: DOE-JGI, DSMZ
     – Eddy Rubin, Dongying Wu, Phil Hugenholtz, Hans-Peter Klenk, Nikos
       Kyrpides, Tanya Woyke
     – Aaron Darling, Jenna Morgan
•   iSEEM:
     – $$: GBMF
     – Katie Pollard, Jessica Green, Martin Wu, Steven Kembel, Tom
       Sharpton, Morgan Langille, Guillaume Jospin
•   aTOL
     – $$: NSF
     – Naomi Ward, Jonathan Badger, Frank Robb, Martin Wu, Dongying Wu
•   Others (not mentioned in detail)
     – $$: NSF, NIH, DOE, GBMF, DARPA, Sloan
     – Frank Robb, Craig Venter, Doug Rusch, Shibu Yooseph, Nancy Moran,
       Colleen Cavanaugh, Josh Weitz, Srijak Bhatnagar, Russell Neches,
       Lizzy Wilbanks, Marc Facciotti,

Eisen Talk for MBL Microbial Diversity Course

  • 1.
    Phylogenomics and the Originof Novelty in Microbes Jonathan A. Eisen UC Davis MBL Microbial Diversity Course July 9, 2011
  • 2.
    Phylogenomics and the Originof Novelty in Microbes Jonathan A. Eisen UC Davis MBL Microbial Diversity Course July 9, 2011
  • 3.
    My Obsessions Jonathan A. Eisen UC Davis MBL Microbial Diversity Course July 9, 2011
  • 5.
    Social Networking inScience HOME PAGE MY TIMES TODAY'S PAPER VIDEO MOST POPULAR TIMES TOPICS Welcome, fcollins Member Center Log Out Sunday, April 1, 2007 Health WORLD U.S. N.Y. / REGION BUSINESS TECHNOLOGY SCIENCE HEALTH SPORTS OPINION ARTS STYLE TRAVEL JOBS REAL ESTATE AUTOS FITNESS & NUTRITION HEALTH CARE POLICY MENTAL HEALTH & BEHAVIOR Scientist Reveals Secret of the Ocean: It's Him By NICHOLAS WADE Published: April 1, 2007 PRINT nytimes.com/sports Maverick scientist J. Craig Venter has done it again. It was just a few years SINGLE-PAGE ago that Dr. Venter announced that the human genome sequenced by Celera SAVE Genomics was in fact, mostly his own. And now, Venter has revealed a second SHARE twist in his genomic self-examination. Venter was discussing his Global SHARE Ocean Voyage, in which he used his personal yacht to collect ocean water samples from around the world. He then used large filtration units to collect How good is your bracket? Compare your tournament picks to choices from members of The New York Times sports microbes from the water samples which were then brought back to his high desk and other players. tech lab in Rockville, MD where he used the same methods that were used to Also in Sports: The Bracket Blog - all the news leading up to the Final sequence the human genome to study the genomes of the 1000s of ocean Four dwelling microbes found in each sample. In discussing the sampling methods, Venter let slip his Bats Blog: Spring training updates Play Magazine: How to build a super athlete latest attack on the standards of science – some of the samples were in fact not from the ocean, but were from microbial habitats in and on his body. “The human microbiome is the next frontier,” Dr. Venter said. “The ocean voyage was just a cover. My main goal has always been to work on the microbes that live in and on people. And now that my genome is nearly complete, why not use myself as the model for human microbiome studies as well. ” It is certainly true that in the last few years, the microbes that live in and on people have become a hot research topic. So hot that the same people who were involved in the race to sequence the human
  • 6.
  • 7.
    T. H. Dobzhansky(1973) “Nothing in biology makes sense except in the light of evolution.”
  • 8.
    Evolutionary Perspective and Comparative Biology • Comparative biology is the analysis of differences and similarities between species. • An evolutionary perspective is useful in such studies because it allows one to focus on how and why similarities and differences came to be. • In other words, biological objects have a history and understanding that history is important
  • 9.
    Phylogenomic Analysis • Evolutionaryreconstructions greatly improve genome analyses • Genome analysis greatly improves evolutionary reconstructions • There is a feedback loop such that these should be integrated
  • 10.
    Phylogenomics of Novelty Variation in Mechanisms of Mechanisms: Origin of New Patterns, Causes Functions and Effects Species Evolution
  • 11.
    rRNA Tree ofLife Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  • 12.
    Limited Sampling ofRRR Studies Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  • 13.
    Limited Sampling ofRRR Studies Haloferax Methanococcus Chlorobium Deinococcus Thermotoga Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  • 14.
    Fleischmann et al. 1995Science 269:496-512
  • 15.
    TIGR Genome Projects Methanococcus Chlorobium Deinococcus Thermotoga Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  • 16.
    Fleischmann et al. 1995Science 269:496-512
  • 17.
  • 18.
  • 19.
    Whole Genome ShotgunSequencing Warner Brothers, Inc.
  • 20.
    Whole Genome ShotgunSequencing shotgun Warner Brothers, Inc.
  • 21.
    Whole Genome ShotgunSequencing shotgun Warner Brothers, Inc.
  • 22.
    Whole Genome ShotgunSequencing shotgun Warner Brothers, Inc. sequence
  • 23.
    Whole Genome ShotgunSequencing shotgun Warner Brothers, Inc. sequence
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
    Assemble Fragments sequencer output assemble fragments Closure & Annotation
  • 29.
  • 34.
    General Steps inAnalysis of Complete Genomes • Identification/prediction of genes • Characterization of gene features • Characterization of genome features • Prediction of gene function • Prediction of pathways • Integration with known biological data • Comparative genomics
  • 35.
    Genome Sequences Have Revolutionized Microbiology • Predictions of metabolic processes • Better vaccine and drug design • New insights into mechanisms of evolution • Genomes serve as template for functional studies • New enzymes and materials for engineering and synthetic biology
  • 36.
  • 37.
    Outline • Phylogenomic Tales – Selecting genomes for sequencing – Species evolution – Predicting functions of genes – Uncultured microbes – Searching for novel organisms and genes
  • 38.
    Outline • Phylogenomic Tales – Selecting genomes for sequencing – Species evolution – Predicting functions of genes – Uncultured microbes – Searching for novel organisms and genes • All of these going to be told in context of a recent project “A Genomic Encyclopedia of Bacteria and Archaea” (aka GEBA)
  • 39.
  • 40.
    Major Microbial Sequencing Efforts • Coordinated, top-down efforts – Fungal Genome Initiative (Broad/Whitehead) – Gordon and Betty Moore Foundation Marine Microbial Genome Sequencing Project – Sanger Center Pathogen Sequencing Unit – NHGRI Human Gut Microbiome Project – NIH Human Microbiome Program • White paper or grant systems – NIAID Microbial Sequencing Centers – DOE/JGI Community Sequencing Program – DOE/JGI BER Sequencing Program – NSF/USDA Microbial Genome Sequencing • Covers lots of ground and biological diversity
  • 41.
  • 42.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 43.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 44.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 45.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 46.
    Need for TreeGuidance Well Established • Common approach within some eukaryotic groups • Many small projects funded to fill in some bacterial or archaeal gaps • Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature
  • 47.
    Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of OP8 Project Nitrospira Bacteroides bacteria Chlorobi • A genome Fibrobacteres Marine GroupA • Genome WS3 from each of Gemmimonas sequences are Firmicutes eight phyla Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are only Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Solution I: Dictyoglomus Eisen, Ward, Aquificae Thermudesulfobacteria sequence more Robb, Nelson, et Thermotogae phyla OP1 al OP11
  • 48.
    Organisms Selected Phylum Species selected Chrysiogenes Chrysiogenes arsenatis (GCA) Coprothermobacter Coprothermobacter proteolyticus (GCBP) Dictyoglomi Dictyoglomus thermophilum (GD T ) Thermodesulfobacteria Thermodesulfobacterium commune (GTC) Nitrospirae Thermodesulfovibrio yellowstonii (GTY) Thermomicrobia Thermomicrobium roseum (GTR ) Deferribacteres Geovibrio thiophilus (GGT) Synergistes Synergistes jonesii (GSJ)
  • 50.
    Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Still highly OP10 Thermomicrobia Chloroflexi biased in terms TM7 Deinococcus-Thermus Dictyoglomus of the tree Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • 51.
    Major Lineages ofActinobacteria 2.5 Actinobacteria 2.5.1 Acidimicrobidae 2.5.1 Acidimicrobidae 2.5.1.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.1 Unclassified 2.5.1.3 Acidimicrobineae 2.5.1.3.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.3.2 Acidimicrobiaceae 2.5.1.4 BD2-10 2.5.1.3 Acidimicrobineae 2.5.1.5 EB1017 2.5.2 Actinobacteridae 2.5.1.4 BD2-10 2.5.2.1 Unclassified 2.5.2.10 Ellin306/WR160 2.5.1.5 EB1017 2.5.2.11 Ellin5012 2.5.2.12 Ellin5034 2.5.2 Actinobacteridae 2.5.2.13 Frankineae 2.5.2.13.1 Unclassified 2.5.2.1 Unclassified 2.5.2.13.2 Acidothermaceae 2.5.2.10 Ellin306/WR160 2.5.2.13.3 2.5.2.13.4 Ellin6090 Frankiaceae 2.5.2.11 Ellin5012 2.5.2.13.5 2.5.2.13.6 Geodermatophilaceae Microsphaeraceae 2.5.2.12 Ellin5034 2.5.2.13.7 2.5.2.14 Sporichthyaceae Glycomyces 2.5.2.13 Frankineae 2.5.2.15 2.5.2.15.1 Intrasporangiaceae Unclassified 2.5.2.14 Glycomyces 2.5.2.15.2 2.5.2.15.3 Dermacoccus Intrasporangiaceae 2.5.2.15 Intrasporangiaceae 2.5.2.16 2.5.2.17 Kineosporiaceae Microbacteriaceae 2.5.2.16 Kineosporiaceae 2.5.2.17.1 2.5.2.17.2 Unclassified Agrococcus 2.5.2.17 Microbacteriaceae 2.5.2.17.3 2.5.2.18 Agromyces Micrococcaceae 2.5.2.18 Micrococcaceae 2.5.2.19 2.5.2.2 Micromonosporaceae Actinomyces 2.5.2.19 Micromonosporaceae 2.5.2.20 2.5.2.20.1 Propionibacterineae Unclassified 2.5.2.2 Actinomyces 2.5.2.20.2 2.5.2.20.3 Kribbella Nocardioidaceae 2.5.2.20 Propionibacterineae 2.5.2.20.4 2.5.2.21 Propionibacteriaceae Pseudonocardiaceae 2.5.2.21 Pseudonocardiaceae 2.5.2.22 2.5.2.22.1 Streptomycineae Unclassified 2.5.2.22 Streptomycineae 2.5.2.22.2 2.5.2.22.3 Kitasatospora Streptacidiphilus 2.5.2.23 Streptosporangineae 2.5.2.23 2.5.2.23.1 Streptosporangineae Unclassified 2.5.2.3 Actinomycineae 2.5.2.23.2 2.5.2.23.3 Ellin5129 Nocardiopsaceae 2.5.2.4 Actinosynnemataceae 2.5.2.23.4 2.5.2.23.5 Streptosporangiaceae Thermomonosporaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.3 Actinomycineae 2.5.2.4 Actinosynnemataceae 2.5.2.6 Brevibacteriaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.6 Brevibacteriaceae 2.5.2.7 Cellulomonadaceae 2.5.2.7 Cellulomonadaceae 2.5.2.8 Corynebacterineae 2.5.2.8 Corynebacterineae 2.5.2.8.1 Unclassified 2.5.2.8.2 Corynebacteriaceae 2.5.2.9 Dermabacteraceae 2.5.2.8.3 Dietziaceae 2.5.2.8.4 Gordoniaceae 2.5.3 Coriobacteridae 2.5.2.8.5 Mycobacteriaceae 2.5.2.8.6 Rhodococcus 2.5.3.1 Unclassified 2.5.2.8.7 Rhodococcus 2.5.2.8.8 Rhodococcus 2.5.3.2 Atopobiales 2.5.2.9 Dermabacteraceae 2.5.2.9.1 Unclassified 2.5.3.3 Coriobacteriales 2.5.2.9.2 Brachybacterium 2.5.2.9.3 Dermabacter 2.5.3.4 Eggerthellales 2.5.3 Coriobacteridae 2.5.3.1 Unclassified 2.5.4 OPB41 2.5.3.2 Atopobiales 2.5.3.3 Coriobacteriales 2.5.5 PK1 2.5.3.4 Eggerthellales 2.5.4 OPB41 2.5.6 Rubrobacteridae 2.5.5 PK1 2.5.6 Rubrobacteridae 2.5.6.1 Unclassified 2.5.6.1 Unclassified 2.5.6.2 "Thermoleiphilaceae 2.5.6.2 "Thermoleiphilaceae 2.5.6.2.1 Unclassified 2.5.6.2.2 Conexibacter 2.5.6.3 MC47 2.5.6.2.3 XGE514 2.5.6.3 MC47 2.5.6.4 Rubrobacteraceae 2.5.6.4 Rubrobacteraceae
  • 52.
    Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Archaea TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • 53.
    Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Eukaryotes TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • 54.
    Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Viruses TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • 55.
    Proteobacteria • GEBA TM6 OS-K • At least 40 Acidobacteria • A genomic Termite Group OP8 phyla of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria Fibrobacteres Marine GroupA sequences are and archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter OP10 • Solution: Really Thermomicrobia Chloroflexi Fill in the Tree TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Eisen & Ward, PIs Thermotogae OP1 OP11
  • 56.
  • 57.
    GEBA Pilot Project:Components • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow) • Project management (David Bruce, Eileen Dalin, Lynne Goodwin) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk) • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng) • Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al) • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla) • Adopt a microbe education project (Cheryl Kerfeld) • Outreach (David Gilbert) • $$$ (DOE, Eddy Rubin, Jim Bristow)
  • 58.
    rRNA Tree ofLife FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • 62.
    B: Ac tin ob ac te B: ria # of Genomes Am (H in igh 10 15 20 25 30 35 0 5 an G a C B: B: er ) Ba Aq ob ct uif ia B: ero ica B: e D Ch ide B: e ef lo te r s D rri ofl ef ba e B: e c xi B: De B rrib ter Ep lta : D act es si Pr ei er lo o n es n te oc Pr ob oc ot a ci B: e ct G B: oba eri am B F ct a : ir e B: m Fu mi ria a G P so cut em ro ba e t c s B: ma eo te ba ri H tim c a a t B: loa ona eri a B: Pl nae de an r te Th c o s Phyla er B: to bia m S m le y s B: od piro ce es c te T u h B: he lfo ae s rm b te GEBA Pilot Target List Th o a s er de cte m s ri u a A: ove lfo H n bi A: alo abu a A: A b la M rc ac e A: et ha te M han eo ria et g ha ob lob ac i A: no te m r A: The icr ia Th rm obi er oc a m oc op ci ro te i
  • 63.
    GEBA Pilot ProjectOverview • Identify major branches in rRNA tree for which no genomes are available • Identify those with a cultured representative in DSMZ • DSMZ grew > 200 of these and prepped DNA • Sequence and finish 200+ • Annotate, analyze, release data • Assess benefits of tree guided sequencing • 1st paper Wu et al in Nature Dec 2009
  • 64.
    Assess Benefits ofGEBA • All genomes have some value • But what, if any, is the benefit of tree- guided sequencing over other selection methods • Lessons for other large scale microbial genome projects?
  • 65.
    GEBA Phylogenomic Lesson1 The rRNA Tree of Life is a Useful Tool for Identifying Phylogenetically Novel Genomes
  • 66.
    rRNA Tree ofLife Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  • 67.
    The Core GetsSmall ...
  • 68.
  • 69.
  • 70.
    Network of Life Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • 71.
    T. roseum mobilemotility element Wu et al doi:10.1371/journal.pone.0004207
  • 72.
    Phylogenetic Distribution Novelty: Bacterial Actin Related Protein 2"#3)&4&*&& !"#*)$*),+% 5"#$-.-6&0&1- !"#$%,$-%)( 7"#0(1.8-9& !"#$''+-+,',! 5"#:1,)*&$/0 !"#&$,%+)+-+ !"#$% !"#$%&'()*&& !"#$%&'(%() (( +"#,-.(/01 !"#*+,**'+( ;"#01,&-*0 !"#%*+$--( <"#$-.-3.1%&0 !"#%',&'-+) ') 2"#$&*-.-1 !"#$'(-%%+&$ ="#$.1001 !"#-*$+$(&( !&'( $++ >"#0$1,/%1.&0 !"#&$**+),)-! *$ $++ ;"#01,&-*0 !"#*+,$*'( '* 5"#:1,)*&$/0 !"#&$,%+%-%% $++ 5"#$-.-6&0&1- !"#',&+$)* !&') ?"#@-%1*)A10(-. !"#&%'%&*%* $++ B"#A1%%/0# "#%*,-&*'( )* 2"#*-)').@1*0 !"#*-&'''(+ 5"#$-.-6&0&1- !"#',&&*&* !&'* $++ ?"#@-%1*)A10(-. !"#$)),)*%, $++ ;"#01,&-*0 !"#*+,$*),! ;"#)$C.1$-/@ !"#&&),(*((- +!&' 5"#$-.-6&0&1- !"#$++-&%%! ), ."#,1(-*0 !"#$'-+*$((&! !&', (( !"#(C1%&1*1 !"#$-,(%'+-! (% 5"#$-.-6&0&1- !"#$,+$(,& $++ 5"#:1,)*&$/0 !"#&$,%+-,(,! !&'- -) ?"#4&0$)&4-/@ !"#''-+&%$- )% ?"#@-%1*)A10(-. !"#$)),),%) () 5"#$-.-6&0&1- !"#',&,$$% $++ ?"#C1*0-*&&!"#&$-*$ $(&$ !&'. $++ D"#01(&61 !"#$-&'*)%&+! !"#(C1%&1*1!"#$-%$ $),) !&'/ ?"#@-%1*)A1(-. !"#$((&+,*- $++ <"#@/0$/%/0 !"#&&'&%'*(, !&'(0 +/*! Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor Kunin Wu et al. 2009 Nature 462, 1056-1060 See also Guljamow et al. 2007 Current Biology.
  • 73.
    articles Analysis of thegenome sequence of the ¯owering plant The Arabidopsis Genome Initiative Authorship of this paper should be cited as `The Arabidopsis Genome Iniative'. A full list of contributors appears at the end of this paper .......................................................................................................................................................................................................................................................................... . . The ¯owering plant is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of . The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of and the other sequenced multicellular eukaryotes. has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the ®rst complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-speci®c gene functions and establishing rapid systematic ways to identify genes for crop improvement. C. elegans Drosophila Overview of sequencing strategy Arabidopsis thaliana Arabidopsis Arabidopsis
  • 74.
  • 75.
    Wh Whole genome tree builtusing AMPHORA by Martin Wu and Dongying Wu
  • 77.
    GEBA Phylogenomic Lesson2 rRNA Tree is good but not perfect and better genomic sampling improves phylogenetic inference
  • 78.
    16s Says Hyphomonasis in Rhodobacteriales Badger et al. 2005
  • 79.
    WGT and individualgene trees: Its Related to Caulobacterales Badger et al. 2005
  • 80.
    16s WGT, 23S Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
  • 81.
    Zimmer. New YorkTimes. 2009
  • 82.
    GEBA Phylogenomic Lesson3 Phylogenetics guided genome selection (and phylogenetics in general) improves genome annotation
  • 83.
    Predicting Function • Keystep in genome projects • More accurate predictions help guide experimental and computational analyses • Many diverse approaches • All improved both by “phylogenomic” type analyses that integrate evolutionary reconstructions and understanding of how new functions evolve
  • 84.
    From Eisen et al.1997 Nature Medicine 3: 1076-1078.
  • 85.
    Blast Search ofH. pylori “MutS” • Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs • Based on this TIGR predicted this species had mismatch repair Based on Eisen • Assumes functional constancy et al. 1997 Nature Medicine 3: 1076-1078.
  • 86.
  • 87.
    Phylogenetic Tree ofMutS Family Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg mSaco Yeast Human Yeast Mouse Arath Celeg Human Arath Human Mouse Spombe Fly Yeast Xenla Rat Mouse Yeast Human Spombe Yeast Neucr Arath Aquae Trepa Chltr DeiraTheaq Thema BacsuBorbu Based on Eisen, SynspStrpy 1998 Nucl Acids Ecoli Neigo Res 26: 4291-4300.
  • 88.
    MutS Subfamilies MSH5 MutS2 Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg mSaco MSH6 Yeast Human Mouse Arath Yeast MSH4 Celeg Human Arath Human MSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse Yeast MSH1 Spombe Human Yeast MSH2 Neucr Arath Aquae Trepa Chltr Deira Theaq BacsuBorbu Thema SynspStrpy Ecoli Neigo Based on Eisen, 1998 Nucl Acids MutS1 Res 26: 4291-4300.
  • 89.
    Overlaying Functions ontoTree MutS2 MSH5 Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg MSH6 mSaco Yeast Human Mouse Arath YeastMSH4 Celeg Human Arath Human MSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse Yeast Human MSH1 Spombe Yeast MSH2 Neucr Arath Aquae Trepa Chltr DeiraTheaq BacsuBorbu Thema SynspStrpy Based on Eisen, Ecoli Neigo 1998 Nucl Acids MutS1 Res 26: 4291-4300.
  • 90.
    Functional Prediction UsingTree MSH5 - Meiotic Crossing Over MutS2 - Unknown Functions Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg MSH6 - Nuclear mSaco Repair Yeast Of Mismatches Human MSH4 - Meiotic Crossing Mouse Yeast Over Arath Celeg Human Arath MSH3 - Nuclear Human Mouse RepairOf Loops Spombe Fly Yeast Xenla Rat Mouse MSH2 - Eukaryotic Nuclear Yeast Human Mismatch and Loop Repair MSH1 Spombe Yeast Neucr Mitochondrial Arath Repair Aquae Trepa Chltr DeiraTheaq BacsuBorbu Thema SynspStrpy Ecoli Based on Eisen, Neigo 1998 Nucl Acids MutS1 - Bacterial Mismatch and Loop Repair Res 26: 4291-4300.
  • 92.
    PHYLOGENENETIC PREDICTION OFGENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 2A 3A 1B 2B 3B 1 2 3 4 5 6 1A INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 1A 1B 2A 2B 3A 3B 1 2 3 4 5 6 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Eisen, 1998 Genome Duplication Res 8: 163-167.
  • 93.
  • 94.
    Phylogenetic Prediction ofFunction • Greatly improves accuracy of functional predictions compared to similarity alone (e.g., blast) • Many surrogate methods (e.g., COGs) • Automated phylogenetic methods now available – Sean Eddy, Steven Brenner, Kimmen Sjölander, etc. • But …
  • 95.
    Example 2: RecentChanges • Phylogenomic functional prediction NJ * ** V.cholerae VC V.cholerae VC 0512 A1034 V.cholerae VC V.cholerae VC V.cholerae VC A0974 A0068 V.cholerae VC0825 0282 may not work well for very newly V.cholerae VCA0906 V.cholerae VC A0979 V.cholerae VCA1056 V.cholerae VC1643 V.cholerae VC 2161 V.cholerae VCA0923 ** ** V.cholerae VC0514 V.cholerae VC1868 V.cholerae VCA0773 V.cholerae VC1313 evolved functions V.cholerae VC1859 V.cholerae VC 1413 V.cholerae VCA0268 V.cholerae VC A0658 ** V.cholerae VC1405 V.cholerae VC 1298 * V.cholerae V.cholerae VCA0864 VC 1248 V.cholerae VCA0176 V.cholerae VCA0220 ** V.cholerae VC1289 V.cholerae VC1069 A ** V.cholerae VC2439 • Can use understanding of origin of V.cholerae VC967 1 V.cholerae VCA0031 V.cholerae VC 1898 V.cholerae VCA0663 V.cholerae VC0988 A V.cholerae VC0216 V.cholerae VC0449 * V.cholerae VCA0008 V.cholerae VC1406 V.cholerae VC 1535 novelty to better interpret these cases? V.cholerae VC 0840 B.subtilis gi2633766 Synechocystis sp. gi1001299 Synechocystis sp.gi1001300 * Synechocystis sp. gi1652276 * Synechocystis * H.pylori sp. gi1652103 gi2313716 H.pylori 99 gi4155097 **C.jejuni ** C.jejuniCj1190c Cj1110c A.fulgidus gi2649560 A.fulgidus gi2649548 ** B.subtilis gi2634254 • Screen genomes for genes that have B.subtilis gi2632630 B.subtilis gi2635607 B.subtilis gi2635608 B.subtilis ** ** B.subtilis gi2635609 ** gi2635610 B.subtilis E.coli gi2635882 E.coligi1788195 gi2367378 * ** E.coligi1788194 E.coli A1092 gi1787690 V.cholerae VC changed recently V.cholerae VC0098 E.coli gi1789453 H.pylori gi2313186 H.pylori 99 gi4154603 C.jejuni ** C.jejuni Cj0144 Cj1564 C.jejuni ** C.jejuniCj0262c ** Cj1506c H.pylori gi2313163 * H.pylori 99 gi4154575 **H.pylori gi2313179 ** H.pylori 99 gi4154599 – Pseudogenes and gene loss ** C.jejuni Cj0019c C.jejuni C.jejuni Cj0951c Cj0246c B.subtilis gi2633374 T.maritima TM0014 V.cholerae VC V.cholerae VC 1403 A1088 T.pallidum gi3322777 T.pallidum ** T.pallidum gi3322939 gi3322938 ** B.burgdorferi gi2688522 – Contingency Loci T.pallidum gi3322296 B.burgdorferi * T.maritima gi2688521 TM0429 T.maritima **T.maritima TM0918 ** TM1428 T.maritima TM0023 * T.maritima TM1143 T.maritima TM1146 P.abyssi PAB1308 P.horikoshii gi3256846 ** P.horikoshii P.abyssi PAB1336 – Acquisition (e.g., LGT) ** gi3256896 ** **P.abyssi PAB2066 ** P.horikoshii gi3258290 * ** P.abyssi PAB1026 P.horikoshii gi3256884 ** D.radiodurans DRA00354 D.radiodurans DRA0353 ** D.radiodurans ** ** VC DRA0352 V.cholerae 1394 P.abyssi PAB1189 P.horikoshii gi3258414 – Unusual dS/dN ratios ** B.burgdorferi gi2688621 M.tuberculosis gi1666149 V.cholerae VC 0622 – Rapid evolutionary rates – Recent duplications
  • 96.
    Example 3: Nonhomology methods • Many genes have homologs in other species but no homologs have ever been studied experimentally • Non-homology methods can make functional predictions for these • Example: phylogenetic profiling (extension of prior work of Koonin, Tatusov, Ragan, et al.)
  • 97.
    Phylogenetic profiling basis •Microbial genes are lost rapidly when not maintained by selection • Genes can be acquired by lateral transfer • Frequently gain and loss occurs for entire pathways/processes • Thus might be able to use correlated presence/ absence information to identify genes with similar functions
  • 98.
    Non-Homology Predictions: Phylogenetic Profiling • Step 1: Search all genes in organisms of interest against all other genomes • Ask: Yes or No, is each gene found in each other species • Cluster genes by distribution patterns (profiles)
  • 99.
    Carboxydothermus hydrogenoformans • Isolatedfrom a Russian hotspring • Thermophile (grows at 80°C) • Anaerobic • Grows very efficiently on CO (Carbon Monoxide) • Produces hydrogen gas • Low GC Gram positive (Firmicute) • Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. )
  • 100.
    Homologs of SporulationGenes Wu et al. 2005 PLoS Genetics 1: e65.
  • 101.
    Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.
  • 102.
    Wu et al.2005 PLoS Genetics 1: e65.
  • 103.
    PG Profiling WorksBetter Using Orthology
  • 104.
    GEBA Lesson 3: Phylogeny driven genome selection (and phylogenetics) improves genome annotation • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology prediction
  • 105.
    GEBA Lesson 4: Metadata Important
  • 106.
    GEBA Phylogenomic Lesson5 Phylogeny-driven genome selection helps discover new genetic diversity
  • 107.
    Network of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • 108.
    Protein Family Rarefaction •Take data set of multiple complete genomes • Identify all protein families using MCL • Plot # of genomes vs. # of protein families
  • 109.
    Wu et al.2009 Nature 462, 1056-1060
  • 110.
    Wu et al.2009 Nature 462, 1056-1060
  • 111.
    Wu et al.2009 Nature 462, 1056-1060
  • 112.
    Wu et al.2009 Nature 462, 1056-1060
  • 113.
    Wu et al.2009 Nature 462, 1056-1060
  • 114.
    Synapomorphies exist Wu etal. 2009 Nature 462, 1056-1060
  • 115.
    GEBA Phylogenomic Lesson6 Improves analysis of genome data from uncultured organisms
  • 116.
    rRNA Phylotyping • Collect DNA from environment • PCR amplify rRNA genes using broad (so- called universal) primers • Sequence • Align to others • Infer evolutionary tree • Unknowns “identified” by placement on tree • Some use BLAST, but not as good as phylogeny
  • 117.
    rRNA PCR The HiddenMajority Richness estimates Hugenholtz 2002 Bohannan and Hughes 2003
  • 118.
    Metagenomics shotgun sequence
  • 119.
  • 120.
    rRNA Phylotyping inSargasso Sea Venter et al., Science 304: 66. 2004
  • 121.
    Shotgun Sequencing AllowsUse of Alternative Anchors (e.g., RecA) Venter et al., Science 304: 66. 2004
  • 122.
    Weighted % ofClones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta 304: 66. 2004 Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science EFTu rRNA RecA RpoB HSP70
  • 123.
    Weighted % ofClones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 124.
    Weighted % ofClones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 125.
  • 126.
  • 127.
  • 128.
    Binning challenge Best binningmethod: reference genomes
  • 129.
    Binning challenge Best binningmethod: reference genomes
  • 130.
    Binning challenge No referencegenome? What do you do?
  • 131.
    Glassy Winged Sharpshooter • Obligate xylem feeder • Can transmit Pierce’s Disease agent • Potential bioterror agent • Needs to get amino- acids and other nutrients from symbionts like aphids
  • 132.
    Wu et al.2006 PLoS Biology 4: e188.
  • 133.
  • 134.
    Weighted % ofClones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch Phylogenetic Binning ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 135.
    Weighted % ofClones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 136.
    Weighted % ofClones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut e improves Ac s tin ob ac te C ria hl o ro bi C GEBA Project FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu metagenomic analysis s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 137.
    GEBA Cyano Sequencing status(as of 01/14): Awaiting
Material 11 Library 12 Production 22 Finishing 5 Grand
Total 50 On-going/ Planed Activities: - Building Cyanobacterial Metadatabase (IMG-GOLD) - 10th Cyanobacterial Molecular Biology Workshop, Lake Arrowhead, CA (06/10) --> Cheryl will host: Workshop training as prep for virtual Jamboree 123
  • 138.
    GEBA RNB Plan: Sequence multipleRoot Nodule Bacteria (RNBs) across the planet. Pilot: 100 RNBs. Beta RNB Cupriavidis Goal: Burkholderia • Understand BioGeographical effects on species evolution Alpha RNB Azorhizobium and understand host-specificity. Allorhizobium Bradyrhizobium Mesorhizobium Rationale: Rhizobium Sinorhizobium • N2 fixation by legume pastures and crops provides 65% of the N Devosia Ochrobactrum currently utilized in agricultural production. Phyllobacterium Balneimonas-like • Contributes 25 to 90 million metric tones N pa. • Symbioses save $US 6-10 billion annually on N fertilizer. • Grain and animal production enhanced by fixed nitrogen supplied by the symbiosis. Nikos Kyrpides 124
  • 139.
  • 140.
    Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Still not happy OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • 141.
    Weighted % ofClones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut e improves Ac s tin ob ac te C ria hl o ro bi C GEBA Project FB Major Phylogenetic Group Sargasso Phylotypes but only a little C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- metagenomic analysis, Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 142.
    Phylogenomics Future 1 Need to adapt genomic and metagenomic methods to make better use of data
  • 143.
    Improving Metagenomic Analysis •Methods – More automation – Better phylogenetic methods for short reads – Improved tools for using distantly related genomes in metagenomic analysis • Data sets – Rebuild protein family models – New phylogenetic markers – Need better reference phylogenies, including HGT • More simulations
  • 144.
    AMPHORA Guide tree
  • 145.
    AMPHORA
2
Coming
w/
More
Markers Phylogene9c
group Genome
Number Gene
Number Maker
Candidates Archaea 62 145415 106 Ac-nobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684
  • 146.
    Phylogenetic challenge A single tree with everything
  • 147.
    PhylOTU: A High-ThroughputProcedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data T h o m as J. Sh ar p t o n 1 *, Sa m a n t h a J. Riese n f el d 1 , St e v e n W. K e m b el 2 , Josh u a La d a u 1 , Ja m es P. O ’ D w y er 2,3 , Jessica L. G re e n 2 , Jo n a t h a n A . Eise n 4 , K a t h e rin e S. Pollar d 1,5 1 The J. David Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America, 2 Center for Ecology and Evolutionary Biology, University of Oregon, Eugene, Oregon, United States of America, 3 Institute of Integrative and Comparative Biology, University of Leeds, Leeds, United Kingdom, 4 Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America, 5 Institute for Human Genetics & Division of Biostatistics, Finding Metagenomic OTUs University of California San Francisco, San Francisco, California, United States of America A bstract Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU- finding methods. To circumvent these limitations, we developed Ph y l OTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which Ph y l OTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that Ph y l OTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity? Cit a tio n: Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O’Dwyer JP, et al. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061 E d it or: Oded Be ` , Technion-Israel Institute of Technology, Israel ´ja Receiv e d July 22, 2010; A cce p t e d December 17, 2010; Pu b lish e d January 20, 2011 C o p yrig h t: 2011 Sharpton et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
  • 148.
    • Build AMPHORA ALL reference tree with concatenated alignment • Align reads that match any of the HMMs to concatenated alignment • Place reads into reference tree one at a time
  • 149.
    Phylogenomics Future 2 Wehave still only scratched the surface of microbial diversity
  • 150.
    rRNA Tree ofLife Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  • 151.
    Phylogenetic Diversity: Genomes FromWu et al. 2009 Nature 462, 1056-1060
  • 152.
    Phylogenetic Diversity withGEBA From Wu et al. 2009 Nature 462, 1056-1060
  • 153.
    Phylogenetic Diversity: Isolates From Wu et al. 2009 Nature 462, 1056-1060
  • 154.
    Phylogenetic Diversity: All From Wu et al. 2009 Nature 462, 1056-1060
  • 155.
    Uncultured Lineages: • Get into culture • Enrichment cultures • If abundant in low diversity ecosystems • Flow sorting • Microbeads • Microfluidic sorting • Single cell amplification
  • 156.
    GEBA uncultured Number of SAGs from Candidate Phyla 406 1 OD1 SAR OP3 OP1 Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - - Sample collections at 4 additional sites are underway. Phil Hugenholtz 142
  • 157.
    Phylogenomics Future 3 NeedExperiments from Across the Tree of Life too
  • 158.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 159.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Experimental WS3 Gemmimonas Firmicutes studies are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 160.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Experimental WS3 Gemmimonas Firmicutes studies are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some studies Verrucomicrobia Chlamydia OP3 in other phyla Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 161.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Same trend in Dictyoglomus Aquificae Thermudesulfobacteria Eukaryotes Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 162.
    As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Same trend in Dictyoglomus Aquificae Thermudesulfobacteria Viruses Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 163.
    Proteobacteria TM6 OS-K Need Acidobacteria Termite Group OP8 experimental Nitrospira Bacteroides Chlorobi studies from Fibrobacteres Marine GroupA WS3 across the tree Gemmimonas Firmicutes too Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes 0.1 Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Tree based on Thermudesulfobacteria Thermotogae Hugenholtz (2002) OP1 with some OP11 modifications.
  • 164.
    Proteobacteria TM6 OS-K Adopt a Acidobacteria Termite Group OP8 Microbe Nitrospira Bacteroides Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes 0.1 Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Tree based on Thermudesulfobacteria Thermotogae Hugenholtz (2002) OP1 with some OP11 modifications.
  • 165.
    Conclusion • Phylogenetic samplingof genomes improves our understanding of microbial diversity in many ways • Still need – More biogeography – More phenotypic/experimental data – Deeper phylogenetic sampling
  • 167.
  • 168.
    A Happy Treeof Life
  • 169.
    Acknowledgements • GEBA: – $$: DOE-JGI, DSMZ – Eddy Rubin, Dongying Wu, Phil Hugenholtz, Hans-Peter Klenk, Nikos Kyrpides, Tanya Woyke – Aaron Darling, Jenna Morgan • iSEEM: – $$: GBMF – Katie Pollard, Jessica Green, Martin Wu, Steven Kembel, Tom Sharpton, Morgan Langille, Guillaume Jospin • aTOL – $$: NSF – Naomi Ward, Jonathan Badger, Frank Robb, Martin Wu, Dongying Wu • Others (not mentioned in detail) – $$: NSF, NIH, DOE, GBMF, DARPA, Sloan – Frank Robb, Craig Venter, Doug Rusch, Shibu Yooseph, Nancy Moran, Colleen Cavanaugh, Josh Weitz, Srijak Bhatnagar, Russell Neches, Lizzy Wilbanks, Marc Facciotti,

Editor's Notes