EveMicrobial Phylogenomics (EVE161) Class 9Jonathan Eisen
Microbial Phylogenomics (EVE161) at UC Davis Spring 2016. Co-taught by Jonathan Eisen and Holly Ganz.
Class 9:
Era II: rRNA Case Study: Built Environment Metaanalysis
EveMicrobial Phylogenomics (EVE161) Class 9Jonathan Eisen
Microbial Phylogenomics (EVE161) at UC Davis Spring 2016. Co-taught by Jonathan Eisen and Holly Ganz.
Class 9:
Era II: rRNA Case Study: Built Environment Metaanalysis
Understanding the origin and evolution of the eukaryotic cell and the full diversity of eukaryotes is relevant to many biological disciplines.
However, our current understanding of eukaryotic genomes is extremely biased, leading to a skewed view of eukaryotic biology.
We argue that a phylogeny-driven initiative to cover the full eukaryotic diversity is needed to overcome this bias.
•
◦There is an important bias in eukaryotic knowledge, affecting cultures and genomes.
Eukaryotic genomics are biased towards multicellular organisms and their parasites.
◦A phylogeny-driven initiative is needed to overcome the eukaryotic genomic bias.
◦We propose to sequence neglected cultures and increase culturing efforts.
◦Single-cell genomics should be embraced as a tool to explore eukaryotic diversity
Understanding the origin and evolution of the eukaryotic cell and the full diversity of eukaryotes is relevant to many biological disciplines.
However, our current understanding of eukaryotic genomes is extremely biased, leading to a skewed view of eukaryotic biology.
We argue that a phylogeny-driven initiative to cover the full eukaryotic diversity is needed to overcome this bias.
•
◦There is an important bias in eukaryotic knowledge, affecting cultures and genomes.
Eukaryotic genomics are biased towards multicellular organisms and their parasites.
◦A phylogeny-driven initiative is needed to overcome the eukaryotic genomic bias.
◦We propose to sequence neglected cultures and increase culturing efforts.
◦Single-cell genomics should be embraced as a tool to explore eukaryotic diversity
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Jonathan Eisen
Lang JM, Darling AE, Eisen JA (2013) Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees and Supermatrices. PLoS ONE 8(4): e62510. doi:10.1371/journal.pone.0062510
Microbial Metagenomics Drives a New CyberinfrastructureLarry Smarr
06.03.03
Invited Talk
School of Biological Sciences
University of California, Irvine
Title: Microbial Metagenomics Drives a New Cyberinfrastructure
Irvine, CA
"Keeping up with the plant destroyers." My talk at The Royal Society, 7 March...Sophien Kamoun
Tackling emerging threats to animal health, food security and ecosystem resilience, The Royal Society, Monday 7 – Tuesday 8 March 2016. https://royalsociety.org/events/2016/03/emerging-fungal-threats/
Using Supercomputers and Supernetworks to Explore the Ocean of LifeLarry Smarr
07.06.07
Director's Colloquium
Los Alamos National Laboratory
Title: Using Supercomputers and Supernetworks to Explore the Ocean of Life
Los Alamos, NM
My talk at BASF Science Symposium: sustainable food chain - from field to table, Jun 23-24, 2015, Chicago.
Notes and acknowledgements at http://kamounlab.tumblr.com/post/122151022390/plant-pathology-in-the-post-genomics-era
PART I INTRODUCTION TO THE CELL 1
Chapter 1 Cells and Genomes 1
Chapter 2 Cell Chemistry and Bioenergetics 43
Chapter 3 Proteins 109
PART II BASIC GENETIC MECHANISMS 173
Chapter 4 DNA, Chromosomes, and Genomes 173
Chapter 5 DNA Replication, Repair, and Recombination 237
Chapter 6 How Cells Read the Genome: From DNA to Protein 299
Chapter 7 Control of Gene Expression 369
PART III WAYS OF WORKING WITH CELLS 439
Chapter 8 Analyzing Cells, Molecules, and Systems 439
Chapter 9 Visualizing Cells 529
PART IV INTERNAL ORGANIZATION OF THE CELL 565
Chapter 10 Membrane Structure 565
Chapter 11 Membrane Transport of Small Molecules and the Electrical
Properties of Membranes 597
Chapter 12 Intracellular Compartments and Protein Sorting 641
Chapter 13 Intracellular Membrane Traffic 695
Chapter 14 Energy Conversion: Mitochondria and Chloroplasts 753
Chapter 15 Cell Signaling 813
Chapter 16 The Cytoskeleton 889
Chapter 17 The Cell Cycle 963
Chapter 18 Cell Death 1021
PART V CELLS IN THEIR SOCIAL CONTEXT 1035
Chapter 19 Cell Junctions and the Extracellular Matrix 1035
Chapter 20 Cancer 1091
Chapter 21 Development of Multicellular Organisms 1145
Chapter 22 Stem Cells and Tissue Renewal 1217
Chapter 23 Pathogens and Infection 1263
Chapter 24 The Innate and Adaptive Immune Systems 1297
Glossary G: 1
Index I: 1
Tables The Genetic Code, Amino Acids T: 1
Using the Semantic Web to Support Ecoinformaticsebiquity
We describe our on-going work in using the semantic web in support of ecological informatics, and demonstrate a distributed platform for constructing end-to-end use cases. Specifically, we describe ELVIS (the Ecosystem Location Visualization and Information System), a suite of tools for constructing food webs for a given location, and Triple Shop, a SPARQL query interface which allows scientists to semi-automatically construct distributed datasets relevant to the queries they want to ask. ELVIS functionality is exposed as a collection of web services, and all input and output data is expressed in OWL, thereby enabling its integration with Triple Shop and other semantic web resources.
Innovations in Sequencing & Bioinformatics
Talk for
Healthy Central Valley Together Research Workshop
Jonathan A. Eisen University of California, Davis
January 31, 2024 linktr.ee/jonathaneisen
Thoughts on UC Davis' COVID Current ActionsJonathan Eisen
Slides I used for a presentation to Chancellor May's leadership council about the current state of UC Davis' response to COVID and how it could be improved
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Microbial Phylogenomics (EVE161) Class 5
1. Lecture 3:
EVE 161:
Microbial Phylogenomics
Lecture #5:
Modern View of Tree of Life
UC Davis, Winter 2016
Instructors: Jonathan Eisen & Holly Ganz
2. Where we are going and where we have been
• Previous lecture:
!4. Background on Phylogeny
• Current Lecture:
!5. Modern view of Tree of Life
• Next Lecture:
!6. rRNA from environments
!2
3. Three papers for today
Syst. Biol. 59(5):518–533, 2010
c⃝ The Author(s) 2010. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.
For Permissions, please email: journals.permissions@oxfordjournals.org
DOI:10.1093/sysbio/syq037
Advance Access publication on July 23, 2010
Broadly Sampled Multigene Analyses Yield a Well-Resolved Eukaryotic Tree of Life
LAURA WEGENER PARFREY1
, JESSICA GRANT2
, YONAS I. TEKLE2,6
, ERICA LASEK-NESSELQUIST3,4
,
HILARY G. MORRISON3
, MITCHELL L. SOGIN3
, DAVID J. PATTERSON5
, AND LAURA A. KATZ1,2,∗
1Program in Organismic and Evolutionary Biology, University of Massachusetts, 611 North Pleasant Street, Amherst,
MA 01003, USA; 2Department of Biological Sciences, Smith College, 44 College Lane, Northampton, MA 01063, USA; 3Bay Paul Center for
Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 4Department of Ecology and
Evolutionary Biology, Brown University, 80 Waterman Street, Providence, RI 02912, USA; 5Biodiversity Informatics Group, Marine Biological
Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 6Present address: Department of Epidemiology and Public Health, Yale University School of
Medicine, New Haven, CT 06520, USA;
∗Correspondence to be sent to: Laura A. Katz, 44 College Lane, Northampton, MA 01003, USA; E-mail: lkatz@smith.edu.
Laura Wegener Parfrey and Jessica Grant have contributed equally to this work.
Received 30 September 2009; reviews returned 1 December 2009; accepted 25 May 2010
Associate Editor: C´ecile An´e
Abstract.—An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations underlying the
diversity of microbial and macroscopic (e.g., plants and animals) eukaryotes. Previous work has divided eukaryotic diver-
sity into a small number of high-level “supergroups,” many of which receive strong support in phylogenomic analyses.
However, the abundance of data in phylogenomic analyses can lead to highly supported but incorrect relationships due
to systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or fewer) included in these
genomic studies may exaggerate systematic error and reduce power to evaluate hypotheses. Here, we use a taxon-rich
strategy to assess eukaryotic relationships. We show that analyses emphasizing broad taxonomic sampling (up to 451 taxa
representing 72 major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic tree of life.
The consistency across analyses with varying numbers of taxa (88–451) and levels of missing data (17–69%) supports the
accuracy of the resulting topologies. The resulting stable topology emerges without the removal of rapidly evolving genes
or taxa, a practice common to phylogenomic analyses. Several major groups are stable and strongly supported in these
analyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata” is rejected. Furthermore, ex-
tensive instability among photosynthetic lineages suggests the presence of systematic biases including endosymbiotic gene
transfer from symbiont (nucleus or plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionary
relationships can be achieved with broad taxonomic sampling and a moderate number of genes. Finally, taxon-rich analy-
ses such as presented here provide a method for testing the accuracy of relationships that receive high bootstrap support
atUniversityohttp://sysbio.oxfordjournals.org/Downloadedfrom
first published online 24 October 2012, doi: 10.1098/rspb.2012.17952792012Proc. R. Soc. B
Tom A. Williams, Peter G. Foster, Tom M. W. Nye, Cymon J. Cox and T. Martin Embley
the Archaea
A congruent phylogenomic signal places eukaryotes within
Supplementary data
tml
http://rspb.royalsocietypublishing.org/content/suppl/2012/10/18/rspb.2012.1795.DC1.h
"Data Supplement"
References
http://rspb.royalsocietypublishing.org/content/279/1749/4870.full.html#related-urls
Article cited in:
http://rspb.royalsocietypublishing.org/content/279/1749/4870.full.html#ref-list-1
This article cites 56 articles, 35 of which can be accessed free
This article is free to access
Subject collections
(1595 articles)evolution
(25 articles)bioinformatics
Articles on similar topics can be found in the following collections
on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from
!3
ARTICLE doi:10.1038/nature14447
Complex archaea that bridge the gap
between prokaryotes and eukaryotes
Anja Spang1
*, Jimmy H. Saw1
*, Steffen L. Jørgensen2
*, Katarzyna Zaremba-Niedzwiedzka1
*, Joran Martijn1
, Anders E. Lind1
,
Roel van Eijk1
{, Christa Schleper2,3
, Lionel Guy1,4
& Thijs J. G. Ettema1
The origin of the eukaryotic cell remains one of the most contentious puzzles in modern biology. Recent studies
4. Palfrey et al.
Syst. Biol. 59(5):518–533, 2010
c⃝ The Author(s) 2010. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.
For Permissions, please email: journals.permissions@oxfordjournals.org
DOI:10.1093/sysbio/syq037
Advance Access publication on July 23, 2010
Broadly Sampled Multigene Analyses Yield a Well-Resolved Eukaryotic Tree of Life
LAURA WEGENER PARFREY1
, JESSICA GRANT2
, YONAS I. TEKLE2,6
, ERICA LASEK-NESSELQUIST3,4
,
HILARY G. MORRISON3
, MITCHELL L. SOGIN3
, DAVID J. PATTERSON5
, AND LAURA A. KATZ1,2,∗
1Program in Organismic and Evolutionary Biology, University of Massachusetts, 611 North Pleasant Street, Amherst,
MA 01003, USA; 2Department of Biological Sciences, Smith College, 44 College Lane, Northampton, MA 01063, USA; 3Bay Paul Center for
Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 4Department of Ecology and
Evolutionary Biology, Brown University, 80 Waterman Street, Providence, RI 02912, USA; 5Biodiversity Informatics Group, Marine Biological
Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 6Present address: Department of Epidemiology and Public Health, Yale University School of
Medicine, New Haven, CT 06520, USA;
∗Correspondence to be sent to: Laura A. Katz, 44 College Lane, Northampton, MA 01003, USA; E-mail: lkatz@smith.edu.
Laura Wegener Parfrey and Jessica Grant have contributed equally to this work.
Received 30 September 2009; reviews returned 1 December 2009; accepted 25 May 2010
Associate Editor: C´ecile An´e
Abstract.—An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations underlying the
diversity of microbial and macroscopic (e.g., plants and animals) eukaryotes. Previous work has divided eukaryotic diver-
sity into a small number of high-level “supergroups,” many of which receive strong support in phylogenomic analyses.
However, the abundance of data in phylogenomic analyses can lead to highly supported but incorrect relationships due
to systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or fewer) included in these
genomic studies may exaggerate systematic error and reduce power to evaluate hypotheses. Here, we use a taxon-rich
strategy to assess eukaryotic relationships. We show that analyses emphasizing broad taxonomic sampling (up to 451 taxa
representing 72 major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic tree of life.
The consistency across analyses with varying numbers of taxa (88–451) and levels of missing data (17–69%) supports the
accuracy of the resulting topologies. The resulting stable topology emerges without the removal of rapidly evolving genes
or taxa, a practice common to phylogenomic analyses. Several major groups are stable and strongly supported in these
analyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata” is rejected. Furthermore, ex-
tensive instability among photosynthetic lineages suggests the presence of systematic biases including endosymbiotic gene
transfer from symbiont (nucleus or plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionary
!4
5. Abstract
An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations
underlying the diversity of microbial and macroscopic (e.g., plants and animals) eukaryotes.
Previous work has divided eukaryotic diversity into a small number of high-level “supergroups,”
many of which receive strong support in phylogenomic analyses. However, the abundance of
data in phylogenomic analyses can lead to highly supported but incorrect relationships due to
systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or
fewer) included in these genomic studies may exaggerate systematic error and reduce power to
evaluate hypotheses. Here, we use a taxon-rich strategy to assess eukaryotic relationships. We
show that analyses emphasizing broad taxonomic sampling (up to 451 taxa representing 72
major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic
tree of life. The consistency across analyses with varying numbers of taxa (88–451) and levels
of missing data (17–69%) supports the accuracy of the resulting topologies. The resulting stable
topology emerges without the removal of rapidly evolving genes or taxa, a practice common to
phylogenomic analyses. Several major groups are stable and strongly supported in these
analyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata”
is rejected. Furthermore, ex- tensive instability among photosynthetic lineages suggests the
presence of systematic biases including endosymbiotic gene transfer from symbiont (nucleus or
plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionary
relationships can be achieved with broad taxonomic sampling and a moderate number of
genes. Finally, taxon-rich analyses such as presented here provide a method for testing the
accuracy of relationships that receive high bootstrap support (BS) in phylogenomic analyses
and enable placement of the multitude of lineages that lack genome scale data. [Excavata;
microbial eukaryotes; Rhizaria; supergroups; systematic error; taxon sampling.]
!5
6. Abstract
An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations
underlying the diversity of microbial and macroscopic (e.g., plants and animals) eukaryotes.
Previous work has divided eukaryotic diversity into a small number of high-level “supergroups,”
many of which receive strong support in phylogenomic analyses. However, the abundance of
data in phylogenomic analyses can lead to highly supported but incorrect relationships due to
systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or
fewer) included in these genomic studies may exaggerate systematic error and reduce power to
evaluate hypotheses. Here, we use a taxon-rich strategy to assess eukaryotic relationships. We
show that analyses emphasizing broad taxonomic sampling (up to 451 taxa representing 72
major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic
tree of life. The consistency across analyses with varying numbers of taxa (88–451) and levels
of missing data (17–69%) supports the accuracy of the resulting topologies. The resulting stable
topology emerges without the removal of rapidly evolving genes or taxa, a practice common to
phylogenomic analyses. Several major groups are stable and strongly supported in these
analyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata”
is rejected. Furthermore, ex- tensive instability among photosynthetic lineages suggests the
presence of systematic biases including endosymbiotic gene transfer from symbiont (nucleus or
plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionary
relationships can be achieved with broad taxonomic sampling and a moderate number of
genes. Finally, taxon-rich analyses such as presented here provide a method for testing the
accuracy of relationships that receive high bootstrap support (BS) in phylogenomic analyses
and enable placement of the multitude of lineages that lack genome scale data. [Excavata;
microbial eukaryotes; Rhizaria; supergroups; systematic error; taxon sampling.]
!6
7. Abstract
An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations
underlying the diversity of microbial and macroscopic (e.g., plants and animals) eukaryotes.
Previous work has divided eukaryotic diversity into a small number of high-level “supergroups,”
many of which receive strong support in phylogenomic analyses. However, the abundance of
data in phylogenomic analyses can lead to highly supported but incorrect relationships due to
systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or
fewer) included in these genomic studies may exaggerate systematic error and reduce power to
evaluate hypotheses. Here, we use a taxon-rich strategy to assess eukaryotic relationships. We
show that analyses emphasizing broad taxonomic sampling (up to 451 taxa representing 72
major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic
tree of life. The consistency across analyses with varying numbers of taxa (88–451) and levels
of missing data (17–69%) supports the accuracy of the resulting topologies. The resulting stable
topology emerges without the removal of rapidly evolving genes or taxa, a practice common to
phylogenomic analyses. Several major groups are stable and strongly supported in these
analyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata”
is rejected. Furthermore, ex- tensive instability among photosynthetic lineages suggests the
presence of systematic biases including endosymbiotic gene transfer from symbiont (nucleus or
plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionary
relationships can be achieved with broad taxonomic sampling and a moderate number of
genes. Finally, taxon-rich analyses such as presented here provide a method for testing the
accuracy of relationships that receive high bootstrap support (BS) in phylogenomic analyses
and enable placement of the multitude of lineages that lack genome scale data. [Excavata;
microbial eukaryotes; Rhizaria; supergroups; systematic error; taxon sampling.]
!7
8. Abstract
An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations
underlying the diversity of microbial and macroscopic (e.g., plants and animals) eukaryotes.
Previous work has divided eukaryotic diversity into a small number of high-level “supergroups,”
many of which receive strong support in phylogenomic analyses. However, the abundance of
data in phylogenomic analyses can lead to highly supported but incorrect relationships due to
systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or
fewer) included in these genomic studies may exaggerate systematic error and reduce power to
evaluate hypotheses. Here, we use a taxon-rich strategy to assess eukaryotic relationships. We
show that analyses emphasizing broad taxonomic sampling (up to 451 taxa representing 72
major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic
tree of life. The consistency across analyses with varying numbers of taxa (88–451) and levels
of missing data (17–69%) supports the accuracy of the resulting topologies. The resulting stable
topology emerges without the removal of rapidly evolving genes or taxa, a practice common to
phylogenomic analyses. Several major groups are stable and strongly supported in these
analyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata”
is rejected. Furthermore, ex- tensive instability among photosynthetic lineages suggests the
presence of systematic biases including endosymbiotic gene transfer from symbiont (nucleus or
plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionary
relationships can be achieved with broad taxonomic sampling and a moderate number of
genes. Finally, taxon-rich analyses such as presented here provide a method for testing the
accuracy of relationships that receive high bootstrap support (BS) in phylogenomic analyses
and enable placement of the multitude of lineages that lack genome scale data. [Excavata;
microbial eukaryotes; Rhizaria; supergroups; systematic error; taxon sampling.]
!8
9. Abstract
An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations
underlying the diversity of microbial and macroscopic (e.g., plants and animals) eukaryotes.
Previous work has divided eukaryotic diversity into a small number of high-level “supergroups,”
many of which receive strong support in phylogenomic analyses. However, the abundance of
data in phylogenomic analyses can lead to highly supported but incorrect relationships due to
systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or
fewer) included in these genomic studies may exaggerate systematic error and reduce power to
evaluate hypotheses. Here, we use a taxon-rich strategy to assess eukaryotic relationships. We
show that analyses emphasizing broad taxonomic sampling (up to 451 taxa representing 72
major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic
tree of life. The consistency across analyses with varying numbers of taxa (88–451) and levels
of missing data (17–69%) supports the accuracy of the resulting topologies. The resulting stable
topology emerges without the removal of rapidly evolving genes or taxa, a practice common to
phylogenomic analyses. Several major groups are stable and strongly supported in these
analyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata”
is rejected. Furthermore, ex- tensive instability among photosynthetic lineages suggests the
presence of systematic biases including endosymbiotic gene transfer from symbiont (nucleus or
plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionary
relationships can be achieved with broad taxonomic sampling and a moderate number of
genes. Finally, taxon-rich analyses such as presented here provide a method for testing the
accuracy of relationships that receive high bootstrap support (BS) in phylogenomic analyses
and enable placement of the multitude of lineages that lack genome scale data. [Excavata;
microbial eukaryotes; Rhizaria; supergroups; systematic error; taxon sampling.]
!9
43. Williams et al.
first published online 24 October 2012, doi: 10.1098/rspb.2012.17952792012Proc. R. Soc. B
Tom A. Williams, Peter G. Foster, Tom M. W. Nye, Cymon J. Cox and T. Martin Embley
the Archaea
A congruent phylogenomic signal places eukaryotes within
Supplementary data
tml
http://rspb.royalsocietypublishing.org/content/suppl/2012/10/18/rspb.2012.1795.DC1.h
"Data Supplement"
References
http://rspb.royalsocietypublishing.org/content/279/1749/4870.full.html#related-urls
Article cited in:
http://rspb.royalsocietypublishing.org/content/279/1749/4870.full.html#ref-list-1
This article cites 56 articles, 35 of which can be accessed free
This article is free to access
Subject collections
(178 articles)taxonomy and systematics
(1595 articles)evolution
(25 articles)bioinformatics
Articles on similar topics can be found in the following collections
Email alerting service hereright-hand corner of the article or click
Receive free email alerts when new articles cite this article - sign up in the box at the top
!43
44. Abstract
Determining the relationships among the major groups of cellular life is important for
understanding the evolution of biological diversity, but is difficult given the enormous
time spans involved. In the textbook ‘three domains’ tree based on informational genes,
eukaryotes and Archaea share a common ancestor to the exclusion of Bacteria.
However, some phylogenetic analyses of the same data have placed eukaryotes within
the Archaea, as the nearest relatives of different archaeal lineages. We compared the
support for these competing hypotheses using sophisticated phylogenetic methods and
an improved sampling of archaeal biodiversity. We also employed both new and existing
tests of phylogenetic congruence to explore the level of uncertainty and conflict in the
data. Our analyses suggested that much of the observed incongruence is weakly
supported or associated with poorly fitting evolutionary models. All of our phylogenetic
analyses, whether on small subunit and large subunit ribosomal RNA or concatenated
protein-coding genes, recovered a monophyletic group containing eukaryotes and the
TACK archaeal superphylum comprising the Thaumarchaeota, Aigarchaeota,
Crenarchaeota and Korarchaeota. Hence, while our results provide no support for the
iconic three-domain tree of life, they are consistent with an extended eocyte hypothesis
whereby vital components of the eukaryotic nuclear lineage originated from within the
archaeal radiation.
!44
45. Abstract
Determining the relationships among the major groups of cellular life is important for
understanding the evolution of biological diversity, but is difficult given the enormous
time spans involved. In the textbook ‘three domains’ tree based on informational genes,
eukaryotes and Archaea share a common ancestor to the exclusion of Bacteria.
However, some phylogenetic analyses of the same data have placed eukaryotes within
the Archaea, as the nearest relatives of different archaeal lineages. We compared the
support for these competing hypotheses using sophisticated phylogenetic methods and
an improved sampling of archaeal biodiversity. We also employed both new and existing
tests of phylogenetic congruence to explore the level of uncertainty and conflict in the
data. Our analyses suggested that much of the observed incongruence is weakly
supported or associated with poorly fitting evolutionary models. All of our phylogenetic
analyses, whether on small subunit and large subunit ribosomal RNA or concatenated
protein-coding genes, recovered a monophyletic group containing eukaryotes and the
TACK archaeal superphylum comprising the Thaumarchaeota, Aigarchaeota,
Crenarchaeota and Korarchaeota. Hence, while our results provide no support for the
iconic three-domain tree of life, they are consistent with an extended eocyte hypothesis
whereby vital components of the eukaryotic nuclear lineage originated from within the
archaeal radiation.
!45
46. Abstract
Determining the relationships among the major groups of cellular life is important for
understanding the evolution of biological diversity, but is difficult given the enormous
time spans involved. In the textbook ‘three domains’ tree based on informational genes,
eukaryotes and Archaea share a common ancestor to the exclusion of Bacteria.
However, some phylogenetic analyses of the same data have placed eukaryotes within
the Archaea, as the nearest relatives of different archaeal lineages. We compared the
support for these competing hypotheses using sophisticated phylogenetic methods and
an improved sampling of archaeal biodiversity. We also employed both new and existing
tests of phylogenetic congruence to explore the level of uncertainty and conflict in the
data. Our analyses suggested that much of the observed incongruence is weakly
supported or associated with poorly fitting evolutionary models. All of our phylogenetic
analyses, whether on small subunit and large subunit ribosomal RNA or concatenated
protein-coding genes, recovered a monophyletic group containing eukaryotes and the
TACK archaeal superphylum comprising the Thaumarchaeota, Aigarchaeota,
Crenarchaeota and Korarchaeota. Hence, while our results provide no support for the
iconic three-domain tree of life, they are consistent with an extended eocyte hypothesis
whereby vital components of the eukaryotic nuclear lineage originated from within the
archaeal radiation.
!46
47. Abstract
Determining the relationships among the major groups of cellular life is important for
understanding the evolution of biological diversity, but is difficult given the enormous
time spans involved. In the textbook ‘three domains’ tree based on informational genes,
eukaryotes and Archaea share a common ancestor to the exclusion of Bacteria.
However, some phylogenetic analyses of the same data have placed eukaryotes within
the Archaea, as the nearest relatives of different archaeal lineages. We compared the
support for these competing hypotheses using sophisticated phylogenetic methods and
an improved sampling of archaeal biodiversity. We also employed both new and existing
tests of phylogenetic congruence to explore the level of uncertainty and conflict in the
data. Our analyses suggested that much of the observed incongruence is weakly
supported or associated with poorly fitting evolutionary models. All of our phylogenetic
analyses, whether on small subunit and large subunit ribosomal RNA or concatenated
protein-coding genes, recovered a monophyletic group containing eukaryotes and the
TACK archaeal superphylum comprising the Thaumarchaeota, Aigarchaeota,
Crenarchaeota and Korarchaeota. Hence, while our results provide no support for the
iconic three-domain tree of life, they are consistent with an extended eocyte hypothesis
whereby vital components of the eukaryotic nuclear lineage originated from within the
archaeal radiation.
!47
48. Abstract
Determining the relationships among the major groups of cellular life is important for
understanding the evolution of biological diversity, but is difficult given the enormous
time spans involved. In the textbook ‘three domains’ tree based on informational genes,
eukaryotes and Archaea share a common ancestor to the exclusion of Bacteria.
However, some phylogenetic analyses of the same data have placed eukaryotes within
the Archaea, as the nearest relatives of different archaeal lineages. We compared the
support for these competing hypotheses using sophisticated phylogenetic methods and
an improved sampling of archaeal biodiversity. We also employed both new and existing
tests of phylogenetic congruence to explore the level of uncertainty and conflict in the
data. Our analyses suggested that much of the observed incongruence is weakly
supported or associated with poorly fitting evolutionary models. All of our phylogenetic
analyses, whether on small subunit and large subunit ribosomal RNA or concatenated
protein-coding genes, recovered a monophyletic group containing eukaryotes and the
TACK archaeal superphylum comprising the Thaumarchaeota, Aigarchaeota,
Crenarchaeota and Korarchaeota. Hence, while our results provide no support for the
iconic three-domain tree of life, they are consistent with an extended eocyte hypothesis
whereby vital components of the eukaryotic nuclear lineage originated from within the
archaeal radiation.
!48
63. rRNA
Bacteria
Euryarchaeota
Korarchaeota
Crenarchaeota
Aigarchaeota
Thaumarchaeota
Eukaryota
Archaeoglobus fulgidus
Methanococcus jannaschii
Methanothermobacter thermautotrophicus
Pyrococcus furiosus
Methanosarcina mazei
Thermoplasma volcanium
Giardia lamblia
Trichomonas vaginalis
Naegleria gruberi
Arabidopsis thaliana
Homo sapiens
Saccharomyces cerevisiae
Thalassiosira pseudonana
Dictyostelium discoideum
Trypanosoma brucei
Entamoeba histolytica
Cenarchaeum symbiosum
Nitrosopumilus maritimus
Korarchaeum cryptofilum
Caldiarchaeum subterraneum
Caldivirga maquilingensis
Pyrobaculum aerophilum
Thermofilum pendens
Sulfolobus solfataricus
Staphylothermus marinus
Hyperthermus butylicus
Ignicoccus hospitalis
Aeropyrum pernix
Campylobacter jejuni
Escherichia coli
Rhodopseudomonas palustris
Clostridium acetobutylicum
Synechocystis sp.
Treponema pallidum
Chlamydia trachomatis
Rhodopirellula baltica
1
1
1
1
0.83
1
0.2
(a)
Bacteria
Euryarchaeota
Crenarchaeota
Eukaryota
Trichomonas vaginalis
Arabidopsis thaliana
Giardia lamblia
Homo sapiens
Saccharomyces cerevisiae
Thalassiosira pseudonana
Dictyostelium discoideum
Trypanosoma brucei
Entamoeba histolytica
Naegleria gruberi
Archaeoglobus fulgidus
Methanosarcina mazei
Thermoplasma volcanium
Methanococcus jannaschii
Methanothermobacter thermautotrophicus
Pyrococcus furiosus
Caldivirga maquilingensis
Pyrobaculum aerophilum
Thermofilum pendens
Sulfolobus solfataricus
Hyperthermus butylicus
Staphylothermus marinus
Ignicoccus hospitalis
Aeropyrum pernix
Clostridium acetobutylicum
Synechocystis sp.
Campylobacter jejuni
Escherichia coli
Rhodopseudomonas palustris
Treponema pallidum
Chlamydia trachomatis
Rhodopirellula baltica
1
1
1
1
1
0.2
(b)
Bacteria
Euryarchaeota
Korarchaeota
Crenarchaeota
Aigarchaeota
Thaumarchaeota
Eukaryota
Archaeoglobus fulgidus
Methanococcus jannaschii
Methanothermobacter thermautotrophicus
Pyrococcus furiosus
Methanosarcina mazei
Thermoplasma volcanium
Trichomonas vaginalis
Giardia lamblia
Naegleria gruberi
Entamoeba histolytica
Dictyostelium discoideum
Trypanosoma brucei
Arabidopsis thaliana
Homo sapiens
Saccharomyces cerevisiae
Thalassiosira pseudonana
Cenarchaeum symbiosum
Nitrosopumilus maritimus
Korarchaeum cryptofilum
Caldiarchaeum subterraneum
Caldivirga maquilingensis
Pyrobaculum aerophilum
Thermofilum pendens
Sulfolobus solfataricus
Hyperthermus butylicus
Ignicoccus hospitalis
Staphylothermus marinus
Aeropyrum pernix
Campylobacter jejuni
Escherichia coli
Rhodopseudomonas palustris
Clostridium acetobutylicum
Synechocystis sp.
Treponema pallidum
Chlamydia trachomatis
Rhodopirellula baltica
1
1
1
1
1
1
0.2
(c)
Bacteria
Euryarchaeota
Korarchaeota
Crenarchaeota
Aigarchaeota
Thaumarchaeota
Eukaryota
Methanococcus jannaschii
Thermoplasma volcanium
Methanosarcina mazei
Archaeoglobus fulgidus
Methanothermobacter thermautotrophicus
Pyrococcus furiosus
Korarchaeum cryptofilum
Nitrosopumilus maritimus
Cenarchaeum symbiosum
Caldiarchaeum subterraneum
Giardia lamblia
Homo sapiens
Thalassiosira pseudonana
Saccharomyces cerevisiae
Trypanosoma brucei
Naegleria gruberi
Entamoeba histolytica
Trichomonas vaginalis
Dictyostelium discoideum
Arabidopsis thaliana
Thermofilum pendens
Pyrobaculum aerophilum
Caldivirga maquilingensis
Sulfolobus solfataricus
Staphylothermus marinus
Aeropyrum pernix
Ignicoccus hospitalis
Hyperthermus butylicus
Rhodopirellula baltica
Synechocystis sp.
Clostridium acetobutylicum
Treponema pallidum
Chlamydia trachomatis
Rhodopseudomonas palustris
Escherichia coli
Campylobacter jejuni
1
1
0.57
1
0.97
0.2
(d)
Figure 1. Phylogenies of Bacteria, Archaea and eukaryotes inferred from concatenated rRNA. (a) A Bayesian phylogeny of Bac-
teria, Archaea and eukaryotes inferred under the GTR model, showing an eocyte-like topology in which eukaryotes emerge
from within the Archaea with maximal support (posterior probability (PP) ¼ 1). (b) Removal of recently characterized archaeal
groups (the Thaumarchaeota, Aigarchaeota and Korarchaeota) converts this tree into a canonical three-domains topology,
again with maximal support (PP ¼ 1), indicating that sampling plays an important role in the resolution of these ancient
relationships. Analyses of the full dataset using the better-fitting NDRH þ NDCH (c) and CAT (d) models recover maximally
supported eocyte-like topologies; these models also recover eocyte-like topologies on the reduced dataset, without the TAK
sequences (see the electronic supplementary material, figure S1). Branch lengths are proportional to substitutions per site.
Evolution of eukaryotes from Archaea T. A. Williams et al. 4873
Proc. R. Soc. B (2012)
on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from
!63
64. Figure 1. Phylogenies of Bacteria, Archaea and eukaryotes inferred
from concatenated rRNA. (a) A Bayesian phylogeny of Bacteria,
Archaea and eukaryotes inferred under the GTR model, showing an
eocyte-like topology in which eukaryotes emerge from within the
Archaea with maximal support (posterior probability (PP) 1⁄4 1). (b)
Removal of recently characterized archaeal groups (the
Thaumarchaeota, Aigarchaeota and Korarchaeota) converts this tree
into a canonical three-domains topology, again with maximal support
(PP 1⁄4 1), indicating that sampling plays an important role in the
resolution of these ancient relationships. Analyses of the full dataset
using the better-fitting NDRH þ NDCH (c) and CAT (d ) models
recover maximally supported eocyte-like topologies; these models
also recover eocyte-like topologies on the reduced dataset, without
the TAK sequences (see the electronic supplementary material,
figure S1). Branch lengths are proportional to substitutions per site.
65. rRNA
Bacteria
Euryarchaeota
Korarchaeota
Crenarchaeota
Aigarchaeota
Thaumarchaeota
Eukaryota
Archaeoglobus fulgidus
Methanococcus jannaschii
Methanothermobacter thermautotrophicus
Pyrococcus furiosus
Methanosarcina mazei
Thermoplasma volcanium
Giardia lamblia
Trichomonas vaginalis
Naegleria gruberi
Arabidopsis thaliana
Homo sapiens
Saccharomyces cerevisiae
Thalassiosira pseudonana
Dictyostelium discoideum
Trypanosoma brucei
Entamoeba histolytica
Cenarchaeum symbiosum
Nitrosopumilus maritimus
Korarchaeum cryptofilum
Caldiarchaeum subterraneum
Caldivirga maquilingensis
Pyrobaculum aerophilum
Thermofilum pendens
Sulfolobus solfataricus
Staphylothermus marinus
Hyperthermus butylicus
Ignicoccus hospitalis
Aeropyrum pernix
Campylobacter jejuni
Escherichia coli
Rhodopseudomonas palustris
Clostridium acetobutylicum
Synechocystis sp.
Treponema pallidum
Chlamydia trachomatis
Rhodopirellula baltica
1
1
1
1
0.83
1
0.2
(a)
Bacteria
Euryarchaeota
Crenarchaeota
Eukaryota
Trichomonas vaginalis
Arabidopsis thaliana
Giardia lamblia
Homo sapiens
Saccharomyces cerevisiae
Thalassiosira pseudonana
Dictyostelium discoideum
Trypanosoma brucei
Entamoeba histolytica
Naegleria gruberi
Archaeoglobus fulgidus
Methanosarcina mazei
Thermoplasma volcanium
Methanococcus jannaschii
Methanothermobacter thermautotrophicus
Pyrococcus furiosus
Caldivirga maquilingensis
Pyrobaculum aerophilum
Thermofilum pendens
Sulfolobus solfataricus
Hyperthermus butylicus
Staphylothermus marinus
Ignicoccus hospitalis
Aeropyrum pernix
Clostridium acetobutylicum
Synechocystis sp.
Campylobacter jejuni
Escherichia coli
Rhodopseudomonas palustris
Treponema pallidum
Chlamydia trachomatis
Rhodopirellula baltica
1
1
1
1
1
0.2
(b)
Bacteria
Euryarchaeota
Korarchaeota
Crenarchaeota
Aigarchaeota
Thaumarchaeota
Eukaryota
Archaeoglobus fulgidus
Methanococcus jannaschii
Methanothermobacter thermautotrophicus
Pyrococcus furiosus
Methanosarcina mazei
Thermoplasma volcanium
Trichomonas vaginalis
Giardia lamblia
Naegleria gruberi
Entamoeba histolytica
Dictyostelium discoideum
Trypanosoma brucei
Arabidopsis thaliana
Homo sapiens
Saccharomyces cerevisiae
Thalassiosira pseudonana
Cenarchaeum symbiosum
Nitrosopumilus maritimus
Korarchaeum cryptofilum
Caldiarchaeum subterraneum
Caldivirga maquilingensis
Pyrobaculum aerophilum
Thermofilum pendens
Sulfolobus solfataricus
Hyperthermus butylicus
Ignicoccus hospitalis
Staphylothermus marinus
Aeropyrum pernix
Campylobacter jejuni
Escherichia coli
Rhodopseudomonas palustris
Clostridium acetobutylicum
Synechocystis sp.
Treponema pallidum
Chlamydia trachomatis
Rhodopirellula baltica
1
1
1
1
1
1
0.2
(c)
Bacteria
Euryarchaeota
Korarchaeota
Crenarchaeota
Aigarchaeota
Thaumarchaeota
Eukaryota
Methanococcus jannaschii
Thermoplasma volcanium
Methanosarcina mazei
Archaeoglobus fulgidus
Methanothermobacter thermautotrophicus
Pyrococcus furiosus
Korarchaeum cryptofilum
Nitrosopumilus maritimus
Cenarchaeum symbiosum
Caldiarchaeum subterraneum
Giardia lamblia
Homo sapiens
Thalassiosira pseudonana
Saccharomyces cerevisiae
Trypanosoma brucei
Naegleria gruberi
Entamoeba histolytica
Trichomonas vaginalis
Dictyostelium discoideum
Arabidopsis thaliana
Thermofilum pendens
Pyrobaculum aerophilum
Caldivirga maquilingensis
Sulfolobus solfataricus
Staphylothermus marinus
Aeropyrum pernix
Ignicoccus hospitalis
Hyperthermus butylicus
Rhodopirellula baltica
Synechocystis sp.
Clostridium acetobutylicum
Treponema pallidum
Chlamydia trachomatis
Rhodopseudomonas palustris
Escherichia coli
Campylobacter jejuni
1
1
0.57
1
0.97
0.2
(d)
Figure 1. Phylogenies of Bacteria, Archaea and eukaryotes inferred from concatenated rRNA. (a) A Bayesian phylogeny of Bac-
teria, Archaea and eukaryotes inferred under the GTR model, showing an eocyte-like topology in which eukaryotes emerge
from within the Archaea with maximal support (posterior probability (PP) ¼ 1). (b) Removal of recently characterized archaeal
groups (the Thaumarchaeota, Aigarchaeota and Korarchaeota) converts this tree into a canonical three-domains topology,
again with maximal support (PP ¼ 1), indicating that sampling plays an important role in the resolution of these ancient
relationships. Analyses of the full dataset using the better-fitting NDRH þ NDCH (c) and CAT (d) models recover maximally
supported eocyte-like topologies; these models also recover eocyte-like topologies on the reduced dataset, without the TAK
sequences (see the electronic supplementary material, figure S1). Branch lengths are proportional to substitutions per site.
Evolution of eukaryotes from Archaea T. A. Williams et al. 4873
Proc. R. Soc. B (2012)
on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from
!65
72. Concatenated Proteins
Bacteria
Euryarchaeota
Korarchaeota
Crenarchaeota
Aigarchaeota
Thaumarchaeota
Eukaryota
Methanothermobacter thermautotrophicus
Methanococcus jannaschii
Thermoplasma volcanium
Methanosarcina mazei
Archaeoglobus fulgidus
Pyrococcus furiosus
Giardia lamblia
Trichomonas vaginalis
Thalassiosira pseudonana
Phytophthora ramorum
Saccharomyces cerevisiae
Homo sapiens
Entamoeba histolytica
Dictyostelium discoideum
Leishmania major
Arabidopsis thaliana
Korarchaeum cryptofilum
Nitrosopumilus maritimus
Nitrosoarchaeum limnia
Cenarchaeum symbiosum
Caldiarchaeum subterraneum
Thermofilum pendens
Pyrobaculum aerophilum
Caldivirga maquilingensis
Staphylothermus marinus
Sulfolobus solfataricus
Ignicoccus hospitalis
Aeropyrum pernix
Hyperthermus butylicus
Rhodopseudomonas palustris
Escherichia coli
Treponema pallidum
Rhodopirellula baltica
Chlamydia trachomatis
Synechocystis sp.
Clostridium acetobutylicum
Campylobacter jejuni
1
0.51
0.81
0.99
0.99
1
0.99
1
1
0.2
(a)
Euryarchaeota
Korarchaeota
Crenarchaeota
Aigarchaeota
Thaumarchaeota
Eukaryota
Pyrococcus furiosus
Methanococcus jannaschii
Methanothermobacter thermautotrophicus
Thermoplasma acidophilum
Archaeoglobus fulgidus
Methanosarcina mazei
Trichomonas vaginalis
Giardia lamblia
Entamoeba histolytica
Naegleria gruberi
Leishmania major
Dictyostelium discoideum
Saccharomyces cerevisiae
Homo sapiens
Arabidopsis thaliana
Thalassiosira pseudonana
Phytophthora ramorum
Korarchaeum cryptofilum
Caldiarchaeum subterraneum
Cenarchaeum symbiosum
Nitrosopumilus maritimus
Nitrosoarchaeum limnia
Thermofilum pendens
Pyrobaculum aerophilum
Caldivirga maquilingensis
Sulfolobus solfataricus
Ignicoccus hospitalis
Staphylothermus marinus
Hyperthermus butylicus
Aeropyrum pernix
1
1
1
0.99
1
1
0.5
(b)
Figure 2. Phylogenies of Bacteria, Archaea and eukaryotes inferred from conserved protein-coding genes. (a) A phylogeny
inferred from 29 concatenated proteins conserved between Bacteria, Archaea and eukaryotes. An eocyte topology was recov-
ered with strong (PP ¼ 0.99) support. In this phylogeny, the eukaryotes emerge as the sister group of Korarchaeum, nested with
the TACK superphylum. (b) A phylogeny inferred from 63 concatenated proteins shared between Archaea and eukaryotes. The
position of the root is not explicitly indicated. However, based on the result from (a) and the electronic supplementary material,
table S4, it is likely to be either within, or on the branch leading to, the Euryarchaea. If this position is correct, then the tree
shows the eukaryotes emerging as the sister group to the TACK superphylum, including Korarchaeum. These trees were
inferred using the CAT model in PHYLOBAYES. Branch lengths are proportional to substitutions per site, except the truncated
bacterial branch in (a).
4874 T. A. Williams et al. Evolution of eukaryotes from Archaea
on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from
!72
73. Figure 2. Phylogenies of Bacteria, Archaea and eukaryotes inferred from
conserved protein-coding genes. (a) A phylogeny inferred from 29
concatenated proteins conserved between Bacteria, Archaea and
eukaryotes. An eocyte topology was recovered with strong (PP 1⁄4 0.99)
support. In this phylogeny, the eukaryotes emerge as the sister group of
Korarchaeum, nested with the TACK superphylum. (b) A phylogeny
inferred from 63 concatenated proteins shared between Archaea and
eukaryotes. The position of the root is not explicitly indicated. However,
based on the result from (a) and the electronic supplementary material,
table S4, it is likely to be either within, or on the branch leading to, the
Euryarchaea. If this position is correct, then the tree shows the
eukaryotes emerging as the sister group to the TACK superphylum,
including Korarchaeum. These trees were inferred using the CAT model
in PHYLOBAYES. Branch lengths are proportional to substitutions per
site, except the truncated bacterial branch in (a).
76. Tree Congruence
3. CONCLUSIONS theories of eukaryotic origins [1]. Here, we have com-
distance
frequency
1 2 3 4 5
no.testspassed(P>0.05)
saturation and
homoplasy
site-specific
biochemical diversity
compositional
heterogeneity
0
10
20
30
40
50
60
model
CAT20
LG
(b)
0
50
100
150
200
250
300
(a)
1.0 1.5 2.0 2.5 3.0
density
model
CAT20
LG
0
0.2
0.4
0.6
0.8
1.0
1.2
(c)
distance
Figure 3. Analysing incongruence using a novel measure of distance between gene trees. We used distributions of pairwise geo-
desic distances between gene trees to compare levels of incongruence inferred under different evolutionary models. (a) The
distribution of distances under a single model (CAT20) can be used to identify obvious outliers corresponding to highly incon-
gruent gene trees; a single gene was responsible for the peak highlighted in red, and was removed from subsequent analyses.
(b) Overview of model-fitting tests (posterior predictive simulations) for each gene in the 64AE dataset. The height of the bars
indicates the proportion of genes that ‘passed’ a test under a particular model; we said that a test was passed when the value of
the test statistic on the real data fell within the central 95% of the distribution of values produced by posterior predictive simu-
lation. The results suggest that CAT20 fits better than LG, successfully accounting for the observed levels of saturation and
homoplasy in all but one of the alignments. Both models do a poor job of modelling the site-specific selective constraints in
our dataset, although again CAT20 performs better than LG (13 passes as opposed to 0). (c) Comparison of the distance dis-
tributions inferred under the CAT20 and LG models. The trees inferred under the better-fitting CAT20 model are significantly
more congruent than those inferred under LG (mean distance: 2.68 versus 3.22, p , 0.0001). The significance of this differ-
ence was assessed using a permutation test that took the correlations between pairwise distances into account (see §4). These
results suggest that a significant portion of the incongruence in this dataset of informational genes can be attributed to model
misspecification, rather than genuinely distinct evolutionary histories.
4876 T. A. Williams et al. Evolution of eukaryotes from Archaea
on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from
!76
77. Figure 3. Analysing incongruence using a novel measure of distance between gene trees. We
used distributions of pairwise geodesic distances between gene trees to compare levels of
incongruence inferred under different evolutionary models. (a) The distribution of distances
under a single model (CAT20) can be used to identify obvious outliers corresponding to
highly incongruent gene trees; a single gene was responsible for the peak highlighted in red,
and was removed from subsequent analyses. (b) Overview of model-fitting tests (posterior
predictive simulations) for each gene in the 64AE dataset. The height of the bars indicates the
proportion of genes that ‘passed’ a test under a particular model; we said that a test was
passed when the value of the test statistic on the real data fell within the central 95% of the
distribution of values produced by posterior predictive simulation. The results suggest that
CAT20 fits better than LG, successfully accounting for the observed levels of saturation and
homoplasy in all but one of the alignments. Both models do a poor job of modelling the site-
specific selective constraints in our dataset, although again CAT20 performs better than LG
(13 passes as opposed to 0). (c) Comparison of the distance distributions inferred under the
CAT20 and LG models. The trees inferred under the better-fitting CAT20 model are
significantly more congruent than those inferred under LG (mean distance: 2.68 versus 3.22,
p , 0.0001). The significance of this difference was assessed using a permutation test that took
the correlations between pairwise distances into account (see §4). These results suggest that a
significant portion of the incongruence in this dataset of informational genes can be attributed
to model misspecification, rather than genuinely distinct evolutionary histories.
78. Tree Congruence
distance
frequency
1 2 3 4 5
no.testspassed(P>0.05)
saturation and
homoplasy
site-specific
biochemical diversity
compositional
heterogeneity
0
10
20
30
40
50
60
model
CAT20
LG
(b)
0
50
100
150
200
250
300
(a)
1.0 1.5 2.0 2.5 3.0
density
model
CAT20
LG
0
0.2
0.4
0.6
0.8
1.0
1.2
(c)
4876 T. A. Williams et al. Evolution of eukaryotes from Archaea
on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from
!78