This document provides an overview of phylogeny and constructing phylogenetic trees. It defines phylogeny as models of evolutionary relationships among species based on sequence similarities, often illustrated as phylogenetic trees. It describes how to construct phylogenetic trees, including choosing marker genes, aligning sequences, calculating evolutionary distances, performing phylogenetic analysis, and dealing with complexities like long-branch attraction. It also discusses species definitions in microbes and operational species concepts based on metrics like 16S rRNA sequence identity.
Unit 2: Phylogeny
LECTURE LEARNING GOALS
1. Define phylogeny, and describe what a phylogenetic tree can reveal about the species it models.
2. Describe how to construct a phylogenetic tree, and the complexities that create mistakes.
3. Explain how to root a tree, and contrast how to root the tree of life.
Unit 2: Phylogeny
LECTURE LEARNING GOALS
1. Define phylogeny, and describe what a phylogenetic tree can reveal about the species it models.
2. Describe how to construct a phylogenetic tree, and the complexities that create mistakes.
3. Explain how to root a tree, and contrast how to root the tree of life.
lecture for doctorate students while I was working as researcher assisstance about phylogenetic science, definition,
Understand the most basic concepts of phylogeny
Understand the difference between orthology, paralogy and xenology.
Be able to compute simple phylogenetic trees
Understand what bootstrapping means in phylogeny
This presentation entitled 'Molecular phylogenetics and its application' deals with all the developmental ideas and basics in the field of bioinformatics.
Keynote presentation from Plant and Pathogen Bioinformatics workshop at EMBL-EBI, 8-11 July 2014
Slides and teaching material are available at https://github.com/widdowquinn/Teaching-EMBL-Plant-Path-Genomics
Introduction to Modern Biosystemaics for Fungal ClassificationMrinal Vashisth
This is a more specific version of the slide-set "Major Characteristics Used in Microbial Classification". A presentation I could not deliver for some reasons yet turned out to be pretty nice. I hope to deliver it some day, but for the time being I am making it public. I hope it would be of some use. :)
Evolution 2012 Talk: When do we Lack Resolvable Clades?David Bapst
A talk presenting my work recently published in PLoS One, at the Evolution meeting in 2012, in Ottawa. Examples of morphological differentiation illustrated with colorful pictures of a group known to many.
You can find the published paper here, without the pocket monsters:
Bapst DW (2013) When Can Clades Be Potentially Resolved with
Morphology? PLoS ONE 8(4): e62312. doi:10.1371/journal.pone.0062312
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0062312
Obviously all the Pokemon are copyright of Nintendo of America. But how useful pedagogically they were!
Comparative genomics: Genomic features are compared, evolutionary relationship
The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. orthologous sequences,
Started as soon as the whole genomes of two organisms became available (that is, the genomes of the bacteria Haemophilus influenzae and Mycoplasma genitalium) in 1995, comparative genomics is now a standard component of the analysis of every new genome sequence. comparative genomics studies of small model organisms (for example the model Caenorhabditis elegans and closely related Caenorhabditis briggsae) are of great importance to advance our understanding of general mechanisms of evolution
Computational tools for analyzing sequences and complete genomes. Application of comparative genomics in agriculture and medicine.
This is the first presentation of the BITS training on 'Comparative genomics'.
It reviews the basic concepts of sequence homology on different levels.
Thanks to Klaas Vandepoele of the PSB department.
lecture for doctorate students while I was working as researcher assisstance about phylogenetic science, definition,
Understand the most basic concepts of phylogeny
Understand the difference between orthology, paralogy and xenology.
Be able to compute simple phylogenetic trees
Understand what bootstrapping means in phylogeny
This presentation entitled 'Molecular phylogenetics and its application' deals with all the developmental ideas and basics in the field of bioinformatics.
Keynote presentation from Plant and Pathogen Bioinformatics workshop at EMBL-EBI, 8-11 July 2014
Slides and teaching material are available at https://github.com/widdowquinn/Teaching-EMBL-Plant-Path-Genomics
Introduction to Modern Biosystemaics for Fungal ClassificationMrinal Vashisth
This is a more specific version of the slide-set "Major Characteristics Used in Microbial Classification". A presentation I could not deliver for some reasons yet turned out to be pretty nice. I hope to deliver it some day, but for the time being I am making it public. I hope it would be of some use. :)
Evolution 2012 Talk: When do we Lack Resolvable Clades?David Bapst
A talk presenting my work recently published in PLoS One, at the Evolution meeting in 2012, in Ottawa. Examples of morphological differentiation illustrated with colorful pictures of a group known to many.
You can find the published paper here, without the pocket monsters:
Bapst DW (2013) When Can Clades Be Potentially Resolved with
Morphology? PLoS ONE 8(4): e62312. doi:10.1371/journal.pone.0062312
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0062312
Obviously all the Pokemon are copyright of Nintendo of America. But how useful pedagogically they were!
Comparative genomics: Genomic features are compared, evolutionary relationship
The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. orthologous sequences,
Started as soon as the whole genomes of two organisms became available (that is, the genomes of the bacteria Haemophilus influenzae and Mycoplasma genitalium) in 1995, comparative genomics is now a standard component of the analysis of every new genome sequence. comparative genomics studies of small model organisms (for example the model Caenorhabditis elegans and closely related Caenorhabditis briggsae) are of great importance to advance our understanding of general mechanisms of evolution
Computational tools for analyzing sequences and complete genomes. Application of comparative genomics in agriculture and medicine.
This is the first presentation of the BITS training on 'Comparative genomics'.
It reviews the basic concepts of sequence homology on different levels.
Thanks to Klaas Vandepoele of the PSB department.
Instructions and bracket to play Morrill Microbe Madness, a game to review representative organisms from the major phyla of the domain bacteria, part of MICROBIO 480 Microbial Diversity.
Unit 11: Viruses and Prions
LECTURE LEARNING GOALS
1. Define what is a virus, and describe the three theories on the origin of viruses.
2. Define and contrast prions and subviral agents. Explain how they are different from viruses.
3. Explain coronaviruses, the origin of SARS- CoV-2, how it infects cells, and the tools we use to fight the spread of COVID-19.
Unit 10: Diversity of Permafrost
LECTURE LEARNING GOALS
1. Describe permafrost, and the microbial diversity of permafrost. Explain how the greatest diversity of Archaea exist in cold environments.
2. Describe the two main Archaeal phyla, and describe example species.
3. Explain how climate change is affecting permafrost and microbial diversity.
Unit 9: Human Microbiome
LECTURE LEARNING GOALS
1. Describe the human microbiome: how many microbes there are, how you get your microbiome, who’s there, and how it changes over time and by region.
2. Describe the domain eukarya. List the five superkingdoms and a few notable species.
3. Explain how the human microbiome is related to health and disease.
Unit 8: Rare and Uncultured Microbes
LECTURE LEARNING GOALS
1. Describe the phyla containing rare bacteria: Deinococcus/Thermus, Chlamydia & Planctomycetes.
2. Describe the sequencing methods used to understand uncultured microbes. Explain the Eocyte hypothesis and how this model differs from the three domain tree of life.
3. For the cultured microbes, describe major characteristics for the 13 bacterial phyla, and explain why some microbe remain uncultivated.
6
Unit 7: Diversity of Soils & Sediments
LECTURE LEARNING GOALS
1. Define soils and sediment, and contrast the microbes living in each. Explain biogeochemical cycles.
2. Describe the diversity, metabolism & habitat of the five classes of the phylum Proteobacteria, including some common example species.
3. Describe the diversity, metabolism & habitat of the Gram-positive bacteria (phylua Firmicutes & Actinobacteria).
Unit 6: Diversity of Microbial Mats
LECTURE LEARNING GOALS
1. Definemicrobialmats.Describethe functional guilds of microbes in the different layers, and how they interact.
2. Foreachofthethreephylaof photosynthetic bacteria, contrast how each fixes C and gains energy and reducing equivalents from light.
3. Forthetwothermophilicbacterialphyla, describe their adaptations to life at high
temperature. Explain how they are primitive and deeply-branching.
Unit 5: Everything is everywhere?
LECTURE LEARNING GOALS
1. State the Baas Becking hypothesis, and describe the environmental traits are the strongest drivers of microbial community.
2. Explain how to measure community dissimilarity. Explain why the Baas Becking hypothesis continues to be tested today.
3. Describe methods to link taxonomic or community structure to function.
Unit 4: Biofilms & Motility
LECTURE LEARNING GOALS
• Describethethreetypesofbacterialbiofilm, and how each develop.
• Contrastthedifferentwaysthatmicrobes move using flagella. Explain the ways that bacterial and archaeal flagella are different. Describe non-flagellar movement.
• Giveexamplesofhowmicrobesmovefrom the phyla spirochetes and bacteroidetes.
Unit 3: Microbiology of Early Earth
LECTURE LEARNING GOALS
• Describe the early Earth environment, and prevailing theories for the origins of life.
• Describe the major events in the evolution of cellular life, and when they happened.
• Explain the lines of evidence that lead us to know when early life arose, and the scientific basis behind each line.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
2. Lecture Learning Goals
• Define phylogeny, and describe what a phylogenetic tree can reveal
about the taxa that they model.
• Explain how phylogenetic methods can allow us to make inferences
about groups of organisms and ancestors
• Describe how to construct a phylogenetic tree, and the complexities
that create mistakes.
• Contrast the different phylogenetic marker genes or concatenations
of genes that are available depending on the sequencing technology.
• Define the species concept for microbes.
• Make a phylogenetic tree.
2
3. Phylogeny
• Phylogeny is a model of
evolutionary relationships
among species based on
sequence similarities.
• Phylogeny may also refer to a
phylogenetic tree, the
illustration of these
relationships.
Woesian ToL: Pace NR, Science 1997
3
5. Read trees like mobiles
5
In a tree like this, these blue branches have lengths that are
meaningful. Their distance should be described by the
value of changes in a scale bar.
In a tree like this, these red distances have lengths that are
NOT meaningful. They are spacers whose distance are only
meant to make room for labels or pictures.
8. 8
The root of the ToL represents the
last universal common ancestor
9. The root of the ToL represents the
last universal common ancestor
• One cannot rely on nucleotide gene sequences alone because these
would have mutated beyond recognition
• Amino acid sequences mutate more slowly because neutral
mutations leave the amino acid sequence fixed
• The tertiary folded structure of a protein is even more strongly
conserved than the secondary structure
9
10. Sequence homology
• Homologous genes have a shared ancestry.
• Orthologs arise because of a speciation event.
• Paralogs arise because of duplication event.
10
11. Paralogs are used to root the ToL
• Elongation Factors duplicated prior to divergence of the three
Domains
• One gene tree can be rooted with the other gene
• Both trees yield the same relationship and are rooted in the
same location.
11
12. Root the tree of life using paralogs
• The genes for the protein synthesis elongation factors Tu
(EF-Tu) and G (EF-G) are the products of an ancient gene
duplication, which appears to predate the divergence of all
extant organismal lineages.
• Most phylogenetic methods place the root of the ToL in the
Bacteria
• A combined data set of EF-Tu and EF-G sequences favors
placement of the eukaryotes within the Archaea, as the
sister group to the Crenarchaeota
12
Baladuf, Palmer, & Doolittle, 1996
14. Protein-based models of evolu7on
• Traits here are proteins, NOT DNA sequence
• Based on 420 modern organisms, looking for structures
that were common to all.
• 5 to 11 per cent were universal-- conserved enough to have
originated in LUCA
• This perspective gives us new information about LUCA
• LUCA had enzymes to break down and extract energy from
nutrients, and some protein-making equipment
• LUCA lacked the enzymes for making and reading DNA
molecules
14
16. The root moves depending on whether you use
nucleic acids or protein!
• RNA sequence-based rooting of the
tree of life puts the root within the
Bacteria.
• usually derived from analyses of the
sequence of ancient gene paralogs e.g.,
ATPases, elongation factors
• Proteomic analyses for many proteins
puts the root of the tree of life within
the Archaea.
• Archaeal rooting has been observed for
phylogenetic analyses of tRNA, 5S, &
Rnase P
16
Bacteria
Archaea Eukaryotes
Bacteria Archaea Eukaryotes
17. The last universal common ancestor, aka LUCA
• 4 – 3.5 Ga (Ga = 109 years ago)
• Almost certainly a dispersed
population of variable cells
• Features
• DNA, the universal code, and most genes
• Transcription and RNA polymerase
• RNAs of all kinds
• Translation and translational machinery
• Most proteins and metabolisms
• Membrane and cellular structure
17
Bacteria
Archaea Eukaryotes
Bacteria Archaea Eukaryotes
LUCA
also LUCA !
18. So you’re making a phylogenetic tree…
• Assume you have chosen which species to analyze
• (1) Decide which gene to use …
• Ribosomal RNA genes
• A concatenaZon of single copy housekeeping genes
18
19. SSU ribosomal RNA
gene is a common
phylogenetic marker
+ Short, only 1500 base pairs
+ InformaZon-dense because it
is a non-coding, structural RNA
+ EssenZal for life so probably
not horizontally transferred
- MulZple copies per genome
- Cannot resolve close
relaZonships
19
Xie, Tian, Qin, Bu, 2008
21. Sensitivity and correlation of hypervariable regions
in 16S rRNA genes in phylogenetic analysis
• Distance between trees based
on sub-regions (V2 through
V8) and trees based on all the
sub-regions (VT)
• Sequence analyses including
V4 are favored because of this
21
Yang, Wang, Qian. BMC Bioinformatics, 2016
22. So you’re making a phylogene;c tree…
• (2) Align the gene sequences
22
23. So you’re making a phylogene;c tree…
• (2) Align the gene sequences
• We want evolutionary distance but it cannot be directly
measured, so it must be estimated
• Each vertical column in the alignment is a “trait” in
calculating the distance matrix
• Distance matrix is based on observed (measurable)
differences, but we assume parsimony
• There can be more than one evolutionary change at a single position
(e.g., A à G à U)
• Positions can change and change back (A à G à A)
23
24. So you’re making a phylogenetic tree…
• (3) Make an evolutionary distance matrix based on sequence
similarity, using Jukes-Cantor Method.
24
25. So you’re making a phylogenetic tree…
• Jukes Cantor method relates sequence similarity to
evolutionary distance
• If all sequences are the same, distance is zero
• Distances increase as sequence similarity decreases, which
means that one or two bases difference does not change
the distance much
• The lowest sequence similarity is about 0.25 because all
sequences are about 25% similar by chance; there are 4
bases in the genetic code so the chance that one base will
match another is 1 in 4
25
26. So you’re making a phylogenetic tree…
• (4) Perform phylogeneZc analysis.
• This is an example of the neighbor
joining method
26
Distance Matrix (%)
27. So you’re making a phylogene;c tree…
• How can you determine the branch lengths?
• In other words, you need to place the node “u”, which defines
a common ancestor
• You know how far apart a & b are from each other
• You know how far apart a is from something else, say c, so
measure b from c and you can estimate where node u should
be
• (5, optional) Create a visualization of the tree.
• Let’s look at some nice trees …
27
28. So you’re making a phylogene;c tree…
• (4) Perform phylogenetic analysis.
28
Yang & Rannala, Nat Rev Gen, 2012
29. So you’re making a phylogenetic tree…
• (4) Perform phylogeneZc analysis.
29
Yang & Rannala, Nat Rev Gen, 2012
30. Some nice trees: Metatranscriptomic reconstruc1on reveals
RNA viruses with the poten1al to shape carbon cycling in soil
30
Starr et al., 2019
31. A nice tree: bacterial
isolates in the our lab
culture collection
The colored branches are unique
for each taxonomy Family, and the
colored labels refer to strains that
belong to the same Genus.
The outer blue/red indicates if
each strain is from the heated or
control plots.
And the stars mean we have a
genome sequenced.
Choudoir, unpublished
32. So you’re
making a
phylogenetic
tree…
• There are many
(free) programs
to make trees…
https://evolution.genetics.
washington.edu/phylip/soft
ware.html
32
Yang & Rannala, Nat Rev Gen, 2012
33. Tree Construc;on Complexi;es
1. Choice of substitution model
2. GC bias
3. Choice of tree-making algorithm
4. Long-branch attraction
5. Bootstrapping
33
34. Choice of subs;tu;on model
• Pairwise sequence distances are calculated assuming a Markov chain model of
nucleotide substitution. Several commonly used models are illustrated in FIG. 1.
34
Yang & Rannala, Nature Reviews GeneYcs, 2012
35. 35
“GC bias”
• The more GC-rich a
region is, the higher the
recombination rates.
• That means that GC-rich
regions, or GC-rich
genomes, evolve faster
naturally.
• Including High GC gram
positives (like
Actinobacteria) in the
same tree as Low GC
gram positives (like
Firmicutes) can be
misleading.
36. Choice of tree algorithm can affect tree structure
• Neighbor-joining starts with a radial tree and joins
neighbors
• Parsimony makes a bunch of trees and find the one
that is the most simple, usually based on the fewest
mutaWons
• Maximum likelihood trees are based on probability
• the best & most computaZonally intensive
• Bayesian inference starts with random tree structure
& random parameters, then iterates unWl an
“opWmal” tree is found
36
37. Long-branch attraction
• Very long branches can someZmes cluster arZficially
• Usually due to bad sequence, poor alignment, or not enough Zps
• The erroneous new phylogeny implies a common ancestor and
can result in different rates of evoluZon
37
39. Long-branch attraction in theory and in practice
• Panels a and b show the four-species case by Felsenstein. If the correct tree (T in a) has
two long branches separated by a short internal branch, parsimony (as well as model-
based methods such as likelihood and Bayesian methods under simplistic models) tends
to recover a wrong tree (T2 in b), in which the two long branches are grouped together.
• Panels c and d show a similar phenomenon in a real data set, concerning the phylogeny
of seed plants. The Gnetales is a morphologically and ecologically diverse group of
Gymnosperms including three genera (Ephedra, Gnetum and Welwitschia), but its
phylogenetic position has been controversial.
• Maximum likelihood analysis of 56 chloroplast proteins produced the GneCup tree (d), in
which the Gnetales are grouped with Cupressophyta, apparently owing to a long-branch
attraction artefact.
• However, the Gnepine tree (c), in which the Gnetales joins the Pinaceae, was inferred by
excluding the fastest-evolving 18 proteins as well as three proteins (namely, psbC, rpl2
and rps7) that had experienced many parallel substitutions between the Cryptomeria
branch and the branch ancestral to the Gnetales. The Gnepine tree (c) is also supported
by two proteins from the nuclear genome and appears to be the correct tree.
• Branch lengths and bootstrap proportions are all calculated using RAxML.
39
Yang & Rannala, Nature Reviews Genetics, 2012
40. Bootstrapping
• Random sampling with
replacement to create new
trees
• A measure of confidence in
your sequence alignment
• Numbers are from 0-100, with
100 being perfect confidence
40
42. What is a species?
The following terms represent similar concepts and are sometimes used
interchangeably.
• Species = related organisms that share common characteristics and are capable of
interbreeding
• Taxa = a group of one or more populations of an organism, usually with a name and rank,
and seen by taxonomists to form a unit
• Operational taxonomic unit = Usually defined as the number of distinct 16S ribosomal
RNA sequences (or distinct phylogenetic marker genes or concatenations) at a certain
cut-off level of sequence diversity.
• Lineage = temporal series of populations, organisms, cells, or genes connected by a
continuous line of descent from ancestor to descendant, determined by the techniques
of molecular systematics.
• Strain = a genetic variant, a subtype or a culture within a biological species
42
43. What is a species?
43
The species concept in microbes is hotly debated.
• ‘‘A species could be described as a monophyleZc and genomically coherent
cluster of individual organisms that show a high degree of overall similarity in
many independent characterisZcs, and is diagnosable by a discriminaZve
phenotypic property.’’ (ReF. 9)
• ‘‘Species are considered to be an irreducible cluster of organisms diagnosably
different from other such clusters and within which there is a parental palern of
ancestry and descent.’’ (ReF. 82)
• ‘‘A species is a group of individuals where the observed lateral gene transfer
within the group is much greater than the transfer between groups.’’ (ReF. 83)
• ‘‘Microbes ... do not form natural clusters to which the term “species” can be
universally and sensibly applied.’’ (ReF. 84)
• ‘‘Species are (segments of) metapopulaZon lineages.’’ (ReF. 7)
Achtman & Wagner, Nat. Rev. Micro. 2008
44. 44
Achtman & Wagner, Nat. Rev. Micro. 2008
Species definition should be guided by a method-free
species concept based on cohesive evolutionary forces
45. Species defini7ons
• Five types of ecotype models have been described in detail. E1 and E2 represent ecotypes; G1 and G2
represent genotypes. Colours reflect genetic ancestry. Solid lines indicate extant lineages that exist today,
whereas dotted lines indicate extinct lineages that have disappeared owing to overgrowth during episodes
of periodic selection.
45
Achtman & Wagner, Nat. Rev. Micro. 2008
46. Species definitions
• …
46
Achtman & Wagner, Nat. Rev. Micro. 2008
Salmonella enterica subsp.
enterica serovar Typhi
Yersinia pestis Neisseria meningitidis
serogroup A subgroup III
47. Opera;onal species defini;ons
• pairwise DNA re-association values are ≥70% in DNA–DNA
hybridization experiments under standardized conditions and their
∆Tm (melting temperature) is ≤5°C
• 16S ribosomal RNAs (rRNAs) that are ≤98.7% identical are always
members of different species
• strong differences in rRNA correlate with <70% DNA–DNA similarity
• distinct species have been occasionally described with 16S rRNAs that are
>98.7% identical
• multilocus sequence analysis (MLSA) based on multiple (typically 6–8)
protein-coding core genes
• average nucleotide identity (ANI) of all orthologous genes
• …
47
49. NCBI BLAST 16S ribosomal RNA genes
• The Basic Local
Alignment Search Tool (BLAST)
finds regions of local similarity
between sequences.
• Default database is ‘nr/nt’, the
non-redundant nucleotide
collection
• Update date: 2021/08/01
• Number of sequences:
72,191,653
• For phylogeny & taxonomy, we
want to use the ribosomal RNA
(rRNA) intergenic transcribed
spacer (ITS) database
• 21,856 sequences
49
50. What if we cannot detect the usual phylogenetic
marker genes?
• Inferring phylogeny for genomes newly discovered from
metagenomes is useful for identification (aka genotyping)
• 16S ribosomal RNA genes are the “gold standard,” but sometimes
resist assembly due to high degrees of sequence similarity across
lineages
• Any shared genomic trait is a candidate for a phylogenetic marker
• Single copy marker genes
50
51. Single copy marker genes
• ezTree is a program that can extract single
copy genes for phylogeneZc analysis
51
Wu, BMC Genomics. 2018
52. Taxonomy versus phylogeny
• Taxonomy bins organization based on classified levels
• Linnaean classification is still used
• 97% identity of the 16S rRNA gene or greater are the same species
• 95% identity of the 16S rRNA gene or greater are the same genus
• THERE ARE MANY EXCEPTIONS!
• Linnaean classification
• Kingdom
• Phylum
• Class
• Order
• Family
• Genus
• Species
52
53. Linnaeus and Race
• Linnaeus’ work forms one of the 18th-century roots of modern scientific racism.
• Linnaeus was the first naturalist to classify man as an animal in Systema naturae in 1735
• ’man’ was divided into four ”varieties” (he did not use the word ”race”)
• based on the then known four continents of the world: Europe, America, Asia and Africa
• By the 10th edition, he expanded this idea to add the four ‘humours’ or temperaments, as
well as a hierarchy of the ‘varieties’
53
https://www.linnean.org/learning/who-was-linnaeus/linnaeus-and-race
55. Genome Taxonomy Database (GTDB)
• Phylogenomic classificaWon based on a set of conserved proteins
55
56. Insert Genome into Species Tree
• species tree using a set of 49 core, universal genes defined by COG
(Clusters of Orthologous Groups) gene families
• COGs domains used in the estimate of relatedness are listed on the
website. For example:
• GTPase, tRNA synthetases, Ribosomal proteins, and other proteins involved in
Translation, ribosomal structure and biogenesis
• Nucleotide transport and metabolism
• 3-phosphoglycerate kinase [Carbohydrate transport and metabolism]
56
57. Lecture Learning Goals
• Define phylogeny, and describe what a phylogenetic tree can reveal
about the taxa that they model.
• Explain how phylogenetic methods can allow us to make inferences
about groups of organisms and ancestors
• Describe how to construct a phylogenetic tree, and the complexities
that create mistakes.
• Contrast the different phylogenetic marker genes or concatenations
of genes that are available depending on the sequencing technology.
• Define the species concept for microbes.
• Make a phylogenetic tree.
57