PROFESSOR JAYASHANKAR TELANGANA STATE
AGRICULTURAL UNIVERSITY
College of Agriculture, Rajendranagar, Hyderabad- 500030
Presented by,
Ajay Kumar Chandra
RAM/14-97
M.Sc. (Ag) Mol. Biology & Biotechnology
“The time will come, I believe, though I shall not live to see it, when
we shall have fairly true genealogical trees of each great kingdom of
Nature” - Charles Darwin
What is Molecular Phylogenetics?
Phylogenetics is the study of evolutionary relationships.
Snakes
Crocodiles Birds
Lizards
Rodents
Primates
Marsupials
Example - relationship among species crocodiles
birds
lizards
snakes
rodents
primates
marsupials
• Systematics is an analytical approach to understanding the diversity
and relationships of organisms, both present-day and extinct.
• Systematists use morphological, biochemical, and molecular
comparisons to infer evolutionary relationships.
A Brief History of Molecular Phylogenetics
1900s
Immunochemical studies: cross-reactions stronger for closely related
organisms
Nuttall (1902) - apes are closest relatives to humans
1960s - 1970s
Protein sequencing methods, electrophoresis, DNA hybridization and
PCR contributed to a boom in molecular phylogeny
late 1970s to present
Discoveries using molecular phylogeny:
- Endosymbiosis - Margulis, 1978
- Divergence of phyla and kingdom - Woese, 1987
- Many Tree of Life projects completed or underway.
Endosymbiosis: Origin of the Mitochondrion
and Chloroplast
Mitochondria and chloroplasts are derived from the -
purple bacteria and the cyanobacteria respectively, via
separate endosymbiotic events.
Eukaryotes
Archaea
Mitochondria
-Purple Bacteria
Other bacteria
Cyanobacteria
Chloroplasts
Root
Universal Tree of Life
• Using rRNA sequences: Woese, 1987
• Able to study the relationships of uncultivated organisms, obtained from a hot
spring in Yellowstone National Park.
• Phylogenetic (cladistic) classification reflects evolutionary history.
• The only objective form of classification
– organisms share a true evolutionary history regardless of our
arbitrary decisions of how to classify them.
Phylogeny and classification
Class
Order
Order
Family
Family
Family
Genus
Genus
Genus
Genus
Genus
Genus
Family
Genus
Genus
PhylogenyClassification
Phylogenetic concepts
- Relationships are illustrated by a Phylogenetic tree / dendrogram
- The branching pattern is call the tree’s topology
- Trees can be represented in several forms:
Slanted cladogramRectangular cladogram
Tree Terminology
Terminal nodes
Internal nodes A
B
C
D
F
E
Operational taxonomic units (OTU) / Taxa
Sisters
Root
Branches
Polytomy
Phylogenetic trees
Phylogentic trees: (A) Rooted; (B) Unrooted
These trees show five different evolutionary relationships among the
taxa!
Rooted tree 1
B
A
C
D
Rooted tree 2
A
B
C
D
Rooted tree 3
A
B
C
D
Rooted tree 4
C
D
A
B
Rooted tree 5
D
C
A
B
Rooting and Tree Interpretation
Bacteria Archaebacteria
Oak
Fruit fly
Chicken
Human
bacteria
archaea
oak
fruit fly
chicken
human
Bacteria
Archaebacteria
Oak
Fruit fly
Chicken
Human
Bones
Cell nuclei
+ Cell nuclei
+ Bones
How Many Trees?
2N - 2(2N - 3)!
2N - 2 (N - 2)!
2N - 3(2N - 5)!
2N - 3 (N - 3)!
N (N - 1)
2
N
584.95  1038578.69  103643530
1834,459,425172,027,0254510
109459105156
8105715105
6155364
433133
Branches
/treeTrees
Branches
/treeTrees
Pair wise
distancesSequences
Rooted treesUnrooted trees
Scaled vs. Un-scaled trees
Scaled trees: Branch lengths are proportional to the number of
nucleotide/amino acid changes that occurred on that branch.
Unscaled trees: Branch lengths are not proportional to the number of
nucleotide/amino acid changes (usually used to illustrate evolutionary
relationships only).
CenozoicMesozoicPaleozoic
65.5251542
Neoproterozoic
Millionsof
yearsago
• Construction of Phylogenetic
trees based on shared
characteristics.
• Branching of a Phylogenetic,
representing timing of divergences.
• Length of a branch in a
cladogram reflects the number of
genetic changes.
Phylogenetic Trees and Timing
Organism’s evolutionary history is documented
in its genome
• Comparing nucleic acids to infer relatedness is a valuable tool for
tracing organisms’ evolutionary history.
• Gene duplication increases the number of genes in the genome,
providing more opportunities for evolutionary changes.
AAGACTT
TGGACTTAAGGCCT
AGGGCAT TAGCCCT AGCACTT
AAGGCCT TGGACTT
AGCGCTTAGCACAATAGACTTTAGCCCAAGGGCAT
-3 mil yrs
-2 mil yrs
-1 mil yrs
Today
Molecular Phylogenetic analysis may be described
in four stages:
1. Selection of sequences for analysis
2. Multiple sequence alignment
3. Tree building
4. Tree evaluation
Four stages of Phylogenetic analysis
• Sequence alignments can provide clues to evolutionary change by
examining the effect of mutations occurring over time in species with
a common ancestor.
Examples: sequences for analysis
• DNA sequence changes in the Cytochrome-c gene reflect
evolutionary distance.
• Alignment of a portion of the casein gene
• Calculating the substitution rate (r) for two sequences that have
changed over time.
• Mathematical and/or statistical methods for inferring the divergence order of
taxa, as well as the lengths of the branches that connect them.
• There are many Phylogenetic methods available today, Most can be classified as
follows:
COMPUTATIONAL METHOD
Clustering algorithmOptimality criterion
DATATYPE
CharactersDistances
PARSIMONY
MAXIMUM LIKELIHOOD
UPGMA
NEIGHBOR-JOINING
MINIMUM EVOLUTION
LEAST SQUARES
Molecular Phylogenetic tree building
methods
Based on lectures by C-B Stewart, and by
Tal Pupko
Types of data used in phylogenetic inference:
Character-based methods: Use the aligned characters, such as DNA or protein
sequences, directly during tree inference.
Taxa Characters
Species A ATGGCTATTCTTATAGTACG
Species B ATCGCTAGTCTTATATTACA
Species C TTCACTAGACCTGTGGTCCA
Species D TTGACCAGACCTGTGGTCCG
Species E TTGACCAGTTCTCTAGTTCG
Distance-based methods: Transform the sequence data into pairwise distances
(dissimilarities), and then use the matrix during tree building.
A B C D E
Species A ---- 0.20 0.50 0.45 0.40
Species B 0.23 ---- 0.40 0.55 0.50
Species C 0.87 0.59 ---- 0.15 0.40
Species D 0.73 1.12 0.17 ---- 0.25
Species E 0.59 0.89 0.61 0.31 ----
Example 1:
Uncorrected
“p” distance
(=observed percent
sequence difference)
Example 2: Kimura 2-parameter distance
(estimate of the true number of substitutions between taxa)
Hair
Amniotic (shelled) egg
Four walking legs
Hinged jaws
Vertebral column
(backbone)
Character table
CHARACTERS
TAXA
Lancelet
(outgroup)
Lamprey
Tuna
Salamander
Turtle
Leopard
Turtle Leopard
Hair
Amniotic egg
Four walking legs
Hinged jaws
Vertebral column
Salamander
Tuna
Lamprey
Lancelet (outgroup)
Cladogram
Reality: Not all sites are free to change, the same sites
change multiple times.
Distance Matrix Methods
(matrix calculation)
Step 1: compute the pairwise distances of all the proteins.
Get ready to put the numbers 1-5 at the bottom of your
new tree.
1 2
3
4
5
UPGMA - Unweighted Pair Group Method using
Arithmetic mean.
Distance Matrix Methods
(tree construction)
Tree-building methods: UPGMA
Step 2: Find the two proteins with the smallest pair wise
distance. Cluster them.
1 2
3
4
5
1 2
6
Tree-building methods: UPGMA
Step 3: Do it again. Find the next two proteins with the
smallest pairwise distance. Cluster them.
1 2
3
4
5
1 2
6
4 5
7
Tree-building methods: UPGMA
Step 4: Keep going. Cluster.
1 2
3
4
5
1 2
6
4 5
7
3
8
Tree-building methods: UPGMA
Step 4: Last cluster! This is your tree.
1 2
3
4
5
1 2
6
4 5
7
3
8
9
List of Phylogenetics software
Name Description Methods
EzEditor
It allows manipulation of both DNA
and protein sequence alignments for
phylogenetic analysis.
Neighbour Joining
BAli-Phy
Simultaneous Bayesian inference of
alignment and phylogeny
Bayesian inference, alignment as
well as tree search.
ClustalW
Progressive multiple sequence
alignment
Distance matrix/nearest neighbour
BayesTraits
Analyses trait evolution among groups
of species
Trait analysis
BioNumerics
storage and analysis of all types of
biological data,
Neighbour-joining, maximum
parsimony, UPGMA, maximum
likelihood, distance matrix
methods
fastDNAml
Optimized maximum likelihood
(nucleotides only)
Maximum likelihood
Geneious
Geneious provides genome and
proteome research tools
Neighbour-joining, UPGMA,
MrBayes plugin, PHYML plugin,
RAxML plugin, FastTree plugin,
• Forensics:
Did a patient’s HIV infection result from an invasive dental
procedure performed by an HIV+ dentist?
Applications of Phylogenetics
• Conservation:
How much gene flow is there among local populations of
island foxes off the coast of California?
• Medicine:
What are the evolutionary relationships among the various
prion-related diseases?
To be continued…
Understanding and classifying the
diversity of life on Earth.
Testing evolutionary hypotheses:
- Trait evolution
- Coevolution
- Mode and pattern of speciation
- Correlated trait evolution
- Biogeography
- Geographic origins
- Age of different taxa
- Nature of molecular evolution
- Disease epidemiology
…And many more applications!
Why is phylogeny important?
Molecular phylogenetics

Molecular phylogenetics

  • 1.
    PROFESSOR JAYASHANKAR TELANGANASTATE AGRICULTURAL UNIVERSITY College of Agriculture, Rajendranagar, Hyderabad- 500030 Presented by, Ajay Kumar Chandra RAM/14-97 M.Sc. (Ag) Mol. Biology & Biotechnology
  • 2.
    “The time willcome, I believe, though I shall not live to see it, when we shall have fairly true genealogical trees of each great kingdom of Nature” - Charles Darwin
  • 3.
    What is MolecularPhylogenetics? Phylogenetics is the study of evolutionary relationships. Snakes Crocodiles Birds Lizards Rodents Primates Marsupials Example - relationship among species crocodiles birds lizards snakes rodents primates marsupials • Systematics is an analytical approach to understanding the diversity and relationships of organisms, both present-day and extinct. • Systematists use morphological, biochemical, and molecular comparisons to infer evolutionary relationships.
  • 4.
    A Brief Historyof Molecular Phylogenetics 1900s Immunochemical studies: cross-reactions stronger for closely related organisms Nuttall (1902) - apes are closest relatives to humans 1960s - 1970s Protein sequencing methods, electrophoresis, DNA hybridization and PCR contributed to a boom in molecular phylogeny late 1970s to present Discoveries using molecular phylogeny: - Endosymbiosis - Margulis, 1978 - Divergence of phyla and kingdom - Woese, 1987 - Many Tree of Life projects completed or underway.
  • 5.
    Endosymbiosis: Origin ofthe Mitochondrion and Chloroplast Mitochondria and chloroplasts are derived from the - purple bacteria and the cyanobacteria respectively, via separate endosymbiotic events. Eukaryotes Archaea Mitochondria -Purple Bacteria Other bacteria Cyanobacteria Chloroplasts Root
  • 6.
    Universal Tree ofLife • Using rRNA sequences: Woese, 1987 • Able to study the relationships of uncultivated organisms, obtained from a hot spring in Yellowstone National Park.
  • 7.
    • Phylogenetic (cladistic)classification reflects evolutionary history. • The only objective form of classification – organisms share a true evolutionary history regardless of our arbitrary decisions of how to classify them. Phylogeny and classification Class Order Order Family Family Family Genus Genus Genus Genus Genus Genus Family Genus Genus PhylogenyClassification
  • 8.
    Phylogenetic concepts - Relationshipsare illustrated by a Phylogenetic tree / dendrogram - The branching pattern is call the tree’s topology - Trees can be represented in several forms: Slanted cladogramRectangular cladogram
  • 9.
    Tree Terminology Terminal nodes Internalnodes A B C D F E Operational taxonomic units (OTU) / Taxa Sisters Root Branches Polytomy
  • 10.
    Phylogenetic trees Phylogentic trees:(A) Rooted; (B) Unrooted These trees show five different evolutionary relationships among the taxa! Rooted tree 1 B A C D Rooted tree 2 A B C D Rooted tree 3 A B C D Rooted tree 4 C D A B Rooted tree 5 D C A B
  • 11.
    Rooting and TreeInterpretation Bacteria Archaebacteria Oak Fruit fly Chicken Human bacteria archaea oak fruit fly chicken human Bacteria Archaebacteria Oak Fruit fly Chicken Human Bones Cell nuclei + Cell nuclei + Bones
  • 12.
    How Many Trees? 2N- 2(2N - 3)! 2N - 2 (N - 2)! 2N - 3(2N - 5)! 2N - 3 (N - 3)! N (N - 1) 2 N 584.95  1038578.69  103643530 1834,459,425172,027,0254510 109459105156 8105715105 6155364 433133 Branches /treeTrees Branches /treeTrees Pair wise distancesSequences Rooted treesUnrooted trees
  • 13.
    Scaled vs. Un-scaledtrees Scaled trees: Branch lengths are proportional to the number of nucleotide/amino acid changes that occurred on that branch. Unscaled trees: Branch lengths are not proportional to the number of nucleotide/amino acid changes (usually used to illustrate evolutionary relationships only).
  • 14.
    CenozoicMesozoicPaleozoic 65.5251542 Neoproterozoic Millionsof yearsago • Construction ofPhylogenetic trees based on shared characteristics. • Branching of a Phylogenetic, representing timing of divergences. • Length of a branch in a cladogram reflects the number of genetic changes. Phylogenetic Trees and Timing
  • 15.
    Organism’s evolutionary historyis documented in its genome • Comparing nucleic acids to infer relatedness is a valuable tool for tracing organisms’ evolutionary history. • Gene duplication increases the number of genes in the genome, providing more opportunities for evolutionary changes. AAGACTT TGGACTTAAGGCCT AGGGCAT TAGCCCT AGCACTT AAGGCCT TGGACTT AGCGCTTAGCACAATAGACTTTAGCCCAAGGGCAT -3 mil yrs -2 mil yrs -1 mil yrs Today
  • 16.
    Molecular Phylogenetic analysismay be described in four stages: 1. Selection of sequences for analysis 2. Multiple sequence alignment 3. Tree building 4. Tree evaluation Four stages of Phylogenetic analysis
  • 17.
    • Sequence alignmentscan provide clues to evolutionary change by examining the effect of mutations occurring over time in species with a common ancestor.
  • 18.
    Examples: sequences foranalysis • DNA sequence changes in the Cytochrome-c gene reflect evolutionary distance. • Alignment of a portion of the casein gene
  • 19.
    • Calculating thesubstitution rate (r) for two sequences that have changed over time.
  • 20.
    • Mathematical and/orstatistical methods for inferring the divergence order of taxa, as well as the lengths of the branches that connect them. • There are many Phylogenetic methods available today, Most can be classified as follows: COMPUTATIONAL METHOD Clustering algorithmOptimality criterion DATATYPE CharactersDistances PARSIMONY MAXIMUM LIKELIHOOD UPGMA NEIGHBOR-JOINING MINIMUM EVOLUTION LEAST SQUARES Molecular Phylogenetic tree building methods
  • 21.
    Based on lecturesby C-B Stewart, and by Tal Pupko Types of data used in phylogenetic inference: Character-based methods: Use the aligned characters, such as DNA or protein sequences, directly during tree inference. Taxa Characters Species A ATGGCTATTCTTATAGTACG Species B ATCGCTAGTCTTATATTACA Species C TTCACTAGACCTGTGGTCCA Species D TTGACCAGACCTGTGGTCCG Species E TTGACCAGTTCTCTAGTTCG Distance-based methods: Transform the sequence data into pairwise distances (dissimilarities), and then use the matrix during tree building. A B C D E Species A ---- 0.20 0.50 0.45 0.40 Species B 0.23 ---- 0.40 0.55 0.50 Species C 0.87 0.59 ---- 0.15 0.40 Species D 0.73 1.12 0.17 ---- 0.25 Species E 0.59 0.89 0.61 0.31 ---- Example 1: Uncorrected “p” distance (=observed percent sequence difference) Example 2: Kimura 2-parameter distance (estimate of the true number of substitutions between taxa)
  • 22.
    Hair Amniotic (shelled) egg Fourwalking legs Hinged jaws Vertebral column (backbone) Character table CHARACTERS TAXA Lancelet (outgroup) Lamprey Tuna Salamander Turtle Leopard Turtle Leopard Hair Amniotic egg Four walking legs Hinged jaws Vertebral column Salamander Tuna Lamprey Lancelet (outgroup) Cladogram
  • 23.
    Reality: Not allsites are free to change, the same sites change multiple times. Distance Matrix Methods (matrix calculation)
  • 24.
    Step 1: computethe pairwise distances of all the proteins. Get ready to put the numbers 1-5 at the bottom of your new tree. 1 2 3 4 5 UPGMA - Unweighted Pair Group Method using Arithmetic mean. Distance Matrix Methods (tree construction)
  • 25.
    Tree-building methods: UPGMA Step2: Find the two proteins with the smallest pair wise distance. Cluster them. 1 2 3 4 5 1 2 6
  • 26.
    Tree-building methods: UPGMA Step3: Do it again. Find the next two proteins with the smallest pairwise distance. Cluster them. 1 2 3 4 5 1 2 6 4 5 7
  • 27.
    Tree-building methods: UPGMA Step4: Keep going. Cluster. 1 2 3 4 5 1 2 6 4 5 7 3 8
  • 28.
    Tree-building methods: UPGMA Step4: Last cluster! This is your tree. 1 2 3 4 5 1 2 6 4 5 7 3 8 9
  • 31.
    List of Phylogeneticssoftware Name Description Methods EzEditor It allows manipulation of both DNA and protein sequence alignments for phylogenetic analysis. Neighbour Joining BAli-Phy Simultaneous Bayesian inference of alignment and phylogeny Bayesian inference, alignment as well as tree search. ClustalW Progressive multiple sequence alignment Distance matrix/nearest neighbour BayesTraits Analyses trait evolution among groups of species Trait analysis BioNumerics storage and analysis of all types of biological data, Neighbour-joining, maximum parsimony, UPGMA, maximum likelihood, distance matrix methods fastDNAml Optimized maximum likelihood (nucleotides only) Maximum likelihood Geneious Geneious provides genome and proteome research tools Neighbour-joining, UPGMA, MrBayes plugin, PHYML plugin, RAxML plugin, FastTree plugin,
  • 32.
    • Forensics: Did apatient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist? Applications of Phylogenetics • Conservation: How much gene flow is there among local populations of island foxes off the coast of California? • Medicine: What are the evolutionary relationships among the various prion-related diseases? To be continued…
  • 33.
    Understanding and classifyingthe diversity of life on Earth. Testing evolutionary hypotheses: - Trait evolution - Coevolution - Mode and pattern of speciation - Correlated trait evolution - Biogeography - Geographic origins - Age of different taxa - Nature of molecular evolution - Disease epidemiology …And many more applications! Why is phylogeny important?