3. Table of content
■ Phylogenetics
■ Evolution of Bioinformatics tools
■ Phylogenetic tree
■ Terms use to describe a tree
■ Types of phylogenetic tree
■ Methods for constructing phylogenetic tree
■ Phylogenetic tree validation
■ Multiple Sequence Alignment
■ Practical section
5. Description
■ In biology, phylogenetics (Greek:– phylé, phylon = tribe, clan, race + genetikós = origin,
source, birth) is a part of systematics that addresses the inference of
the evolutionary history and relationships among or within groups
of organisms (e.g. species, or more inclusive taxa).
phylon = tribe,
clan, race
genetikós =
origin, source,
birth
Phylogenetics the
inference of
the evolutionary
history and
relationships
6. Taxonomy
■ Taxonomy is the identification, naming
and classification of organisms.
Classifications are now usually based on
phylogenetic data, and many systematics
contend that only monophyletic taxa
should be recognized as named groups.
Figure represents the taxonomy of one of the example
known as homo sepians.
7. Continue…
■ Brief History:-
The term "phylogeny" derives from the
German Phylogenie, introduced by Haeckel in
1866, and the Darwinian approach to
classification became known as the "phyletic"
approach.
4.1. 1858 Heinrich Georg Bronn
Paleontologist Heinrich Georg Bronn (1800–
1862) published a hypothetical tree to
illustrating the paleontological "arrival" of new,
similar species following the extinction of an
older species
Branching tree diagram from
Heinrich Georg Bronn's work (1858)
Figure represents Phylogenetic
tree suggested by Haeckel
(1866).
8. Evolution
■ Evolution is the change in heritable
traits of biological organisms over generations
due to natural selection, mutation, gene flow,
and genetic drift. Also known as descent with
modification. Over time these evolutionary
processes lead to formation of new species
(speciation), changes within lineages
(anagenesis), and loss of species (extinction).
Figure A and B diagram showing the
relationships between various groups of
organisms and concept of evolution.
10. Evolution of Bioinformatics tools
■ Bioinformatics experts have developed a large collection of tools to make sense of
the rapidly growing data related to molecular biology.
Figure represents the data storage to computer with the evolution of
Bioinformatics tools.
12. Introduction
■ Computational phylogenetics is the application of
computational algorithms, methods, and
programs to phylogenetic analyses. The goal is to
assemble a phylogenetic tree representing a
hypothesis about the evolutionary ancestry of a
set of genes, species, or other taxa.
Figure The root of the tree of life
13. Computational
phylogenetics:
■ Computational phylogenetics is the application of
computational algorithms, methods, and
programs to phylogenetic analyses. The goal is to
assemble a phylogenetic tree representing a
hypothesis about the evolutionary ancestry of a
set of genes, species, or other taxa.
■ Example:-
For example, these techniques have been used to
explore the family tree of gene α-hemoglobin and
the relationships between specific genes.
Figure The gene tree for the gene α-
hemoglobin compared to the species
tree. Both match because the gene
evolved from common ancestors.
15. Molecular data such as DNA sequence for
genes and amino acid sequence for
proteins
■ Phylogenetic analysis using molecular data such as DNA sequence for genes and
amino acid sequence for proteins is very common not only in the field of
evolutionary biology but also in the wide fields of molecular biology.
17. List of Terms
– Clade
An ancestor (an organism, population, or species) and all of its
descendants.
1. Sister clade
One member of a pair of clades originating when a single lineage
splits into two. Sister clades thus share an exclusive common
ancestry and are mutually most closely related to one another in
terms of common ancestry.
– Ancestor
An entity from which another entity is descended
– Node
A point or vertex on a tree (in the sense of graph theory). On a
phylogenetic tree, a node is commonly used to represent (1) the
split of one lineage to form two or more lineages (internal node) or
the extinction of a lineage (terminal node) or the lineage at a
specified time, often the present (terminal node), or (2) a taxon,
whether ancestral (internal node) or descendant (internal node or
terminal node).
– Root
The root of the tree represents the ancestral lineage, and the tips
of the branches represent the descendants of that ancestor
– Leaf
Each leaf on a phylogenetic tree represents a taxon.
Figure represents terms used to
describe rooted and unrooted tree
19. Scaled and Unscaled
tree
Scaled branches -
branches will be different
lengths based on the
number of evolutionary
changes or distance.
Unscaled branches - all
branches in the tree are
the same length.
Figure represents the scaled and
unscaled branches trees.
20. Species tree and Gene
tree
Species Trees
“Species” Trees recover the genealogy of
taxa, individuals of a population, etc.
Species trees should contain sequences
from only orthologous genes.
Gene Trees
Gene trees represent the evolutionary
history of the genes included in the study.
Gene trees can provide evidence for gene
duplication events, as well as speciation
events.Sequences from different homologs
can be included in a gene tree; the
subsequent analyses should cluster
orthologs, thus demonstrating the
evolutionary history of the orthologs.
Figure represents the orthologs, paralogs
and homologs.
21. Rooted and Unrooted Trees
Rooted phylogenetic tree In
a rooted phylogenetic tree,
each node with descendants
represents the inferred most
recent common ancestors of
the descendants.
UnRooted phylogenetic
tree Unrooted trees illustrate
the relatedness of the leaf
nodes without making
assumptions about ancestry
Figure represents the Rooted versus Unrooted Tree.
22. Bifurcating and
multifurcating
■ Bifurcating tree A rooted bifurcating tree has exactly two
descendants arising from each interior node (that is, it
forms a binary tree), and an unrooted bifurcating tree
takes the form of an unrooted binary tree, a free
tree with exactly three neighbors at each internal node.
Multifurcating tree In contrast, a rooted multifurcating
tree may have more than two children at some nodes
and an unrooted multifurcating tree may have more
than three neighbors at some nodes.
Bifurcating versus multifurcating
23. Labeled versus unlabeled
■ Both rooted and unrooted trees can be either
labeled or unlabeled. A labeled tree has
specific values assigned to its leaves, while an
unlabeled tree, sometimes called a tree
shape, defines a topology only. Some
sequence-based trees built from a small
genomic locus, such as Phylotree, feature
internal nodes labeled with inferred ancestral
haplotypes
25. Special types
Dendrogram
A dendrogram is a general name for a tree, whether
phylogenetic or not, and hence also for the diagrammatic
representation of phylogenetic tree.
Cladogram
A cladogram only represents a branching pattern; i.e., its
branch lengths do not represent time or relative amount
of character change, and its internal nodes do not
represent ancestors.
Figure represents cladogram tree
Phylogram
A phylogram is a phylogenetic tree that has branch
lengths proportional to the amount of character change.
Figure represents Cladogram (I), Phylogram (II), Dendrogram
(III)
30. Distance-matrix methods
■ Distance-matrix methods of phylogenetic analysis explicitly rely on a measure of
"genetic distance" between the sequences being classified, and therefore they
require an MSA (multiple sequence alignment) as an input
■ Distance-matrix methods may produce either rooted or unrooted trees, depending
on the algorithm used to calculate them.
■ Distance matrix method
1.UPGMA
2.Transfromed distance method
3.Neighbor’s Relation method
4.Neighbor joining method
5. Fitch margoliash method
32. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
0.0
UPGMA:
Unweighted Pair-Group Method with Arithmetic mean
Unweighted – all pairwise distances contribute equally.
Pair-Group – groups are combined in pairs (dichotomies only).
Arithmetic mean – pairwise distances to each group (clade) are mean
distances to all members of that group.
(Ultrametric – assumes molecular clock)
Dr Richard Edwards ● University of Southampton ● r.edwards@southampton.ac.uk
33. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
0.5 + 0.5 = 1.0
1.0 / 2
1. Find the shortest pairwise distance.
2. Join two sequences/groups with shortest distance.
3. Depth of new branch = ½ shortest distance.
4. Tip-to-tip path length = shortest distance.
34. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
A BF C D E G
A
BF 18.50
C 27.00 31.50
D 8.00 17.50 26.00
E 33.00 35.50 41.00 31.00
G 13.00 12.50 29.00 14.00 28.00
B
F
A
C
D
E
G
(19 + 18) / 2 = 18.5
(31 + 32) / 2 = 31.5
(18 + 17) / 2 = 17.5
(36 + 35) / 2 = 35.5
(13 + 12) / 2 = 12.5
5. Calculate mean
pairwise distances with
other sequences in new
matrix.
35. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
A BF C D E G
A
BF 18.50
C 27.00 31.50
D 8.00 17.50 26.00
E 33.00 35.50 41.00 31.00
G 13.00 12.50 29.00 14.00 28.00
4.0 + 4.0 = 8.0
A D
4.0
4.0 4.0
8.0 / 2
6. Repeat cycle with new shortest distance.
36. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
A BF C D E G
A
BF 18.50
C 27.00 31.50
D 8.00 17.50 26.00
E 33.00 35.50 41.00 31.00
G 13.00 12.50 29.00 14.00 28.00
A D
4.0
4.0 4.0
37. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
AD BF C E G
AD
BF 18.00
C 26.50 31.50
E 32.00 35.50 41.00
G 13.50 12.50 29.00 28.00
A D
4.0
4.0 4.0
A
D
B
F
C
E
G
(19 + 18 + 18 + 17) / 4 = 18.0
(27 + 26) / 2 = 26.5
(33 + 31) / 2 = 32.0
(13 + 14) / 2 = 13.5
38. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
AD BF C E G
AD
BF 18.00
C 26.50 31.50
E 32.00 35.50 41.00
G 13.50 12.50 29.00 28.00
A D
4.0
4.0 4.0
G
6.25
0.5 + 5.75 + 6.25 = 12.5
5.75
6.25
12.5 / 2
39. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
AD BFG C E
AD
BFG 16.50
C 26.50 30.67
E 32.00 33.00 41.00
A D
4.0
4.0 4.0
G
6.25
5.75
6.25
A
D
C
E
B
F
G
(19 + 18 + 13 + 18 + 17 + 14) / 6 = 16.5
New distances are mean values for all possible
pairwise distances between groups.
40. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
AD BFG C E
AD
BFG 16.50
C 26.50 30.67
E 32.00 33.00 41.00
A D
4.0
4.0 4.0
G
6.25
5.75
6.25
A
D
C
E
B
F
G
(31 + 32 + 29) / 3 = 30.67
(36 + 35 + 28) / 3 = 33.0
(19 + 18 + 13 + 18 + 17 + 14) / 6 = 16.5
41. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
AD BFG C E
AD
BFG 16.50
C 26.50 30.67
E 32.00 33.00 41.00
A D
4.0
4.0 4.0
G
6.25
5.75
6.25
8.25
16.5 / 2
4.25
2.0
4.0 + 4.25 +
0.5 + 5.75 + 4.25 = 16.5
6.25 + 2.0 = 16.5
42. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
ADBFG C E
ADBFG
C 29.00
E 32.60 41.00
A D
4.0
4.0 4.0
G
6.25
5.75
6.25
8.25
4.25
2.0
(27 + 31 + 26 + 32 + 29) / 5 = 29.00
(33 + 36 + 31 + 35 + 28) / 5 = 32.60
43. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
ADBFG C E
ADBFG
C 29.00
E 32.60 41.00
A D
4.0
4.0 4.0
G
6.25
5.75
6.25
4.25
2.0
C
8.25
14.5
29.0 / 2
6.25
14.5
44. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
ADBFGC E
ADBFGC
E 34.00
A D
4.0
4.0 4.0
G
6.25
5.75
6.25
4.25
2.0
C
8.25
14.5
6.25
14.5
(33 + 36 + 41 +31 + 35 + 28) / 6 = 34.00
45. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
ADBFGC E
ADBFGC
E 34.00
A D
4.0
4.0 4.0
G
6.25
5.75
6.25
4.25
2.0
C
8.25
14.5
6.25
14.5
E
17.0
2.5
17.0
UPGMA assumes a molecular clock. The tree
is rooted with the final joining of clades. All
tip-to-tip distances via the root will have the
same total distance, equal to the final mean
distance.
46. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
A D
4.0
4.0 4.0
G
6.25
5.75
6.25
4.25
2.0
C
8.25
14.5
6.25
14.5
E
17.0
2.5
17.0
ADBFGC
E 34.00
ADBFG C
C 29.00
E 32.60 41.00AD BFG C
BFG 16.50
C 26.50 30.67
E 32.00 33.00 41.00
AD BF C E
BF 18.00
C 26.50 31.50
E 32.00 35.50 41.00
G 13.50 12.50 29.00 28.00
A BF C D E
BF 18.50
C 27.00 31.50
D 8.00 17.50 26.00
E 33.00 35.50 41.00 31.00
G 13.00 12.50 29.00 14.00 28.00
47. A B C D E F G
A
B 19.00
C 27.00 31.00
D 8.00 18.00 26.00
E 33.00 36.00 41.00 31.00
F 18.00 1.00 32.00 17.00 35.00
G 13.00 13.00 29.00 14.00 28.00 12.00
B F
0.5
0.00.5 0.5
A D
4.0
4.0 4.0
G
6.25
5.75
6.25
4.25
2.0
C
8.25
14.5
6.25
14.5
E
17.0
2.5
17.0
The source data for this worked example is a selection of
Cytochrome C distances from Table 3 of one of the seminal
phylogenetic papers: Fitch WM & Margoliash E (1967).
Construction of phylogenetic trees. Science 155:279-84.
http://www.ncbi.nlm.nih.gov/pubmed/5334057
Turtle
A
Man
B
Tuna
C
Chicken
D
Moth
E
Monkey
F
Dog
G
Turtle
Man 19
Tuna 27 31
Chicken 8 18 26
Moth 33 36 41 31
Monkey 18 1 32 17 35
Dog 13 13 29 14 28 12
48. Turtle
A
Man
B
Tuna
C
Chicken
D
Moth
E
Monkey
F
Dog
G
Turtle
Man 19
Tuna 27 31
Chicken 8 18 26
Moth 33 36 41 31
Monkey 18 1 32 17 35
Dog 13 13 29 14 28 12
0.5
0.0
4.0
6.25
8.25
14.5
17.0
Man MonkeyTurtle Chicken Dog Tuna Moth
Primates
MammalsReptilia
Vertebrates
Amniota
The UPGMA tree based on
this Cytochrome C data
supports the known
evolutionary relationships of
these organisms.
50. Bootstrapping:-
■ Bootstrapping is any test or metric that uses random
sampling with replacement, and falls under the broader
class of resampling methods. Bootstrapping assigns
measures of accuracy (bias, variance, confidence
intervals, prediction error, etc.) to sample
estimates. This technique allows estimation of the
sampling distribution of almost any statistic using
random sampling methods.
■ Bootstrapping and jackknifing are statistical methods to
evaluate and distinguish the confidence of partial
hypotheses (“branch support”) that are contained in a
phylogenetic tree and have become a standard in
molecular phylogenetic analyses.
52. Multiple sequence alignment (MSA)
■ A multiple sequence alignment (MSA) is a sequence alignment of three or
more biological sequences, generally protein, DNA, or RNA. In many cases, the
input set of query sequences are assumed to have an evolutionary relationship by
which they share a linkage and are descended from a common ancestor.
53. Workflow
1. Sequence retrieval
2. Download 18 pqqc(Pyrroloquinoline
quinone biosynthesis gene pqqC) gene
sequences from NCBI.
3. Do Multiple sequence alignment
4. Draw phylogenetic tree
5. Validate by bootstrapping
6. Interpret the results and save image
57. Introduction ■ ClustalW: Clustal is a series of widely
used computer programs used
in Bioinformatics for multiple sequence
alignment. The third generation, released in
1994, greatly improved upon the previous
versions. It improved upon the progressive
alignment algorithm in various ways, including
allowing individual sequences to be weighted
down or up according to similarity or
divergence respectively in a partial alignment
■ Access:-
ClustalW can access from both NCBI(National
Center for biotechnology) and EMBL(European
Management Biology Laborataory)
CLUSTALW
NCBI
EMBL
59. Open ClustalW through website. When we open
this two different types of distribution
60. Important information of Homepage
In which form you
need an output
Choose according
to need but slow
and accurate is
recommended
The sequence of
interest is in DNA or
Protein
Choose the
file or paste
to execute
Click Directly on
Execute
92. ■ The inference of phylogenies with computational
methods has many important applications in medical
and biological research, such as drug discovery and
conservation biology
■ A result published by Korber et al. that times the
evolution of the HIV-1 virus, demonstrates that ML
techniques can be effective in solving biological
problems.
93. ■ Phylogenetic trees have already witnessed
applications in numerous practical domains
■ Due to the rapid growth of available sequence
data over recent years and the constant
improvement of multiple alignment methods, it
has now become feasible to compute very large
trees which comprise more than 1,000
organisms
94. ■ Cancer research is considered one of
the most significant areas in the
medical community
95. ■ Phylogenetic can capture important
mutational events among different
cancer types; a network approach can
also capture tumour similarities.
■ Also for generating gene interaction
networks.