2. Phylogenetic tree construction
2 methods
• Distance-based methods –
Examples : UPGMA, Neighbor joining, Fitch-Margoliash method, minimum evolution
• Character-based methods –
Input: Aligned sequences
Output: Phylogenetic tree
Examples : Parsimony , Maximum Likelihood
3. UPGMA
UPGMA : Unweighted Pair Group Method with Arithmetic Mean
Developed by Sokal and Michener in 1958.
It is a Sequential clustering method
Type of distance based method for Phylogenetic Tree construction
UPGMA is the simplest method for constructing trees.
4. Generates rooted trees
Generates ultra metric trees from a distance matrix
Uses a simplest algorithm
Input: Distance matrix containing pairwise statistical estimation of aligned
sequences
Output: Phylogenetic tree
5. • UPGMA starts with a matrix of pairwise distances.
• Each sample is denoted as a 'cluster'.
• Assigns all clusters to a star-like tree.
• The algorithm constructs a rooted tree that reflects the structure present in a
pairwise similarity matrix.
• At each step, the nearest two clusters are combined into a higher-level cluster.
• It assumes an ultra-metric tree in which the distances from the root to every branch
tip are equal.
UPGMAAlgorithm
6. Steps
Find the i and j with the smallest distance Dij.
Create a new group (ij) which has n(ij) = ni + nj members.
Connect i and j on the tree to a new node (ij).
Give the edges connecting i to (ij) and j to (ij) same length so that the depth of group
(ij) is Dij/2.
Compute the distance between the new group and all other groups except i and j by
using
𝐷 𝑖𝑗 , 𝑘 =
Dik +𝐷 𝑗𝑘
2
Delete columns and rows corresponding to i and j and add one for (ij). If there are
two or more groups left, go back to the first step
8. Advantages
simple algorithm
Fastest method
easy to compute by hand or a variety of software
Trees reflect phenotypic similarities by phylogenetic distances
Data can be arranged in random order prior to analysis
Rooted trees are generated that are easy to analyze
9. Disadvantages
It assumes the same evolutionary speed on all lineages
It frequently generates wrong tree topologies
Re-rooting is not allowed
Algorithm does not aim to reflect evolutionary descent
It assumes a randomized molecular clock.
10. Applications
• In ecology, it is one of the most popular methods for the classification of sampling units (such
as vegetation plots) on the basis of their pairwise
similarities in relevant descriptor variables (such as species composition).[3]
• In bioinformatics, UPGMA is used for the creation of phenetic trees (phenograms). UPGMA
was initially designed for use in protein
electrophoresis studies, but is currently most often used to produce guide trees for more sophi
sticated algorithms. This algorithm is for example
used in sequence alignment procedures, as it proposes one order in which the sequences will
be aligned. Indeed, the guide tree aims at grouping
the most similar sequences, regardless of their evolutionary rate or phylogenetic affinities, an
d that is exactly the goal of UPGMA.[4]
• In phylogenetics, UPGMA assumes a constant rate of evolution (molecular clock hypothesis),
and is not a wellregarded method for inferring
relationships unless this assumption has been tested and justified for the data set being used.
11. Example
1. Calculate the pairwise distance matrix
A B C D E F
A 0 1 3 6 7 10
B 1 0 3 6 7 10
C 3 3 0 5 6 9
D 6 6 5 0 1 7
E 7 7 6 1 0 8
F 10 10 9 7 8 0
12. 2. Group the 2 most closely related sequences
A B C D E F
A 0 1 3 6 7 10
B 1 0 3 6 7 10
C 3 3 0 5 6 9
D 6 6 5 0 1 7
E 7 7 6 1 0 8
F 10 10 9 7 8 0
A
B
0.5
0.5
13. 3. Recalculate the distance matrix and take the next smallest distance
A/B C D E F
A/B 0 3 6 7 10
C 3 0 5 6 9
D 6 5 0 1 7
E 7 6 1 0 8
F 10 9 7 8 0
A
B
0.5
0.5
D
E
0.5
0.5
14. 3. Recalculate the distance matrix and take the next smallest distance
A
B
0.5
0.5
D
E
0.5
0.5
A/B C D/E F
A/B 0 3 6.5 10
C 3 0 5.5 9
D/E 6.5 5.5 0 7.5
F 10 9 7.5 0
C
1
1.5
15. 3. Recalculate the distance matrix and take the next smallest distance
A
B
0.5
0.5
D
E
0.5
0.5
C
1
1.5
A/B/C D/E F
A/B/C 0 6 9.5
D/E 6 0 7.5
F 9.5 7.5 0
1.5
2.5
16. 3. Recalculate the distance matrix and take the next smallest distance
A
B
0.5
0.5
D
E
0.5
0.5
C
1
1.5
1.5
2.5
A/B/C/D/E F
A/B/C/D/E 0 8.5
F 8.5 0
F4.25
1.25