Procedure1) Decide what sequences to examine2) Determine the evolutionary distancesbetween the sequences and build a distancematrix3) Phylogenetic tree construction
(1) Decide what sequences to examine• Choose homologous sequences in differentspecies.o Homologous sequences are, by definition, thosesequences that have a common evolutionaryorigin.o Homology is not just similarity.
(2) Determine the evolutionary distances andbuild a distance matrix• For molecular data, evolutionary distances canbe the observed number of nucleotidedifferences between the pairs of species.• Distance matrix: simply a table showing theevolutionary distances between all pairs ofsequences.
(2) Determine the evolutionary distances andbuild a distance matrix – a simple example:A. A G G C C A T G A A T T A A G A A T A AB. A G C C C A T G G A T A A A G A G T A AC. A G G A C A T G A A T T A A G A A T A AD. A A G C C A A G A A T T A C G G A A A AE. A G G C T T T C A A G T A A A A A T G AIn this example, theevolutionary distance isexpressed as the number ofnucleotide differences foreach sequence pair. Forexample, the distancebetween taxa B and C is 5.Distance MatrixA B C D EA 0B 4 0C 1 5 0D 5 9 6 0E 6 10 7 11 0
(3) Phylogenetic tree construction (using UPGMA)Repeat the following steps until all taxa are joined:a) Identify the shortest distance in the matrix.b) Join the two taxa identified as a clade. The tip-to-tipdistance between the joined taxa will equal theshortest distance, so the distance from each to theircommon ancestor will be ½ this distance.c) Calculate the average distance between this cladeand each remaining taxa to form a new matrix.d) Identify the shortest distance in the new matrix.e) Repeat steps (b) to (d).
UPMGA (Michener & Sokal 1957) The shortest distance, between A and C, is 1. Join A and C. The tip-to-tip path length is 1, so thedistance from the node to each tip = ½ × 1 = 0.50.A C0.50 0.50Distance MatrixA B C D EA 0B 4 0C 1 5 0D 5 9 6 0E 6 10 7 11 0
Average distance between B and AC =(4 + 5) / 2 = 4.50 Average distance between D and AC =(5 + 6) / 2 = 5.50 Average distance between E and AC =(6 + 7) / 2 = 6.50ACBDEDistance MatrixA B C D EA 0B 4 0C 1 5 0D 5 9 6 0E 6 10 7 11 0
AC B2.25 2.25A C B0.52.251.75 The shortest distance, between B andAC, is 4.50. Join B and AC. The tip-to-tip pathlength is 4.50, so the distance from thenode to each tip = ½ × 4.50 = 2.25.AC B D EAC 0B 4.5 0D 5.5 9 0E 6.5 10 11 0
Average distance between D andACB = (5 + 6 + 9) / 3 = 6.67 Average distance between E andACB = (6+ 7 + 10) / 3 = 7.67ACBDEDistance MatrixA B C D EA 0B 4 0C 1 5 0D 5 9 6 0E 6 10 7 11 0
ACB D3.33 3.33A C B D0.501.75 3.331.08 The shortest distance, between Dand ACB, is 6.67. Join D and ACB. The tip-to-tip pathlength is 6.67, so the distance fromthe node to each tip = ½ × 6.67= 3.33.ACB D EACB 0D 6.67 0E 7.67 11 0
Distance MatrixA B C D EA 0B 4 0C 1 5 0D 5 9 6 0E 6 10 7 11 0Average distance between E andACBD = ( 6 + 7 + 10 + 11) / 4 = 8.5ACBDE
Join E and ACBD. The tip-to-tippath length is 8.50, so thedistance from the node to eachtip = ½ × 8.50 = 4.25.All taxa are joined. Done!ACBD EACBD 0E 8.5 0ACBD E4.25 4.25A C B D E0.501.75 4.251.080.92
Weakness of UPGMA• UPGMA assumes:o A constant molecular clock (mutations accumulate at aconstant rate).o All leaves are in the same level.• For example, it doesn’t work in the following case:2341 1 4 32Correct tree UPGMA