PHYLOGENETIC TREE CONSTRUCTION
BY DISTANCE BASED METHOD
INTRODUCTION
 A phylogenetic tree also known as
a phylogeny is a diagram that depicts the
lines of evolutionary descent of
different species, organisms, or genes from a
common ancestor.
 Attempt to reconstruct evolutionary ancestors
 Estimate time of divergence from ancestor
 Can be used to solve a number of interesting
problems
 Forensics
• HIV virus mutates rapidly
 Predicting evolution of influenza viruses
 Predicting functions of uncharacterized genes -
ortholog detection
 Drug discovery
 Vaccine development
• Target inferred common ancestor
HOW TO CONSTRUCT A PHYLOGENETIC TREE
 Step1: Make a multiple alignment from base
alignment or amino acid sequence (by using
MUSCLE, BLAST, or other method)
 Step 2: Check the multiple alignment if it
reflects the evolutionary process.
 Step3: Choose what method we are going to
use and calculate the distance or use the
result depending on the method.
 Step 4: Verify the result statistically.
TYPES OF APPROACHES
 CHARACTER BASED APPROACH
It makes use of all known evolutionary
information, i.e. the individual substitutions
among the sequences, to determine the most
likely ancestral sequences.
 DISTANCE BASED APPROACH
Distance-matrix methods of phylogenetic
analysis explicitly rely on a measure of "genetic
distance" between the sequences being
classified and therefore they require an
MSA(multiple sequnce alignment) as an input.
 Distance-based methods must transform the
sequence data into a pairwise similarity
matrix for use during tree inference.
VARIOUS DISTANCE BASED METHODS
1. UPGMA
2. NJ(Neighbor Joining)
3. FM(Fitch-Margoliash)
4. Minimum evolution
UPGMA
• Stands for Unweighted pair group method
with arithmetic mean.
• Originally developed for numeric taxonomy in
1958 by Sokal and Michener.
• This method uses sequential clustering
algorithm.
 This method follows a clustering procedure:
(1) Assume that initially each species is a
cluster on its own.
(2) Join closest 2 clusters and recalculate
distance of the joint pair by taking the
average.
(3) Repeat this process until all species are
connected in a single cluster.
CONSTRUCTION OF PHYLOGENETIC TREE
DRAWBACK
• Strictly speaking, this algorithm is phenetic,
which does not aim to reflect evolutionary
descent.
• It assigns equal weight on the distance and
assumes a randomized molecular clock.
• WPGMA(Weighted Pair Group Method
with Arithmetic Mean)is a similar algorithm
but assigns different weight on the distances.
NEIGHBOUR JOINING METHOD
 Neighbor-joining methods apply general data
clustering techniques to sequence analysis
using genetic distance as a clustering metric.
 Developed in 1987 by Saitou and Nei.
 The simple neighbor-joining method produces
unrooted trees, but it does not assume a
constant rate of evolution (i.e., a molecular
clock) across lineages.
 It begins with an unresolved star-like tree .
 Each pair is evaluated for being joined and the
sum of all branches length is calculated of the
resultant tree.
 The pair that yields the smallest sum is
considered the closest neighbors and is thus
joined .
 A new branch is inserted between them and
the rest of the tree and the branch length is
recalculated.
 This process is repeated until only one
terminal is present.
DRAWBACKS
 But it produces only one tree and neglects other
possible trees, which might be as good as NJ
trees, if not significantly better.
 Moreover since errors in distance estimates are
exponentially larger for longer distances, under
some condition, this method will yield a biased
tree.
WEIGHTED NEIGHBOUR JOINING(WEIGHBOR)
 It is a new method proposed recently.
 The Weighbor criterion consists of two terms;
1. additivity term (of external branches)
2. positivity term (of internal branches), that
quantifies the implications of joining the
pair.
 Weighbor gives less weight to the longer
distances in the distance matrix and the
resulting trees are less sensitive to specific
biases than NJ and relatively immune to the
"long branches attraction/distraction"
drawbacks observed with other methods.
FITCH – MARGOLIASH METHOD
 Proposed in 1967
 Produces unrooted trees
 Criteria for fitting trees to distance matrices
 Uses a weighted least squares method for
clustering based on genetic distance.
 Closely related sequences are given more
weight in the tree construction process to
correct for the increased inaccuracy in
measuring distances between distantly related
sequences.
MINIMUM EVOLUTION
 First decribed by Kidd & Sgaramella – Zonta
in 1971, then earlier by Rzhetsky & Nei in
1992.
 Based on the assumption that the tree with
the smallest sum of branch length estimates
is most likely to be the true one.
 Unrooted metric trees
 In ME, the tree that minimizes the lengths of
the tree, which is the sum of the lengths of
the branches, is regarded as the estimate of
the phylogeny:
𝑆 =
𝑖=1
2𝑛−3
𝑣𝑖
where n is the number of taxa in the tree, vi
is the ith branch.
DRAWBACKS
 In principle all different tree topologies have
tobe investigated to find the minimum tree.
However, this is impossible in practice
because of the explosive increase in the
number of tree topologies.
 Slower than clustering methods.
 Information lot when characters transformed
to distances.
ADVANTAGES OF DISTANCE BASED
APPROACH
 Less sensitive to variations in evolutionary
rate than cluster analysis
 Fast
 Can handle many sequences at a time
 Produce a reasonable estimate of phylogeny
DISADVANTAGES OF DISTANCE BASED
APPROACH
 More sensitive than Parsimony or Maximum
Likelihood to systematic errors.
 The relationship between the individual
characters and the tree is lost in the process
of reducing characters to distances.
 Strength of the technique is dependent on
accuracy of the distance estimate, and thus
dependent on the model used to obtain the
distance matrix.
THANK YOU

Distance based method

  • 2.
    PHYLOGENETIC TREE CONSTRUCTION BYDISTANCE BASED METHOD
  • 3.
    INTRODUCTION  A phylogenetictree also known as a phylogeny is a diagram that depicts the lines of evolutionary descent of different species, organisms, or genes from a common ancestor.  Attempt to reconstruct evolutionary ancestors  Estimate time of divergence from ancestor
  • 4.
     Can beused to solve a number of interesting problems  Forensics • HIV virus mutates rapidly  Predicting evolution of influenza viruses  Predicting functions of uncharacterized genes - ortholog detection  Drug discovery  Vaccine development • Target inferred common ancestor
  • 5.
    HOW TO CONSTRUCTA PHYLOGENETIC TREE  Step1: Make a multiple alignment from base alignment or amino acid sequence (by using MUSCLE, BLAST, or other method)
  • 6.
     Step 2:Check the multiple alignment if it reflects the evolutionary process.  Step3: Choose what method we are going to use and calculate the distance or use the result depending on the method.  Step 4: Verify the result statistically.
  • 7.
    TYPES OF APPROACHES CHARACTER BASED APPROACH It makes use of all known evolutionary information, i.e. the individual substitutions among the sequences, to determine the most likely ancestral sequences.
  • 8.
     DISTANCE BASEDAPPROACH Distance-matrix methods of phylogenetic analysis explicitly rely on a measure of "genetic distance" between the sequences being classified and therefore they require an MSA(multiple sequnce alignment) as an input.
  • 9.
     Distance-based methodsmust transform the sequence data into a pairwise similarity matrix for use during tree inference.
  • 10.
    VARIOUS DISTANCE BASEDMETHODS 1. UPGMA 2. NJ(Neighbor Joining) 3. FM(Fitch-Margoliash) 4. Minimum evolution
  • 11.
    UPGMA • Stands forUnweighted pair group method with arithmetic mean. • Originally developed for numeric taxonomy in 1958 by Sokal and Michener. • This method uses sequential clustering algorithm.
  • 12.
     This methodfollows a clustering procedure: (1) Assume that initially each species is a cluster on its own. (2) Join closest 2 clusters and recalculate distance of the joint pair by taking the average. (3) Repeat this process until all species are connected in a single cluster.
  • 13.
  • 17.
    DRAWBACK • Strictly speaking,this algorithm is phenetic, which does not aim to reflect evolutionary descent. • It assigns equal weight on the distance and assumes a randomized molecular clock. • WPGMA(Weighted Pair Group Method with Arithmetic Mean)is a similar algorithm but assigns different weight on the distances.
  • 18.
    NEIGHBOUR JOINING METHOD Neighbor-joining methods apply general data clustering techniques to sequence analysis using genetic distance as a clustering metric.  Developed in 1987 by Saitou and Nei.  The simple neighbor-joining method produces unrooted trees, but it does not assume a constant rate of evolution (i.e., a molecular clock) across lineages.
  • 19.
     It beginswith an unresolved star-like tree .  Each pair is evaluated for being joined and the sum of all branches length is calculated of the resultant tree.  The pair that yields the smallest sum is considered the closest neighbors and is thus joined .  A new branch is inserted between them and the rest of the tree and the branch length is recalculated.  This process is repeated until only one terminal is present.
  • 20.
    DRAWBACKS  But itproduces only one tree and neglects other possible trees, which might be as good as NJ trees, if not significantly better.  Moreover since errors in distance estimates are exponentially larger for longer distances, under some condition, this method will yield a biased tree.
  • 22.
    WEIGHTED NEIGHBOUR JOINING(WEIGHBOR) It is a new method proposed recently.  The Weighbor criterion consists of two terms; 1. additivity term (of external branches) 2. positivity term (of internal branches), that quantifies the implications of joining the pair.
  • 23.
     Weighbor givesless weight to the longer distances in the distance matrix and the resulting trees are less sensitive to specific biases than NJ and relatively immune to the "long branches attraction/distraction" drawbacks observed with other methods.
  • 24.
    FITCH – MARGOLIASHMETHOD  Proposed in 1967  Produces unrooted trees  Criteria for fitting trees to distance matrices  Uses a weighted least squares method for clustering based on genetic distance.  Closely related sequences are given more weight in the tree construction process to correct for the increased inaccuracy in measuring distances between distantly related sequences.
  • 26.
    MINIMUM EVOLUTION  Firstdecribed by Kidd & Sgaramella – Zonta in 1971, then earlier by Rzhetsky & Nei in 1992.  Based on the assumption that the tree with the smallest sum of branch length estimates is most likely to be the true one.  Unrooted metric trees
  • 27.
     In ME,the tree that minimizes the lengths of the tree, which is the sum of the lengths of the branches, is regarded as the estimate of the phylogeny: 𝑆 = 𝑖=1 2𝑛−3 𝑣𝑖 where n is the number of taxa in the tree, vi is the ith branch.
  • 28.
    DRAWBACKS  In principleall different tree topologies have tobe investigated to find the minimum tree. However, this is impossible in practice because of the explosive increase in the number of tree topologies.  Slower than clustering methods.  Information lot when characters transformed to distances.
  • 29.
    ADVANTAGES OF DISTANCEBASED APPROACH  Less sensitive to variations in evolutionary rate than cluster analysis  Fast  Can handle many sequences at a time  Produce a reasonable estimate of phylogeny
  • 30.
    DISADVANTAGES OF DISTANCEBASED APPROACH  More sensitive than Parsimony or Maximum Likelihood to systematic errors.  The relationship between the individual characters and the tree is lost in the process of reducing characters to distances.  Strength of the technique is dependent on accuracy of the distance estimate, and thus dependent on the model used to obtain the distance matrix.
  • 31.