Concepts of Dendrogram-
Neighbor joining
Submitted to: Dr. Arti Sharma
Submitted by: Gaurav
Registration no.: 19mslsbf03
Phylogenetics
• Phylogenetic is the study of evolutionary relationships.
• Phylogenetic analysis is the means of inferring or estimating
these relationships.
• The evolutionary history inferred from phylogenetic analysis is
usually depicted as branching, treelike diagrams that represent
an estimated pedigree of the inherited relationships among
molecules (‘‘gene trees’’), organisms, or both.
• Phylogenetic is sometimes called cladistics because the word
‘‘clade,’’ a set of descendants from a single ancestor
What is NJ Method?
● A method called the neighbor-joining method was proposed for
reconstructing phylogenetic trees from evolutionary distance
data.
● The NJ method was developed by Saitou and Nei (1987).
● The principle of this method is to find pairs of operational
taxonomic units (OTUs [ =neighbors]) that minimize the total
branch length at each stage of clustering of OTUs starting with
a starlike tree.
● The input is the ‘n’ number of taxa.
● The output is an unrooted tree with branched
Since B and D have accumulated mutations at a higher rate than A.
The Three-point criterion is violated and the UPGMA method cannot be used since
this would group together A and C rather than A and B.
In such a case the neighbor-joining method is one of the recommended methods.
A B C D E
B 5
C 4 7
D 7 10 7
E 6 9 6 5
F 8 11 8 9 8
Raw data of the tree:
We have in total 6 OTUs
(N=6).
C
A
B
F
E
D
Tree by UPGMA method
Step 1: We calculate the net divergence r (i) for each OTU
from all other OTUs.
r(A) = 5+4+7+6+8
=30
r(B) = 42
r(C) = 32
r(D) = 38
r(E) = 34
r(F) = 44
Step 2: Now we calculate a new distance matrix using for
each pair of OTUs the formula:
M(ij)=d(ij) - [r(i) + r(j)]/(N-2) or in the case of the pair A,B:
A B C D E
B -13
C -11.5 -11.5
D -10 -10 -10.5
E -10 -10 -10.5 -13
F -10.5 -10.5 -11 -11.5 -11.5
M(AB)=d(AB) -[(r(A) + r(B)]/(N-2) = -13
Step 3:Now we choose as neighbors those two OTUs for which Mij is the
smallest. These are A and B and D and E.
Let's take A and B as neighbors and we form a new node
called U. Now we calculate the branch length from
the internal node U to the external OTUs A and B.
S(AU1) =d(AB) / 2 + [r(A)-r(B)] / 2(N-2) = 1 S(BU1)
=d(AB) -S(AU1) = 4
C
A
BF
E
D
U
1
Step 4: Now we define new distances from U1 to
each other terminal node:
d(CU1) = d(AC) + d(BC) - d(AB) / 2 = 3
d(DU1) = d(AD) + d(BD) - d(AB) / 2 = 6
d(EU1) = d(AE) + d(BE) - d(AB) / 2 = 5
d(FU1) = d(AF) + d(BF) - d(AB) / 2 = 7
1. r(U1) = 3+3+7=13
r(C) = 15
r(U2) = 13
r(F) = 21
2. M(ij)=d(ij) - [r(i) + r(j)]/(N-2)
U1 C U2
C -11
U2 -10 -10
F -10 -10 -11
3. S(CU3) = d(CU1) / 2 + [r(C)-r(U1)] / 2(N-2) = 2 S(U1U3)=
d(CU1)-S(CU3) = 1
U1 C D E
C 3
D 6 7
E 5 6 5
F 7 8 9 8
1. r(U1) =3+6+5+7=21
r(C) = 24
r(D) = 27
r(E) = 24
r(F) = 32
2. M(ij)=d(ij) - [r(i) + r(j)]/(N-2)
U1 C D E
C -12
D -10 -10
E -10 -10 -12
F -10.66 -10.66 -10.66 -10.66
3. S(DU2) = d(DE) / 2 + [r(D)-r(E)] / 2(N-2) = 3 S(EU2)= d(DU2)-S(DE)
= 2
C
A
BF
E
D
U
1
U
2
4. d(CU2) = d(DC) + d(EC) - d(DE) / 2 =4
d(U1U2)= d(DU1)+ d(EU1)- d(DE) / 2 = 3
d(FU2) = d(DF) + d(EF) - d(DE) / 2 = 6
U1 C U2
C 3
U2 3 4
F 7 8 6
C
A
BF
E
D
U
1
U
2
U
3
U
2
U
3
U
3
2
F 6 6
4. d(U2U3) = d(CU2)+ d(U1U2)- d(CU1) / 2 = 2
d(FU3) = d(CF) + d(U2F) - d(CU1) / 2 = 6
1. r(U2) = 8
r(U3) = 6
r(F) = 12
2. M(ij)=d(ij) - [r(i) + r(j)]/(N-2)
3. S(FU4) = d(FU2) / 2 + [r(F)-r(U2)] / 2(N-2) = 5 S(U2U4)=
d(U2F)-S(FU4) = 1
U
2
U
3
U
3
-
12
F -
14
-
12
C
A
BF
E
D
U
1
U
2 U
3
U4
F 5
Here, N-2=0
So we can’t do any calculation.
For last pair,connect U4 and F with branch length 5.
U4
C
A
B
F
E
D
U
1
U
2 U
3
U4
Advantages
●
●
● Is fast and thus suited for large datasets.
• Permits lineages with largely different
branch lengths. Permits correction for
multiple substitutions.
Disadvantages
●
●
● Sequence information is reduced. Gives only
one possible tree.
Strongly dependent on the model of evolution
used.
Thank
You!!

Sequence analysis

  • 1.
    Concepts of Dendrogram- Neighborjoining Submitted to: Dr. Arti Sharma Submitted by: Gaurav Registration no.: 19mslsbf03
  • 2.
    Phylogenetics • Phylogenetic isthe study of evolutionary relationships. • Phylogenetic analysis is the means of inferring or estimating these relationships. • The evolutionary history inferred from phylogenetic analysis is usually depicted as branching, treelike diagrams that represent an estimated pedigree of the inherited relationships among molecules (‘‘gene trees’’), organisms, or both. • Phylogenetic is sometimes called cladistics because the word ‘‘clade,’’ a set of descendants from a single ancestor
  • 3.
    What is NJMethod? ● A method called the neighbor-joining method was proposed for reconstructing phylogenetic trees from evolutionary distance data. ● The NJ method was developed by Saitou and Nei (1987). ● The principle of this method is to find pairs of operational taxonomic units (OTUs [ =neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. ● The input is the ‘n’ number of taxa. ● The output is an unrooted tree with branched
  • 4.
    Since B andD have accumulated mutations at a higher rate than A. The Three-point criterion is violated and the UPGMA method cannot be used since this would group together A and C rather than A and B. In such a case the neighbor-joining method is one of the recommended methods.
  • 5.
    A B CD E B 5 C 4 7 D 7 10 7 E 6 9 6 5 F 8 11 8 9 8 Raw data of the tree: We have in total 6 OTUs (N=6).
  • 6.
  • 7.
    Step 1: Wecalculate the net divergence r (i) for each OTU from all other OTUs. r(A) = 5+4+7+6+8 =30 r(B) = 42 r(C) = 32 r(D) = 38 r(E) = 34 r(F) = 44
  • 8.
    Step 2: Nowwe calculate a new distance matrix using for each pair of OTUs the formula: M(ij)=d(ij) - [r(i) + r(j)]/(N-2) or in the case of the pair A,B: A B C D E B -13 C -11.5 -11.5 D -10 -10 -10.5 E -10 -10 -10.5 -13 F -10.5 -10.5 -11 -11.5 -11.5 M(AB)=d(AB) -[(r(A) + r(B)]/(N-2) = -13
  • 9.
    Step 3:Now wechoose as neighbors those two OTUs for which Mij is the smallest. These are A and B and D and E. Let's take A and B as neighbors and we form a new node called U. Now we calculate the branch length from the internal node U to the external OTUs A and B. S(AU1) =d(AB) / 2 + [r(A)-r(B)] / 2(N-2) = 1 S(BU1) =d(AB) -S(AU1) = 4
  • 10.
  • 11.
    Step 4: Nowwe define new distances from U1 to each other terminal node: d(CU1) = d(AC) + d(BC) - d(AB) / 2 = 3 d(DU1) = d(AD) + d(BD) - d(AB) / 2 = 6 d(EU1) = d(AE) + d(BE) - d(AB) / 2 = 5 d(FU1) = d(AF) + d(BF) - d(AB) / 2 = 7
  • 12.
    1. r(U1) =3+3+7=13 r(C) = 15 r(U2) = 13 r(F) = 21 2. M(ij)=d(ij) - [r(i) + r(j)]/(N-2) U1 C U2 C -11 U2 -10 -10 F -10 -10 -11 3. S(CU3) = d(CU1) / 2 + [r(C)-r(U1)] / 2(N-2) = 2 S(U1U3)= d(CU1)-S(CU3) = 1
  • 13.
    U1 C DE C 3 D 6 7 E 5 6 5 F 7 8 9 8
  • 14.
    1. r(U1) =3+6+5+7=21 r(C)= 24 r(D) = 27 r(E) = 24 r(F) = 32 2. M(ij)=d(ij) - [r(i) + r(j)]/(N-2) U1 C D E C -12 D -10 -10 E -10 -10 -12 F -10.66 -10.66 -10.66 -10.66 3. S(DU2) = d(DE) / 2 + [r(D)-r(E)] / 2(N-2) = 3 S(EU2)= d(DU2)-S(DE) = 2
  • 15.
  • 16.
    4. d(CU2) =d(DC) + d(EC) - d(DE) / 2 =4 d(U1U2)= d(DU1)+ d(EU1)- d(DE) / 2 = 3 d(FU2) = d(DF) + d(EF) - d(DE) / 2 = 6 U1 C U2 C 3 U2 3 4 F 7 8 6
  • 17.
  • 18.
    U 2 U 3 U 3 2 F 6 6 4.d(U2U3) = d(CU2)+ d(U1U2)- d(CU1) / 2 = 2 d(FU3) = d(CF) + d(U2F) - d(CU1) / 2 = 6
  • 19.
    1. r(U2) =8 r(U3) = 6 r(F) = 12 2. M(ij)=d(ij) - [r(i) + r(j)]/(N-2) 3. S(FU4) = d(FU2) / 2 + [r(F)-r(U2)] / 2(N-2) = 5 S(U2U4)= d(U2F)-S(FU4) = 1 U 2 U 3 U 3 - 12 F - 14 - 12
  • 20.
  • 21.
    F 5 Here, N-2=0 Sowe can’t do any calculation. For last pair,connect U4 and F with branch length 5. U4
  • 22.
  • 23.
    Advantages ● ● ● Is fastand thus suited for large datasets. • Permits lineages with largely different branch lengths. Permits correction for multiple substitutions. Disadvantages ● ● ● Sequence information is reduced. Gives only one possible tree. Strongly dependent on the model of evolution used.
  • 26.