Sequence analysis

Concepts of Dendrogram-
Neighbor joining
Submitted to: Dr. Arti Sharma
Submitted by: Gaurav
Registration no.: 19mslsbf03

Phylogenetics
• Phylogenetic is the study of evolutionary relationships.
• Phylogenetic analysis is the means of inferring or estimating
these relationships.
• The evolutionary history inferred from phylogenetic analysis is
usually depicted as branching, treelike diagrams that represent
an estimated pedigree of the inherited relationships among
molecules (‘‘gene trees’’), organisms, or both.
• Phylogenetic is sometimes called cladistics because the word
‘‘clade,’’ a set of descendants from a single ancestor

What is NJ Method?
● A method called the neighbor-joining method was proposed for
reconstructing phylogenetic trees from evolutionary distance
data.
● The NJ method was developed by Saitou and Nei (1987).
● The principle of this method is to find pairs of operational
taxonomic units (OTUs [ =neighbors]) that minimize the total
branch length at each stage of clustering of OTUs starting with
a starlike tree.
● The input is the ‘n’ number of taxa.
● The output is an unrooted tree with branched

Since B and D have accumulated mutations at a higher rate than A.
The Three-point criterion is violated and the UPGMA method cannot be used since
this would group together A and C rather than A and B.
In such a case the neighbor-joining method is one of the recommended methods.

A B C D E
B 5
C 4 7
D 7 10 7
E 6 9 6 5
F 8 11 8 9 8
Raw data of the tree:
We have in total 6 OTUs
(N=6).

C
A
B
F
E
D
Tree by UPGMA method

Step 1: We calculate the net divergence r (i) for each OTU
from all other OTUs.
r(A) = 5+4+7+6+8
=30
r(B) = 42
r(C) = 32
r(D) = 38
r(E) = 34
r(F) = 44

Step 2: Now we calculate a new distance matrix using for
each pair of OTUs the formula:
M(ij)=d(ij) - [r(i) + r(j)]/(N-2) or in the case of the pair A,B:
A B C D E
B -13
C -11.5 -11.5
D -10 -10 -10.5
E -10 -10 -10.5 -13
F -10.5 -10.5 -11 -11.5 -11.5
M(AB)=d(AB) -[(r(A) + r(B)]/(N-2) = -13

Step 3:Now we choose as neighbors those two OTUs for which Mij is the
smallest. These are A and B and D and E.
Let's take A and B as neighbors and we form a new node
called U. Now we calculate the branch length from
the internal node U to the external OTUs A and B.
S(AU1) =d(AB) / 2 + [r(A)-r(B)] / 2(N-2) = 1 S(BU1)
=d(AB) -S(AU1) = 4

Step 4: Now we define new distances from U1 to
each other terminal node:
d(CU1) = d(AC) + d(BC) - d(AB) / 2 = 3
d(DU1) = d(AD) + d(BD) - d(AB) / 2 = 6
d(EU1) = d(AE) + d(BE) - d(AB) / 2 = 5
d(FU1) = d(AF) + d(BF) - d(AB) / 2 = 7

1. r(U1) = 3+3+7=13
r(C) = 15
r(U2) = 13
r(F) = 21
2. M(ij)=d(ij) - [r(i) + r(j)]/(N-2)
U1 C U2
C -11
U2 -10 -10
F -10 -10 -11
3. S(CU3) = d(CU1) / 2 + [r(C)-r(U1)] / 2(N-2) = 2 S(U1U3)=
d(CU1)-S(CU3) = 1

U1 C D E
C 3
D 6 7
E 5 6 5
F 7 8 9 8

1. r(U1) =3+6+5+7=21
r(C) = 24
r(D) = 27
r(E) = 24
r(F) = 32
2. M(ij)=d(ij) - [r(i) + r(j)]/(N-2)
U1 C D E
C -12
D -10 -10
E -10 -10 -12
F -10.66 -10.66 -10.66 -10.66
3. S(DU2) = d(DE) / 2 + [r(D)-r(E)] / 2(N-2) = 3 S(EU2)= d(DU2)-S(DE)
= 2

4. d(CU2) = d(DC) + d(EC) - d(DE) / 2 =4
d(U1U2)= d(DU1)+ d(EU1)- d(DE) / 2 = 3
d(FU2) = d(DF) + d(EF) - d(DE) / 2 = 6
U1 C U2
C 3
U2 3 4
F 7 8 6

U
2
U
3
U
3
2
F 6 6
4. d(U2U3) = d(CU2)+ d(U1U2)- d(CU1) / 2 = 2
d(FU3) = d(CF) + d(U2F) - d(CU1) / 2 = 6

1. r(U2) = 8
r(U3) = 6
r(F) = 12
2. M(ij)=d(ij) - [r(i) + r(j)]/(N-2)
3. S(FU4) = d(FU2) / 2 + [r(F)-r(U2)] / 2(N-2) = 5 S(U2U4)=
d(U2F)-S(FU4) = 1
U
2
U
3
U
3
-
12
F -
14
-
12

F 5
Here, N-2=0
So we can’t do any calculation.
For last pair,connect U4 and F with branch length 5.
U4

Advantages
●
●
● Is fast and thus suited for large datasets.
• Permits lineages with largely different
branch lengths. Permits correction for
multiple substitutions.
Disadvantages
●
●
● Sequence information is reduced. Gives only
one possible tree.
Strongly dependent on the model of evolution
used.

Sequence analysis

More Related Content

What's hot

Similar to Sequence analysis

More from Gaurav Aggarwal

Recently uploaded

Sequence analysis