2. What is Needleman-Wunsch
algorithm?
The Needleman–Wunsch algorithm is an algorithm used
in bioinformatics to align protein or nucleotide sequences.
It performs a global alignment on two sequences.
The algorithm was developed by Saul B. Needleman and Christian D.
Wunsch and published in 1970.
It is an example of Dynamic Programming and It was one of the first
applications of dynamic programming to compare biological sequences.
3. Even for relatively short sequences, there are lots of possible alignments.
But it will take a long time to assess each alignment one-by-one , to find
the best alignment.
The Needleman-Wunsch algorithm saves us the trouble of assessing all the
many possible alignments to find the best one.
The N-W algorithm takes time proportion to n2 to find the best alignment
of two sequences that are both n letters long.
.
4. Alignment methods
Alignment:- Arranging the sequence of DNA/RNA or PROTEIN to identify
similarities.
2 types:-
Global and local sequence alignment methods
Global : Needleman-Wunch algorithm
Local : Smith-Waterman algorithm
These two dynamic programming alignment algorithm are guaranteed to
give OPTIMAL alignments
5. Goals of sequence alignment
Measure the similarity
Observe patterns of sequence conservation between related biological
species and variability of sequences over time.
Infer evolutionary relationships.
8. RULES
Put the gap in the first
Fill the first column and last row with gap values
Value of Box beside + Gap value
Value of Box bottom + Gap value
Diagonal value + {match/mismatch}
9. Lets see an example….
TWO SEQUENCES WILL BE ALIGNED:-
GATC (#SEQUENCE 1)
GAGC (#SEQUENCE 2)
11. Matrix Fill
Fill the first column and For match=+1; Mismatch= -1; Gap= -2
last row with gap values
We putting the values by adding the gap values
With the beside box
0 -2 -4
C
G
A
G
-
- G A T C
13. Scoring
Parameters
Value of Box beside + Gap value match=+1; Mismatch= -1;
Value of Box bottom + Gap value Gap= -2
Diagonal value + {match/mismatch}
-8
-6
-4
-2
0 -2 -4 -6 -8
C
G
A
G
-
- G A T C
-
4 -4+1
1
15. Continuing the procedure…
match= +1; Mismatch= -1; Gap= -2
-8 -5 -2 -1 2
-6 -3 0 1 -1
-4 -1 2 0 -2
-2 1 -1 -3 -5
0 -2 -4 -6 -8
C
G
A
G
-
- G A T C
16. Traceback Step
After scoring is done we get the maximum global alignment score at the
end. It may be in negative or positive.
The trace back step will determine the actual alingment(s) that result in the
maximum score.
In this step we need to come back towards zero.
Since we have kept the pointers
to all the predecessors, so the
traceback step become simple.
-8 -5 -2 -1 2
-6 -3 0 1 -1
-4 -1 2 0 -2
-2 1 -1 -3 -5
0 -2 -4 -6 -8
C
G
A
G
-
- G A T C
17. we follow the pointers
-8 -5 -2 -1 2
-6 -3 0 1 -1
-4 -1 2 0 -2
-2 1 -1 -3 -5
0 -2 -4 -6 -8
C
G
A
G
-
- G A T C
18. GAGC
It`s the optimal alignment
GA T C
-8 -5 -2 -1 2
-6 -3 0 1 -1
-4 -1 2 0 -2
-2 1 -1 -3 -5
0 -2 -4 -6 -8
C
G
A
G
-
- G A T C
19. Other example…
AGC and AACC
For alignment we need to look at the pointers:-
= sequence
= gaps
We got 3 optimal alignment:-
A-GC AG-C -AGC
AAAC AACC AACC
A G C
0 -2 -4 -6
A -2 1 -1 -3
A -4 -1 0 -2
C -6 -3 -2 -1
C -8 -5 -4 -1
20. Checking..!
We can also check our alignment is right or not, by doing scoring
manually.
Eg:- GAGC A-GC
GATC AACC
+1+1-1+1 +1-2-1+1
= 2 = -1
This score should must be equal to the maximum score of traceback.
If it is then it`s a perfect alingment.