2. Presented By:
• Proshanta Kumar Shil
ID:141-15-3140
Section:B
Department of CSE
Daffodil International University
11-Dec-2015
2
3. What is Needleman-Wunsch algorithm?
The Needleman–Wunsch algorithm is an algorithm used
in bioinformatics to align protein or nucleotide sequences.
• It was one of the first applications of dynamic programming to compare
biological sequences.
• The algorithm was developed by Saul B. Needleman and Christian D.
Wunsch and published in 1970.
3
11-Dec-2015
4. Alignment methods
• Global and local sequence alignment methods
• Global : Needleman-Wunch
• Local : Smith-Waterman
• Database Search
• BLAST
• FASTA
4
11-Dec-2015
5. Goals of sequence databases
• To know about a newly sequenced database.
• To find the similarity of a unique sequence to another gene that
has a known function.
• To find the similarity of a new protein in a lower organism to a
protein from another species.
5
11-Dec-2015
6. Alignment Algorithms
• Global : Needleman-Wunch
• Local : Smith-Watermann
• These two dynamic programming alignment algorithm are
guaranteed to give OPTIMAL alignments
• But O(m*n) quadratic
6
11-Dec-2015
7. Needleman-Wunsch Method
• For example, the two hypothetical sequences
• abcdefghajklm
• abbdhijk
• could be aligned like this
• abcdefghajklm
• || | | ||
• abbd...hijk
• As shown, there are 6 matches,
• 2 mismatches, and one gap of length 3.
7
11-Dec-2015
8. Needleman-Wunsch Method
• The alignment is scored according to a payoff matrix
• $payoff = { match => $match,
• mismatch => $mismatch,
• gap_open => $gap_open,
• gap_extend => $gap_extend };
• For correct operation, match must be positive,
• and the other entries must be negative.
8
11-Dec-2015
9. Needleman-Wunsch Method
• Given the payoff matrix
• $payoff = { match => 4,
• mismatch => -3,
• gap_open => -2,
• gap_extend => -1 };
9
11-Dec-2015
10. Needleman-Wunsch Method
• The sequences
• abcdefghajklm
• abbdhijk
• are aligned and scored like this
• a b c d e f g h a j k l m
• | | | | | |
• a b b d . . . h i j k
• match 4 4 4 4 4 4
• mismatch -3 -3
• gap_open -2
• gap_extend -1-1-1
• for a total score of 24-6-2-3 = 13.
10
11-Dec-2015
13. Fill in the Table 13
11-Dec-2015
A G C
A
A
C
C
Two sequences will be aligned.
AGC (sequence #1)
AACC (sequence #2)
A simple scoring scheme will be used
14. Initialization step:
Create Matrix with M + 1
columns
and N + 1 rows.
For match=+1; Mismatch=-1;
Gap=-2
14
11-Dec-2015
A G C
0 -2 -4 -6
A -2
A -4
C -6
C -8
15. 15
11-Dec-2015
Fill in the Table
A G C
0 -2 -4 -6
A -2 1
A -4
C -6
C -8
Matrix fill step: Each position Mi,j is defined to be the
MAXIMUM score at position i,j
Mi,j = MAXIMUM [
Mi-1, j-1 + si,,j (match or mismatch in the diagonal)
Mi, j-1 + w (gap in sequence #1)
Mi-1, j + w (gap in sequence #2)]
17. Traceback step:
Position at current cell
and look at direct
predecessors
17
11-Dec-2015
A G C
0 -2 -4 -6
A -2 1 -1 -3
A -4 -1 0 -2
C -6 -3 -2 -1
C -8 -5 -4 -1
18. Traceback step:
Position at current cell and
look at direct predecessors
18
11-Dec-2015
A G C
0 -2 -4 -6
A -2 1 -1 -3
A -4 -1 0 -2
C -6 -3 -2 -1
C -8 -5 -4 -1
AG-C
AAAC
-AGC
AAAC
A-GC
AAAC
19. Summary
• The algorithm essentially divides a large problem into a series of
smaller problems and uses the solutions to the smaller problems to
reconstruct a solution to the larger problem.
• It is also sometimes referred to as the optimal matching algorithm
and the global alignment technique.
• The Needleman–Wunsch algorithm is still widely used for optimal
global alignment, particularly when the quality of the global
alignment is of the utmost importance.
11-Dec-2015
19