4. 4
Why align sequences?
• Useful for discovering
• Functional
• Structural and
• Evolutionary relationship
– For example
• To find whether two (or more) genes or proteins are
evolutionarily related to each other
• Two proteins with similar sequences will probably be
structurally or functionally similar
6. 6
Global Vs Local Alignment
• Global Alignment
– A general global alignment technique is the Needleman–Wunsch
algorithm, which is based on dynamic programming.
– Attempts to align the maximum of the entire sequence
– Suitable for similar and equal length sequences
• Local Alignment
– Local alignments are more useful for
dissimilar sequences that are suspected to contain regions of similarity or
similar sequence motifs within their larger sequence context.
– Stretches of sequences with highest density of matches are
aligned
– Suitable for partially similar, different length and conserved
region containing sequences
9. 9
Allows obtaining the optimal alignment with linear gap cost has
been proposed by Needleman and Wunsch by providing a
score, for each position of the aligned sequences.
Based on the dynamic programming technique.
For two sequences of length m and n we define a matrix of
dimensions m+1 and n+1.
Global Alignment
10. 10
Global Alignment
Three steps in dynamic programming
Initialization
Matrix fill (scoring)
Traceback (alignment)
Smith–Waterman algorithm
11. 11
Sequences:
S: ATTATCT
T: TTTCTA
T
S 0
_
A
T
T
A
T
C
T
_ T T T C T A
0
-1
-2
-3
-4
-5
-6
-7
-1 -2 -3 -4 -5 -6
0 -1 -2 -3 -4 -5
1 2 1 0 -1 -2
0 3 4 3 2 1
-1 2 3 4 3 4
-2 1 4 3 6 5
-3 0 3 6 5 6
-4 -1 2 5 8 7
Match Score =
+2
Mismatch Score
= 0
Gap Penalty = -1
12. 12
0
_
A
T
T
A
T
C
T
_ T T T C T A
0
-1
-2
-3
-4
-5
-6
-7
-1 -2 -3 -4 -5 -6
0 -1 -2 -3 -4 -5
1 2 1 0 -1 -2
0 3 4 3 2 1
-1 2 3 4 3 4
-2 1 4 3 6 5
-3 0 3 6 5 6
-4 -1 2 5 8 7
T
S
16. 16
S/T 0 A T G A T G T A G
0 0 0 0 0 0 0 0 0 0 0
G 0 0 0
A 0
G 0
A 0
T 0
G 0
T 0
G 0
C 0
0 + 2 0 +-2
0 + -2 2
0 + 2 = 2
0 + -2 = 0
0 + -2 = 0
Match : 2, Mismatch : -1, Gap : -2
0 + -1 0 + -2
0 + -2 0
0 + 2 = 0
0 + -2 = 0
0 + -2 = 0
Matrix fill (scoring)
17. 17
S/T 0 A T G A T G T A G
0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 2 0 0 2 0 0 0
A 0 2 0 0 4 2 0 0 2 0
G 0 0 1 2 2 3 4 2 0 4
A 0 2 0 0 4 2 2 2 4 2
T 0 0 4 2 2 6 4 4 2 3
G 0 0 2 6 4 4 8 6 4 4
T 0 0 2 4 4 6 6 10 8 6
G 0 0 0 4 3 4 8 8 9 10
C 0 0 0 2 3 1 6 7 7 8
Match : 2, Mismatch : -1, Gap : -2
Matrix fill (scoring)
18. Trace back
18
S/T 0 A T G A T G T A G
0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 2 0 0 2 0 0 0
A 0 2 0 0 4 2 0 0 2 0
G 0 0 1 2 2 3 4 2 0 4
A 0 2 0 0 4 2 2 2 4 2
T 0 0 4 2 2 6 4 4 2 3
G 0 0 2 6 4 4 8 6 4 4
T 0 0 2 4 4 6 6 10 8 6
G 0 0 0 4 3 4 8 8 9 10
C 0 0 0 2 3 1 6 7 7 8
Match : 2, Mismatch : -1, Gap : -2
19. Alignment
19
G A T G T A G
| | | | | | |
G A T G T - G
2 2 2 2 2 -2 2
6 X 2 = 12
1 X -2 = -2
10
G A T G T
| | | | |
G A T G T
2 2 2 2 2
5 X 2 = 10
10