Upcoming SlideShare
×

# Dynamic programming

752
-1

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
752
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
27
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Dynamic programming

1. 1. About optimal sequence alignment A short glimpse into bioinformatics April 24, 2010 1 / 23
2. 2. Pairwise sequence alignment Assumptions: sequences S1 and S2 are homologous, they share a common ancestor; diﬀerences between them are due to only two kinds of events, substitutions and insertion-deletions. Strategy: choose a scoring matrix (reward for match, penalty for mismatch and gap); compute the editing distance (number of matches, mismatches and gaps) to go from one sequence to the other; keep the alignment with the highest score. 2 / 23
3. 3. Needleman-Wunsch algorithm Aim: ﬁnd the optimal global alignment of sequences S1 and S2 Recursion rule:  D(i − 1, j − 1) + score(S1 [i], S2 [j])  D(i, j) = max D(i − 1, j) + gap (1)  D(i, j − 1) + gap  Scoring scheme: identity=0 transition=-2 transversion=-5 gap=-10 Sequences: S1 =TTGT S2 =CTAGG 3 / 23
4. 4. Fill the matrix C T A G G T T G T 4 / 23
5. 5. Fill the matrix C T A G G 0 T T G T 5 / 23
6. 6. Fill the matrix C T A G G 0 → -10 T T G T 6 / 23
7. 7. Fill the matrix C T A G G 0 → -10 → -20 T T G T 7 / 23
8. 8. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 T T G T 8 / 23
9. 9. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ T -10 ↓ T -20 ↓ G -30 ↓ T -40 9 / 23
10. 10. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ T -10 → ↓ T -20 ↓ G -30 ↓ T -40 10 / 23
11. 11. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ T -10 → -2 ↓ T -20 ↓ G -30 ↓ T -40 11 / 23
12. 12. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ T -20 ↓ G -30 ↓ T -40 12 / 23
13. 13. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → → → → → ↓ ↓ ↓ ↓ ↓ ↓ G -30 → → → → → ↓ ↓ ↓ ↓ ↓ ↓ T -40 → → → → → 13 / 23
14. 14. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 14 / 23
15. 15. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 15 / 23
16. 16. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 16 / 23
17. 17. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 17 / 23
18. 18. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 18 / 23
19. 19. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 19 / 23
20. 20. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 20 / 23
21. 21. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 21 / 23
22. 22. Output Plot the optimal alignment: CTAGG *| |* TT-GT Score: -17 Complexity in time: O(nm) Complexity in memory: O(nm) 22 / 23
23. 23. Acknowledgments Bellman; Levenstein; Needleman and Wunsch; Sankoﬀ and Sellers; Hirschberg; Smith and Waterman; Gotoh; Ukkonen, Myers and Fickett; and many others... Want to know more? start reading! http://lectures.molgen.mpg.de/online_lectures.html 23 / 23