Dynamic programming

752
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
752
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Dynamic programming

  1. 1. About optimal sequence alignment A short glimpse into bioinformatics April 24, 2010 1 / 23
  2. 2. Pairwise sequence alignment Assumptions: sequences S1 and S2 are homologous, they share a common ancestor; differences between them are due to only two kinds of events, substitutions and insertion-deletions. Strategy: choose a scoring matrix (reward for match, penalty for mismatch and gap); compute the editing distance (number of matches, mismatches and gaps) to go from one sequence to the other; keep the alignment with the highest score. 2 / 23
  3. 3. Needleman-Wunsch algorithm Aim: find the optimal global alignment of sequences S1 and S2 Recursion rule:  D(i − 1, j − 1) + score(S1 [i], S2 [j])  D(i, j) = max D(i − 1, j) + gap (1)  D(i, j − 1) + gap  Scoring scheme: identity=0 transition=-2 transversion=-5 gap=-10 Sequences: S1 =TTGT S2 =CTAGG 3 / 23
  4. 4. Fill the matrix C T A G G T T G T 4 / 23
  5. 5. Fill the matrix C T A G G 0 T T G T 5 / 23
  6. 6. Fill the matrix C T A G G 0 → -10 T T G T 6 / 23
  7. 7. Fill the matrix C T A G G 0 → -10 → -20 T T G T 7 / 23
  8. 8. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 T T G T 8 / 23
  9. 9. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ T -10 ↓ T -20 ↓ G -30 ↓ T -40 9 / 23
  10. 10. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ T -10 → ↓ T -20 ↓ G -30 ↓ T -40 10 / 23
  11. 11. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ T -10 → -2 ↓ T -20 ↓ G -30 ↓ T -40 11 / 23
  12. 12. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ T -20 ↓ G -30 ↓ T -40 12 / 23
  13. 13. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → → → → → ↓ ↓ ↓ ↓ ↓ ↓ G -30 → → → → → ↓ ↓ ↓ ↓ ↓ ↓ T -40 → → → → → 13 / 23
  14. 14. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 14 / 23
  15. 15. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 15 / 23
  16. 16. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 16 / 23
  17. 17. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 17 / 23
  18. 18. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 18 / 23
  19. 19. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 19 / 23
  20. 20. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 20 / 23
  21. 21. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 21 / 23
  22. 22. Output Plot the optimal alignment: CTAGG *| |* TT-GT Score: -17 Complexity in time: O(nm) Complexity in memory: O(nm) 22 / 23
  23. 23. Acknowledgments Bellman; Levenstein; Needleman and Wunsch; Sankoff and Sellers; Hirschberg; Smith and Waterman; Gotoh; Ukkonen, Myers and Fickett; and many others... Want to know more? start reading! http://lectures.molgen.mpg.de/online_lectures.html 23 / 23

×