Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dynamic programming

931 views

Published on

  • Be the first to comment

  • Be the first to like this

Dynamic programming

  1. 1. About optimal sequence alignment A short glimpse into bioinformatics April 24, 2010 1 / 23
  2. 2. Pairwise sequence alignment Assumptions: sequences S1 and S2 are homologous, they share a common ancestor; differences between them are due to only two kinds of events, substitutions and insertion-deletions. Strategy: choose a scoring matrix (reward for match, penalty for mismatch and gap); compute the editing distance (number of matches, mismatches and gaps) to go from one sequence to the other; keep the alignment with the highest score. 2 / 23
  3. 3. Needleman-Wunsch algorithm Aim: find the optimal global alignment of sequences S1 and S2 Recursion rule:  D(i − 1, j − 1) + score(S1 [i], S2 [j])  D(i, j) = max D(i − 1, j) + gap (1)  D(i, j − 1) + gap  Scoring scheme: identity=0 transition=-2 transversion=-5 gap=-10 Sequences: S1 =TTGT S2 =CTAGG 3 / 23
  4. 4. Fill the matrix C T A G G T T G T 4 / 23
  5. 5. Fill the matrix C T A G G 0 T T G T 5 / 23
  6. 6. Fill the matrix C T A G G 0 → -10 T T G T 6 / 23
  7. 7. Fill the matrix C T A G G 0 → -10 → -20 T T G T 7 / 23
  8. 8. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 T T G T 8 / 23
  9. 9. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ T -10 ↓ T -20 ↓ G -30 ↓ T -40 9 / 23
  10. 10. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ T -10 → ↓ T -20 ↓ G -30 ↓ T -40 10 / 23
  11. 11. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ T -10 → -2 ↓ T -20 ↓ G -30 ↓ T -40 11 / 23
  12. 12. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ T -20 ↓ G -30 ↓ T -40 12 / 23
  13. 13. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → → → → → ↓ ↓ ↓ ↓ ↓ ↓ G -30 → → → → → ↓ ↓ ↓ ↓ ↓ ↓ T -40 → → → → → 13 / 23
  14. 14. Fill the matrix C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 14 / 23
  15. 15. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 15 / 23
  16. 16. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 16 / 23
  17. 17. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 17 / 23
  18. 18. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 18 / 23
  19. 19. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 19 / 23
  20. 20. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 20 / 23
  21. 21. Traceback C T A G G 0 → -10 → -20 → -30 → -40 → -50 ↓ ↓ ↓ ↓ ↓ ↓ T -10 → -2 → -10 → -20 → -30 → -40 ↓ ↓ ↓ ↓ ↓ ↓ T -20 → -12 → -2 → -12 → -22 → -32 ↓ ↓ ↓ ↓ ↓ ↓ G -30 → -22 → -12 → -4 → -12 → -22 ↓ ↓ ↓ ↓ ↓ ↓ T -40 → -32 → -22 → -14 → -9 → -17 21 / 23
  22. 22. Output Plot the optimal alignment: CTAGG *| |* TT-GT Score: -17 Complexity in time: O(nm) Complexity in memory: O(nm) 22 / 23
  23. 23. Acknowledgments Bellman; Levenstein; Needleman and Wunsch; Sankoff and Sellers; Hirschberg; Smith and Waterman; Gotoh; Ukkonen, Myers and Fickett; and many others... Want to know more? start reading! http://lectures.molgen.mpg.de/online_lectures.html 23 / 23

×