S.Prasanth Kumar, Bioinformatician Genomics Sequence Alignment : Complete Coverage-I S.Prasanth Kumar   Dept. of Bioinformatics  Applied Botany Centre (ABC)  Gujarat University, Ahmedabad, INDIA www.facebook.com/Prasanth Sivakumar FOLLOW ME ON  ACCESS MY RESOURCES IN SLIDESHARE prasanthperceptron CONTACT ME [email_address]
Alignment scoring schemes Alignment of ATCGGATCT and ACGGACT match: +2 mismatch: -1 indel –2 6 * 2 + 1 * -1 + 2 * -2 = 7 6 matches, 1 mismatch, and 2 indels
Optimal alignment of two sequences Brute Force Method Suppose there are two sequences X and Z to be aligned, where |X| = m and |Z| = n If gaps are allowed in the sequences, then the potential length of both the first and second sequences is m+n. 2 m+n  subsequences with spaces for the sequence X 2 m+n  subsequences with spaces for the sequence Z Alignment = 2 m+n  * 2 m+n  = 2 (2(m+n))  = 4 m+n  comparisons
Optimal alignment of two sequences Dynamic Programming DP align two sequences by beginning at the ends of the two sequences and attempting to align all possible pairs of characters (one from each sequence) using a scoring scheme for matches, mismatches, and gaps. The highest set of scores defines the optimal alignment between the two sequences DP algorithms solve optimization problems by dividing the problem into independent subproblems
Optimal alignment of two sequences Dynamic Programming Matrix s(a i b j ) = +5 if a i  = b j  (match score) s(a i b j ) = -3 if a i  ≠ b j  (mismatch score) w = -4 (gap penalty) •  Initialization •  Matrix Fill (scoring) •  Traceback (alignment)
Global Alignment: Needleman-Wunsch Algorithm Initialization Step Each row S i,0  is set to w * i  Each column S 0,j  is set to w * j
Global Alignment: Needleman-Wunsch Algorithm Matrix Fill Step G-G   match score = +5 Si,j = MAX [0 +  5 , -4 +  -4 , -4 +  -4 ]  = MAX [ 5 , -8 , -8 ] =  5 Confusing ? Diagonal + Match/Mismatch Score Left + Gap penalty Right + Gap penalty
Global Alignment: Needleman-Wunsch Algorithm G-A   mismatch score = -3 Si,j = MAX [-4 +  -3 , 5 +  -4 , -8 +  -4 ]  = MAX [ -7 , 1 , -12 ] =  1
Global Alignment: Needleman-Wunsch Algorithm Trace backing Easy ; Find the lowermost right corner and follow arrow
Global Alignment: Needleman-Wunsch Algorithm 5 – 3 + 5 – 4 + 5 + 5 – 4 + 5 – 4 – 4 + 5 =  11
Local Alignment: Smith-Waterman Algorithm Initialization Step Each row S i,0  is set to 0  Each column S 0,j  is set to 0 Same Rule  Initialization different  Trace backing need attention
Local Alignment: Smith-Waterman Algorithm There are two cells having 14.  There are multiple alignments producing the maximal alignment score What to consider ?  Value in last row means aligned fully
Local Alignment: Smith-Waterman Algorithm Two trace back pathway pointers The two local alignments resulting in a score of 14
Local Alignment: Smith-Waterman Algorithm 5 matches, 1 mismatch, and 2 gaps score = 5 *5 – 1 *3  – 2 *4  = 25 – 3 – 8 =  14
What in Next Coverage ? Scoring Matrices: PAM & BLOSUM Assessing the significance of sequence alignments
Thank You For Your Attention !!!

Sequence alignments complete coverage

  • 1.
    S.Prasanth Kumar, BioinformaticianGenomics Sequence Alignment : Complete Coverage-I S.Prasanth Kumar Dept. of Bioinformatics Applied Botany Centre (ABC) Gujarat University, Ahmedabad, INDIA www.facebook.com/Prasanth Sivakumar FOLLOW ME ON ACCESS MY RESOURCES IN SLIDESHARE prasanthperceptron CONTACT ME [email_address]
  • 2.
    Alignment scoring schemesAlignment of ATCGGATCT and ACGGACT match: +2 mismatch: -1 indel –2 6 * 2 + 1 * -1 + 2 * -2 = 7 6 matches, 1 mismatch, and 2 indels
  • 3.
    Optimal alignment oftwo sequences Brute Force Method Suppose there are two sequences X and Z to be aligned, where |X| = m and |Z| = n If gaps are allowed in the sequences, then the potential length of both the first and second sequences is m+n. 2 m+n subsequences with spaces for the sequence X 2 m+n subsequences with spaces for the sequence Z Alignment = 2 m+n * 2 m+n = 2 (2(m+n)) = 4 m+n comparisons
  • 4.
    Optimal alignment oftwo sequences Dynamic Programming DP align two sequences by beginning at the ends of the two sequences and attempting to align all possible pairs of characters (one from each sequence) using a scoring scheme for matches, mismatches, and gaps. The highest set of scores defines the optimal alignment between the two sequences DP algorithms solve optimization problems by dividing the problem into independent subproblems
  • 5.
    Optimal alignment oftwo sequences Dynamic Programming Matrix s(a i b j ) = +5 if a i = b j (match score) s(a i b j ) = -3 if a i ≠ b j (mismatch score) w = -4 (gap penalty) • Initialization • Matrix Fill (scoring) • Traceback (alignment)
  • 6.
    Global Alignment: Needleman-WunschAlgorithm Initialization Step Each row S i,0 is set to w * i Each column S 0,j is set to w * j
  • 7.
    Global Alignment: Needleman-WunschAlgorithm Matrix Fill Step G-G  match score = +5 Si,j = MAX [0 + 5 , -4 + -4 , -4 + -4 ] = MAX [ 5 , -8 , -8 ] = 5 Confusing ? Diagonal + Match/Mismatch Score Left + Gap penalty Right + Gap penalty
  • 8.
    Global Alignment: Needleman-WunschAlgorithm G-A  mismatch score = -3 Si,j = MAX [-4 + -3 , 5 + -4 , -8 + -4 ] = MAX [ -7 , 1 , -12 ] = 1
  • 9.
    Global Alignment: Needleman-WunschAlgorithm Trace backing Easy ; Find the lowermost right corner and follow arrow
  • 10.
    Global Alignment: Needleman-WunschAlgorithm 5 – 3 + 5 – 4 + 5 + 5 – 4 + 5 – 4 – 4 + 5 = 11
  • 11.
    Local Alignment: Smith-WatermanAlgorithm Initialization Step Each row S i,0 is set to 0 Each column S 0,j is set to 0 Same Rule Initialization different Trace backing need attention
  • 12.
    Local Alignment: Smith-WatermanAlgorithm There are two cells having 14. There are multiple alignments producing the maximal alignment score What to consider ? Value in last row means aligned fully
  • 13.
    Local Alignment: Smith-WatermanAlgorithm Two trace back pathway pointers The two local alignments resulting in a score of 14
  • 14.
    Local Alignment: Smith-WatermanAlgorithm 5 matches, 1 mismatch, and 2 gaps score = 5 *5 – 1 *3 – 2 *4 = 25 – 3 – 8 = 14
  • 15.
    What in NextCoverage ? Scoring Matrices: PAM & BLOSUM Assessing the significance of sequence alignments
  • 16.
    Thank You ForYour Attention !!!