About optimal sequence alignment
A short glimpse into bioinformatics
April 24, 2010
1 / 23
Pairwise sequence alignment
sequences S1 and S2 are homologous, they share a common
diﬀerences between them are due to only two kinds of events,
substitutions and insertion-deletions.
choose a scoring matrix (reward for match, penalty for
mismatch and gap);
compute the editing distance (number of matches,
mismatches and gaps) to go from one sequence to the other;
keep the alignment with the highest score.
2 / 23
Aim: ﬁnd the optimal global alignment of sequences S1 and S2
D(i − 1, j − 1) + score(S1 [i], S2 [j])
D(i, j) = max D(i − 1, j) + gap (1)
D(i, j − 1) + gap
Scoring scheme: identity=0 transition=-2 transversion=-5
Sequences: S1 =TTGT S2 =CTAGG
3 / 23
Plot the optimal alignment:
Complexity in time: O(nm)
Complexity in memory: O(nm)
22 / 23
Bellman; Levenstein; Needleman and Wunsch; Sankoﬀ and Sellers;
Hirschberg; Smith and Waterman; Gotoh; Ukkonen, Myers and
Fickett; and many others...
Want to know more? start reading!
23 / 23