2. Introduction
Similarity between two sequences can be defined as:
1. One sequence is a substring of the other.
2. If the number of changes needed to turn one into the other is small.
3. Find a third sequence such that its alphabets appear in two other sequences in the
same order, but not necessarily consecutively. Longer the third sequence, the more
similar the other two sequences are.
Subsequence:
Given a sequence X = <x1, x2, . . . xm>, another sequence Z = <Z1, Z2, . . . Zk> is a
subsequence of X if there exists a strictly increasing sequence <i1, i2, . . . ik> of indices
of X such that for all j = 1, 2, . . . , k, we have xij
= zj.
Eg.: X = <A, B, C, B, D, A, B>
Subsequence: Z = <B, C, D, B,>, Index Sequence <2, 3, 5, 7>.
3. Introduction…
Common Subsequence:
Given two sequences X and Y, we say that a sequence Z is a common subsequence of X
and Y if Z is a subsequence of both X and Y
Eg.: X = <A, B, C, B, D, A, B> and Y = <B, D, C, A, B, A>
Common Subsequence: Z = <B, C, A>
Longest Common Subsequence (LCS) Problem:
Given two sequences X = <x1, x2, . . . xm> and Y = <y1, y2, . . . yn>, and wish to find a
Maximum Length Common Subsequence of X and Y.
Eg.: S1 = ACCGGTCGAGTGCGCGGAAGCCGGCCGAA
S2 = GTCGTTCGGAATGCCGTTGCTCTGTAAA
LCS: S3 = GTCGTCGGAAGCCGGCCGAA
4. Introduction…
Dynamic Programming Approach to solve the LCS Problem:
1. Characterizing a LCS.
2. A Recursive Solution.
3. Computing the length of an LCS.
4. Constructing an LCS.
5. Characterizing a LCS
Theorem:
Let X = <x1, x2, . . . xm> and Y = <y1, y2, . . . yn>, be sequences, and let Z = <Z1, Z2, . . .
Zk> be any LCS of X and Y .
1. If (xm = yn), then zk = xm = yn and Z k-1 is an LCS of Xm-1 and Yn-1.
2. If (xm ≠ yn), then (zk ≠ xm) implies that Z is an LCS of Xm-1 and Y.
3. If (xm ≠ yn), then (zk ≠ yn) implies that Z is an LCS of X and Yn-1.
Proof:
1. xm = yn
Assumption: (zk ≠ xm) (T1.1)
(T1.1) → xm = yn can be appended to Z to obtain a common subsequence of X and
Y of length k + 1 (T1.2)
(T1.2) contradicts the supposition that Z is a longest common subsequence of X
and Y. Hence, zk = xm = yn
6. Characterizing a LCS…
Prefix Zk-1 is a length k – 1 common subsequence of xm-1 and yn-1:
Assumption: Let W be a common subsequence of xm-1 and yn-1 with length greater than
k – 1. (T1.3)
(T1.3) → Appending xm = yn to W produces a common subsequence of X and Y whose
length is greater than k. (T1.4)
(T1.4) contradicts the supposition that Z is a longest common subsequence of X and Y.
Hence, Zk-1 is a length k – 1 common subsequence of xm-1 and yn-1.
2. If (xm ≠ yn), then (zk ≠ xm) implies that Z is an LCS of Xm-1 and Y
Assumption: Let W be a common subsequence of Xm-1 and Y with length greater than k.(T2.1)
xm ≠ yn (T2.2)
(T2.1) and (T2.2) → W would also be a common subsequence of Xm and Y (T2.3)
(T2.3) contradicts the assumption that Z is an LCS of X and Y.
Hence, there is no other common subsequence of Xm-1 and Y which is longer than Z.
7. Characterizing a LCS…
The Theorem implies than an LCS of two sequences contains within it an LCS of
prefixes of the two sequences.
Thus, the LCS problem has an Optimal-Substructure property.
8. A Recursive Solution
To Find the LCS of X and Y:
1. If (xm = yn)
Find an LCS of Xm-1 and Yn-1.
Append xm = yn to this LCS.
2. If (xm ≠ yn)
Find an LCS of Xm-1 and Y
Find an LCS of X and Yn-1
Return the LCS that is longer.
Overlapping Subproblems:
• To find an LCS of X and Y, Find LCSs of X and Yn-1 and of Xm-1 and Y.
• Each of these subproblems has the subproblem of finding an LCS of Xm-1 and Yn-1.
9. A Recursive Solution…
c [i, j]: Length of an LCS of the sequences Xi and Yj
Algorithm LCS-LENGTH (X, Y):
• c [0 . . m, 0 . . n]: Stores the c [i, j] values
• b [1 . . m, 1 . . n]: Points to the table entry corresponding to the optimal subproblem
solution chosen when computing c [i, j].
• c [m, n] : Contains the Length of an LCS of X and Y.
10. Computing the Length of an LCS
LCS-LENGTH (X, Y)
m = X.length
n = Y.length
Let b [1. . m, 1 . . n] and
c [0. . m, 0 . . n] be new tables
For (i = 1 to m)
c [i, 0] = 0
For (j = 0 to n)
c [0, j] = 0
For (i = 1 to m)
For (j = 1 to n)
if (Xi == Yj)
c [i, j] = c [i - 1, j - 1] + 1
b [i, j] = “ ”
else if (c [i - 1, j] ≥ c [i, j - 1])
c [i, j] = c [i - 1, j]
b [i, j] = “↑”
else c [i, j] = c [i, j - 1]
b [i, j] = “←”
Return c and b
11. Computing the Length of an LCS…
Running Time = Ө (mn)
(The algorithm has to fill n x m
entries of the table)
Eg.:
X = <A, B, C, B, D, A, B>
Y = <B, D, C, A, B, A>
Result:
Length of the LCS: c [7, 6] = 4
LCS: <B, C, B, A>
12. PRINT-LCS (b, X, i, j)
If (i == 0 or j == 0)
return
If (b [i, j] = “ ”)
PRINT-LCS (b, X, i - 1, j - 1)
print xi
else If (b [i, j] = “↑”)
PRINT-LCS (b, X, i - 1, j)
Else PRINT-LCS (b, X, i, j - 1)
Constructing an LCS
• b table enables to quickly construct
an LCS of X = <x1, x2, . . . xm> and
Y = <y1, y2, . . . yn>.
• Begin at b [m, n] and trace through
the table by following the arrows.
• Running Time: O (m + n)
13. References:
• Thomas H Cormen. Charles E Leiserson, Ronald L Rivest, Clifford Stein,
Introduction to Algorithms, Third Edition, The MIT Press Cambridge,
Massachusetts London, England.