3.
Longest Common
Subsequence???
Biological applications often need to compare the
DNA of tow(or more) different organisms.
3
4.
Subsequence
A subsequence of a given sequence is just the given
sequence with zero or more elements left out.
Ex: app、le、ple and so on are
subsequences of “apple”.
4
5.
Common Subsequence
X = (A, B, C, B, D, A, B)
Y = (B, D, C, A, B, A)
Two sequences:
Sequence Z is a common subsequence of X and
Y if Z is a subsequence of both X and Y
Z = (B, C, A) — length 3
Z = (B, C, A, B) - length 4
Z = (B, D, A, B) — length 4
Z= — length 5 ???
longest
5
6.
What is longest Common
Subsequence problem?
X = (x1, x2,……., xm)
Y = (y1, y2,……., yn)
6
Find a maximum-length common subsequence of X
and Y
How to do?
Dynamic Programming!!!
Brute Force!!!
7.
Step 1: Characterize
optimality
Sequence X = (x1, x2,……., xm)
Define the ith prefix of X, for i = 0, 1,…, m
as Xi = (x1, x2, ..., xi)
with X0 representing the empty sequence.
EX: if X = (A, B, C, A, D, A, B) then
X4 = (A, B, C, A)
X0 = ( ) empty sequence
7
8.
Theorem (Optimal substructure of LCS)
8
1. If Xm = Yn, then Zk = Xm = Yn and Zk-1 is a
LCS of Xm-1 and Yn-1
2. If Xm ≠ Yn, then Zk ≠ Xm implies that Z is a
LCS of Xm-1 and Y
3. If Xm ≠ Yn, then Zk ≠ Yn implies that Z is a
LCS of X and Yn-1
X = (X1, X2,…, Xm) and Y = (Y1, Y2,…, Yn)
Sequences
Z = (Z1, Z2,…, Zk) be any LCS of X and Y
We assume:
9.
Optimal substructure problem
The LCS of the original two sequences contains a LCS
of prefixes of the two sequences.
(當一個問題存在著最佳解，則表示其所有子問題也必存
在著最佳解)
9
10.
Step 2: A recursive solution
Xi and Yj end with xi=yj
Zk is Zk -1 followed by Zk = Xi = Yj
where Zk-1 is an LCS of Xi-1 and Yj -1
LenLCS(i, j) = LenLCS(i-1, j-1)+1
Xi
x1 x2 … xi-1 xi
Yj
y1 y2 … yj-1 yj=xi
Zk
z1 z2…zk-1 zk =yj=xi
Case 1:
11.
Step 2: A recursive solution
Case 2,3: Xi and Yj end with xi ≠ yj
Xi
x1 x2 … xi-1 xi
Yj
y1 y2 … yj-1 yj
Zk
z1 z2…zk-1 zk ≠yj
Xi
x1 x2 … xi-1 x i
Yj
yj y1 y2 …yj-1 yj
Zk
z1 z2…zk-1 zk ≠ xi
Zk is an LCS of Xi and Yj -1 Zk is an LCS of Xi-1 and Yj
LenLCS(i, j)=max{LenLCS(i, j-1), LenLCS(i-1, j)}
12.
Step 2:A recursive solution
Let c[i,j] be the length of a LCS for Xi and Yj
the recursion described by the above cases as
12
Case 1
Reduces to the single subproblem of finding a LCS of
Xm-1, Yn-1 and adding Xm = Yn to the end of Z.
Cases 2 and 3
Reduces to two subproblems of finding a LCS of Xm-1,
Y and X, Yn-1 and selecting the longer of the two.
13.
Step 3: Compute the length of
the LCS
LCS problem has only ɵ(mn) distinct subproblems.
So?
Use Dynamic programming!!!
13
14.
Step 3: Compute the length of the LCS
Procedure 1
LCS-length takes two Sequences
X = (x1, x2,…, xm) and Y = (y1, y2,…, yn) as input.
Procedure 2
It stores the c[i, j] values in a table c[0..m, 0..n] and
it computes the entries in row-major order.
Procedure 3
Table b[1..m, 1..n] to construct an optimal solution.
b[i, j] points to the table entry corresponding to the
optimal solution chosen when computing c[i, j]
Procedure 4
Return the b and c tables; c[m, n] contains the length
of an LCS X and Y14
15.
LCS-Length(X, Y)
1 m = X.length
2 n = Y.length
3 let b[1..m, 1..n] and c[0..m, 0..n] be new tables.
4 for i 1 to m do
5 c[i, 0] = 0
6 for j 1 to n do
7 c[0, j] = 0
8 for i 1 to m do
9 for j 1 to n do
10 if xi ==yj
11 c[i, j] = c[i-1, j-1]+1
12 b[i, j] = “ ”
13 else if c[i-1, j] ≥ c[i, j-1]
14 c[i, j] = c[i-1, j]
15 b[i, j] = “ ”
16 else
17 c[i, j] = c[i, j-1]
18 b[i, j] = “ ”
19 return c and b 15
16.
The table produced by LCS-Length on the sequences
X = (A, B, C, B, D, A, B) and Y = (B, D, C, A, B, A).
16
The running time of the procedure is O(mn), since each table
entry table O(1) time to compute
17.
Step 4: Construct an optimal LCS
PRINT-LCS(b, X, i, j)
PRINT-LCS(b, X, X.length, Y.length)
1 if i == 0 or j == 0
2 return
3 if b[i, j] == “ ”
4 PRINT-LCS(b,X,i-1, j-1)
5 print Xi
6 else if b[i, j] == “ ”
7 PRINT-LCS(b,X,i-1, j)
8 else PRINT-LCS(b,X,i, j-1)
This procedure prints BCBA.
The procedure takes time O(m+n)
18.
Example
X = <A, B, C, B, A>
Y = <B, D, C, A, B>
We will fill in the table in row-major order starting in
the upper left corner using the following formulas:
19.
Example
X = <A, B, C, B, A>
Y = <B, D, C, A, B>
We will fill in the table in row-major order starting in
the upper left corner using the following formulas:
20.
Answer
Thus the optimal LCS length is c[m,n] = 3.
Optimal LCS starting at c[5,5] we get Z = <B, C, B>
Alternatively start at c[5,4]
we would produce Z = <B, C, A>.
*Note that the LCS is not unique but the optimal length of
the LCS is.
20
21.
Reference
Lecture 13: Dynamic Programming - Longest
Common Subsequence http://faculty.ycp.edu/
~dbabcock/cs360/lectures/lecture13.html
http://www.csie.ntnu.edu.tw/~u91029/
LongestCommonSubsequence.html
Longest common subsequence (Cormen et al., Sec.
15.4)
https://www.youtube.com/watch?v=Wv1y45iqsbk
https://www.youtube.com/watch?v=wJ-rP9hJXO0
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.