Dynamic Programing_LCS.ppt

Department of Computer Science
DCS
COMSATS Institute of
Information Technology
Design and Analysis of Algorithms
Tanveer Ahmed

Dynamic Programming for Solving
Optimization Problems
(Longest Common Subsequence)
Lecture No. 27

 In biological applications, we often want to compare
the DNA of two (or more) different organisms.
 A part of DNA consists of a string of molecules
called bases, where the possible bases are
 adenine,
 guanine,
 cytosine, and
 thymine.
 Represent each of the bases by their initial letters
 A part of DNA can be expressed as a string over the
finite set {A, C, G, T}.
An Introduction

 For example, the DNA of one organism may be
S1= CCGGTCGAGTGCGCGGAAGCCGGCCGAA,
 While the DNA of another organism may be
S2 = GTCGTTCGGAATGCCGTTGCTCTGTAAA.
 One goal of comparing two parts of DNA is to
determine how “similar” two parts are, OR
 Measure of how closely related two organisms are.
 As we know that similarity is an ambiguous term
and can be defined in many different ways.
 Here we give some ways of defining it with
reference to this problem.
An Introduction

 For example, we can say that two DNA parts are
similar if one is a substring of the other. In our
case, neither S1 nor S2 is a substring of the other.
This will be discussed in string matching.
 Alternatively, we could say two parts are similar if
changes needed to turn one to other is small.
 Another way to measure similarity is by finding third
part S3 in which bases in S3 appear in both S1, S2
 Bases must preserve order, may not consecutively.
 Longer S3 we can find, more similar S1 and S2 are.
 In above, S3 is GTCGTCGGAAGCCGGCCGAA
An Introduction: Similarity

 In mathematics, a subsequence of some sequence
is a new sequence which is formed from original
one by deleting some elements without disturbing
the relative positions of the remaining elements.
Examples:
 < B,C,D,B > is a subsequence of
< A,C,B,D,E,G,C,E,D,B,G > ,
with corresponding index sequence <3,7,9,10>.
 < D, E, E, B > is also a subsequence of the same
< A,C,B,D,E,G,C,E,D,B,G > ,
with corresponding index sequence <4,5,8,10>.
What is a Subsequence?

 The sequence Z = (B, C, A) is a subsequence of
X = (A, B, C, B, D, A, B).
 The sequence Z = (B, C, A) is also a subsequence of
Y = (B, D, C, A, B, A).
 Of course, it is a common subsequence of X and Y.
 But the above sequence is not a longest common
subsequence
 This is because the sequence Z’ = (B, D, A, B)
is a longer subsequence of
X = (A, B, C, B, D, A, B) and
Y = (B, D, C, A, B, A)
Longest Common Subsequence

Statement:
 In the longest-common-subsequence (LCS) problem,
we are given two sequences
X = <x1, x2, . . . , xm> and
Y = <y1, y2, . . . , yn>
 And our objective is to find a maximum-length
common subsequence of X and Y.
Note:
 This LCS problem can be solved using brute force
approach as well but using dynamic programming it
will be solved more efficiently.
Problem

 First we enumerate all the subsequences of X
= <x1, x2, . . . , xm>.
 There will be 2m such subsequences.
 Then we check if a subsequence of X is also a
subsequence of Y.
 In this way, we can compute all the common
subsequences of X and Y.
 Certainly, this approach requires exponential time,
making it impractical for long sequences.
Note:
 Because this problem has an optimal sub-structure
property, and hence can be solved using approach
of dynamic programming
Brute Force Approach

Dynamic Programming Solution

 As we shall see, the natural classes of sub-
problems correspond to pairs of “prefixes” of the
two input sequences.
 To be precise, given a sequence X = <x1, x2, ...,
xm>, we define the ith prefix of X, for i = 0, 1, ...,
m, as Xi = <x1, x2, ..., xi>.
Examples,
If X = <A, B, C, B, D, A, B> then
 X4 = <A, B, C, B> and
 X0 is the empty sequence = < >
Towards Optimal Substructure of LCS: Prefixes

 If X = (x1, x2,. . ., xm), and Y = (y1, y2, . . ., yn) be
sequences and let us suppose that Z = (z1, z2, . . .,
zk) be a longest common sub-sequence of X and Y
 Let, Xi = (x1, x2, …, xi), Yj = (y1, y2, …, yj) and
Zl = (z1, z2, …, zl) are prefixes of X, Y and Z res.
1. if xm = yn, then zk = xm and Zk – 1 is LCS of Xm – 1, Yn-
1.
2. If xm  yn, then zk  xm implies that Z is LCS of Xm – 1
and Y
3. If xm  yn, then zk  yn implies that Z is LCS of X and
Yn – 1
Theorem: Optimal Substructure of an LCS

Case 1
 On contrary suppose that xm = yn but zk ≠ xm,
 Then we could append xm = yn to Z to obtain a common
subsequence of X and Y of length k + 1, contradicting
the supposition that Z is a LCS of X and Y.
 Thus, we must have zk = xm = yn.
 Now, the prefix Zk-1 is a length-(k - 1) common
subsequence of Xm-1 and Yn-1.
Now we wish to show that it is an LCS.
 Suppose, there is a common subsequence W of Xm-1 and
Yn-1 with length greater than k - 1.
 Appending xm = yn to W gives common subsequence of
X and Y whose length is greater than k, a contradiction.
Proof of Theorem

Case 2
 If zk ≠ xm, then Z is a common subsequence of
Xm-1 and Y.
 If there were a common subsequence W of Xm-1
and Y with length greater than k, then W would
also be a common subsequence of Xm and Y,
contradicting the assumption that Z is an LCS of
X and Y.
Case 3
 The proof is symmetric to (2).

 If X = (x1, x2,. . ., xm), and Y = (y1, y2, . . ., yn) be
sequences and let us suppose that Z = (z1, z2, . . .,
zk) be a longest common sub-sequence of X and Y
1. if xm = yn, then zk = xm and Zk – 1 is LCS of Xm – 1, Yn-
1.
2. If xm  yn, then zk  xm implies that Z is LCS of Xm – 1
and Y
3. If xm  yn, then zk  yn implies that Z is LCS of X and
Yn – 1

















j
i
j
i
y
x
and
0
j
i,
if
))
1
,
(
),
,
1
(
max(
y
x
and
0
j
i,
if
1
)
1
,
1
(
0
j
OR
0
i
if
0
)
,
(
j
i
c
j
i
c
j
i
c
j
i
c

Problem
If X = <A, B, C, B, D, A, B>, and Y = <B, D, C, A,
B, A> are two sequences then compute a
maximum-length common subsequence of X and Y.
Solution:
 Let c(i, j) = length of LCS of Xi and Yj, now we have
to compute c(7, 6).
 The recursive mathematical formula computing LCS
is given below

















j
i
j
i
y
x
and
0
j
i,
if
))
1
,
(
),
,
1
(
max(
y
x
and
0
j
i,
if
1
)
1
,
1
(
0
j
OR
0
i
if
0
)
,
(
j
i
c
j
i
c
j
i
c
j
i
c
Example

If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>
 c(1, 1) = max (c(0, 1), c(1, 0)) = max (0, 0) = 0
b[1, 1] = 
 c(1, 2) = max (c(0, 2), c(1, 1)) = max (0, 0) = 0
b[1, 2] = 
 c(1, 3) = max (c(0, 3), c(1, 2)) = max (0, 0) = 0
b[1, 3] = 
 c(1, 4) = c(0, 3) + 1 = 0 + 1 = 1; b[1, 4] =
 c(1, 5) = max (c(0, 5), c(1, 4)) = max (0, 1) = 1
b[1, 5] = 
 c(1, 6) = c(0, 5) + 1 = 0 + 1 = 1; b[1, 6] =
Example
↖

 c(2, 1) = c(1, 0) + 1 = 0 + 1 = 1; b[2, 1] =
 c(2, 2) = max (c(1, 2), c(2, 1)) = max (0, 1) = 1
b[2, 2] = 
 c(2, 3) = max (c(1, 3), c(2, 2)) = max (0, 1) = 1
b[2, 3] = 
 c(2, 4) = max (c(1, 4), c(2, 3)) = max (1, 1) = 1
b[2, 4] = 
 c(2, 5) = c(1, 4) + 1 = 1 + 1 = 2; b[2, 5] =
 c(2, 6) = max (c(1, 6), c(2, 5)) = max (1, 2) = 2
b[2, 6] = 
Example
↖
↖

 c(3, 1) = max (c(2, 1), c(3, 0)) = max (1, 0) = 1
b[3, 1] = 
 c(3, 2) = max (c(2, 2), c(3, 1)) = max (1, 1) = 1
b[3, 2] = 
 c(3, 3) = c(2, 2) + 1 = 1 + 1 = 2; b[3, 3] =
 c(3, 4) = max (c(2, 4), c(3, 3)) = max (1, 2) = 2
b[3, 4] = 
 c(3, 5) = max (c(2, 5), c(3, 4)) = max (2, 2) = 2
b[3, 5] = 
 c(3, 6) = max (c(2, 6), c(3, 5)) = max (2, 2) = 2
b[2, 6] = 
Example
↖

 c(4, 1) = c(3, 0) + 1 = 0 + 1 = 1; b[4, 1] =
 c(4, 2) = max (c(3, 2), c(4, 1)) = max (1, 1) = 1
b[4, 2] = 
 c(4, 3) = max (c(3, 3), c(4, 2)) = max (2, 1) = 2
b[4, 3] = ↑
 c(4, 4) = max (c(3, 4), c(4, 3)) = max (2, 2) = 2
b[4, 4] = 
 c(4, 5) = c(3, 4) + 1 = 2 + 1 = 3; b[4, 5] =
 c(4, 6) = max (c(3, 6), c(4, 5)) = max (2, 3) = 3
b[4, 6] = 
Example
↖
↖

 c(5, 1) = max (c(4, 1), c(5, 0)) = max (1, 0) = 1
b[5, 1] = ↑
 c(5, 2) = c(4, 1) + 1 = 1 + 1 = 2; b[5, 2] =
 c(5, 3) = max (c(4, 3), c(5, 2)) = max (2, 2) = 2
b[5, 3] = 
 c(5, 4) = max (c(4, 4), c(5, 3)) = max (2, 2) = 2
b[5, 4] = 
 c(5, 5) = max (c(4, 5), c(5, 4)) = max (3, 2) = 3
b[5, 5] = ↑
 c(5, 6) = max (c(4, 6), c(5, 5)) = max (3, 3) = 3
b[5, 6] = 
Example
↖

 c(6, 1) = max (c(5, 1), c(6, 0)) = max (1, 0) = 1
b[6, 1] = ↑
 c(6, 2) = max (c(5, 2), c(6, 1)) = max (2, 1) = 2
b[6, 1] = ↑
 c(6, 3) = max (c(5, 3), c(6, 2)) = max (2, 2) = 2
b[6, 3] = 
 c(6, 4) = c(5, 3) + 1 = 2 + 1 = 3; b[6, 4] =
 c(6, 5) = max (c(5, 5), c(6, 4)) = max (2, 3) = 3
b[6, 5] = 
 c(6, 6) = c(5, 5) + 1 = 3 + 1 = 4; b[6, 6] =
Example
↖
↖

 c(7, 1) = c(6, 0) + 1 = 0 + 1 = 1; b[7, 1] =
 c(7, 2) = max (c(6, 2), c(7, 1)) = max (2, 1) = 2
b[7, 2] = ↑
 c(7, 3) = max (c(6, 3), c(7, 2)) = max (2, 2) = 2
b[7, 3] = 
 c(7, 4) = max (c(6, 4), c(7, 3)) = max (3, 2) = 3
b[7, 4] = ↑
 c(7, 5) = c(6, 4) + 1 = 3 + 1 = 4; b[7, 5] =
 c(7, 6) = max (c(6, 6), c(7, 5)) = max (4, 4) = 4
b[7, 6] = 
Example
↖
↖

0
1
2
3
5
4
6
7
j
i
0 1 2 3 4 5 6
A
B
C
D
B
A
B
B D C A B A
xi
yj
0 0 0 0 0 0 0
0
0
1
2
3
5
4
6
7
j
i
0 1 2 3 4 5 6
A
B
C
D
B
A
B
B D C A B A
xi
yj
• • • • • • •
0 •
0 •
←
0 ←
0 ←
1 ↖
1 ←
1 ↖
0
0
1 1 1 1 2 2
1 1 2 2 2 2
•
•
 ← ← ← ↖ ←
↑ ← ↖ ← ← ←
1 ↖
1 ←
2 ↑
2 ←
3 ↖
3 ←
0
0
0
1 2 2 2 3 3
1 2 2 3 3 4
1 2 2 3 4 4
•
•
•
↑ ↖ ← ← ↑ ←
↑ ↑ ← ↖ ← ↖
↖ ↑ ← ↑ ↖ ←
Results

















j
i
j
i
y
x
and
0
j
i,
if
))
1
,
(
),
,
1
(
max(
y
x
and
0
j
i,
if
1
)
1
,
1
(
0
j
OR
0
i
if
0
)
,
(
j
i
c
j
i
c
j
i
c
j
i
c

0
1
2
3
5
4
6
7
0 1 2 3 4 5 6
xi
yj
0 0 0 0 0 0 0
0
0
0
0
0
0
0
0 0 0 1 1 1
1 1 1 1 2 2
1 1 2 2 2 2
1 1 2 2 3 3
1 2 2 2 3 3
1 2 2 3 3 4
1 2 2 3 4 4
0
1
2
3
5
4
6
7
0 1 2 3 4 5 6
xi
yj
• • • • • • •
•
•
•
•
•
•
•
← ← ← ↖ ← ↖
↖ ← ← ← ↖ ←
↑ ← ↖ ← ← ←
↖ ← ↑ ← ↖ ←
↑ ↖ ← ← ↑ ←
↑ ↑ ← ↖ ← ↖
↖ ↑ ← ↑ ↖ ←
j
i
j
i
A
B
C
D
B
A
B
A
B
C
D
B
A
B
B D C A B A B D C A B A
Computable Tables

















j
i
j
i
y
x
and
0
j
i,
if
))
1
,
(
),
,
1
(
max(
y
x
and
0
j
i,
if
1
)
1
,
1
(
0
j
OR
0
i
if
0
)
,
(
j
i
c
j
i
c
j
i
c
j
i
c

0
1
2
3
5
4
6
7
0 1 2 3 4 5 6
A
B
C
D
B
A
B
B D C A B A
xi
yj
0 0 0 0 0 0 0
0
0
0
0
0
0
0
0 0 0 1 1 1
1 1 1 1 2 2
1 1 2 2 2 2
1 1 2 2 3 3
1 2 2 2 3 3
1 2 2 3 3 4
1 2 2 3 4 4
0
1
2
3
5
4
6
7
0 1 2 3 4 5 6
A
B
C
D
B
A
B
B D C A B A
xi
yj
• • • • • • •
•
•
•
•
•
•
•
← ← ← ↖ ← ↖
↖ ← ← ← ↖ ←
↑ ← ↖ ← ← ←
↖ ← ↑ ← ↖ ←
↑ ↖ ← ← ↑ ←
↑ ↑ ← ↖ ← ↖
↖ ↑ ← ↑ ↖ ←
j
i
j
i
Computable Tables

















j
i
j
i
y
x
and
0
j
i,
if
))
1
,
(
),
,
1
(
max(
y
x
and
0
j
i,
if
1
)
1
,
1
(
0
j
OR
0
i
if
0
)
,
(
j
i
c
j
i
c
j
i
c
j
i
c

0
1
2
3
5
4
6
7
0 1 2 3 4 5 6
B D C A B A
xi
yj
0 0 0 0 0 0 0
0
0
0
0
0
0
0
0 0 0 1 1 1
1 1 1 1 2 2
1 1 2 2 2 2
1 1 2 2 3 3
1 2 2 2 3 3
1 2 2 3 3 4
1 2 2 3 4 4
 Table size: O(n.m)
 Every entry takes O(1)
time to compute.
 The algorithm takes
O(n.m) time and space.
 The space complexity can
be reduced to
2 · min(m, n) + O(1).
j
i
A
B
C
D
B
A
B
Computable Tables

c[i, j] = c(i-1, j-1) + 1 if xi = yj;
c[i, j] = max( c(i-1, j), c(i, j-1)) if xi ≠ yj;
c[i, j] = 0 if (i = 0) or (j = 0).
function LCS(X, Y)
1 m  length [X]
2 n  length [Y]
3 for i  1 to m
4 do c[i, 0]  0;
5 for j  1 to n
6 do c[0, j]  0;
Longest Common Subsequence Algorithm

c[i, j] = c(i-1, j-1) + 1 if xi = yj;
c[i, j] = max( c(i-1, j), c(i, j-1)) if xi ≠ yj;
c[i, j] = 0 if (i = 0) or (j = 0).
7. for i  1 to m
8 do for j  1 to n
9 do if (xi = = yj)
10 then c[i, j]  c[i-1, j-1] + 1
11 b[i, j]  “ ”
12 else if c[i-1, j]  c[i, j-1]
13 then c[i, j]  c[i-1, j]
14 b[i, j]  “↑”
15 else c[i, j]  c[i, j-1]
16 b[i, j]  “”
17 Return c and b;
Longest Common Subsequence Algorithm

c[i, j] = c(i-1, j-1) if xi = yj;
c[i, j] = min( c(i-1, j), c(i, j-1)) if xi ≠ yj;
c[i, j] = 0 if (i = 0) or (j = 0).
procedure PrintLCS(b, X, i, j)
1. if (i == 0) or (j == 0)
2. then return
3. if b[i, j] == “ ”
4. then PrintLCS(b, X, i-1, j-1)
5. Print xi
6. else if b[i, j]  “↑”
7. then PrintLCS(b, X, i-1, j)
8 else PrintLCS(b, X, i, j-1)
Construction of Longest Common Subsequence

 Shortest common super-sequence problem is closely
related to longest common subsequence problem
Shortest common super-sequence
 Given two sequences:
X = < x1,...,xm > and
Y = < y1,...,yn >
 A sequence U = < u1,...,uk > is a common super-
sequence of X and Y if U is a super-sequence of both
X and Y.
 The shortest common supersequence (scs) is a
common supersequence of minimal length.
Relationship with shortest common supper-sequence

Problem Statement
 The two sequences X and Y are given and task is to
find a shortest possible common supersequence.
 Shortest common supersequence is not unique.
 Easy to make SCS from LCS for 2 input sequences.
Example,
 X[1..m] = abcbdab
Y[1..n] = bdcaba
LCS = Z[1..r] = bcba
 Insert non-lcs symbols preserving order, we get
SCS = U[1..t] = abdcabdab.
Relationship with shortest common supper-sequence

Dynamic Programing_LCS.ppt

More Related Content

Similar to Dynamic Programing_LCS.ppt

Recently uploaded

Dynamic Programing_LCS.ppt