Department of Computer Science
DCS
COMSATS Institute of
Information Technology
Design and Analysis of Algorithms
Tanveer Ahmed
Department of Computer Science
Dynamic Programming for Solving
Optimization Problems
(Longest Common Subsequence)
Lecture No. 27
Department of Computer Science
 In biological applications, we often want to compare
the DNA of two (or more) different organisms.
 A part of DNA consists of a string of molecules
called bases, where the possible bases are
 adenine,
 guanine,
 cytosine, and
 thymine.
 Represent each of the bases by their initial letters
 A part of DNA can be expressed as a string over the
finite set {A, C, G, T}.
An Introduction
Department of Computer Science
 For example, the DNA of one organism may be
S1= CCGGTCGAGTGCGCGGAAGCCGGCCGAA,
 While the DNA of another organism may be
S2 = GTCGTTCGGAATGCCGTTGCTCTGTAAA.
 One goal of comparing two parts of DNA is to
determine how “similar” two parts are, OR
 Measure of how closely related two organisms are.
 As we know that similarity is an ambiguous term
and can be defined in many different ways.
 Here we give some ways of defining it with
reference to this problem.
An Introduction
Department of Computer Science
 For example, we can say that two DNA parts are
similar if one is a substring of the other. In our
case, neither S1 nor S2 is a substring of the other.
This will be discussed in string matching.
 Alternatively, we could say two parts are similar if
changes needed to turn one to other is small.
 Another way to measure similarity is by finding third
part S3 in which bases in S3 appear in both S1, S2
 Bases must preserve order, may not consecutively.
 Longer S3 we can find, more similar S1 and S2 are.
 In above, S3 is GTCGTCGGAAGCCGGCCGAA
An Introduction: Similarity
Department of Computer Science
 In mathematics, a subsequence of some sequence
is a new sequence which is formed from original
one by deleting some elements without disturbing
the relative positions of the remaining elements.
Examples:
 < B,C,D,B > is a subsequence of
< A,C,B,D,E,G,C,E,D,B,G > ,
with corresponding index sequence <3,7,9,10>.
 < D, E, E, B > is also a subsequence of the same
< A,C,B,D,E,G,C,E,D,B,G > ,
with corresponding index sequence <4,5,8,10>.
What is a Subsequence?
Department of Computer Science
 The sequence Z = (B, C, A) is a subsequence of
X = (A, B, C, B, D, A, B).
 The sequence Z = (B, C, A) is also a subsequence of
Y = (B, D, C, A, B, A).
 Of course, it is a common subsequence of X and Y.
 But the above sequence is not a longest common
subsequence
 This is because the sequence Z’ = (B, D, A, B)
is a longer subsequence of
X = (A, B, C, B, D, A, B) and
Y = (B, D, C, A, B, A)
Longest Common Subsequence
Department of Computer Science
Statement:
 In the longest-common-subsequence (LCS) problem,
we are given two sequences
X = <x1, x2, . . . , xm> and
Y = <y1, y2, . . . , yn>
 And our objective is to find a maximum-length
common subsequence of X and Y.
Note:
 This LCS problem can be solved using brute force
approach as well but using dynamic programming it
will be solved more efficiently.
Problem
Department of Computer Science
 First we enumerate all the subsequences of X
= <x1, x2, . . . , xm>.
 There will be 2m such subsequences.
 Then we check if a subsequence of X is also a
subsequence of Y.
 In this way, we can compute all the common
subsequences of X and Y.
 Certainly, this approach requires exponential time,
making it impractical for long sequences.
Note:
 Because this problem has an optimal sub-structure
property, and hence can be solved using approach
of dynamic programming
Brute Force Approach
Department of Computer Science
Dynamic Programming Solution
Department of Computer Science
 As we shall see, the natural classes of sub-
problems correspond to pairs of “prefixes” of the
two input sequences.
 To be precise, given a sequence X = <x1, x2, ...,
xm>, we define the ith prefix of X, for i = 0, 1, ...,
m, as Xi = <x1, x2, ..., xi>.
Examples,
If X = <A, B, C, B, D, A, B> then
 X4 = <A, B, C, B> and
 X0 is the empty sequence = < >
Towards Optimal Substructure of LCS: Prefixes
Department of Computer Science
 If X = (x1, x2,. . ., xm), and Y = (y1, y2, . . ., yn) be
sequences and let us suppose that Z = (z1, z2, . . .,
zk) be a longest common sub-sequence of X and Y
 Let, Xi = (x1, x2, …, xi), Yj = (y1, y2, …, yj) and
Zl = (z1, z2, …, zl) are prefixes of X, Y and Z res.
1. if xm = yn, then zk = xm and Zk – 1 is LCS of Xm – 1, Yn-
1.
2. If xm  yn, then zk  xm implies that Z is LCS of Xm – 1
and Y
3. If xm  yn, then zk  yn implies that Z is LCS of X and
Yn – 1
Theorem: Optimal Substructure of an LCS
Department of Computer Science
Case 1
 On contrary suppose that xm = yn but zk ≠ xm,
 Then we could append xm = yn to Z to obtain a common
subsequence of X and Y of length k + 1, contradicting
the supposition that Z is a LCS of X and Y.
 Thus, we must have zk = xm = yn.
 Now, the prefix Zk-1 is a length-(k - 1) common
subsequence of Xm-1 and Yn-1.
Now we wish to show that it is an LCS.
 Suppose, there is a common subsequence W of Xm-1 and
Yn-1 with length greater than k - 1.
 Appending xm = yn to W gives common subsequence of
X and Y whose length is greater than k, a contradiction.
Proof of Theorem
Department of Computer Science
Case 2
 If zk ≠ xm, then Z is a common subsequence of
Xm-1 and Y.
 If there were a common subsequence W of Xm-1
and Y with length greater than k, then W would
also be a common subsequence of Xm and Y,
contradicting the assumption that Z is an LCS of
X and Y.
Case 3
 The proof is symmetric to (2).
Theorem: Optimal Substructure of an LCS
Department of Computer Science
 If X = (x1, x2,. . ., xm), and Y = (y1, y2, . . ., yn) be
sequences and let us suppose that Z = (z1, z2, . . .,
zk) be a longest common sub-sequence of X and Y
1. if xm = yn, then zk = xm and Zk – 1 is LCS of Xm – 1, Yn-
1.
2. If xm  yn, then zk  xm implies that Z is LCS of Xm – 1
and Y
3. If xm  yn, then zk  yn implies that Z is LCS of X and
Yn – 1
Theorem: Optimal Substructure of an LCS

















j
i
j
i
y
x
and
0
j
i,
if
))
1
,
(
),
,
1
(
max(
y
x
and
0
j
i,
if
1
)
1
,
1
(
0
j
OR
0
i
if
0
)
,
(
j
i
c
j
i
c
j
i
c
j
i
c
Department of Computer Science
Problem
If X = <A, B, C, B, D, A, B>, and Y = <B, D, C, A,
B, A> are two sequences then compute a
maximum-length common subsequence of X and Y.
Solution:
 Let c(i, j) = length of LCS of Xi and Yj, now we have
to compute c(7, 6).
 The recursive mathematical formula computing LCS
is given below

















j
i
j
i
y
x
and
0
j
i,
if
))
1
,
(
),
,
1
(
max(
y
x
and
0
j
i,
if
1
)
1
,
1
(
0
j
OR
0
i
if
0
)
,
(
j
i
c
j
i
c
j
i
c
j
i
c
Example
Department of Computer Science
If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>
 c(1, 1) = max (c(0, 1), c(1, 0)) = max (0, 0) = 0
b[1, 1] = 
 c(1, 2) = max (c(0, 2), c(1, 1)) = max (0, 0) = 0
b[1, 2] = 
 c(1, 3) = max (c(0, 3), c(1, 2)) = max (0, 0) = 0
b[1, 3] = 
 c(1, 4) = c(0, 3) + 1 = 0 + 1 = 1; b[1, 4] =
 c(1, 5) = max (c(0, 5), c(1, 4)) = max (0, 1) = 1
b[1, 5] = 
 c(1, 6) = c(0, 5) + 1 = 0 + 1 = 1; b[1, 6] =
Example
↖
Department of Computer Science
If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>
 c(2, 1) = c(1, 0) + 1 = 0 + 1 = 1; b[2, 1] =
 c(2, 2) = max (c(1, 2), c(2, 1)) = max (0, 1) = 1
b[2, 2] = 
 c(2, 3) = max (c(1, 3), c(2, 2)) = max (0, 1) = 1
b[2, 3] = 
 c(2, 4) = max (c(1, 4), c(2, 3)) = max (1, 1) = 1
b[2, 4] = 
 c(2, 5) = c(1, 4) + 1 = 1 + 1 = 2; b[2, 5] =
 c(2, 6) = max (c(1, 6), c(2, 5)) = max (1, 2) = 2
b[2, 6] = 
Example
↖
↖
Department of Computer Science
If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>
 c(3, 1) = max (c(2, 1), c(3, 0)) = max (1, 0) = 1
b[3, 1] = 
 c(3, 2) = max (c(2, 2), c(3, 1)) = max (1, 1) = 1
b[3, 2] = 
 c(3, 3) = c(2, 2) + 1 = 1 + 1 = 2; b[3, 3] =
 c(3, 4) = max (c(2, 4), c(3, 3)) = max (1, 2) = 2
b[3, 4] = 
 c(3, 5) = max (c(2, 5), c(3, 4)) = max (2, 2) = 2
b[3, 5] = 
 c(3, 6) = max (c(2, 6), c(3, 5)) = max (2, 2) = 2
b[2, 6] = 
Example
↖
Department of Computer Science
If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>
 c(4, 1) = c(3, 0) + 1 = 0 + 1 = 1; b[4, 1] =
 c(4, 2) = max (c(3, 2), c(4, 1)) = max (1, 1) = 1
b[4, 2] = 
 c(4, 3) = max (c(3, 3), c(4, 2)) = max (2, 1) = 2
b[4, 3] = ↑
 c(4, 4) = max (c(3, 4), c(4, 3)) = max (2, 2) = 2
b[4, 4] = 
 c(4, 5) = c(3, 4) + 1 = 2 + 1 = 3; b[4, 5] =
 c(4, 6) = max (c(3, 6), c(4, 5)) = max (2, 3) = 3
b[4, 6] = 
Example
↖
↖
Department of Computer Science
If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>
 c(5, 1) = max (c(4, 1), c(5, 0)) = max (1, 0) = 1
b[5, 1] = ↑
 c(5, 2) = c(4, 1) + 1 = 1 + 1 = 2; b[5, 2] =
 c(5, 3) = max (c(4, 3), c(5, 2)) = max (2, 2) = 2
b[5, 3] = 
 c(5, 4) = max (c(4, 4), c(5, 3)) = max (2, 2) = 2
b[5, 4] = 
 c(5, 5) = max (c(4, 5), c(5, 4)) = max (3, 2) = 3
b[5, 5] = ↑
 c(5, 6) = max (c(4, 6), c(5, 5)) = max (3, 3) = 3
b[5, 6] = 
Example
↖
Department of Computer Science
If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>
 c(6, 1) = max (c(5, 1), c(6, 0)) = max (1, 0) = 1
b[6, 1] = ↑
 c(6, 2) = max (c(5, 2), c(6, 1)) = max (2, 1) = 2
b[6, 1] = ↑
 c(6, 3) = max (c(5, 3), c(6, 2)) = max (2, 2) = 2
b[6, 3] = 
 c(6, 4) = c(5, 3) + 1 = 2 + 1 = 3; b[6, 4] =
 c(6, 5) = max (c(5, 5), c(6, 4)) = max (2, 3) = 3
b[6, 5] = 
 c(6, 6) = c(5, 5) + 1 = 3 + 1 = 4; b[6, 6] =
Example
↖
↖
Department of Computer Science
If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>
 c(7, 1) = c(6, 0) + 1 = 0 + 1 = 1; b[7, 1] =
 c(7, 2) = max (c(6, 2), c(7, 1)) = max (2, 1) = 2
b[7, 2] = ↑
 c(7, 3) = max (c(6, 3), c(7, 2)) = max (2, 2) = 2
b[7, 3] = 
 c(7, 4) = max (c(6, 4), c(7, 3)) = max (3, 2) = 3
b[7, 4] = ↑
 c(7, 5) = c(6, 4) + 1 = 3 + 1 = 4; b[7, 5] =
 c(7, 6) = max (c(6, 6), c(7, 5)) = max (4, 4) = 4
b[7, 6] = 
Example
↖
↖
Department of Computer Science
0
1
2
3
5
4
6
7
j
i
0 1 2 3 4 5 6
A
B
C
D
B
A
B
B D C A B A
xi
yj
0 0 0 0 0 0 0
0
0
1
2
3
5
4
6
7
j
i
0 1 2 3 4 5 6
A
B
C
D
B
A
B
B D C A B A
xi
yj
• • • • • • •
0 •
0 •
←
0 ←
0 ←
1 ↖
1 ←
1 ↖
0
0
1 1 1 1 2 2
1 1 2 2 2 2
•
•
 ← ← ← ↖ ←
↑ ← ↖ ← ← ←
1 ↖
1 ←
2 ↑
2 ←
3 ↖
3 ←
0
0
0
1 2 2 2 3 3
1 2 2 3 3 4
1 2 2 3 4 4
•
•
•
↑ ↖ ← ← ↑ ←
↑ ↑ ← ↖ ← ↖
↖ ↑ ← ↑ ↖ ←
Results

















j
i
j
i
y
x
and
0
j
i,
if
))
1
,
(
),
,
1
(
max(
y
x
and
0
j
i,
if
1
)
1
,
1
(
0
j
OR
0
i
if
0
)
,
(
j
i
c
j
i
c
j
i
c
j
i
c
Department of Computer Science
0
1
2
3
5
4
6
7
0 1 2 3 4 5 6
xi
yj
0 0 0 0 0 0 0
0
0
0
0
0
0
0
0 0 0 1 1 1
1 1 1 1 2 2
1 1 2 2 2 2
1 1 2 2 3 3
1 2 2 2 3 3
1 2 2 3 3 4
1 2 2 3 4 4
0
1
2
3
5
4
6
7
0 1 2 3 4 5 6
xi
yj
• • • • • • •
•
•
•
•
•
•
•
← ← ← ↖ ← ↖
↖ ← ← ← ↖ ←
↑ ← ↖ ← ← ←
↖ ← ↑ ← ↖ ←
↑ ↖ ← ← ↑ ←
↑ ↑ ← ↖ ← ↖
↖ ↑ ← ↑ ↖ ←
j
i
j
i
A
B
C
D
B
A
B
A
B
C
D
B
A
B
B D C A B A B D C A B A
Computable Tables

















j
i
j
i
y
x
and
0
j
i,
if
))
1
,
(
),
,
1
(
max(
y
x
and
0
j
i,
if
1
)
1
,
1
(
0
j
OR
0
i
if
0
)
,
(
j
i
c
j
i
c
j
i
c
j
i
c
Department of Computer Science
0
1
2
3
5
4
6
7
0 1 2 3 4 5 6
A
B
C
D
B
A
B
B D C A B A
xi
yj
0 0 0 0 0 0 0
0
0
0
0
0
0
0
0 0 0 1 1 1
1 1 1 1 2 2
1 1 2 2 2 2
1 1 2 2 3 3
1 2 2 2 3 3
1 2 2 3 3 4
1 2 2 3 4 4
0
1
2
3
5
4
6
7
0 1 2 3 4 5 6
A
B
C
D
B
A
B
B D C A B A
xi
yj
• • • • • • •
•
•
•
•
•
•
•
← ← ← ↖ ← ↖
↖ ← ← ← ↖ ←
↑ ← ↖ ← ← ←
↖ ← ↑ ← ↖ ←
↑ ↖ ← ← ↑ ←
↑ ↑ ← ↖ ← ↖
↖ ↑ ← ↑ ↖ ←
j
i
j
i
Computable Tables

















j
i
j
i
y
x
and
0
j
i,
if
))
1
,
(
),
,
1
(
max(
y
x
and
0
j
i,
if
1
)
1
,
1
(
0
j
OR
0
i
if
0
)
,
(
j
i
c
j
i
c
j
i
c
j
i
c
Department of Computer Science
0
1
2
3
5
4
6
7
0 1 2 3 4 5 6
B D C A B A
xi
yj
0 0 0 0 0 0 0
0
0
0
0
0
0
0
0 0 0 1 1 1
1 1 1 1 2 2
1 1 2 2 2 2
1 1 2 2 3 3
1 2 2 2 3 3
1 2 2 3 3 4
1 2 2 3 4 4
 Table size: O(n.m)
 Every entry takes O(1)
time to compute.
 The algorithm takes
O(n.m) time and space.
 The space complexity can
be reduced to
2 · min(m, n) + O(1).
j
i
A
B
C
D
B
A
B
Computable Tables
Department of Computer Science
c[i, j] = c(i-1, j-1) + 1 if xi = yj;
c[i, j] = max( c(i-1, j), c(i, j-1)) if xi ≠ yj;
c[i, j] = 0 if (i = 0) or (j = 0).
function LCS(X, Y)
1 m  length [X]
2 n  length [Y]
3 for i  1 to m
4 do c[i, 0]  0;
5 for j  1 to n
6 do c[0, j]  0;
Longest Common Subsequence Algorithm
Department of Computer Science
c[i, j] = c(i-1, j-1) + 1 if xi = yj;
c[i, j] = max( c(i-1, j), c(i, j-1)) if xi ≠ yj;
c[i, j] = 0 if (i = 0) or (j = 0).
7. for i  1 to m
8 do for j  1 to n
9 do if (xi = = yj)
10 then c[i, j]  c[i-1, j-1] + 1
11 b[i, j]  “ ”
12 else if c[i-1, j]  c[i, j-1]
13 then c[i, j]  c[i-1, j]
14 b[i, j]  “↑”
15 else c[i, j]  c[i, j-1]
16 b[i, j]  “”
17 Return c and b;
Longest Common Subsequence Algorithm
Department of Computer Science
c[i, j] = c(i-1, j-1) if xi = yj;
c[i, j] = min( c(i-1, j), c(i, j-1)) if xi ≠ yj;
c[i, j] = 0 if (i = 0) or (j = 0).
procedure PrintLCS(b, X, i, j)
1. if (i == 0) or (j == 0)
2. then return
3. if b[i, j] == “ ”
4. then PrintLCS(b, X, i-1, j-1)
5. Print xi
6. else if b[i, j]  “↑”
7. then PrintLCS(b, X, i-1, j)
8 else PrintLCS(b, X, i, j-1)
Construction of Longest Common Subsequence
Department of Computer Science
 Shortest common super-sequence problem is closely
related to longest common subsequence problem
Shortest common super-sequence
 Given two sequences:
X = < x1,...,xm > and
Y = < y1,...,yn >
 A sequence U = < u1,...,uk > is a common super-
sequence of X and Y if U is a super-sequence of both
X and Y.
 The shortest common supersequence (scs) is a
common supersequence of minimal length.
Relationship with shortest common supper-sequence
Department of Computer Science
Problem Statement
 The two sequences X and Y are given and task is to
find a shortest possible common supersequence.
 Shortest common supersequence is not unique.
 Easy to make SCS from LCS for 2 input sequences.
Example,
 X[1..m] = abcbdab
Y[1..n] = bdcaba
LCS = Z[1..r] = bcba
 Insert non-lcs symbols preserving order, we get
SCS = U[1..t] = abdcabdab.
Relationship with shortest common supper-sequence

Dynamic Programing_LCS.ppt

  • 1.
    Department of ComputerScience DCS COMSATS Institute of Information Technology Design and Analysis of Algorithms Tanveer Ahmed
  • 2.
    Department of ComputerScience Dynamic Programming for Solving Optimization Problems (Longest Common Subsequence) Lecture No. 27
  • 3.
    Department of ComputerScience  In biological applications, we often want to compare the DNA of two (or more) different organisms.  A part of DNA consists of a string of molecules called bases, where the possible bases are  adenine,  guanine,  cytosine, and  thymine.  Represent each of the bases by their initial letters  A part of DNA can be expressed as a string over the finite set {A, C, G, T}. An Introduction
  • 4.
    Department of ComputerScience  For example, the DNA of one organism may be S1= CCGGTCGAGTGCGCGGAAGCCGGCCGAA,  While the DNA of another organism may be S2 = GTCGTTCGGAATGCCGTTGCTCTGTAAA.  One goal of comparing two parts of DNA is to determine how “similar” two parts are, OR  Measure of how closely related two organisms are.  As we know that similarity is an ambiguous term and can be defined in many different ways.  Here we give some ways of defining it with reference to this problem. An Introduction
  • 5.
    Department of ComputerScience  For example, we can say that two DNA parts are similar if one is a substring of the other. In our case, neither S1 nor S2 is a substring of the other. This will be discussed in string matching.  Alternatively, we could say two parts are similar if changes needed to turn one to other is small.  Another way to measure similarity is by finding third part S3 in which bases in S3 appear in both S1, S2  Bases must preserve order, may not consecutively.  Longer S3 we can find, more similar S1 and S2 are.  In above, S3 is GTCGTCGGAAGCCGGCCGAA An Introduction: Similarity
  • 6.
    Department of ComputerScience  In mathematics, a subsequence of some sequence is a new sequence which is formed from original one by deleting some elements without disturbing the relative positions of the remaining elements. Examples:  < B,C,D,B > is a subsequence of < A,C,B,D,E,G,C,E,D,B,G > , with corresponding index sequence <3,7,9,10>.  < D, E, E, B > is also a subsequence of the same < A,C,B,D,E,G,C,E,D,B,G > , with corresponding index sequence <4,5,8,10>. What is a Subsequence?
  • 7.
    Department of ComputerScience  The sequence Z = (B, C, A) is a subsequence of X = (A, B, C, B, D, A, B).  The sequence Z = (B, C, A) is also a subsequence of Y = (B, D, C, A, B, A).  Of course, it is a common subsequence of X and Y.  But the above sequence is not a longest common subsequence  This is because the sequence Z’ = (B, D, A, B) is a longer subsequence of X = (A, B, C, B, D, A, B) and Y = (B, D, C, A, B, A) Longest Common Subsequence
  • 8.
    Department of ComputerScience Statement:  In the longest-common-subsequence (LCS) problem, we are given two sequences X = <x1, x2, . . . , xm> and Y = <y1, y2, . . . , yn>  And our objective is to find a maximum-length common subsequence of X and Y. Note:  This LCS problem can be solved using brute force approach as well but using dynamic programming it will be solved more efficiently. Problem
  • 9.
    Department of ComputerScience  First we enumerate all the subsequences of X = <x1, x2, . . . , xm>.  There will be 2m such subsequences.  Then we check if a subsequence of X is also a subsequence of Y.  In this way, we can compute all the common subsequences of X and Y.  Certainly, this approach requires exponential time, making it impractical for long sequences. Note:  Because this problem has an optimal sub-structure property, and hence can be solved using approach of dynamic programming Brute Force Approach
  • 10.
    Department of ComputerScience Dynamic Programming Solution
  • 11.
    Department of ComputerScience  As we shall see, the natural classes of sub- problems correspond to pairs of “prefixes” of the two input sequences.  To be precise, given a sequence X = <x1, x2, ..., xm>, we define the ith prefix of X, for i = 0, 1, ..., m, as Xi = <x1, x2, ..., xi>. Examples, If X = <A, B, C, B, D, A, B> then  X4 = <A, B, C, B> and  X0 is the empty sequence = < > Towards Optimal Substructure of LCS: Prefixes
  • 12.
    Department of ComputerScience  If X = (x1, x2,. . ., xm), and Y = (y1, y2, . . ., yn) be sequences and let us suppose that Z = (z1, z2, . . ., zk) be a longest common sub-sequence of X and Y  Let, Xi = (x1, x2, …, xi), Yj = (y1, y2, …, yj) and Zl = (z1, z2, …, zl) are prefixes of X, Y and Z res. 1. if xm = yn, then zk = xm and Zk – 1 is LCS of Xm – 1, Yn- 1. 2. If xm  yn, then zk  xm implies that Z is LCS of Xm – 1 and Y 3. If xm  yn, then zk  yn implies that Z is LCS of X and Yn – 1 Theorem: Optimal Substructure of an LCS
  • 13.
    Department of ComputerScience Case 1  On contrary suppose that xm = yn but zk ≠ xm,  Then we could append xm = yn to Z to obtain a common subsequence of X and Y of length k + 1, contradicting the supposition that Z is a LCS of X and Y.  Thus, we must have zk = xm = yn.  Now, the prefix Zk-1 is a length-(k - 1) common subsequence of Xm-1 and Yn-1. Now we wish to show that it is an LCS.  Suppose, there is a common subsequence W of Xm-1 and Yn-1 with length greater than k - 1.  Appending xm = yn to W gives common subsequence of X and Y whose length is greater than k, a contradiction. Proof of Theorem
  • 14.
    Department of ComputerScience Case 2  If zk ≠ xm, then Z is a common subsequence of Xm-1 and Y.  If there were a common subsequence W of Xm-1 and Y with length greater than k, then W would also be a common subsequence of Xm and Y, contradicting the assumption that Z is an LCS of X and Y. Case 3  The proof is symmetric to (2). Theorem: Optimal Substructure of an LCS
  • 15.
    Department of ComputerScience  If X = (x1, x2,. . ., xm), and Y = (y1, y2, . . ., yn) be sequences and let us suppose that Z = (z1, z2, . . ., zk) be a longest common sub-sequence of X and Y 1. if xm = yn, then zk = xm and Zk – 1 is LCS of Xm – 1, Yn- 1. 2. If xm  yn, then zk  xm implies that Z is LCS of Xm – 1 and Y 3. If xm  yn, then zk  yn implies that Z is LCS of X and Yn – 1 Theorem: Optimal Substructure of an LCS                  j i j i y x and 0 j i, if )) 1 , ( ), , 1 ( max( y x and 0 j i, if 1 ) 1 , 1 ( 0 j OR 0 i if 0 ) , ( j i c j i c j i c j i c
  • 16.
    Department of ComputerScience Problem If X = <A, B, C, B, D, A, B>, and Y = <B, D, C, A, B, A> are two sequences then compute a maximum-length common subsequence of X and Y. Solution:  Let c(i, j) = length of LCS of Xi and Yj, now we have to compute c(7, 6).  The recursive mathematical formula computing LCS is given below                  j i j i y x and 0 j i, if )) 1 , ( ), , 1 ( max( y x and 0 j i, if 1 ) 1 , 1 ( 0 j OR 0 i if 0 ) , ( j i c j i c j i c j i c Example
  • 17.
    Department of ComputerScience If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>  c(1, 1) = max (c(0, 1), c(1, 0)) = max (0, 0) = 0 b[1, 1] =   c(1, 2) = max (c(0, 2), c(1, 1)) = max (0, 0) = 0 b[1, 2] =   c(1, 3) = max (c(0, 3), c(1, 2)) = max (0, 0) = 0 b[1, 3] =   c(1, 4) = c(0, 3) + 1 = 0 + 1 = 1; b[1, 4] =  c(1, 5) = max (c(0, 5), c(1, 4)) = max (0, 1) = 1 b[1, 5] =   c(1, 6) = c(0, 5) + 1 = 0 + 1 = 1; b[1, 6] = Example ↖
  • 18.
    Department of ComputerScience If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>  c(2, 1) = c(1, 0) + 1 = 0 + 1 = 1; b[2, 1] =  c(2, 2) = max (c(1, 2), c(2, 1)) = max (0, 1) = 1 b[2, 2] =   c(2, 3) = max (c(1, 3), c(2, 2)) = max (0, 1) = 1 b[2, 3] =   c(2, 4) = max (c(1, 4), c(2, 3)) = max (1, 1) = 1 b[2, 4] =   c(2, 5) = c(1, 4) + 1 = 1 + 1 = 2; b[2, 5] =  c(2, 6) = max (c(1, 6), c(2, 5)) = max (1, 2) = 2 b[2, 6] =  Example ↖ ↖
  • 19.
    Department of ComputerScience If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>  c(3, 1) = max (c(2, 1), c(3, 0)) = max (1, 0) = 1 b[3, 1] =   c(3, 2) = max (c(2, 2), c(3, 1)) = max (1, 1) = 1 b[3, 2] =   c(3, 3) = c(2, 2) + 1 = 1 + 1 = 2; b[3, 3] =  c(3, 4) = max (c(2, 4), c(3, 3)) = max (1, 2) = 2 b[3, 4] =   c(3, 5) = max (c(2, 5), c(3, 4)) = max (2, 2) = 2 b[3, 5] =   c(3, 6) = max (c(2, 6), c(3, 5)) = max (2, 2) = 2 b[2, 6] =  Example ↖
  • 20.
    Department of ComputerScience If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>  c(4, 1) = c(3, 0) + 1 = 0 + 1 = 1; b[4, 1] =  c(4, 2) = max (c(3, 2), c(4, 1)) = max (1, 1) = 1 b[4, 2] =   c(4, 3) = max (c(3, 3), c(4, 2)) = max (2, 1) = 2 b[4, 3] = ↑  c(4, 4) = max (c(3, 4), c(4, 3)) = max (2, 2) = 2 b[4, 4] =   c(4, 5) = c(3, 4) + 1 = 2 + 1 = 3; b[4, 5] =  c(4, 6) = max (c(3, 6), c(4, 5)) = max (2, 3) = 3 b[4, 6] =  Example ↖ ↖
  • 21.
    Department of ComputerScience If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>  c(5, 1) = max (c(4, 1), c(5, 0)) = max (1, 0) = 1 b[5, 1] = ↑  c(5, 2) = c(4, 1) + 1 = 1 + 1 = 2; b[5, 2] =  c(5, 3) = max (c(4, 3), c(5, 2)) = max (2, 2) = 2 b[5, 3] =   c(5, 4) = max (c(4, 4), c(5, 3)) = max (2, 2) = 2 b[5, 4] =   c(5, 5) = max (c(4, 5), c(5, 4)) = max (3, 2) = 3 b[5, 5] = ↑  c(5, 6) = max (c(4, 6), c(5, 5)) = max (3, 3) = 3 b[5, 6] =  Example ↖
  • 22.
    Department of ComputerScience If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>  c(6, 1) = max (c(5, 1), c(6, 0)) = max (1, 0) = 1 b[6, 1] = ↑  c(6, 2) = max (c(5, 2), c(6, 1)) = max (2, 1) = 2 b[6, 1] = ↑  c(6, 3) = max (c(5, 3), c(6, 2)) = max (2, 2) = 2 b[6, 3] =   c(6, 4) = c(5, 3) + 1 = 2 + 1 = 3; b[6, 4] =  c(6, 5) = max (c(5, 5), c(6, 4)) = max (2, 3) = 3 b[6, 5] =   c(6, 6) = c(5, 5) + 1 = 3 + 1 = 4; b[6, 6] = Example ↖ ↖
  • 23.
    Department of ComputerScience If X = <A, B, C, B, D, A, B>, Y = <B, D, C, A, B, A>  c(7, 1) = c(6, 0) + 1 = 0 + 1 = 1; b[7, 1] =  c(7, 2) = max (c(6, 2), c(7, 1)) = max (2, 1) = 2 b[7, 2] = ↑  c(7, 3) = max (c(6, 3), c(7, 2)) = max (2, 2) = 2 b[7, 3] =   c(7, 4) = max (c(6, 4), c(7, 3)) = max (3, 2) = 3 b[7, 4] = ↑  c(7, 5) = c(6, 4) + 1 = 3 + 1 = 4; b[7, 5] =  c(7, 6) = max (c(6, 6), c(7, 5)) = max (4, 4) = 4 b[7, 6] =  Example ↖ ↖
  • 24.
    Department of ComputerScience 0 1 2 3 5 4 6 7 j i 0 1 2 3 4 5 6 A B C D B A B B D C A B A xi yj 0 0 0 0 0 0 0 0 0 1 2 3 5 4 6 7 j i 0 1 2 3 4 5 6 A B C D B A B B D C A B A xi yj • • • • • • • 0 • 0 • ← 0 ← 0 ← 1 ↖ 1 ← 1 ↖ 0 0 1 1 1 1 2 2 1 1 2 2 2 2 • •  ← ← ← ↖ ← ↑ ← ↖ ← ← ← 1 ↖ 1 ← 2 ↑ 2 ← 3 ↖ 3 ← 0 0 0 1 2 2 2 3 3 1 2 2 3 3 4 1 2 2 3 4 4 • • • ↑ ↖ ← ← ↑ ← ↑ ↑ ← ↖ ← ↖ ↖ ↑ ← ↑ ↖ ← Results                  j i j i y x and 0 j i, if )) 1 , ( ), , 1 ( max( y x and 0 j i, if 1 ) 1 , 1 ( 0 j OR 0 i if 0 ) , ( j i c j i c j i c j i c
  • 25.
    Department of ComputerScience 0 1 2 3 5 4 6 7 0 1 2 3 4 5 6 xi yj 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 2 1 1 2 2 2 2 1 1 2 2 3 3 1 2 2 2 3 3 1 2 2 3 3 4 1 2 2 3 4 4 0 1 2 3 5 4 6 7 0 1 2 3 4 5 6 xi yj • • • • • • • • • • • • • • ← ← ← ↖ ← ↖ ↖ ← ← ← ↖ ← ↑ ← ↖ ← ← ← ↖ ← ↑ ← ↖ ← ↑ ↖ ← ← ↑ ← ↑ ↑ ← ↖ ← ↖ ↖ ↑ ← ↑ ↖ ← j i j i A B C D B A B A B C D B A B B D C A B A B D C A B A Computable Tables                  j i j i y x and 0 j i, if )) 1 , ( ), , 1 ( max( y x and 0 j i, if 1 ) 1 , 1 ( 0 j OR 0 i if 0 ) , ( j i c j i c j i c j i c
  • 26.
    Department of ComputerScience 0 1 2 3 5 4 6 7 0 1 2 3 4 5 6 A B C D B A B B D C A B A xi yj 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 2 1 1 2 2 2 2 1 1 2 2 3 3 1 2 2 2 3 3 1 2 2 3 3 4 1 2 2 3 4 4 0 1 2 3 5 4 6 7 0 1 2 3 4 5 6 A B C D B A B B D C A B A xi yj • • • • • • • • • • • • • • ← ← ← ↖ ← ↖ ↖ ← ← ← ↖ ← ↑ ← ↖ ← ← ← ↖ ← ↑ ← ↖ ← ↑ ↖ ← ← ↑ ← ↑ ↑ ← ↖ ← ↖ ↖ ↑ ← ↑ ↖ ← j i j i Computable Tables                  j i j i y x and 0 j i, if )) 1 , ( ), , 1 ( max( y x and 0 j i, if 1 ) 1 , 1 ( 0 j OR 0 i if 0 ) , ( j i c j i c j i c j i c
  • 27.
    Department of ComputerScience 0 1 2 3 5 4 6 7 0 1 2 3 4 5 6 B D C A B A xi yj 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 2 1 1 2 2 2 2 1 1 2 2 3 3 1 2 2 2 3 3 1 2 2 3 3 4 1 2 2 3 4 4  Table size: O(n.m)  Every entry takes O(1) time to compute.  The algorithm takes O(n.m) time and space.  The space complexity can be reduced to 2 · min(m, n) + O(1). j i A B C D B A B Computable Tables
  • 28.
    Department of ComputerScience c[i, j] = c(i-1, j-1) + 1 if xi = yj; c[i, j] = max( c(i-1, j), c(i, j-1)) if xi ≠ yj; c[i, j] = 0 if (i = 0) or (j = 0). function LCS(X, Y) 1 m  length [X] 2 n  length [Y] 3 for i  1 to m 4 do c[i, 0]  0; 5 for j  1 to n 6 do c[0, j]  0; Longest Common Subsequence Algorithm
  • 29.
    Department of ComputerScience c[i, j] = c(i-1, j-1) + 1 if xi = yj; c[i, j] = max( c(i-1, j), c(i, j-1)) if xi ≠ yj; c[i, j] = 0 if (i = 0) or (j = 0). 7. for i  1 to m 8 do for j  1 to n 9 do if (xi = = yj) 10 then c[i, j]  c[i-1, j-1] + 1 11 b[i, j]  “ ” 12 else if c[i-1, j]  c[i, j-1] 13 then c[i, j]  c[i-1, j] 14 b[i, j]  “↑” 15 else c[i, j]  c[i, j-1] 16 b[i, j]  “” 17 Return c and b; Longest Common Subsequence Algorithm
  • 30.
    Department of ComputerScience c[i, j] = c(i-1, j-1) if xi = yj; c[i, j] = min( c(i-1, j), c(i, j-1)) if xi ≠ yj; c[i, j] = 0 if (i = 0) or (j = 0). procedure PrintLCS(b, X, i, j) 1. if (i == 0) or (j == 0) 2. then return 3. if b[i, j] == “ ” 4. then PrintLCS(b, X, i-1, j-1) 5. Print xi 6. else if b[i, j]  “↑” 7. then PrintLCS(b, X, i-1, j) 8 else PrintLCS(b, X, i, j-1) Construction of Longest Common Subsequence
  • 31.
    Department of ComputerScience  Shortest common super-sequence problem is closely related to longest common subsequence problem Shortest common super-sequence  Given two sequences: X = < x1,...,xm > and Y = < y1,...,yn >  A sequence U = < u1,...,uk > is a common super- sequence of X and Y if U is a super-sequence of both X and Y.  The shortest common supersequence (scs) is a common supersequence of minimal length. Relationship with shortest common supper-sequence
  • 32.
    Department of ComputerScience Problem Statement  The two sequences X and Y are given and task is to find a shortest possible common supersequence.  Shortest common supersequence is not unique.  Easy to make SCS from LCS for 2 input sequences. Example,  X[1..m] = abcbdab Y[1..n] = bdcaba LCS = Z[1..r] = bcba  Insert non-lcs symbols preserving order, we get SCS = U[1..t] = abdcabdab. Relationship with shortest common supper-sequence