LONGEST COMMON SUBSEQUENCE
Presented by
1.Himanshu Sharma(RA2211031010146)
2.Pratyush Golwala(RA2211031010155)
3.Roneek Patel(RA2211031010154)
Problem
Statement
To design an algorithm to determine the longest
subsequence that is common to two given sequences.
Given two sequences, sequence A and sequence B, the
task is to identify the longest sequence of elements that
appear in the same relative order in both A and B. The
elements of the sequences may be characters, numbers,
or any comparable entities. The algorithm should
efficiently compute the length of the longest common
subsequence and optionally provide the actual
subsequence itself. The output of the algorithm should
enable users to understand the degree of similarity and
the shared elements between the two sequences. Ensure
that the algorithm is scalable and can handle sequences
of varying lengths and contents.
Introduction
The problem of finding the longest
subsequence that is common to two given
sequences, known as the Longest Common
Subsequence (LCS), is a fundamental challenge
in computer science with wide-ranging
applications.
01.
02.
03.
Versatility Across Domains: LCS (Longest Common
Subsequence) algorithms are versatile tools with
applications in diverse fields, including
bioinformatics, text comparison, plagiarism
detection, and version control systems. Their ability
to identify the maximum-length sequence of
elements in the same order within two sequences,
even if not consecutive, makes them valuable for
solving various problems across different domains.
Bioinformatics Applications: In bioinformatics, LCS
algorithms play a crucial role in the analysis of biological
sequences, such as DNA, RNA, and protein sequences.
They are extensively used for tasks like sequence
alignment, genome assembly, and evolutionary analysis.
By identifying common subsequences and differences
between genetic sequences, LCS algorithms contribute to
understanding genetic structures and functions, aiding
researchers in deciphering the complexities of biological
data.
Contributions to Knowledge and Innovation: The use of LCS
algorithms in different domains contributes to
advancements in knowledge and innovation. In
bioinformatics, these algorithms help uncover relationships
between biological sequences, leading to insights into
genetic evolution and function. Similarly, in text
comparison, plagiarism detection, and version control
systems, LCS algorithms enhance the efficiency of content
analysis and version tracking, fostering innovation in
information management and software development
processes.
Objective:
User-Friendly Interface:
• Develop an intuitive and user-friendly interface for inputting
sequences, configuring algorithm parameters, and
visualizing the results.
• Ensure accessibility for users with varying levels of technical
expertise, facilitating seamless integration into research
workflows, educational environments, or practical
applications.
Algorithm Development:
• Design and implement an efficient LCS algorithm capable of
identifying the longest common subsequence between two
given sequences, considering the non-consecutive nature of
elements.
• Optimize the algorithm for performance, ensuring
scalability to handle large datasets, and evaluate its time
and space complexity.
Cross-Domain Application:
• Extend the LCS algorithm's utility beyond bioinformatics by
incorporating features for text comparison, plagiarism
detection, or version control systems.
• Showcase the adaptability of the algorithm in different
contexts, demonstrating its effectiveness in addressing
problems related to content analysis, intellectual property
protection, and software version tracking.
Performance Evaluation:
• Conduct rigorous testing and performance evaluation of
the LCS algorithm in diverse scenarios, considering
varying sequence lengths, types of data, and real-world
use cases.
• Provide comprehensive documentation and benchmarks
to showcase the algorithm's robustness, efficiency, and
versatility across different application domains.
Comparison Of different
Algorithms:
●The brute force algorithm is the most basic method for
solving the LCS problem.
●It involves generating all possible subsequences of the
two input sequences and checking if they are common to
both sequences.
●The brute force algorithm has a time complexity of
O(2^n) and a space complexity of O(n), where n is the
length of the longer sequence.
●Although the brute force algorithm is simple to
implement, it is not efficient for large input sizes.
1.Brute Force
Algorithm
2.Divide and Conquer:
●The divide and conquer algorithm for LCS involves
dividing the input sequences into smaller subproblems,
solving these subproblems recursively, and then
combining the solutions to obtain the LCS of the original
sequences.
●The divide and conquer algorithm has a time
complexity of O(n^2logn) and a space complexity of
O(n^2).
●Although the divide and conquer algorithm is not as
efficient as the dynamic programming algorithm, it is
useful for solving other problems that involve finding
common subsequences
3.Dynamic Programming
●The dynamic programming algorithm for LCS is a more
efficient method for solving the problem.
● It involves breaking down the input sequences into
smaller subproblems and storing the solutions to these
subproblems in a table.
● The dynamic programming algorithm has a time
complexity of O(mn), where m and n are the lengths of
the input sequences, and a space complexity of O(mn).
● The dynamic programming algorithm is widely used
for solving the LCS problem due to its efficiency.
Pseudocode
Dynamic Programming:
def Ics(string1, string2):
if len (string1) == 0 or len(string2) == 0:
return ""
elif string1[-1] == string2[-1]:
return Ics(string1[:-1], string2[:-1]) + string1[-1]
else:
Ics1 = Ics(string1[:-1], string2)
Ics2 = Ics(string1, string2[:-1])
if len (Ics1) > len(Ics2):
return Ics1
else: return Ics2
Divide and Conquer:
def Ics(string1, string2):
m = len(string1)
n = len(string2)
# initialize table with zeros
table = [[0]*(n+1) for i in range(m+1)]
for i in range(1, m+1):
for j in range(1, n+1):
if string1[i-1] == string2[j-1]:
table[i][j] = table[i-1][j-1]+1
else:
table[i][j] = max(table[i-1][j], table[i][j-1])
#backtrack to find LCS
Ics = ""
while i > 0 and j > 0:
i = m
j = n
if string1 [i-1] == string2[j-1]:
Ics = string1 [i-1] + Ics i-=1
elif table[i-1][j] > table[i][j-1]: i-1
else: j-=1
j-=1
return Ics
Brute Force:
def Ics(string1, string2):
if len(string1) == 0 or len(string2) == 0:
return ""
elif string1[-1] == string2[-1]:
return Ics(string1[:-1], string2[:-1]) +
string1[-1]
else:
Source Code:
Brute Force Algorithm:
Source Code:
Dynamic Programming:
Source Code:
Divide and Conquer:
Time Complexity
There are two choices for each character in a string: it
can either be included in the subsequence or not. This
means there are 2^m possible subsequences of X and
2^n possible subsequences of Y. So in this recursive
approach, we are comparing each subsequence of X with
each subsequence of Y. So the overall time complexity is
O(2^m * 2^n) = O(2^(m + n)), which is inefficient for large
values of m and n.
The space complexity of this solution depends on the size
of the call stack, which is equal to the height of the
recursion tree. In this case, input parameters (m and n)
are at most decreasing by 1 on each recursive call and
terminate when either m or n becomes 0. The height of
the recursion tree in the worst case will be O(max(m, n)
and space complexity = O(max(m, n)).
Space Complexity
Brute Force
Algorithm
Time Complexity
In the worst case, we will be solving each subproblem
only once, and there are (m + 1) x (n + 1) different
subproblems. So, the time complexity is O(mn).
We are using a 2D table of size (m + 1) x (n + 1) to store
the solutions of the subproblems. So the space
complexity is O(mn).
Space Complexity
Dynamic
Programming
Conclusion:
1.We discussed three different algorithms for
solving the LCS problem: brute force, dynamic
programming, and divide and conquer.
2.We compared the time and space complexity of
these algorithms and discussed their pros and cons.
3. We also highlighted the numerous applications of
LCS in computer science and industry.
4.LCS is a fundamental problem in computer
science, and efficient algorithms for solving it are
crucial for many applications.

design and analysis of algorithm (Longest common subsequence)

  • 1.
    LONGEST COMMON SUBSEQUENCE Presentedby 1.Himanshu Sharma(RA2211031010146) 2.Pratyush Golwala(RA2211031010155) 3.Roneek Patel(RA2211031010154)
  • 2.
    Problem Statement To design analgorithm to determine the longest subsequence that is common to two given sequences. Given two sequences, sequence A and sequence B, the task is to identify the longest sequence of elements that appear in the same relative order in both A and B. The elements of the sequences may be characters, numbers, or any comparable entities. The algorithm should efficiently compute the length of the longest common subsequence and optionally provide the actual subsequence itself. The output of the algorithm should enable users to understand the degree of similarity and the shared elements between the two sequences. Ensure that the algorithm is scalable and can handle sequences of varying lengths and contents.
  • 3.
    Introduction The problem offinding the longest subsequence that is common to two given sequences, known as the Longest Common Subsequence (LCS), is a fundamental challenge in computer science with wide-ranging applications. 01. 02. 03. Versatility Across Domains: LCS (Longest Common Subsequence) algorithms are versatile tools with applications in diverse fields, including bioinformatics, text comparison, plagiarism detection, and version control systems. Their ability to identify the maximum-length sequence of elements in the same order within two sequences, even if not consecutive, makes them valuable for solving various problems across different domains. Bioinformatics Applications: In bioinformatics, LCS algorithms play a crucial role in the analysis of biological sequences, such as DNA, RNA, and protein sequences. They are extensively used for tasks like sequence alignment, genome assembly, and evolutionary analysis. By identifying common subsequences and differences between genetic sequences, LCS algorithms contribute to understanding genetic structures and functions, aiding researchers in deciphering the complexities of biological data. Contributions to Knowledge and Innovation: The use of LCS algorithms in different domains contributes to advancements in knowledge and innovation. In bioinformatics, these algorithms help uncover relationships between biological sequences, leading to insights into genetic evolution and function. Similarly, in text comparison, plagiarism detection, and version control systems, LCS algorithms enhance the efficiency of content analysis and version tracking, fostering innovation in information management and software development processes.
  • 4.
    Objective: User-Friendly Interface: • Developan intuitive and user-friendly interface for inputting sequences, configuring algorithm parameters, and visualizing the results. • Ensure accessibility for users with varying levels of technical expertise, facilitating seamless integration into research workflows, educational environments, or practical applications. Algorithm Development: • Design and implement an efficient LCS algorithm capable of identifying the longest common subsequence between two given sequences, considering the non-consecutive nature of elements. • Optimize the algorithm for performance, ensuring scalability to handle large datasets, and evaluate its time and space complexity. Cross-Domain Application: • Extend the LCS algorithm's utility beyond bioinformatics by incorporating features for text comparison, plagiarism detection, or version control systems. • Showcase the adaptability of the algorithm in different contexts, demonstrating its effectiveness in addressing problems related to content analysis, intellectual property protection, and software version tracking. Performance Evaluation: • Conduct rigorous testing and performance evaluation of the LCS algorithm in diverse scenarios, considering varying sequence lengths, types of data, and real-world use cases. • Provide comprehensive documentation and benchmarks to showcase the algorithm's robustness, efficiency, and versatility across different application domains.
  • 5.
    Comparison Of different Algorithms: ●Thebrute force algorithm is the most basic method for solving the LCS problem. ●It involves generating all possible subsequences of the two input sequences and checking if they are common to both sequences. ●The brute force algorithm has a time complexity of O(2^n) and a space complexity of O(n), where n is the length of the longer sequence. ●Although the brute force algorithm is simple to implement, it is not efficient for large input sizes. 1.Brute Force Algorithm
  • 6.
    2.Divide and Conquer: ●Thedivide and conquer algorithm for LCS involves dividing the input sequences into smaller subproblems, solving these subproblems recursively, and then combining the solutions to obtain the LCS of the original sequences. ●The divide and conquer algorithm has a time complexity of O(n^2logn) and a space complexity of O(n^2). ●Although the divide and conquer algorithm is not as efficient as the dynamic programming algorithm, it is useful for solving other problems that involve finding common subsequences
  • 7.
    3.Dynamic Programming ●The dynamicprogramming algorithm for LCS is a more efficient method for solving the problem. ● It involves breaking down the input sequences into smaller subproblems and storing the solutions to these subproblems in a table. ● The dynamic programming algorithm has a time complexity of O(mn), where m and n are the lengths of the input sequences, and a space complexity of O(mn). ● The dynamic programming algorithm is widely used for solving the LCS problem due to its efficiency.
  • 8.
    Pseudocode Dynamic Programming: def Ics(string1,string2): if len (string1) == 0 or len(string2) == 0: return "" elif string1[-1] == string2[-1]: return Ics(string1[:-1], string2[:-1]) + string1[-1] else: Ics1 = Ics(string1[:-1], string2) Ics2 = Ics(string1, string2[:-1]) if len (Ics1) > len(Ics2): return Ics1 else: return Ics2 Divide and Conquer: def Ics(string1, string2): m = len(string1) n = len(string2) # initialize table with zeros table = [[0]*(n+1) for i in range(m+1)] for i in range(1, m+1): for j in range(1, n+1): if string1[i-1] == string2[j-1]: table[i][j] = table[i-1][j-1]+1 else: table[i][j] = max(table[i-1][j], table[i][j-1]) #backtrack to find LCS Ics = "" while i > 0 and j > 0: i = m j = n if string1 [i-1] == string2[j-1]: Ics = string1 [i-1] + Ics i-=1 elif table[i-1][j] > table[i][j-1]: i-1 else: j-=1 j-=1 return Ics Brute Force: def Ics(string1, string2): if len(string1) == 0 or len(string2) == 0: return "" elif string1[-1] == string2[-1]: return Ics(string1[:-1], string2[:-1]) + string1[-1] else:
  • 9.
  • 10.
  • 11.
  • 12.
    Time Complexity There aretwo choices for each character in a string: it can either be included in the subsequence or not. This means there are 2^m possible subsequences of X and 2^n possible subsequences of Y. So in this recursive approach, we are comparing each subsequence of X with each subsequence of Y. So the overall time complexity is O(2^m * 2^n) = O(2^(m + n)), which is inefficient for large values of m and n. The space complexity of this solution depends on the size of the call stack, which is equal to the height of the recursion tree. In this case, input parameters (m and n) are at most decreasing by 1 on each recursive call and terminate when either m or n becomes 0. The height of the recursion tree in the worst case will be O(max(m, n) and space complexity = O(max(m, n)). Space Complexity Brute Force Algorithm
  • 13.
    Time Complexity In theworst case, we will be solving each subproblem only once, and there are (m + 1) x (n + 1) different subproblems. So, the time complexity is O(mn). We are using a 2D table of size (m + 1) x (n + 1) to store the solutions of the subproblems. So the space complexity is O(mn). Space Complexity Dynamic Programming
  • 14.
    Conclusion: 1.We discussed threedifferent algorithms for solving the LCS problem: brute force, dynamic programming, and divide and conquer. 2.We compared the time and space complexity of these algorithms and discussed their pros and cons. 3. We also highlighted the numerous applications of LCS in computer science and industry. 4.LCS is a fundamental problem in computer science, and efficient algorithms for solving it are crucial for many applications.