SlideShare a Scribd company logo
INDIAN INSTITUTE OF TECHNOLOGY (BHU),
VARANASI
CS-4301:SEMINAR/GROUPDISCUSSION
By:
HARSHITAGARWAL
11100EN006
Suffix Tree & Suffix Array
Construction
Pattern Searching
Space & Time Complexity
Applications
(In a Nutshell …)
Motivation
• Need fast text searching algorithm with low space
cost.
• DNA sequences and protein sequences are too large
to search by traditional algorithms.
• Rabin Karp
• Naive String Matching
• Some improved algorithms perform efficiently
• KMP, BM algorithms for string matching.
SUFFIX TREE
Definition
A suffix tree (also called PAT tree ) is a compressed trie
containing all the suffixes of the given text.
• Properties :
• Each tree edge is labeled by a substring of S.
• Each internal node except the root ,has >= 2 children.
• No edges branching out from the same internal node can start
with the same character.
• Each S(i) has its corresponding labeled path from root to a leaf, for
1 i  n .
• There are n leaves.
What is a
compressed
trie ??
Trie
• A Trie represents a set of strings. For e.g. :
{ aeef , ad , bbfe , bbfg , c }
Compressed Trie
• Compress unary nodes, label edges by strings
Construction of Suffix Tree
Uniqueness :
(1) Here we preprocess the text string instead of pattern .
(2) Each suffix string is padded with a terminal symbol not seen
in the string (usually denoted by $). This ensures that no suffix
is a prefix of another.
Naïve Method
ALGORITHM :
1) Put suffix S [1,m] into the tree.
2) Then put S [i, m] into the tree for 2<=i<=m .
Naïve method - O(m2) (m = text size)
while suffixes remain:
add next shortest suffix to the tree
O(m2)
Too Expensive !!!
Can we improve it
further ??
Ukkonen’s Algorithm
• Build suffix tree incrementally from left-to-right
• Build the tree in ‘m’ phases, one for each character.
• At the end of phase i, we will have tree T’i, which is the tree
representing the prefix S[1..i].
• Build “implicit" suffix trees (no end-of-string marker).
• Extend the implicit suffix trees from the previous step.
• Convert implicit suffix tree to “explicit” in final step.
• Three Extension Rules involved :
• ADD a new edge if new character.
• mSPLIT the existing edge if character already present.
• DO NOTHING if the current suffix already present.
STEP 1: x t p x t d (ADD)
STEP 2: x t p x t d (ADD)
STEP 3: x t p x t d (ADD)
STEP 4: x t p x t d (mSPLIT)
STEP 5: x t p x t d (mSPLIT)
STEP 6: x t p x t d (ADD)
Implicit Tree
STEP 7: x t p x t d (Finalize)
Explicit Tree
Shortcuts :Improve the complexity
• Suffix Links
• Skip and Count Trick
• Edge Label Compressions
• A Stopper
Pattern Searching
Idea:
Every pattern that is present in text (or we can say every substring
of text) must be a prefix of one of all possible suffixes.
Algorithm:
• Starting from the first character of the pattern and root of Suffix
Tree, do following for every character.
• For the current character of pattern, if there is an edge from the
current node of suffix tree, follow the edge
• If there is no edge, print “pattern doesn’t exist in text” and return.
• If all characters of pattern have been processed, i.e., there is a
path from root for characters of the given pattern, then print
“Pattern found”.
Complexity:Order n (size of pattern) rather than Order m (size of text)
SUFFIX ARRAY
Definition
A suffixarray is just a sorted array of all the suffixes of a given
string. It contain integers that represent the starting indexes of
the all the suffixes of a given string, after the suffixes are sorted.
• Properties : Let Sbe a string and let S[i..j] denote the
substring of ranging from itoj.
• Suffix array is defined to be an array of integers providing the
starting positions of suffixesof S in lexicographicalorder.
• An entry A[i]contains the starting position of the i-thsmallest
suffixinS.
• For all 1 < i <= n : S[A[i-1],n]<S[A[i],n]
Example Suffix Array
Construction of Suffix Array
• The text ends with the special sentinel letter $ that is unique
and lexicographically smaller than any other character.
• EasyO(n2logn)algorithm:
- Sort the n suffixes, which takes O(n log n) comparisons.
- Each comparison takes O(n).
• There are O(n log n) & O(n) algorithms for constructing suffix
arrays that use very little space.
Can we do it in
O(n) ..??
Skew Algorithm-
Divide & Conquer
SKEW ALGORITHM
Pattern Searching
• If Pattern(P) occurs in Text (T) then all its occurrences are
consecutive in the suffix array.
• Do a binary search on the suffix array.
• Complexity :Takes O(nlogm) time ,
where --m->length of Text
-- n->length of Pattern
• It can be improved to O(n+logm) time using LCP information.
Accelerate the Search :
L
R
Maintain l = LCP(P,L)
Maintain r = LCP(P,R)
M
If l = r then start
comparing M to P at l + 1
l
r
L
R
Suppose we know LCP(L,M)
If LCP(L,M) < l we go left
If LCP(L,M) > l we go right
If LCP(L,M) = l we start
comparing at l + 1
M
If l > r then
r
l
Suffix Array Vs Suffix Tree
• Suffix arrays are closely related to suffix trees:
• A suffix array can be constructed from Suffix tree by doing a DFS
(DepthFirstSearch)traversal of the suffix tree.
• A suffix tree can be constructed in linear time by using a
combination of suffix and LCP(LeastCommonPrefix)array.
• Slightspacevs.timetradeoff: Suffix arrays are more space
efficient way but just a bit slower to store the suffixes because we
just store the original string + a list of integers.
Applications
• Finding the longestrepeatedsubstring.
• Finding the longestcommonsubstring.
• Finding the longestpalindromein a string.
• Others :
• Data Compression & Clustering Algorithms.
• Bioinformatics.
Complexity Comparision :
THANK YOU !!!!

More Related Content

What's hot

Back propagation
Back propagationBack propagation
Back propagation
Nagarajan
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
Inverted index
Inverted indexInverted index
Inverted index
Krishna Gehlot
 
Syntax directed translation
Syntax directed translationSyntax directed translation
Syntax directed translation
Akshaya Arunan
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
rajshreemuthiah
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and AlgorithmDhaval Kaneria
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysissumitbardhan
 
Breadth First Search & Depth First Search
Breadth First Search & Depth First SearchBreadth First Search & Depth First Search
Breadth First Search & Depth First Search
Kevin Jadiya
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
omaraldabash
 
Lecture 11 semantic analysis 2
Lecture 11 semantic analysis 2Lecture 11 semantic analysis 2
Lecture 11 semantic analysis 2
Iffat Anjum
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
Houw Liong The
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
Rupali Bhatnagar
 
Code optimization in compiler design
Code optimization in compiler designCode optimization in compiler design
Code optimization in compiler design
Kuppusamy P
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
NAGUR SHAREEF SHAIK
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
Ashikapokiya12345
 
Forward and Backward chaining in AI
Forward and Backward chaining in AIForward and Backward chaining in AI
Forward and Backward chaining in AI
Megha Sharma
 
And or graph
And or graphAnd or graph
And or graph
Ali A Jalil
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
Mahmoud Alfarra
 

What's hot (20)

Back propagation
Back propagationBack propagation
Back propagation
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Inverted index
Inverted indexInverted index
Inverted index
 
Syntax directed translation
Syntax directed translationSyntax directed translation
Syntax directed translation
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and Algorithm
 
Hidden markov model ppt
Hidden markov model pptHidden markov model ppt
Hidden markov model ppt
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
 
Breadth First Search & Depth First Search
Breadth First Search & Depth First SearchBreadth First Search & Depth First Search
Breadth First Search & Depth First Search
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Lecture 11 semantic analysis 2
Lecture 11 semantic analysis 2Lecture 11 semantic analysis 2
Lecture 11 semantic analysis 2
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
 
Code optimization in compiler design
Code optimization in compiler designCode optimization in compiler design
Code optimization in compiler design
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
Forward and Backward chaining in AI
Forward and Backward chaining in AIForward and Backward chaining in AI
Forward and Backward chaining in AI
 
And or graph
And or graphAnd or graph
And or graph
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 

Viewers also liked

Packet forwarding in wan.46
Packet  forwarding in wan.46Packet  forwarding in wan.46
Packet forwarding in wan.46myrajendra
 
Trie tree
Trie treeTrie tree
Trie tree
Shakil Ahmed
 
Data structure tries
Data structure triesData structure tries
Data structure tries
Md. Naim khan
 
E tutorial - digital signature
E tutorial - digital signatureE tutorial - digital signature
E tutorial - digital signature
thesanyamjain
 
TRIES_data_structure
TRIES_data_structureTRIES_data_structure
TRIES_data_structureddewithaman10
 
Application of tries
Application of triesApplication of tries
Application of triesTech_MX
 
Trie Data Structure
Trie Data StructureTrie Data Structure
Fundamentals
FundamentalsFundamentals
Fundamentals
myrajendra
 
Tries - Tree Based Structures for Strings
Tries - Tree Based Structures for StringsTries - Tree Based Structures for Strings
Tries - Tree Based Structures for Strings
Amrinder Arora
 
Digital Search Tree
Digital Search TreeDigital Search Tree
Digital Search Tree
East West University
 
Rabin Karp - String Matching Algorithm
Rabin Karp - String Matching AlgorithmRabin Karp - String Matching Algorithm
Rabin Karp - String Matching Algorithm
Syed Owais Ali Chishti
 
Multi ways trees
Multi ways treesMulti ways trees
Multi ways trees
SHEETAL WAGHMARE
 
Rabin karp string matching algorithm
Rabin karp string matching algorithmRabin karp string matching algorithm
Rabin karp string matching algorithm
Gajanand Sharma
 
Information storage and retrieval
Information storage and retrievalInformation storage and retrieval
Information storage and retrievalSadaf Rafiq
 
Boyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer ScienceBoyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer Science
Transweb Global Inc
 
String matching algorithm
String matching algorithmString matching algorithm
String matching algorithm
Alokeparna Choudhury
 

Viewers also liked (20)

Packet forwarding in wan.46
Packet  forwarding in wan.46Packet  forwarding in wan.46
Packet forwarding in wan.46
 
Trie tree
Trie treeTrie tree
Trie tree
 
Data structure tries
Data structure triesData structure tries
Data structure tries
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Lec18
Lec18Lec18
Lec18
 
E tutorial - digital signature
E tutorial - digital signatureE tutorial - digital signature
E tutorial - digital signature
 
TRIES_data_structure
TRIES_data_structureTRIES_data_structure
TRIES_data_structure
 
Application of tries
Application of triesApplication of tries
Application of tries
 
Trie Data Structure
Trie Data StructureTrie Data Structure
Trie Data Structure
 
Fundamentals
FundamentalsFundamentals
Fundamentals
 
Tries - Tree Based Structures for Strings
Tries - Tree Based Structures for StringsTries - Tree Based Structures for Strings
Tries - Tree Based Structures for Strings
 
Basic Packet Forwarding in NS2
Basic Packet Forwarding in NS2Basic Packet Forwarding in NS2
Basic Packet Forwarding in NS2
 
Digital Search Tree
Digital Search TreeDigital Search Tree
Digital Search Tree
 
Rabin Karp - String Matching Algorithm
Rabin Karp - String Matching AlgorithmRabin Karp - String Matching Algorithm
Rabin Karp - String Matching Algorithm
 
Multi ways trees
Multi ways treesMulti ways trees
Multi ways trees
 
Rabin karp string matching algorithm
Rabin karp string matching algorithmRabin karp string matching algorithm
Rabin karp string matching algorithm
 
Information storage and retrieval
Information storage and retrievalInformation storage and retrieval
Information storage and retrieval
 
Cis82 e2-1-packet forwarding
Cis82 e2-1-packet forwardingCis82 e2-1-packet forwarding
Cis82 e2-1-packet forwarding
 
Boyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer ScienceBoyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer Science
 
String matching algorithm
String matching algorithmString matching algorithm
String matching algorithm
 

Similar to Suffix Tree and Suffix Array

Lecture10.pdf
Lecture10.pdfLecture10.pdf
Lecture10.pdf
tmmwj1
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
Dr Shashikant Athawale
 
String Matching algorithm String Matching algorithm String Matching algorithm
String Matching algorithm String Matching algorithm String Matching algorithmString Matching algorithm String Matching algorithm String Matching algorithm
String Matching algorithm String Matching algorithm String Matching algorithm
praweenkumarsahu9
 
Python data handling
Python data handlingPython data handling
Python data handling
Prof. Dr. K. Adisesha
 
Pattern matching programs
Pattern matching programsPattern matching programs
Pattern matching programs
akruthi k
 
Advance algorithms in master of technology
Advance algorithms in master of technologyAdvance algorithms in master of technology
Advance algorithms in master of technology
ManjunathaOk
 
Fast Exact String Pattern-Matching Algorithm for Fixed Length Patterns
Fast Exact String Pattern-Matching Algorithm for Fixed Length PatternsFast Exact String Pattern-Matching Algorithm for Fixed Length Patterns
Fast Exact String Pattern-Matching Algorithm for Fixed Length Patterns
kvaderlipa
 
Data Structures 5
Data Structures 5Data Structures 5
Data Structures 5
Dr.Umadevi V
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptx
jainaaru59
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)
Aditya pratap Singh
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
Dev Nath
 
Skiena algorithm 2007 lecture06 sorting
Skiena algorithm 2007 lecture06 sortingSkiena algorithm 2007 lecture06 sorting
Skiena algorithm 2007 lecture06 sortingzukun
 
Pattern Matching Part One: Suffix Trees
Pattern Matching Part One: Suffix TreesPattern Matching Part One: Suffix Trees
Pattern Matching Part One: Suffix Trees
Benjamin Sach
 
Unit -I Toc.pptx
Unit -I Toc.pptxUnit -I Toc.pptx
Unit -I Toc.pptx
viswanath kani
 
WINSEM2022-23_CSI2005_TH_VL2022230504110_Reference_Material_II_22-12-2022_1.2...
WINSEM2022-23_CSI2005_TH_VL2022230504110_Reference_Material_II_22-12-2022_1.2...WINSEM2022-23_CSI2005_TH_VL2022230504110_Reference_Material_II_22-12-2022_1.2...
WINSEM2022-23_CSI2005_TH_VL2022230504110_Reference_Material_II_22-12-2022_1.2...
Reddyjanardhan221
 
Combining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcherCombining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcherIAEME Publication
 

Similar to Suffix Tree and Suffix Array (20)

Lecture10.pdf
Lecture10.pdfLecture10.pdf
Lecture10.pdf
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
String Matching algorithm String Matching algorithm String Matching algorithm
String Matching algorithm String Matching algorithm String Matching algorithmString Matching algorithm String Matching algorithm String Matching algorithm
String Matching algorithm String Matching algorithm String Matching algorithm
 
Python data handling
Python data handlingPython data handling
Python data handling
 
Pattern matching programs
Pattern matching programsPattern matching programs
Pattern matching programs
 
Advance algorithms in master of technology
Advance algorithms in master of technologyAdvance algorithms in master of technology
Advance algorithms in master of technology
 
Fast Exact String Pattern-Matching Algorithm for Fixed Length Patterns
Fast Exact String Pattern-Matching Algorithm for Fixed Length PatternsFast Exact String Pattern-Matching Algorithm for Fixed Length Patterns
Fast Exact String Pattern-Matching Algorithm for Fixed Length Patterns
 
Data Structures 5
Data Structures 5Data Structures 5
Data Structures 5
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptx
 
Q
QQ
Q
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
 
String
StringString
String
 
Skiena algorithm 2007 lecture06 sorting
Skiena algorithm 2007 lecture06 sortingSkiena algorithm 2007 lecture06 sorting
Skiena algorithm 2007 lecture06 sorting
 
Pattern Matching Part One: Suffix Trees
Pattern Matching Part One: Suffix TreesPattern Matching Part One: Suffix Trees
Pattern Matching Part One: Suffix Trees
 
RegexCat
RegexCatRegexCat
RegexCat
 
Unit -I Toc.pptx
Unit -I Toc.pptxUnit -I Toc.pptx
Unit -I Toc.pptx
 
LectureNotes-04-DSA
LectureNotes-04-DSALectureNotes-04-DSA
LectureNotes-04-DSA
 
WINSEM2022-23_CSI2005_TH_VL2022230504110_Reference_Material_II_22-12-2022_1.2...
WINSEM2022-23_CSI2005_TH_VL2022230504110_Reference_Material_II_22-12-2022_1.2...WINSEM2022-23_CSI2005_TH_VL2022230504110_Reference_Material_II_22-12-2022_1.2...
WINSEM2022-23_CSI2005_TH_VL2022230504110_Reference_Material_II_22-12-2022_1.2...
 
Combining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcherCombining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcher
 

Recently uploaded

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 

Recently uploaded (20)

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 

Suffix Tree and Suffix Array

  • 1. INDIAN INSTITUTE OF TECHNOLOGY (BHU), VARANASI CS-4301:SEMINAR/GROUPDISCUSSION By: HARSHITAGARWAL 11100EN006
  • 2. Suffix Tree & Suffix Array Construction Pattern Searching Space & Time Complexity Applications (In a Nutshell …)
  • 3. Motivation • Need fast text searching algorithm with low space cost. • DNA sequences and protein sequences are too large to search by traditional algorithms. • Rabin Karp • Naive String Matching • Some improved algorithms perform efficiently • KMP, BM algorithms for string matching.
  • 5. Definition A suffix tree (also called PAT tree ) is a compressed trie containing all the suffixes of the given text. • Properties : • Each tree edge is labeled by a substring of S. • Each internal node except the root ,has >= 2 children. • No edges branching out from the same internal node can start with the same character. • Each S(i) has its corresponding labeled path from root to a leaf, for 1 i  n . • There are n leaves.
  • 7. Trie • A Trie represents a set of strings. For e.g. : { aeef , ad , bbfe , bbfg , c }
  • 8. Compressed Trie • Compress unary nodes, label edges by strings
  • 9. Construction of Suffix Tree Uniqueness : (1) Here we preprocess the text string instead of pattern . (2) Each suffix string is padded with a terminal symbol not seen in the string (usually denoted by $). This ensures that no suffix is a prefix of another.
  • 10. Naïve Method ALGORITHM : 1) Put suffix S [1,m] into the tree. 2) Then put S [i, m] into the tree for 2<=i<=m . Naïve method - O(m2) (m = text size) while suffixes remain: add next shortest suffix to the tree
  • 11. O(m2) Too Expensive !!! Can we improve it further ??
  • 12. Ukkonen’s Algorithm • Build suffix tree incrementally from left-to-right • Build the tree in ‘m’ phases, one for each character. • At the end of phase i, we will have tree T’i, which is the tree representing the prefix S[1..i]. • Build “implicit" suffix trees (no end-of-string marker). • Extend the implicit suffix trees from the previous step. • Convert implicit suffix tree to “explicit” in final step. • Three Extension Rules involved : • ADD a new edge if new character. • mSPLIT the existing edge if character already present. • DO NOTHING if the current suffix already present.
  • 13. STEP 1: x t p x t d (ADD)
  • 14. STEP 2: x t p x t d (ADD)
  • 15. STEP 3: x t p x t d (ADD)
  • 16. STEP 4: x t p x t d (mSPLIT)
  • 17. STEP 5: x t p x t d (mSPLIT)
  • 18. STEP 6: x t p x t d (ADD) Implicit Tree
  • 19. STEP 7: x t p x t d (Finalize) Explicit Tree
  • 20. Shortcuts :Improve the complexity • Suffix Links • Skip and Count Trick • Edge Label Compressions • A Stopper
  • 21. Pattern Searching Idea: Every pattern that is present in text (or we can say every substring of text) must be a prefix of one of all possible suffixes. Algorithm: • Starting from the first character of the pattern and root of Suffix Tree, do following for every character. • For the current character of pattern, if there is an edge from the current node of suffix tree, follow the edge • If there is no edge, print “pattern doesn’t exist in text” and return. • If all characters of pattern have been processed, i.e., there is a path from root for characters of the given pattern, then print “Pattern found”. Complexity:Order n (size of pattern) rather than Order m (size of text)
  • 23. Definition A suffixarray is just a sorted array of all the suffixes of a given string. It contain integers that represent the starting indexes of the all the suffixes of a given string, after the suffixes are sorted. • Properties : Let Sbe a string and let S[i..j] denote the substring of ranging from itoj. • Suffix array is defined to be an array of integers providing the starting positions of suffixesof S in lexicographicalorder. • An entry A[i]contains the starting position of the i-thsmallest suffixinS. • For all 1 < i <= n : S[A[i-1],n]<S[A[i],n]
  • 25. Construction of Suffix Array • The text ends with the special sentinel letter $ that is unique and lexicographically smaller than any other character. • EasyO(n2logn)algorithm: - Sort the n suffixes, which takes O(n log n) comparisons. - Each comparison takes O(n). • There are O(n log n) & O(n) algorithms for constructing suffix arrays that use very little space.
  • 26. Can we do it in O(n) ..?? Skew Algorithm- Divide & Conquer
  • 28.
  • 29. Pattern Searching • If Pattern(P) occurs in Text (T) then all its occurrences are consecutive in the suffix array. • Do a binary search on the suffix array. • Complexity :Takes O(nlogm) time , where --m->length of Text -- n->length of Pattern • It can be improved to O(n+logm) time using LCP information.
  • 30. Accelerate the Search : L R Maintain l = LCP(P,L) Maintain r = LCP(P,R) M If l = r then start comparing M to P at l + 1 l r
  • 31. L R Suppose we know LCP(L,M) If LCP(L,M) < l we go left If LCP(L,M) > l we go right If LCP(L,M) = l we start comparing at l + 1 M If l > r then r l
  • 32. Suffix Array Vs Suffix Tree • Suffix arrays are closely related to suffix trees: • A suffix array can be constructed from Suffix tree by doing a DFS (DepthFirstSearch)traversal of the suffix tree. • A suffix tree can be constructed in linear time by using a combination of suffix and LCP(LeastCommonPrefix)array. • Slightspacevs.timetradeoff: Suffix arrays are more space efficient way but just a bit slower to store the suffixes because we just store the original string + a list of integers.
  • 33. Applications • Finding the longestrepeatedsubstring. • Finding the longestcommonsubstring. • Finding the longestpalindromein a string. • Others : • Data Compression & Clustering Algorithms. • Bioinformatics.