String algorithmsast     • Knuth-Morris-Pratt algorithmime:       - find pattern P in string T           - preprocess P (c...
Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”...
Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”...
Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”...
Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”...
Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”...
Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”...
Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”...
Trie data structure• Cannot represent strings T and TT• Solution: append special symbol $  to the end of each word        ...
Trie data structure• Cannot represent strings T and TT• Solution: append special symbol $  to the end of each word        ...
Trie data structure• Application: storing dictionaries• Space: O(m1+...+mk), mi=length of string Ti• Looking up word of le...
Tries for storing suffixes • This lecture: use tries for storing the set of suffixes of string T                        c ...
Reducing space requirements • Trick 1: Delete non-branching nodes, merge labels (compact trie)                            ...
Reducing space requirements•   # leafs: n+1•   # internal nodes: at most n•                                               ...
Suffix trees• Called the suffix tree for string T [Weiner’73]• D. Knuth: “algorithm of the year 1973”• Can be constructed ...
Suffix trees for string matchingDoes text T contain pattern P?• Construct suffix tree for T in O(|T|) time• Query takes O(...
Suffix trees for string matching• Scenario: text T is fixed, patterns P1,...,Pk come one at a time   - T is very long• Tim...
Getting extra informationTask: Get all occurrences of P in T (give a list of start positions)• Can be done in O(|P|+c) tim...
Getting extra informationTask: Get # of occurrences of P in T• Can be done in O(|P|) time  - With O(|T|) preprocessing    ...
Generalized suffix tree• Input: a set of strings {T1,...,Tk}• Generalized suffix tree: tree containing all suffixes of all...
Construction of generalized suffix tree for {X, Y} • Construct suffix tree for X $1Y $2 • Edges leading to red leaves are ...
Construction of generalized suffix tree for {X, Y}• Construct suffix tree for X $1Y $2• Edges leading to red leaves are la...
Construction of generalized suffix tree {T 1,...,Tk}• Can use the same trick  - construct suffix tree for T1 $1 ... Tk $k•...
Computing common longest subsequence of {T1,T2}•     Mark each internal node with red (blue)              if it has at lea...
Computing common longest subsequence of {T1,...,TK}Task: For each r = 2,...,K compute the length of the longest sequenceco...
Computing common longest subsequence of {T1,...,TK}Task: For each r = 2,...,K compute the length of the longest sequenceco...
Suffix array for text T[1..n]                                     SA[0..n]mississippi   0        ε               11       ...
Pattern search from a suffix arrayTask: Find pattern P = iss in text T = mississippi                         m            ...
Construction of suffix arraysε             11    • SA[0..n] can be constructed in linear timei             10ippi         ...
Summary• Suffix trees & suffix arrays: two powerful data structures for strings• Space requirements:   - suffix trees: lin...
Further information on string algorithms• Dan Gusfield, “Algorithms on Strings, Trees and Sequences:  Computer Science and...
Upcoming SlideShare
Loading in...5
×

Advances in discrete energy minimisation for computer vision

680

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
680
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Advances in discrete energy minimisation for computer vision

  1. 1. String algorithmsast • Knuth-Morris-Pratt algorithmime: - find pattern P in string T - preprocess P (construct Fail array) - search time: O(n+m) n=|T|, m=|P|oday: • Data structures for strings - trie - suffix tree & generalized suffix tree - suffix array • Many applications - e.g. computing common longest substring
  2. 2. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree h• Edges labeled with letters a e i• Leafs correspond to input strings d l hi had d p { had, held, help, hi } held help
  3. 3. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree had held• Edges labeled with letters help• Leafs correspond to input strings hi { had, held, help, hi }
  4. 4. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree h• Edges labeled with letters had• Leafs correspond to input strings held help hi { had, held, help, hi }
  5. 5. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree h• Edges labeled with letters a e i• Leafs correspond to input strings had held hi help { had, held, help, hi }
  6. 6. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree h• Edges labeled with letters a e i• Leafs correspond to input strings d held hi help had { had, held, help, hi }
  7. 7. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree h• Edges labeled with letters a e i• Leafs correspond to input strings d l hi had held { had, held, help, hi } help
  8. 8. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree h• Edges labeled with letters a e i• Leafs correspond to input strings d l hi had d p { had, held, help, hi } held help
  9. 9. Trie data structure• Cannot represent strings T and TT• Solution: append special symbol $ to the end of each word h a e i d l hi had d p { had, held, help, hi } held help
  10. 10. Trie data structure• Cannot represent strings T and TT• Solution: append special symbol $ to the end of each word h $ a e i h$ $ d l hi$ $ d p { had$, held$, help$, hi$, h$ } had$ $ $ held$ help$
  11. 11. Trie data structure• Application: storing dictionaries• Space: O(m1+...+mk), mi=length of string Ti• Looking up word of length n: O(n) - assuming alphabet has constant size h $ a• Alternative to binary trees e i - pros and cons (see wikipedia) h$ $ d l had hi$ h held d p $ help hi had$ $ $ - all nodes store items, not just leaves - unlabeled edges held$ help$
  12. 12. Tries for storing suffixes • This lecture: use tries for storing the set of suffixes of string T c a o $ a c o $ $ cacao$ acao$ c o a o$ $ cao$ a o ao$ ao$ $ o$ cao$ o $ $ $ acao$cacao$
  13. 13. Reducing space requirements • Trick 1: Delete non-branching nodes, merge labels (compact trie) a $ ca o$ $ cacao$ o$ acao$ o$ o$ cao$ cao$ ao$ ao$ cao$ o$ cao$ $ acao$cacao$
  14. 14. Reducing space requirements• # leafs: n+1• # internal nodes: at most n• => O(n) storage # edges: at most 2n edges• Trick 2: Store edge labels via two pointers into the string (begin & end) cacao$ ca a o$ $ acao$ cao$ o$ cao$ o$ cao$ o$ $ ao$ o$ cao$ acao$ ao$ cacao$ $
  15. 15. Suffix trees• Called the suffix tree for string T [Weiner’73]• D. Knuth: “algorithm of the year 1973”• Can be constructed in linear time - e.g. [McCreight’76], [Ukkonen’95] - algorithms are complicated [will not be covered]• Used for all sorts of string problems cacao$ ca a o$ $ acao$ cao$ o$ cao$ o$ cao$ o$ $ ao$ o$ cao$ acao$ ao$ cacao$ $
  16. 16. Suffix trees for string matchingDoes text T contain pattern P?• Construct suffix tree for T in O(|T|) time• Query takes O(|P|) time!• same as Knuth-Morris-Pratt, but... cacao$ ca a o$ $ acao$ cao$ o$cao$ o$ cao$ o$ $ ao$ o$ cao$ acao$ ao$cacao$ $
  17. 17. Suffix trees for string matching• Scenario: text T is fixed, patterns P1,...,Pk come one at a time - T is very long• Time: O(|T| + |P1| + |P2| + ... + |Pk|)• Knuth-Morris-Pratt can be generalized to multiple patterns (Aho-Corasick alg.). Same complexity as above, but requires knowledge of P1,...,Pk in advance cacao$ ca a o$ $ acao$ cao$ o$cao$ o$ cao$ o$ $ ao$ o$ cao$ acao$ ao$cacao$ $
  18. 18. Getting extra informationTask: Get all occurrences of P in T (give a list of start positions)• Can be done in O(|P|+c) time where c is # of occurences - Locate the node corresponding to P - Traverse all leaves of the subtree rooted at this node cacao$ ca a o$ $ acao$ cao$ o$cao$ o$ cao$ o$ $ ao$ o$ cao$ acao$ ao$cacao$ $
  19. 19. Getting extra informationTask: Get # of occurrences of P in T• Can be done in O(|P|) time - With O(|T|) preprocessing cacao$ ca a o$ $ acao$ cao$ o$cao$ o$ cao$ o$ $ ao$ o$ cao$ acao$ ao$cacao$ $
  20. 20. Generalized suffix tree• Input: a set of strings {T1,...,Tk}• Generalized suffix tree: tree containing all suffixes of all strings• Tree for { abba , bac } : abba$ 1 bba$1 a $2 b c$2 $1 ba$1 a$1 c$2 $1 $2 bba$1 $1 a $1 c$2 ba$1abba$1 ac$2 a$1 bba$1 bac$2 $1 c$2 ac$2 ba$1 bac$2 c$2
  21. 21. Construction of generalized suffix tree for {X, Y} • Construct suffix tree for X $1Y $2 • Edges leading to red leaves are labeled as “...$1Y $2”; delete “Y $2” abba$1bac$2 bba$1bac$2 $1bac$2 ba$1bac$2 a$1bac$2...$1bac$2 $1bac$2 $1bac$2 $1bac$2 ...$1bac$1abba$1bac$2 a$1bac$2 bba$1bac$2 bac$2 ac$2 $1bac$2 c$2 ba$1bac$2 $2
  22. 22. Construction of generalized suffix tree for {X, Y}• Construct suffix tree for X $1Y $2• Edges leading to red leaves are labeled as “...$1Y $2”; delete “Y $2” abba$1bac$2 bba$1bac$2 $1 ba$1bac$2 a$1bac$2 ...$1 $1 $1 $1bac$2 ...$1abba$1 a$1 bba$1 bac$2 ac$2 $1 c$2 ba$1 $2
  23. 23. Construction of generalized suffix tree {T 1,...,Tk}• Can use the same trick - construct suffix tree for T1 $1 ... Tk $k• Linear runtime: O(|T1|+...+|Tk|)• In practice, modify the algorithm (e.g. Ukkonen’s alg.) to get a slightly faster performance
  24. 24. Computing common longest subsequence of {T1,T2}• Mark each internal node with red (blue) if it has at least one red (blue) child• Nodes with both colors contain common subsequences abba$1 bba$1 a $2 b c$2 $1 ba$1 a$1 c$2 $1 $2 bba$1 $1 a $1 c$2 ba$1abba$1 ac$2 a$1 bba$1 bac$2 $1 c$2 ac$2 ba$1 bac$2 c$2
  25. 25. Computing common longest subsequence of {T1,...,TK}Task: For each r = 2,...,K compute the length of the longest sequencecontained in at least r strings1. Construct generalized suffix tree for {T1,...,TK} ... ...
  26. 26. Computing common longest subsequence of {T1,...,TK}Task: For each r = 2,...,K compute the length of the longest sequencecontained in at least r strings1. Construct generalized suffix tree for {T1,...,TK}2. Mark each node v with color k=1,...,K if it has a leaf with that color3. Compute c(v) = # of colors at v - Naive computation: O(nK), n=|T1|+...+|TK|. Can also be done in O(n)String corresponding to v is contained in exactly c(v) input strings ... ...
  27. 27. Suffix array for text T[1..n] SA[0..n]mississippi 0 ε 11 • SA[i] = id of the i’thississippi 1 i 10 smallest suffixssissippi 2 ippi 7sissippi 3 issippi 4issippi 4 ississippi 1 • Number k correspondsssippi 5 mississippi 0 to suffix T[k+1..n]sippi 6 pi 9ippi 7 ppi 8ppi 8 sippi 6pi 9 sissippi 3 • Lexicographic order:i 10 ssippi 5 - X < Xα...ε 11 ssissippi 2 - Xα... < Xβ... if α<β Suffix array: stores suffixes of string T in a lexicographic order
  28. 28. Pattern search from a suffix arrayTask: Find pattern P = iss in text T = mississippi m nε 11i 10 • All occurrences appear contiguously in the arrayippi 7issippi 4 • Computing start & end indexes: binary searchississippi 1mississippi 0 • Worst-case: O(m log n)pi 9ppi 8sippi 6 • In practice, often O(m + log n)sissippi 3ssippi 5 • Can be improved to O(m + log n) worst-casessissippi 2 - need to store extra 2n numbers
  29. 29. Construction of suffix arraysε 11 • SA[0..n] can be constructed in linear timei 10ippi 7 - from a suffix treeissippi 4 - or directlyississippi 1mississippi 0 [Kim,Sim,Park,Park’03]pi 9 [Kärkkäinen&Sanders’03]ppi 8 [Ko & Aluru’03]sippi 6sissippi 3ssippi 5ssissippi 2
  30. 30. Summary• Suffix trees & suffix arrays: two powerful data structures for strings• Space requirements: - suffix trees: linear but with a large constant, e.g. 20 |T| - suffix arrays: more efficient, e.g. 5 |T|• Suffix arrays are more suited to large alphabets• Practitioners like suffix arrays (simplicity, space efficiency)• Theoreticians like suffix trees (explicit structure) ε 11 i 10 ippi 7 issippi 4 ississippi 1 mississippi 0 pi 9 ppi 8 sippi 6 sissippi 3 ssippi 5 ssissippi 2
  31. 31. Further information on string algorithms• Dan Gusfield, “Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology”, 1997• Slides mentioning more recent results: http://www.cs.helsinki.fi/u/ukkonen/Erice2005.ppt
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×