Upcoming SlideShare
×

# Advances in discrete energy minimisation for computer vision

1,003 views

Published on

Published in: Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,003
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
11
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Advances in discrete energy minimisation for computer vision

1. 1. String algorithmsast • Knuth-Morris-Pratt algorithmime: - find pattern P in string T - preprocess P (construct Fail array) - search time: O(n+m) n=|T|, m=|P|oday: • Data structures for strings - trie - suffix tree & generalized suffix tree - suffix array • Many applications - e.g. computing common longest substring
2. 2. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree h• Edges labeled with letters a e i• Leafs correspond to input strings d l hi had d p { had, held, help, hi } held help
3. 3. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree had held• Edges labeled with letters help• Leafs correspond to input strings hi { had, held, help, hi }
4. 4. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree h• Edges labeled with letters had• Leafs correspond to input strings held help hi { had, held, help, hi }
5. 5. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree h• Edges labeled with letters a e i• Leafs correspond to input strings had held hi help { had, held, help, hi }
6. 6. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree h• Edges labeled with letters a e i• Leafs correspond to input strings d held hi help had { had, held, help, hi }
7. 7. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree h• Edges labeled with letters a e i• Leafs correspond to input strings d l hi had held { had, held, help, hi } help
8. 8. Trie data structure• Data structure for storing a set of strings• Name comes from “retrieval”, usually pronounced as “try”• Rooted tree h• Edges labeled with letters a e i• Leafs correspond to input strings d l hi had d p { had, held, help, hi } held help
9. 9. Trie data structure• Cannot represent strings T and TT• Solution: append special symbol \$ to the end of each word h a e i d l hi had d p { had, held, help, hi } held help
10. 10. Trie data structure• Cannot represent strings T and TT• Solution: append special symbol \$ to the end of each word h \$ a e i h\$ \$ d l hi\$ \$ d p { had\$, held\$, help\$, hi\$, h\$ } had\$ \$ \$ held\$ help\$
11. 11. Trie data structure• Application: storing dictionaries• Space: O(m1+...+mk), mi=length of string Ti• Looking up word of length n: O(n) - assuming alphabet has constant size h \$ a• Alternative to binary trees e i - pros and cons (see wikipedia) h\$ \$ d l had hi\$ h held d p \$ help hi had\$ \$ \$ - all nodes store items, not just leaves - unlabeled edges held\$ help\$
12. 12. Tries for storing suffixes • This lecture: use tries for storing the set of suffixes of string T c a o \$ a c o \$ \$ cacao\$ acao\$ c o a o\$ \$ cao\$ a o ao\$ ao\$ \$ o\$ cao\$ o \$ \$ \$ acao\$cacao\$
13. 13. Reducing space requirements • Trick 1: Delete non-branching nodes, merge labels (compact trie) a \$ ca o\$ \$ cacao\$ o\$ acao\$ o\$ o\$ cao\$ cao\$ ao\$ ao\$ cao\$ o\$ cao\$ \$ acao\$cacao\$
14. 14. Reducing space requirements• # leafs: n+1• # internal nodes: at most n• => O(n) storage # edges: at most 2n edges• Trick 2: Store edge labels via two pointers into the string (begin & end) cacao\$ ca a o\$ \$ acao\$ cao\$ o\$ cao\$ o\$ cao\$ o\$ \$ ao\$ o\$ cao\$ acao\$ ao\$ cacao\$ \$
15. 15. Suffix trees• Called the suffix tree for string T [Weiner’73]• D. Knuth: “algorithm of the year 1973”• Can be constructed in linear time - e.g. [McCreight’76], [Ukkonen’95] - algorithms are complicated [will not be covered]• Used for all sorts of string problems cacao\$ ca a o\$ \$ acao\$ cao\$ o\$ cao\$ o\$ cao\$ o\$ \$ ao\$ o\$ cao\$ acao\$ ao\$ cacao\$ \$
16. 16. Suffix trees for string matchingDoes text T contain pattern P?• Construct suffix tree for T in O(|T|) time• Query takes O(|P|) time!• same as Knuth-Morris-Pratt, but... cacao\$ ca a o\$ \$ acao\$ cao\$ o\$cao\$ o\$ cao\$ o\$ \$ ao\$ o\$ cao\$ acao\$ ao\$cacao\$ \$
17. 17. Suffix trees for string matching• Scenario: text T is fixed, patterns P1,...,Pk come one at a time - T is very long• Time: O(|T| + |P1| + |P2| + ... + |Pk|)• Knuth-Morris-Pratt can be generalized to multiple patterns (Aho-Corasick alg.). Same complexity as above, but requires knowledge of P1,...,Pk in advance cacao\$ ca a o\$ \$ acao\$ cao\$ o\$cao\$ o\$ cao\$ o\$ \$ ao\$ o\$ cao\$ acao\$ ao\$cacao\$ \$
18. 18. Getting extra informationTask: Get all occurrences of P in T (give a list of start positions)• Can be done in O(|P|+c) time where c is # of occurences - Locate the node corresponding to P - Traverse all leaves of the subtree rooted at this node cacao\$ ca a o\$ \$ acao\$ cao\$ o\$cao\$ o\$ cao\$ o\$ \$ ao\$ o\$ cao\$ acao\$ ao\$cacao\$ \$
19. 19. Getting extra informationTask: Get # of occurrences of P in T• Can be done in O(|P|) time - With O(|T|) preprocessing cacao\$ ca a o\$ \$ acao\$ cao\$ o\$cao\$ o\$ cao\$ o\$ \$ ao\$ o\$ cao\$ acao\$ ao\$cacao\$ \$
20. 20. Generalized suffix tree• Input: a set of strings {T1,...,Tk}• Generalized suffix tree: tree containing all suffixes of all strings• Tree for { abba , bac } : abba\$ 1 bba\$1 a \$2 b c\$2 \$1 ba\$1 a\$1 c\$2 \$1 \$2 bba\$1 \$1 a \$1 c\$2 ba\$1abba\$1 ac\$2 a\$1 bba\$1 bac\$2 \$1 c\$2 ac\$2 ba\$1 bac\$2 c\$2
21. 21. Construction of generalized suffix tree for {X, Y} • Construct suffix tree for X \$1Y \$2 • Edges leading to red leaves are labeled as “...\$1Y \$2”; delete “Y \$2” abba\$1bac\$2 bba\$1bac\$2 \$1bac\$2 ba\$1bac\$2 a\$1bac\$2...\$1bac\$2 \$1bac\$2 \$1bac\$2 \$1bac\$2 ...\$1bac\$1abba\$1bac\$2 a\$1bac\$2 bba\$1bac\$2 bac\$2 ac\$2 \$1bac\$2 c\$2 ba\$1bac\$2 \$2
22. 22. Construction of generalized suffix tree for {X, Y}• Construct suffix tree for X \$1Y \$2• Edges leading to red leaves are labeled as “...\$1Y \$2”; delete “Y \$2” abba\$1bac\$2 bba\$1bac\$2 \$1 ba\$1bac\$2 a\$1bac\$2 ...\$1 \$1 \$1 \$1bac\$2 ...\$1abba\$1 a\$1 bba\$1 bac\$2 ac\$2 \$1 c\$2 ba\$1 \$2
23. 23. Construction of generalized suffix tree {T 1,...,Tk}• Can use the same trick - construct suffix tree for T1 \$1 ... Tk \$k• Linear runtime: O(|T1|+...+|Tk|)• In practice, modify the algorithm (e.g. Ukkonen’s alg.) to get a slightly faster performance
24. 24. Computing common longest subsequence of {T1,T2}• Mark each internal node with red (blue) if it has at least one red (blue) child• Nodes with both colors contain common subsequences abba\$1 bba\$1 a \$2 b c\$2 \$1 ba\$1 a\$1 c\$2 \$1 \$2 bba\$1 \$1 a \$1 c\$2 ba\$1abba\$1 ac\$2 a\$1 bba\$1 bac\$2 \$1 c\$2 ac\$2 ba\$1 bac\$2 c\$2
25. 25. Computing common longest subsequence of {T1,...,TK}Task: For each r = 2,...,K compute the length of the longest sequencecontained in at least r strings1. Construct generalized suffix tree for {T1,...,TK} ... ...
26. 26. Computing common longest subsequence of {T1,...,TK}Task: For each r = 2,...,K compute the length of the longest sequencecontained in at least r strings1. Construct generalized suffix tree for {T1,...,TK}2. Mark each node v with color k=1,...,K if it has a leaf with that color3. Compute c(v) = # of colors at v - Naive computation: O(nK), n=|T1|+...+|TK|. Can also be done in O(n)String corresponding to v is contained in exactly c(v) input strings ... ...
27. 27. Suffix array for text T[1..n] SA[0..n]mississippi 0 ε 11 • SA[i] = id of the i’thississippi 1 i 10 smallest suffixssissippi 2 ippi 7sissippi 3 issippi 4issippi 4 ississippi 1 • Number k correspondsssippi 5 mississippi 0 to suffix T[k+1..n]sippi 6 pi 9ippi 7 ppi 8ppi 8 sippi 6pi 9 sissippi 3 • Lexicographic order:i 10 ssippi 5 - X < Xα...ε 11 ssissippi 2 - Xα... < Xβ... if α<β Suffix array: stores suffixes of string T in a lexicographic order
28. 28. Pattern search from a suffix arrayTask: Find pattern P = iss in text T = mississippi m nε 11i 10 • All occurrences appear contiguously in the arrayippi 7issippi 4 • Computing start & end indexes: binary searchississippi 1mississippi 0 • Worst-case: O(m log n)pi 9ppi 8sippi 6 • In practice, often O(m + log n)sissippi 3ssippi 5 • Can be improved to O(m + log n) worst-casessissippi 2 - need to store extra 2n numbers
29. 29. Construction of suffix arraysε 11 • SA[0..n] can be constructed in linear timei 10ippi 7 - from a suffix treeissippi 4 - or directlyississippi 1mississippi 0 [Kim,Sim,Park,Park’03]pi 9 [Kärkkäinen&Sanders’03]ppi 8 [Ko & Aluru’03]sippi 6sissippi 3ssippi 5ssissippi 2
30. 30. Summary• Suffix trees & suffix arrays: two powerful data structures for strings• Space requirements: - suffix trees: linear but with a large constant, e.g. 20 |T| - suffix arrays: more efficient, e.g. 5 |T|• Suffix arrays are more suited to large alphabets• Practitioners like suffix arrays (simplicity, space efficiency)• Theoreticians like suffix trees (explicit structure) ε 11 i 10 ippi 7 issippi 4 ississippi 1 mississippi 0 pi 9 ppi 8 sippi 6 sissippi 3 ssippi 5 ssissippi 2
31. 31. Further information on string algorithms• Dan Gusfield, “Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology”, 1997• Slides mentioning more recent results: http://www.cs.helsinki.fi/u/ukkonen/Erice2005.ppt