SlideShare a Scribd company logo
.
                        Algorithm of Suffix Tree
                           by farseerfc@gmail.com
.

                                   Jiachen Yang1
                           1 Research   Student in Osaka University


                               November 12, 2011




                                                                .     .   .        .      .       .

    jc-yang (Osaka-U)                       SuffixTree                         November 12, 2011       1 / 25
Part Outline


.
1   What is Suffix Tree

.
2   History & Naïve Algorithm

.
3   Optimization of Naïve Algorithm

.
4   Examples & Analysis




                                            .   .   .        .      .       .

      jc-yang (Osaka-U)         SuffixTree               November 12, 2011       2 / 25
What can we do with suffix tree?
.
                                           Find properties of the strings
Linear algorithms for exact string           . Find the longest common substrings of the
                                             1
matching
                                               string Si and Sj in θ (ni + nj ) time .
    like KMP                                 . Find all maximal pairs, maximal repeats or
                                             2

Search for strings                             supermaximal repeats in θ (n + z) time, if there
  . Check if a string P of length m is
  1                                            are z such repeats .
                                             . Find the Lempel-Ziv decomposition in θ (n)
                                             3
    a substring in O(m) time.
  . Find all z occurrences of the
  2                                            time .
                                             . Find the longest repeated substrings in θ (n)
    patterns P1 , · · · , Pq of total        4


    length m as substrings in                  time.
                                             . Find the most frequently occurring substrings
                                             5
    O(m + z) time.
  . Search for a regular expression P
  3                                            of a minimum length in θ (n) time.
                                             . Find the shortest strings from ∑ that do not
                                             6
    in time expected sublinear in n .
  . Find for each suffix of a pattern
  4                                            occur in D, in O(n + z) time, if there are z such
    P, the length of the longest               strings.
                                             . Find the shortest substrings occurring only
                                             7
    match between a prefix of
    P[i . . . m] and a substring in D in       once in θ (n) time.
                                             . Find, for each i, the shortest substrings of Si
    θ (m) time .                             8

  . ...
  5                                            not occurring elsewhere in D in θ (n) time.
                                             . ...
                                             9


                                                               .     .     .        .      .       .

      jc-yang (Osaka-U)                     SuffixTree                          November 12, 2011       3 / 25
Trie, Radix Tree, Suffix Trie & Suffix Tree

         trie1                    is a dictionary tree.
                   (AKA prefix tree)

                             stores a set of words.
                             each node represents a character except that root is empty string.
                             words with common prefix share same parent nodes.
                             minimal deterministic finite automaton that accepts all words.
  radix tree                            is a trie with compressed chain of nodes.
                   (AKA patricia trie or radix trie)

                             Each internal node has at least 2 children.
  suffix trie is a trie which stores all suffix of a given string.
  suffix tree is a suffix radix tree.
                   that enables linear time construction and fast
                   algorithms of other problems on a string.
   1 pronounced as in word retrieval by its inventor, /tri:/ “tree”, but
pronounced /traI/ “try” by other authors                           .   .    .        .      .       .

      jc-yang (Osaka-U)                                SuffixTree                November 12, 2011       4 / 25
Trie & Radix Tree

.
Trie of “A to tea ted ten i
in inn”
.
                                 .
                                 Radix tree example
                                 .




                                 .

.
                                          .   .   .        .      .       .

    jc-yang (Osaka-U)         SuffixTree               November 12, 2011       5 / 25
Suffix Trie & Suffix Tree of “banana”
                   ^




     b        a          n       $
                                                                                0:6


 a       n         $         a                                  banana$ a             na         $


                                                      1:0             8:6                 6:6            10:6
 n       a                   n       $

                                                                na     $                   na$       $
 a       n         $         a
                                                      4:6             9:6                 3:2            7:6

 n       a                   $
                                                      na$   $


 a       $
                                                2:1         5:6



 $


                                                                  .         .         .          .         .    .

     jc-yang (Osaka-U)                   SuffixTree                                         November 12, 2011        6 / 25
Suffix Tree of “mississippi”
                                                    mississippi$

                                                            0:11


                            (1,2)                     (0,...) (8,9)            (11,...)      (2,3)
                              i                     mississi... p                $...          s


                  12:8                        1:0          15:10                  18:11              4:4


            (8,...) (2,5)          (11,...)                   (10,...)         (9,...)
            ppi$... ssi              $...                       i$...          pi$...

                                                                                             (3,5)           (4,5)
  13:8           6:8                 17:11                 16:10                  14:8
                                                                                               si              i

                (8,...)       (5,...)
                ppi$...     ssippi$...


          7:8                2:1                                         8:8                                 10:8


                                                               (8,...) (5,...)                           (8,...)   (5,...)
                                                               ppi$... ssippi$...                        ppi$... ssippi$...


                                                     9:8                 3:2              11:8                  5:4



                                                                                                     .           .            .        .      .       .

     jc-yang (Osaka-U)                                             SuffixTree                                                      November 12, 2011       7 / 25
Part Outline


.
1   What is Suffix Tree

.
2   History & Naïve Algorithm

.
3   Optimization of Naïve Algorithm

.
4   Examples & Analysis




                                            .   .   .        .      .       .

      jc-yang (Osaka-U)         SuffixTree               November 12, 2011       8 / 25
History of Suffix Tree Algorithms
.
                                                           M. Farach. Optimal suffix tree
    First linear algorithm was introduced by Weiner            construction with large
                                                               alphabets. In focs, page
    1973 as position tree. Awarded by Donald Knuth             137. Published by the IEEE
                                                               Computer Society, 1997.
    as “Algorithm of the year 1973”.                       E. M. McCreight. A
                                                               space-economical suffix
    Greatly simplified by McCreight 1976.                       tree construction algorithm.
                                                               J. ACM, 23:262–272, April
Above two algorithms are processing string backward.           1976. ISSN 0004-5411.
                                                               doi: http://doi.acm.org/10.
                                                               1145/321941.321946. URL
    First online construction by Ukkonen 1995, which           http:
                                                               //doi.acm.org/10.
    is easier to understand.                                   1145/321941.321946.
                                                           E. Ukkonen. On-line
                                                               construction of suffix trees.
Above algorithms assume size of alphabet as fixed               Algorithmica, 14:249–260,
                                                               1995. ISSN 0178-4617.
constant .                                                     URL
                                                               http://dx.doi.org/10.
    Limitation was break by Farach 1997, optimal for           1007/BF01206331.
                                                               10.1007/BF01206331.
    all alphabets.                                         P. Weiner. Linear pattern
                                                               matching algorithms. In
                                                               Switching and Automata
Further study are continued to scale to scenarios when         Theory, 1973. SWAT ’08.
                                                               IEEE Conference Record of
the whole suffix tree or even input string cannot fit into       14th Annual Symposium on,
                                                               pages 1 –11, oct. 1973. doi:
memory.                                           .   .    .
                                                               10.1109/SWAT.1973.13.
                                                                    .        .        .

     jc-yang (Osaka-U)             SuffixTree                   November 12, 2011          9 / 25
Backward Construction of Suffix Tree
   1                       2                        3                                4


  banana$            banana$ anana$      banana$ anana$       nana$        banana$ ana nana$



                                                                                    na$ $




                 7                              6                               5


         banana$ a na          $      banana$       a    na           banana$ ana                na



              na $         na$ $         na $           na$ $             na$ $                na$ $


        na$ $                         na$ $


                                                                      .     .            .         .       .     .

       jc-yang (Osaka-U)                        SuffixTree                                    November 12, 2011   10 / 25
Construct Suffix Tree by Sorting Suffix
Suffix :                  Sorted suffix :          Tree of sorted suffix :
mississippi              i                       | -i - >| -
ississippi               ippi                    |       | - ppi
ssissippi                issippi                 |       | - ssi - >| - ppi
sissippi                 ississippi              |                   | - ssippi
issippi                  mississippi             | - mississippi
ssippi                   pi                      | -p - >| - i
sippi                    ppi                     |       | - pi
ippi                     sippi                   | -s - >| -i - - >| - ppi
ppi                      sissippi                        |         | - ssippi
pi                       ssippi                          | - si - >| - ppi
i                        ssissippi                                 | - ssippi
Time complexity will be O(N 2 log N) .
Space complexity will be O(N 2 ) .
                                                        .   .    .         .       .     .

     jc-yang (Osaka-U)               SuffixTree                       November 12, 2011   11 / 25
Naïve Algorithm

S UFFIX T REE(string)
1 for i ← 1 to length(string)
2       do U PDATE(treei )    £ Phrase i

U PDATE(treei )
1 for j ← 1 to i
2       do node ← treei .F IND(suffix[j to i − 1])
3          E XTEND(node, string[i])       £ Extension j

Time complexity will be O(N 3 ) .
Space complexity will be O(N 2 ) .
The challenge is to make sure treei is updated to treei+1 efficiently.
                                                .   .    .         .       .     .

     jc-yang (Osaka-U)           SuffixTree                   November 12, 2011   12 / 25
Suffix Extend Cases of Naïve Algorithm
.
     Case I If path Sj...i ends at leaf, append a char Si+1 to end of edge into leaf.
         Sj...i = . . . na                                       Sj...i+1 = . . . nan

                    na                                                  nan

    Case II If path Sj...i ends in the middle of an edge , and next char Si+1 is not equal to
              the next char in the edge, split that edge, create a internal node, add a new
              edge to a new leaf.
         Sj...i = . . . na                                        Sj...i+1 = . . . nay
                                                                                         n
                    nan                                                 na
                                                                                         y
    Case III If path Sj...i ends in the middle of an edge , and next char Si+1 is equal to the
               next char in the edge, do nothing, extenstion has done.
          Sj...i = . . . na                                      Sj...i+1 = . . . nan
                    nan                                                 nan

                                                              .     .     .         .        .    .

       jc-yang (Osaka-U)                   SuffixTree                          November 12, 2011   13 / 25
Naïve Online Construction of Suffix Tree
   1                   2            3                                  4


       b           ba      a     ban an n                       bana ana na



               7                     6                             5


       banana$ a na        $   banana anana     nana           banan anan nan



           na $        na$ $



       na$ $


                                                       .   .       .         .       .     .

   jc-yang (Osaka-U)                SuffixTree                          November 12, 2011   14 / 25
Properties of Suffix Tree
.
    . Each update will add exactly 1
    1

      leaf node .
                                                                                          0:6
              nr_leaf = N
    . Suffix tree is full tree.
    2
                                                                           banana$ a              na          $
              Each internal node has at least 2
              children.                                      1:0                    8:6             6:6               10:6
              nr_internal < N
              nr_node < 2N                                                 na        $                  na$       $
    . Worst case Fabonacci word
    3
                                                             4:6                    9:6             3:2               7:6
              abaababaabaab
    . Suffix is either explicit or implicit.
    4
                                                             na$       $

              Explicit when it ends at a node.         2:1             5:6
              Implicit when it ends in the
              middle of an edge.

                                                                   .            .         .         .         .        .

        jc-yang (Osaka-U)                  SuffixTree                                          November 12, 2011         15 / 25
Part Outline


.
1   What is Suffix Tree

.
2   History & Naïve Algorithm

.
3   Optimization of Naïve Algorithm

.
4   Examples & Analysis




                                            .   .   .         .       .     .

      jc-yang (Osaka-U)         SuffixTree               November 12, 2011   16 / 25
Optimization of Naïve Algorithm
.
     . Substrings can be represented as (start, end) pair of their index in
     1

       orignal string.
                    Reduce space complexity to O(N) if size of alphabet is fixed constant.
     . Once a leaf, Always a leaf
     2

                    Represent edge that links to a leaf as (start, · · · ).
                    Extend leaf nodes for free. We do not need Extend Case I.
                               0:6
                                                                                                                   0:6


                    banana$ a        na         $                                                    (1,2)        (0,...)  (6,...)           (2,4)
                                                                                                       a        banana$... $...               na

          1:0            8:6          6:6               10:6                      8:6                 1:0                   10:6                    6:6


                                                                                 (6,...)   (2,4)                                                   (6,...)   (4,...)
                    na    $               na$       $                             $...      na                                                      $...     na$...


          4:6            9:6          3:2               7:6                9:6                 4:6                                     7:6                    3:2


                                                                                           (6,...)    (4,...)
          na$   $                                                                           $...      na$...


                                                                                   5:6                  2:1
    2:1         5:6                                                                        .            .            .             .           .              .

           jc-yang (Osaka-U)                                   SuffixTree                                                 November 12, 2011                    17 / 25
Active Point

                                     During a phrase, if we meet Extend Case III, that is if we found
                                     S[i + 1] already exists in suffix[j . . . i] then S[i + 1] will exists in
                                     ∀suffix[k . . . i], k ∈ j . . . i.
                                     Thus Case III is a sign that means update of this phrase is finished.
                                     During phrase i if we stopped at suffix[k . . . i] by Case III, then in
                                     next phrase we can start from suffix[k . . . i + 1] because all suffix
                                     start with 1 . . . k − 1 will end at Case I.
                                     We called this point(current internal node, current position k in
                                     string) as Active Point.
a                    ab                       aba                      abaa                   abaab                    abaaba                                  abaabab                                              abaababa                                              abaababaa                                           abaababaab                                          abaababaaba                                                         abaababaabaa                                                                       abaababaaba ab


0:0               0:1                       0:2                      0:3                      0:4                      0:5                                        0:6                                                   0:7                                                 0:8                                                0:9                                                 0:10                                                                             0:11                                                                          0:12


 (0,...)         (0,...)   (1,...)         (0,...)   (1,...)         (0,1)    (1,...)     (0,1)        (1,...)     (0,1)         (1,...)                       (0,1)      (1,3)                                     (0,1)      (1,3)                                       (0,1)    (1,3)                                     (0,1)     (1,3)                                     (0,1)     (1,3)                                                                   (0,1)       (1,3)                                                            (0,1)         (1,3)
  a...            ab...     b...           aba...     ba...            a      baa...        a         baab...        a          baaba...                         a         ba                                         a         ba                                           a       ba                                         a        ba                                         a        ba                                                                       a          ba                                                                a            ba


1:0        1:0              2:1      1:0              2:1      3:3              2:1     3:3             2:1      3:3             2:1                    3:3               7:6                                 3:3              7:6                                  3:3             7:6                                 3:3             7:6                                 3:3             7:6                                                              3:3                  7:6                                                   3:3                          7:6


                                                                (3,...)       (1,...)    (3,...)       (1,...)    (3,...)     (1,...)             (3,...) (1,3)              (3,...)   (6,...)          (3,...) (1,3)             (3,...)   (6,...)           (3,...) (1,3)             (3,...)   (6,...)       (3,...)  (1,3)            (3,...)     (6,...)       (3,...)  (1,3)             (3,...)     (6,...)                                    (3,6) (1,3)                  (3,6)       (6,...)                             (3,6) (1,3)                        (3,6)         (6,...)
                                                                 a...         baa...      ab...       baab...     aba...     baaba...            abab... ba                 abab...     b...           ababa... ba               ababa...    ba...          ababaa... ba              ababaa...   baa...      ababaab... ba             ababaab...   baab...     ababaaba... ba             ababaaba...   baaba...                                     aba   ba                     aba       baabaa...                             aba ba                             aba        baabaab...


                                                               4:3              1:0     4:3             1:0      4:3             1:0       4:3          5:6               2:1          8:6       4:3          5:6              2:1          8:6       4:3           5:6              2:1          8:6       4:3          5:6                2:1       8:6       4:3         5:6                 2:1       8:6                        13:11                   5:6                11:11           8:6                   13:11             5:6                         11:11           8:6


                                                                                                                                                     (3,...)    (6,...)                                    (3,...) (6,...)                                       (3,...) (6,...)                                     (3,...)   (6,...)                                    (3,...)   (6,...)                                      (11,...) (6,...)         (3,6)      (6,...)       (11,...)     (6,...)            (11,...) (6,...)  (3,6)               (6,...)       (11,...)      (6,...)
                                                                                                                                                    abab...      b...                                     ababa... ba...                                       ababaa... baa...                                    ababaab... baab...                                  ababaaba... baaba...                                        a... baabaa...          aba      baabaa...        a...      baabaa...            ab... baabaab... aba               baabaab...       ab...      baabaab...


                                                                                                                                                 1:0              6:6                                  1:0              6:6                                   1:0             6:6                                 1:0                 6:6                             1:0                 6:6                            14:11        4:3            9:11            6:6           12:11           2:1     14:11         4:3           9:11              6:6           12:11           2:1


                                                                                                                                                                                                                                                                                                                                                                                                                                                (11,...)      (6,...)                                                              (11,...)     (6,...)
                                                                                                                                                                                                                                                                                                                                                                                                                                                  a...       baabaa...                                                              ab...     baabaab...


                                                                                                                                                                                                                                                                                                                                                                                                                                             10:11            1:0                                                              10:11             1:0




                                                                                                                                                                                                                                                                                                                                                                            .                                        .                                .                                        .                               .                                            .

                                     jc-yang (Osaka-U)                                                                                                                                                                                                       SuffixTree                                                                                                                                                                                      November 12, 2011                                                                                                 18 / 25
Ukk’s Update using Active Point

U PDATE(treei )
 1 current_suffix ← active_point
 2 next_char ← string[i]
 3 while True
 4      do if there exists edge start with next_char
 5              then break       £ Case III
 6              else
 7                   split current edge if implicit
 8                   create new leaf with new edge labelled next_char
 9          if current_suffix is empty
10              then break
11              else current_suffix ← next shorter suffix
12 active_point ← current_suffix
                                              .   .   .         .       .     .

     jc-yang (Osaka-U)          SuffixTree                 November 12, 2011   19 / 25
Suffix Link to find next shorter suffix
            Suffix link
                             Internal node of suffix X α has a link to node α .
                             If α is empty, suffix link points to root.
            How to create suffix link
                             Link together every internal nodes that are created by splitting in
                             same phrase.
                            banana$                                                     banana$

                             0:6                                                        0:6


            (1,2) (0,...)  (6,...)               (2,4)                  (0,...)  (1,2)             (6,...)     (2,4)
              a banana$... $...                   na                  banana$...   a                $...        na


 8:6                 1:0               10:6         6:6         1:0              8:6               10:6            6:6


  (6,...)         (2,4)                       (6,...) (4,...)            (6,...) (2,4)                       (6,...) (4,...)
   $...            na                          $... na$...                $... na                             $... na$...


 9:6                 4:6               7:6          3:2         9:6              4:6               7:6             3:2


                  (6,...)    (4,...)                                          (6,...)    (4,...)
                   $...      na$...                                            $...      na$...


            5:6               2:1                                       5:6               2:1
                                                                                                                               .   .   .         .       .     .

             jc-yang (Osaka-U)                                                                SuffixTree                                    November 12, 2011   20 / 25
Fast jump using Suffix Link


   Assume we are at Suffix X αβ , whose parent internal node
   represent X α .
      1. Go back to parent internal node,
      2. Jumping follow the node’s suffix link to the node represent Suffix
         α
      3. Go down to Suffix αβ .

   Even jump down in step 3 because we already know length of β .
   (Skip/Count trick)
   Combining all these tricks we can Extend a character in O(1)
   time.


                                                .    .   .         .       .     .

   jc-yang (Osaka-U)             SuffixTree                   November 12, 2011   21 / 25
Part Outline


.
1   What is Suffix Tree

.
2   History & Naïve Algorithm

.
3   Optimization of Naïve Algorithm

.
4   Examples & Analysis




                                            .   .   .         .       .     .

      jc-yang (Osaka-U)         SuffixTree               November 12, 2011   22 / 25
Experiment – mississippi
.

                        m

                        0:0


                         (0,...)
                          m...


                        1:0




                                               .   .   .         .       .     .

    jc-yang (Osaka-U)              SuffixTree               November 12, 2011   23 / 25
Experiment – mississippi
.
                             mi

                         0:1


                        (1,...)   (0,...)
                          i...     mi...


               2:1                    1:0




                                               .   .   .         .       .     .

    jc-yang (Osaka-U)              SuffixTree               November 12, 2011   23 / 25
Experiment – mississippi
.
                             mis

                              0:2


                        (1,...) (2,...)        (0,...)
                         is... s...            mis...


        2:1                   3:2                     1:0




                                                            .   .   .         .       .     .

    jc-yang (Osaka-U)                     SuffixTree                     November 12, 2011   23 / 25
Experiment – mississippi
.
                            miss

                              0:3


                        (1,...) (2,...)        (0,...)
                        iss... ss...           miss...


        2:1                   3:2                     1:0




                                                            .   .   .         .       .     .

    jc-yang (Osaka-U)                     SuffixTree                     November 12, 2011   23 / 25
Experiment – mississippi
.
                             miss i

                              0:4


                        (1,...) (0,...)        (2,3)
                        issi... missi...         s


         2:1                  1:0                      4:4


                                               (4,...) (3,...)
                                                 i...   si...


                             5:4                       3:2




                                                                 .   .   .         .       .     .

    jc-yang (Osaka-U)                      SuffixTree                         November 12, 2011   23 / 25
Experiment – mississippi
.
                              miss is

                              0:5


                         (1,...) (0,...)          (2,3)
                        issis... missis...          s


         2:1                   1:0                        4:4


                                                 (4,...) (3,...)
                                                  is... sis...


                              5:4                         3:2




                                                                   .   .   .         .       .     .

    jc-yang (Osaka-U)                        SuffixTree                         November 12, 2011   23 / 25
Experiment – mississippi
.
                               miss iss

                             0:6


                         (1,...)   (0,...)          (2,3)
                        ississ... mississ...          s


         2:1                       1:0                      4:4


                                                  (4,...) (3,...)
                                                  iss... siss...


                                5:4                        3:2




                                                                    .   .   .         .       .     .

    jc-yang (Osaka-U)                          SuffixTree                        November 12, 2011   23 / 25
Experiment – mississippi
.
                                mississi

                              0:7


                          (1,...)    (0,...)           (2,3)
                        ississi... mississi...           s


         2:1                        1:0                       4:4


                                                    (4,...) (3,...)
                                                    issi... sissi...


                                 5:4                         3:2




                                                                       .   .   .         .       .     .

    jc-yang (Osaka-U)                            SuffixTree                         November 12, 2011   23 / 25
Experiment – mississippi
.
                                               mississip

                                       0:8


                        (8,...) (1,2)                   (0,...)             (2,3)
                         p...     i                   mississi...             s


    14:8                     12:8                           1:0                      4:4


                   (8,...)     (2,5)
                    p...        ssi

                                                                            (3,5)             (4,5)
    13:8                     6:8
                                                                              si                i

                   (8,...)     (5,...)
                    p...       ssip...


     7:8                     2:1                            8:8                               10:8


                                          (8,...)     (5,...)                       (8,...)      (5,...)
                                           p...       ssip...                        p...        ssip...


           9:8                          3:2                         11:8                       5:4




                                                                                                           .   .   .         .       .     .

    jc-yang (Osaka-U)                                                  SuffixTree                                       November 12, 2011   23 / 25
Experiment – mississippi
.
                                              mississip p

                                       0:9


                        (8,...) (1,2)                   (0,...)             (2,3)
                        pp...     i                   mississi...             s


    14:8                     12:8                           1:0                      4:4


                   (8,...)     (2,5)
                   pp...        ssi

                                                                            (3,5)             (4,5)
    13:8                     6:8
                                                                              si                i

                   (8,...)      (5,...)
                   pp...       ssipp...


     7:8                     2:1                            8:8                               10:8


                                          (8,...)      (5,...)                      (8,...)       (5,...)
                                          pp...       ssipp...                      pp...        ssipp...


           9:8                          3:2                         11:8                       5:4




                                                                                                            .   .   .         .       .     .

    jc-yang (Osaka-U)                                                  SuffixTree                                        November 12, 2011   23 / 25
Experiment – mississippi
.
                                                       mississippi

                                                        0:10


                                              (1,2)       (0,...)         (8,9)             (2,3)
                                                i       mississi...         p                 s


                             12:8              1:0                       15:10                    4:4


               (2,5)                (8,...)                           (9,...) (10,...)
                ssi                 ppi...                             pi...    i...

                                                                                         (3,5)          (4,5)
    6:8                            13:8               14:8                   16:10
                                                                                           si             i

     (8,...)            (5,...)
     ppi...            ssippi...


    7:8                     2:1                                        8:8                                 10:8


                                                                (8,...) (5,...)                         (8,...)    (5,...)
                                                                ppi... ssippi...                        ppi...    ssippi...


                                              9:8                      3:2                 11:8                   5:4




                                                                                                                        .     .   .         .       .     .

      jc-yang (Osaka-U)                                                              SuffixTree                                        November 12, 2011   23 / 25
Experiment – mississippi
.
                                                        mississippi$

                                                                0:11


                                (1,2)                     (0,...) (8,9)            (11,...)      (2,3)
                                  i                     mississi... p                $...          s


                      12:8                        1:0          15:10                  18:11              4:4


                (8,...) (2,5)          (11,...)                   (10,...)         (9,...)
                ppi$... ssi              $...                       i$...          pi$...

                                                                                                 (3,5)         (4,5)
    13:8             6:8                 17:11                 16:10                  14:8
                                                                                                   si            i

                    (8,...)       (5,...)
                    ppi$...     ssippi$...


              7:8                2:1                                         8:8                               10:8


                                                                   (8,...) (5,...)                       (8,...)   (5,...)
                                                                   ppi$... ssippi$...                    ppi$... ssippi$...


                                                         9:8                 3:2              11:8                5:4




                                                                                                                        .     .   .         .       .     .

           jc-yang (Osaka-U)                                                         SuffixTree                                        November 12, 2011   23 / 25
Time Complexity Analysis
$
i
p
p
i
s
s
i
s
s
i
m
 .   m        i          s   s   i   s     s         i   p       p       i          $
Time complexity is 2N = O(N) .
                                                             .       .       .          .      .     .

     jc-yang (Osaka-U)                   SuffixTree                               November 12, 2011   24 / 25
Experiment – English text
.

                           16
                                                                "data" using 1:3
                           14
Construction Time (sec.)




                           12

                           10

                           8

                           6

                           4

                           2

                           0
                                0      20000 40000   60000 80000 100000 120000 140000
                                                     File size (byte)
                                                                             .     .   .         .       .     .

                           jc-yang (Osaka-U)               SuffixTree                       November 12, 2011   25 / 25

More Related Content

What's hot

プログラミングコンテストでの動的計画法
プログラミングコンテストでの動的計画法プログラミングコンテストでの動的計画法
プログラミングコンテストでの動的計画法Takuya Akiba
 
PHP の GC の話
PHP の GC の話PHP の GC の話
PHP の GC の話
y-uti
 
関数型プログラミング入門 with OCaml
関数型プログラミング入門 with OCaml関数型プログラミング入門 with OCaml
関数型プログラミング入門 with OCaml
Haruka Oikawa
 
D3.jsで日本地図を描いてみた
D3.jsで日本地図を描いてみたD3.jsで日本地図を描いてみた
D3.jsで日本地図を描いてみた
mapquestIwasaki
 
すごい配列楽しく学ぼう
すごい配列楽しく学ぼうすごい配列楽しく学ぼう
すごい配列楽しく学ぼう
xenophobia__
 
Stack pivot
Stack pivotStack pivot
Stack pivot
sounakano
 
ELFの動的リンク
ELFの動的リンクELFの動的リンク
ELFの動的リンク
7shi
 
Glibc malloc internal
Glibc malloc internalGlibc malloc internal
Glibc malloc internal
Motohiro KOSAKI
 
実践QBVH
実践QBVH実践QBVH
実践QBVH
Shuichi Hayashi
 
20210515 of4 wi&amp;paraview 5.9.0_motorbike
20210515 of4 wi&amp;paraview 5.9.0_motorbike20210515 of4 wi&amp;paraview 5.9.0_motorbike
20210515 of4 wi&amp;paraview 5.9.0_motorbike
YohichiShiina
 
中3女子でもわかる constexpr
中3女子でもわかる constexpr中3女子でもわかる constexpr
中3女子でもわかる constexpr
Genya Murakami
 
lispmeetup#63 Common Lispでゼロから作るDeep Learning
lispmeetup#63 Common Lispでゼロから作るDeep Learninglispmeetup#63 Common Lispでゼロから作るDeep Learning
lispmeetup#63 Common Lispでゼロから作るDeep Learning
Satoshi imai
 
最大流 (max flow)
最大流 (max flow)最大流 (max flow)
ネットワーク ゲームにおけるTCPとUDPの使い分け
ネットワーク ゲームにおけるTCPとUDPの使い分けネットワーク ゲームにおけるTCPとUDPの使い分け
ネットワーク ゲームにおけるTCPとUDPの使い分け
モノビット エンジン
 
120901fp key
120901fp key120901fp key
120901fp key
ksknac
 
色々なダイクストラ高速化
色々なダイクストラ高速化色々なダイクストラ高速化
色々なダイクストラ高速化
yosupo
 
IPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, CapabilitiesIPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, Capabilities
Martin Děcký
 
katagaitai CTF勉強会 #3 crypto
katagaitai CTF勉強会 #3 cryptokatagaitai CTF勉強会 #3 crypto
katagaitai CTF勉強会 #3 crypto
trmr
 
F#入門 ~関数プログラミングとは何か~
F#入門 ~関数プログラミングとは何か~F#入門 ~関数プログラミングとは何か~
F#入門 ~関数プログラミングとは何か~
Nobuhisa Koizumi
 

What's hot (20)

プログラミングコンテストでの動的計画法
プログラミングコンテストでの動的計画法プログラミングコンテストでの動的計画法
プログラミングコンテストでの動的計画法
 
PHP の GC の話
PHP の GC の話PHP の GC の話
PHP の GC の話
 
関数型プログラミング入門 with OCaml
関数型プログラミング入門 with OCaml関数型プログラミング入門 with OCaml
関数型プログラミング入門 with OCaml
 
D3.jsで日本地図を描いてみた
D3.jsで日本地図を描いてみたD3.jsで日本地図を描いてみた
D3.jsで日本地図を描いてみた
 
すごい配列楽しく学ぼう
すごい配列楽しく学ぼうすごい配列楽しく学ぼう
すごい配列楽しく学ぼう
 
C#の書き方
C#の書き方C#の書き方
C#の書き方
 
Stack pivot
Stack pivotStack pivot
Stack pivot
 
ELFの動的リンク
ELFの動的リンクELFの動的リンク
ELFの動的リンク
 
Glibc malloc internal
Glibc malloc internalGlibc malloc internal
Glibc malloc internal
 
実践QBVH
実践QBVH実践QBVH
実践QBVH
 
20210515 of4 wi&amp;paraview 5.9.0_motorbike
20210515 of4 wi&amp;paraview 5.9.0_motorbike20210515 of4 wi&amp;paraview 5.9.0_motorbike
20210515 of4 wi&amp;paraview 5.9.0_motorbike
 
中3女子でもわかる constexpr
中3女子でもわかる constexpr中3女子でもわかる constexpr
中3女子でもわかる constexpr
 
lispmeetup#63 Common Lispでゼロから作るDeep Learning
lispmeetup#63 Common Lispでゼロから作るDeep Learninglispmeetup#63 Common Lispでゼロから作るDeep Learning
lispmeetup#63 Common Lispでゼロから作るDeep Learning
 
最大流 (max flow)
最大流 (max flow)最大流 (max flow)
最大流 (max flow)
 
ネットワーク ゲームにおけるTCPとUDPの使い分け
ネットワーク ゲームにおけるTCPとUDPの使い分けネットワーク ゲームにおけるTCPとUDPの使い分け
ネットワーク ゲームにおけるTCPとUDPの使い分け
 
120901fp key
120901fp key120901fp key
120901fp key
 
色々なダイクストラ高速化
色々なダイクストラ高速化色々なダイクストラ高速化
色々なダイクストラ高速化
 
IPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, CapabilitiesIPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, Capabilities
 
katagaitai CTF勉強会 #3 crypto
katagaitai CTF勉強会 #3 cryptokatagaitai CTF勉強会 #3 crypto
katagaitai CTF勉強会 #3 crypto
 
F#入門 ~関数プログラミングとは何か~
F#入門 ~関数プログラミングとは何か~F#入門 ~関数プログラミングとは何か~
F#入門 ~関数プログラミングとは何か~
 

Viewers also liked

Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
zukun
 
Suffix arrays
Suffix arraysSuffix arrays
Lowest Common Ancestor
Lowest Common AncestorLowest Common Ancestor
Lowest Common Ancestor
Benjamin Sach
 
Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatches
Benjamin Sach
 
Ch09 combinatorialpatternmatching
Ch09 combinatorialpatternmatchingCh09 combinatorialpatternmatching
Ch09 combinatorialpatternmatching
BioinformaticsInstitute
 
Asterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAsterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problem
Amrith Krishna
 
Fast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonFast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparison
Davide Eynard
 

Viewers also liked (7)

Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Suffix arrays
Suffix arraysSuffix arrays
Suffix arrays
 
Lowest Common Ancestor
Lowest Common AncestorLowest Common Ancestor
Lowest Common Ancestor
 
Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatches
 
Ch09 combinatorialpatternmatching
Ch09 combinatorialpatternmatchingCh09 combinatorialpatternmatching
Ch09 combinatorialpatternmatching
 
Asterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAsterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problem
 
Fast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonFast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparison
 

More from Jiachen Yang

データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性
Jiachen Yang
 
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Jiachen Yang
 
チェックリストと分割に基づく 網羅と使用テスト
チェックリストと分割に基づく  網羅と使用テストチェックリストと分割に基づく  網羅と使用テスト
チェックリストと分割に基づく 網羅と使用テストJiachen Yang
 
Active Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly ReportsActive Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly ReportsJiachen Yang
 
Inference and Checking of Object Ownership
Inference  and  Checking  of  Object OwnershipInference  and  Checking  of  Object Ownership
Inference and Checking of Object Ownership
Jiachen Yang
 
基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究
Jiachen Yang
 
Output fica.beamer.43
Output fica.beamer.43Output fica.beamer.43
Output fica.beamer.43
Jiachen Yang
 
Cloud sim report
Cloud sim reportCloud sim report
Cloud sim report
Jiachen Yang
 

More from Jiachen Yang (8)

データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性
 
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
 
チェックリストと分割に基づく 網羅と使用テスト
チェックリストと分割に基づく  網羅と使用テストチェックリストと分割に基づく  網羅と使用テスト
チェックリストと分割に基づく 網羅と使用テスト
 
Active Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly ReportsActive Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly Reports
 
Inference and Checking of Object Ownership
Inference  and  Checking  of  Object OwnershipInference  and  Checking  of  Object Ownership
Inference and Checking of Object Ownership
 
基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究
 
Output fica.beamer.43
Output fica.beamer.43Output fica.beamer.43
Output fica.beamer.43
 
Cloud sim report
Cloud sim reportCloud sim report
Cloud sim report
 

Recently uploaded

一比一原版布兰登大学毕业证(BU毕业证书)如何办理
一比一原版布兰登大学毕业证(BU毕业证书)如何办理一比一原版布兰登大学毕业证(BU毕业证书)如何办理
一比一原版布兰登大学毕业证(BU毕业证书)如何办理
wkip62b
 
Dilatometer for measurement of materials
Dilatometer for measurement of materialsDilatometer for measurement of materials
Dilatometer for measurement of materials
ankitsinglaisro
 
一比一原版(LSE毕业证书)伦敦政治经济学院毕业证如何办理
一比一原版(LSE毕业证书)伦敦政治经济学院毕业证如何办理一比一原版(LSE毕业证书)伦敦政治经济学院毕业证如何办理
一比一原版(LSE毕业证书)伦敦政治经济学院毕业证如何办理
340qn0m1
 
一比一原版英国伦敦政治经济学院毕业证(LSE学位证)如何办理
一比一原版英国伦敦政治经济学院毕业证(LSE学位证)如何办理一比一原版英国伦敦政治经济学院毕业证(LSE学位证)如何办理
一比一原版英国伦敦政治经济学院毕业证(LSE学位证)如何办理
k4krdgxx
 
一比一原版美国哥伦比亚大学毕业证Columbia成绩单一模一样
一比一原版美国哥伦比亚大学毕业证Columbia成绩单一模一样一比一原版美国哥伦比亚大学毕业证Columbia成绩单一模一样
一比一原版美国哥伦比亚大学毕业证Columbia成绩单一模一样
881evgn0
 
一比一原版(CSUEB毕业证)美国加州州立大学东湾分校毕业证如何办理
一比一原版(CSUEB毕业证)美国加州州立大学东湾分校毕业证如何办理一比一原版(CSUEB毕业证)美国加州州立大学东湾分校毕业证如何办理
一比一原版(CSUEB毕业证)美国加州州立大学东湾分校毕业证如何办理
stgq9v39
 
一比一原版英国伦敦大学毕业证(London学位证)如何办理
一比一原版英国伦敦大学毕业证(London学位证)如何办理一比一原版英国伦敦大学毕业证(London学位证)如何办理
一比一原版英国伦敦大学毕业证(London学位证)如何办理
k4krdgxx
 
一比一原版(CSU毕业证书)查尔斯特大学毕业证如何办理
一比一原版(CSU毕业证书)查尔斯特大学毕业证如何办理一比一原版(CSU毕业证书)查尔斯特大学毕业证如何办理
一比一原版(CSU毕业证书)查尔斯特大学毕业证如何办理
67n7f53
 
一比一原版(Deakin毕业证书)澳洲迪肯大学毕业证文凭如何办理
一比一原版(Deakin毕业证书)澳洲迪肯大学毕业证文凭如何办理一比一原版(Deakin毕业证书)澳洲迪肯大学毕业证文凭如何办理
一比一原版(Deakin毕业证书)澳洲迪肯大学毕业证文凭如何办理
k4krdgxx
 
一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理
一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理
一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理
qbydc
 
一比一原版美国俄勒冈大学毕业证(UO,UofO学位证)如何办理
一比一原版美国俄勒冈大学毕业证(UO,UofO学位证)如何办理一比一原版美国俄勒冈大学毕业证(UO,UofO学位证)如何办理
一比一原版美国俄勒冈大学毕业证(UO,UofO学位证)如何办理
lyurzi7r
 
一比一原版(UC毕业证书)堪培拉大学毕业证如何办理
一比一原版(UC毕业证书)堪培拉大学毕业证如何办理一比一原版(UC毕业证书)堪培拉大学毕业证如何办理
一比一原版(UC毕业证书)堪培拉大学毕业证如何办理
wkip62b
 
一比一原版肯特大学毕业证UKC成绩单一模一样
一比一原版肯特大学毕业证UKC成绩单一模一样一比一原版肯特大学毕业证UKC成绩单一模一样
一比一原版肯特大学毕业证UKC成绩单一模一样
tobbk6s8
 
一比一原版(UoB毕业证)英国伯明翰大学毕业证如何办理
一比一原版(UoB毕业证)英国伯明翰大学毕业证如何办理一比一原版(UoB毕业证)英国伯明翰大学毕业证如何办理
一比一原版(UoB毕业证)英国伯明翰大学毕业证如何办理
zv943dhb
 
一比一原版(爱大毕业证)美国爱荷华大学毕业证如何办理
一比一原版(爱大毕业证)美国爱荷华大学毕业证如何办理一比一原版(爱大毕业证)美国爱荷华大学毕业证如何办理
一比一原版(爱大毕业证)美国爱荷华大学毕业证如何办理
ynrtjotp
 
欧洲杯买球-欧洲杯买球买球网好的网站-欧洲杯买球哪里有正规的买球网站|【​网址​🎉ac123.net🎉​】
欧洲杯买球-欧洲杯买球买球网好的网站-欧洲杯买球哪里有正规的买球网站|【​网址​🎉ac123.net🎉​】欧洲杯买球-欧洲杯买球买球网好的网站-欧洲杯买球哪里有正规的买球网站|【​网址​🎉ac123.net🎉​】
欧洲杯买球-欧洲杯买球买球网好的网站-欧洲杯买球哪里有正规的买球网站|【​网址​🎉ac123.net🎉​】
jafiradnan336
 
一比一原版(OU毕业证)美国俄克拉荷马大学毕业证如何办理
一比一原版(OU毕业证)美国俄克拉荷马大学毕业证如何办理一比一原版(OU毕业证)美国俄克拉荷马大学毕业证如何办理
一比一原版(OU毕业证)美国俄克拉荷马大学毕业证如何办理
67n7f53
 
一比一原版(UofT毕业证)加拿大多伦多大学毕业证如何办理
一比一原版(UofT毕业证)加拿大多伦多大学毕业证如何办理一比一原版(UofT毕业证)加拿大多伦多大学毕业证如何办理
一比一原版(UofT毕业证)加拿大多伦多大学毕业证如何办理
bel9p89b
 
Intel-Centrino-Mobile-Technology-guidelines
Intel-Centrino-Mobile-Technology-guidelinesIntel-Centrino-Mobile-Technology-guidelines
Intel-Centrino-Mobile-Technology-guidelines
EricHo305923
 
一比一原版(UWS毕业证)澳洲西悉尼大学毕业证如何办理
一比一原版(UWS毕业证)澳洲西悉尼大学毕业证如何办理一比一原版(UWS毕业证)澳洲西悉尼大学毕业证如何办理
一比一原版(UWS毕业证)澳洲西悉尼大学毕业证如何办理
t34zod9l
 

Recently uploaded (20)

一比一原版布兰登大学毕业证(BU毕业证书)如何办理
一比一原版布兰登大学毕业证(BU毕业证书)如何办理一比一原版布兰登大学毕业证(BU毕业证书)如何办理
一比一原版布兰登大学毕业证(BU毕业证书)如何办理
 
Dilatometer for measurement of materials
Dilatometer for measurement of materialsDilatometer for measurement of materials
Dilatometer for measurement of materials
 
一比一原版(LSE毕业证书)伦敦政治经济学院毕业证如何办理
一比一原版(LSE毕业证书)伦敦政治经济学院毕业证如何办理一比一原版(LSE毕业证书)伦敦政治经济学院毕业证如何办理
一比一原版(LSE毕业证书)伦敦政治经济学院毕业证如何办理
 
一比一原版英国伦敦政治经济学院毕业证(LSE学位证)如何办理
一比一原版英国伦敦政治经济学院毕业证(LSE学位证)如何办理一比一原版英国伦敦政治经济学院毕业证(LSE学位证)如何办理
一比一原版英国伦敦政治经济学院毕业证(LSE学位证)如何办理
 
一比一原版美国哥伦比亚大学毕业证Columbia成绩单一模一样
一比一原版美国哥伦比亚大学毕业证Columbia成绩单一模一样一比一原版美国哥伦比亚大学毕业证Columbia成绩单一模一样
一比一原版美国哥伦比亚大学毕业证Columbia成绩单一模一样
 
一比一原版(CSUEB毕业证)美国加州州立大学东湾分校毕业证如何办理
一比一原版(CSUEB毕业证)美国加州州立大学东湾分校毕业证如何办理一比一原版(CSUEB毕业证)美国加州州立大学东湾分校毕业证如何办理
一比一原版(CSUEB毕业证)美国加州州立大学东湾分校毕业证如何办理
 
一比一原版英国伦敦大学毕业证(London学位证)如何办理
一比一原版英国伦敦大学毕业证(London学位证)如何办理一比一原版英国伦敦大学毕业证(London学位证)如何办理
一比一原版英国伦敦大学毕业证(London学位证)如何办理
 
一比一原版(CSU毕业证书)查尔斯特大学毕业证如何办理
一比一原版(CSU毕业证书)查尔斯特大学毕业证如何办理一比一原版(CSU毕业证书)查尔斯特大学毕业证如何办理
一比一原版(CSU毕业证书)查尔斯特大学毕业证如何办理
 
一比一原版(Deakin毕业证书)澳洲迪肯大学毕业证文凭如何办理
一比一原版(Deakin毕业证书)澳洲迪肯大学毕业证文凭如何办理一比一原版(Deakin毕业证书)澳洲迪肯大学毕业证文凭如何办理
一比一原版(Deakin毕业证书)澳洲迪肯大学毕业证文凭如何办理
 
一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理
一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理
一比一原版(ucb毕业证书)英国伯明翰大学学院毕业证如何办理
 
一比一原版美国俄勒冈大学毕业证(UO,UofO学位证)如何办理
一比一原版美国俄勒冈大学毕业证(UO,UofO学位证)如何办理一比一原版美国俄勒冈大学毕业证(UO,UofO学位证)如何办理
一比一原版美国俄勒冈大学毕业证(UO,UofO学位证)如何办理
 
一比一原版(UC毕业证书)堪培拉大学毕业证如何办理
一比一原版(UC毕业证书)堪培拉大学毕业证如何办理一比一原版(UC毕业证书)堪培拉大学毕业证如何办理
一比一原版(UC毕业证书)堪培拉大学毕业证如何办理
 
一比一原版肯特大学毕业证UKC成绩单一模一样
一比一原版肯特大学毕业证UKC成绩单一模一样一比一原版肯特大学毕业证UKC成绩单一模一样
一比一原版肯特大学毕业证UKC成绩单一模一样
 
一比一原版(UoB毕业证)英国伯明翰大学毕业证如何办理
一比一原版(UoB毕业证)英国伯明翰大学毕业证如何办理一比一原版(UoB毕业证)英国伯明翰大学毕业证如何办理
一比一原版(UoB毕业证)英国伯明翰大学毕业证如何办理
 
一比一原版(爱大毕业证)美国爱荷华大学毕业证如何办理
一比一原版(爱大毕业证)美国爱荷华大学毕业证如何办理一比一原版(爱大毕业证)美国爱荷华大学毕业证如何办理
一比一原版(爱大毕业证)美国爱荷华大学毕业证如何办理
 
欧洲杯买球-欧洲杯买球买球网好的网站-欧洲杯买球哪里有正规的买球网站|【​网址​🎉ac123.net🎉​】
欧洲杯买球-欧洲杯买球买球网好的网站-欧洲杯买球哪里有正规的买球网站|【​网址​🎉ac123.net🎉​】欧洲杯买球-欧洲杯买球买球网好的网站-欧洲杯买球哪里有正规的买球网站|【​网址​🎉ac123.net🎉​】
欧洲杯买球-欧洲杯买球买球网好的网站-欧洲杯买球哪里有正规的买球网站|【​网址​🎉ac123.net🎉​】
 
一比一原版(OU毕业证)美国俄克拉荷马大学毕业证如何办理
一比一原版(OU毕业证)美国俄克拉荷马大学毕业证如何办理一比一原版(OU毕业证)美国俄克拉荷马大学毕业证如何办理
一比一原版(OU毕业证)美国俄克拉荷马大学毕业证如何办理
 
一比一原版(UofT毕业证)加拿大多伦多大学毕业证如何办理
一比一原版(UofT毕业证)加拿大多伦多大学毕业证如何办理一比一原版(UofT毕业证)加拿大多伦多大学毕业证如何办理
一比一原版(UofT毕业证)加拿大多伦多大学毕业证如何办理
 
Intel-Centrino-Mobile-Technology-guidelines
Intel-Centrino-Mobile-Technology-guidelinesIntel-Centrino-Mobile-Technology-guidelines
Intel-Centrino-Mobile-Technology-guidelines
 
一比一原版(UWS毕业证)澳洲西悉尼大学毕业证如何办理
一比一原版(UWS毕业证)澳洲西悉尼大学毕业证如何办理一比一原版(UWS毕业证)澳洲西悉尼大学毕业证如何办理
一比一原版(UWS毕业证)澳洲西悉尼大学毕业证如何办理
 

Ukk's Algorithm of Suffix Tree

  • 1. . Algorithm of Suffix Tree by farseerfc@gmail.com . Jiachen Yang1 1 Research Student in Osaka University November 12, 2011 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 1 / 25
  • 2. Part Outline . 1 What is Suffix Tree . 2 History & Naïve Algorithm . 3 Optimization of Naïve Algorithm . 4 Examples & Analysis . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 2 / 25
  • 3. What can we do with suffix tree? . Find properties of the strings Linear algorithms for exact string . Find the longest common substrings of the 1 matching string Si and Sj in θ (ni + nj ) time . like KMP . Find all maximal pairs, maximal repeats or 2 Search for strings supermaximal repeats in θ (n + z) time, if there . Check if a string P of length m is 1 are z such repeats . . Find the Lempel-Ziv decomposition in θ (n) 3 a substring in O(m) time. . Find all z occurrences of the 2 time . . Find the longest repeated substrings in θ (n) patterns P1 , · · · , Pq of total 4 length m as substrings in time. . Find the most frequently occurring substrings 5 O(m + z) time. . Search for a regular expression P 3 of a minimum length in θ (n) time. . Find the shortest strings from ∑ that do not 6 in time expected sublinear in n . . Find for each suffix of a pattern 4 occur in D, in O(n + z) time, if there are z such P, the length of the longest strings. . Find the shortest substrings occurring only 7 match between a prefix of P[i . . . m] and a substring in D in once in θ (n) time. . Find, for each i, the shortest substrings of Si θ (m) time . 8 . ... 5 not occurring elsewhere in D in θ (n) time. . ... 9 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 3 / 25
  • 4. Trie, Radix Tree, Suffix Trie & Suffix Tree trie1 is a dictionary tree. (AKA prefix tree) stores a set of words. each node represents a character except that root is empty string. words with common prefix share same parent nodes. minimal deterministic finite automaton that accepts all words. radix tree is a trie with compressed chain of nodes. (AKA patricia trie or radix trie) Each internal node has at least 2 children. suffix trie is a trie which stores all suffix of a given string. suffix tree is a suffix radix tree. that enables linear time construction and fast algorithms of other problems on a string. 1 pronounced as in word retrieval by its inventor, /tri:/ “tree”, but pronounced /traI/ “try” by other authors . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 4 / 25
  • 5. Trie & Radix Tree . Trie of “A to tea ted ten i in inn” . . Radix tree example . . . . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 5 / 25
  • 6. Suffix Trie & Suffix Tree of “banana” ^ b a n $ 0:6 a n $ a banana$ a na $ 1:0 8:6 6:6 10:6 n a n $ na $ na$ $ a n $ a 4:6 9:6 3:2 7:6 n a $ na$ $ a $ 2:1 5:6 $ . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 6 / 25
  • 7. Suffix Tree of “mississippi” mississippi$ 0:11 (1,2) (0,...) (8,9) (11,...) (2,3) i mississi... p $... s 12:8 1:0 15:10 18:11 4:4 (8,...) (2,5) (11,...) (10,...) (9,...) ppi$... ssi $... i$... pi$... (3,5) (4,5) 13:8 6:8 17:11 16:10 14:8 si i (8,...) (5,...) ppi$... ssippi$... 7:8 2:1 8:8 10:8 (8,...) (5,...) (8,...) (5,...) ppi$... ssippi$... ppi$... ssippi$... 9:8 3:2 11:8 5:4 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 7 / 25
  • 8. Part Outline . 1 What is Suffix Tree . 2 History & Naïve Algorithm . 3 Optimization of Naïve Algorithm . 4 Examples & Analysis . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 8 / 25
  • 9. History of Suffix Tree Algorithms . M. Farach. Optimal suffix tree First linear algorithm was introduced by Weiner construction with large alphabets. In focs, page 1973 as position tree. Awarded by Donald Knuth 137. Published by the IEEE Computer Society, 1997. as “Algorithm of the year 1973”. E. M. McCreight. A space-economical suffix Greatly simplified by McCreight 1976. tree construction algorithm. J. ACM, 23:262–272, April Above two algorithms are processing string backward. 1976. ISSN 0004-5411. doi: http://doi.acm.org/10. 1145/321941.321946. URL First online construction by Ukkonen 1995, which http: //doi.acm.org/10. is easier to understand. 1145/321941.321946. E. Ukkonen. On-line construction of suffix trees. Above algorithms assume size of alphabet as fixed Algorithmica, 14:249–260, 1995. ISSN 0178-4617. constant . URL http://dx.doi.org/10. Limitation was break by Farach 1997, optimal for 1007/BF01206331. 10.1007/BF01206331. all alphabets. P. Weiner. Linear pattern matching algorithms. In Switching and Automata Further study are continued to scale to scenarios when Theory, 1973. SWAT ’08. IEEE Conference Record of the whole suffix tree or even input string cannot fit into 14th Annual Symposium on, pages 1 –11, oct. 1973. doi: memory. . . . 10.1109/SWAT.1973.13. . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 9 / 25
  • 10. Backward Construction of Suffix Tree 1 2 3 4 banana$ banana$ anana$ banana$ anana$ nana$ banana$ ana nana$ na$ $ 7 6 5 banana$ a na $ banana$ a na banana$ ana na na $ na$ $ na $ na$ $ na$ $ na$ $ na$ $ na$ $ . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 10 / 25
  • 11. Construct Suffix Tree by Sorting Suffix Suffix : Sorted suffix : Tree of sorted suffix : mississippi i | -i - >| - ississippi ippi | | - ppi ssissippi issippi | | - ssi - >| - ppi sissippi ississippi | | - ssippi issippi mississippi | - mississippi ssippi pi | -p - >| - i sippi ppi | | - pi ippi sippi | -s - >| -i - - >| - ppi ppi sissippi | | - ssippi pi ssippi | - si - >| - ppi i ssissippi | - ssippi Time complexity will be O(N 2 log N) . Space complexity will be O(N 2 ) . . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 11 / 25
  • 12. Naïve Algorithm S UFFIX T REE(string) 1 for i ← 1 to length(string) 2 do U PDATE(treei ) £ Phrase i U PDATE(treei ) 1 for j ← 1 to i 2 do node ← treei .F IND(suffix[j to i − 1]) 3 E XTEND(node, string[i]) £ Extension j Time complexity will be O(N 3 ) . Space complexity will be O(N 2 ) . The challenge is to make sure treei is updated to treei+1 efficiently. . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 12 / 25
  • 13. Suffix Extend Cases of Naïve Algorithm . Case I If path Sj...i ends at leaf, append a char Si+1 to end of edge into leaf. Sj...i = . . . na Sj...i+1 = . . . nan na nan Case II If path Sj...i ends in the middle of an edge , and next char Si+1 is not equal to the next char in the edge, split that edge, create a internal node, add a new edge to a new leaf. Sj...i = . . . na Sj...i+1 = . . . nay n nan na y Case III If path Sj...i ends in the middle of an edge , and next char Si+1 is equal to the next char in the edge, do nothing, extenstion has done. Sj...i = . . . na Sj...i+1 = . . . nan nan nan . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 13 / 25
  • 14. Naïve Online Construction of Suffix Tree 1 2 3 4 b ba a ban an n bana ana na 7 6 5 banana$ a na $ banana anana nana banan anan nan na $ na$ $ na$ $ . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 14 / 25
  • 15. Properties of Suffix Tree . . Each update will add exactly 1 1 leaf node . 0:6 nr_leaf = N . Suffix tree is full tree. 2 banana$ a na $ Each internal node has at least 2 children. 1:0 8:6 6:6 10:6 nr_internal < N nr_node < 2N na $ na$ $ . Worst case Fabonacci word 3 4:6 9:6 3:2 7:6 abaababaabaab . Suffix is either explicit or implicit. 4 na$ $ Explicit when it ends at a node. 2:1 5:6 Implicit when it ends in the middle of an edge. . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 15 / 25
  • 16. Part Outline . 1 What is Suffix Tree . 2 History & Naïve Algorithm . 3 Optimization of Naïve Algorithm . 4 Examples & Analysis . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 16 / 25
  • 17. Optimization of Naïve Algorithm . . Substrings can be represented as (start, end) pair of their index in 1 orignal string. Reduce space complexity to O(N) if size of alphabet is fixed constant. . Once a leaf, Always a leaf 2 Represent edge that links to a leaf as (start, · · · ). Extend leaf nodes for free. We do not need Extend Case I. 0:6 0:6 banana$ a na $ (1,2) (0,...) (6,...) (2,4) a banana$... $... na 1:0 8:6 6:6 10:6 8:6 1:0 10:6 6:6 (6,...) (2,4) (6,...) (4,...) na $ na$ $ $... na $... na$... 4:6 9:6 3:2 7:6 9:6 4:6 7:6 3:2 (6,...) (4,...) na$ $ $... na$... 5:6 2:1 2:1 5:6 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 17 / 25
  • 18. Active Point During a phrase, if we meet Extend Case III, that is if we found S[i + 1] already exists in suffix[j . . . i] then S[i + 1] will exists in ∀suffix[k . . . i], k ∈ j . . . i. Thus Case III is a sign that means update of this phrase is finished. During phrase i if we stopped at suffix[k . . . i] by Case III, then in next phrase we can start from suffix[k . . . i + 1] because all suffix start with 1 . . . k − 1 will end at Case I. We called this point(current internal node, current position k in string) as Active Point. a ab aba abaa abaab abaaba abaabab abaababa abaababaa abaababaab abaababaaba abaababaabaa abaababaaba ab 0:0 0:1 0:2 0:3 0:4 0:5 0:6 0:7 0:8 0:9 0:10 0:11 0:12 (0,...) (0,...) (1,...) (0,...) (1,...) (0,1) (1,...) (0,1) (1,...) (0,1) (1,...) (0,1) (1,3) (0,1) (1,3) (0,1) (1,3) (0,1) (1,3) (0,1) (1,3) (0,1) (1,3) (0,1) (1,3) a... ab... b... aba... ba... a baa... a baab... a baaba... a ba a ba a ba a ba a ba a ba a ba 1:0 1:0 2:1 1:0 2:1 3:3 2:1 3:3 2:1 3:3 2:1 3:3 7:6 3:3 7:6 3:3 7:6 3:3 7:6 3:3 7:6 3:3 7:6 3:3 7:6 (3,...) (1,...) (3,...) (1,...) (3,...) (1,...) (3,...) (1,3) (3,...) (6,...) (3,...) (1,3) (3,...) (6,...) (3,...) (1,3) (3,...) (6,...) (3,...) (1,3) (3,...) (6,...) (3,...) (1,3) (3,...) (6,...) (3,6) (1,3) (3,6) (6,...) (3,6) (1,3) (3,6) (6,...) a... baa... ab... baab... aba... baaba... abab... ba abab... b... ababa... ba ababa... ba... ababaa... ba ababaa... baa... ababaab... ba ababaab... baab... ababaaba... ba ababaaba... baaba... aba ba aba baabaa... aba ba aba baabaab... 4:3 1:0 4:3 1:0 4:3 1:0 4:3 5:6 2:1 8:6 4:3 5:6 2:1 8:6 4:3 5:6 2:1 8:6 4:3 5:6 2:1 8:6 4:3 5:6 2:1 8:6 13:11 5:6 11:11 8:6 13:11 5:6 11:11 8:6 (3,...) (6,...) (3,...) (6,...) (3,...) (6,...) (3,...) (6,...) (3,...) (6,...) (11,...) (6,...) (3,6) (6,...) (11,...) (6,...) (11,...) (6,...) (3,6) (6,...) (11,...) (6,...) abab... b... ababa... ba... ababaa... baa... ababaab... baab... ababaaba... baaba... a... baabaa... aba baabaa... a... baabaa... ab... baabaab... aba baabaab... ab... baabaab... 1:0 6:6 1:0 6:6 1:0 6:6 1:0 6:6 1:0 6:6 14:11 4:3 9:11 6:6 12:11 2:1 14:11 4:3 9:11 6:6 12:11 2:1 (11,...) (6,...) (11,...) (6,...) a... baabaa... ab... baabaab... 10:11 1:0 10:11 1:0 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 18 / 25
  • 19. Ukk’s Update using Active Point U PDATE(treei ) 1 current_suffix ← active_point 2 next_char ← string[i] 3 while True 4 do if there exists edge start with next_char 5 then break £ Case III 6 else 7 split current edge if implicit 8 create new leaf with new edge labelled next_char 9 if current_suffix is empty 10 then break 11 else current_suffix ← next shorter suffix 12 active_point ← current_suffix . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 19 / 25
  • 20. Suffix Link to find next shorter suffix Suffix link Internal node of suffix X α has a link to node α . If α is empty, suffix link points to root. How to create suffix link Link together every internal nodes that are created by splitting in same phrase. banana$ banana$ 0:6 0:6 (1,2) (0,...) (6,...) (2,4) (0,...) (1,2) (6,...) (2,4) a banana$... $... na banana$... a $... na 8:6 1:0 10:6 6:6 1:0 8:6 10:6 6:6 (6,...) (2,4) (6,...) (4,...) (6,...) (2,4) (6,...) (4,...) $... na $... na$... $... na $... na$... 9:6 4:6 7:6 3:2 9:6 4:6 7:6 3:2 (6,...) (4,...) (6,...) (4,...) $... na$... $... na$... 5:6 2:1 5:6 2:1 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 20 / 25
  • 21. Fast jump using Suffix Link Assume we are at Suffix X αβ , whose parent internal node represent X α . 1. Go back to parent internal node, 2. Jumping follow the node’s suffix link to the node represent Suffix α 3. Go down to Suffix αβ . Even jump down in step 3 because we already know length of β . (Skip/Count trick) Combining all these tricks we can Extend a character in O(1) time. . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 21 / 25
  • 22. Part Outline . 1 What is Suffix Tree . 2 History & Naïve Algorithm . 3 Optimization of Naïve Algorithm . 4 Examples & Analysis . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 22 / 25
  • 23. Experiment – mississippi . m 0:0 (0,...) m... 1:0 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 24. Experiment – mississippi . mi 0:1 (1,...) (0,...) i... mi... 2:1 1:0 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 25. Experiment – mississippi . mis 0:2 (1,...) (2,...) (0,...) is... s... mis... 2:1 3:2 1:0 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 26. Experiment – mississippi . miss 0:3 (1,...) (2,...) (0,...) iss... ss... miss... 2:1 3:2 1:0 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 27. Experiment – mississippi . miss i 0:4 (1,...) (0,...) (2,3) issi... missi... s 2:1 1:0 4:4 (4,...) (3,...) i... si... 5:4 3:2 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 28. Experiment – mississippi . miss is 0:5 (1,...) (0,...) (2,3) issis... missis... s 2:1 1:0 4:4 (4,...) (3,...) is... sis... 5:4 3:2 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 29. Experiment – mississippi . miss iss 0:6 (1,...) (0,...) (2,3) ississ... mississ... s 2:1 1:0 4:4 (4,...) (3,...) iss... siss... 5:4 3:2 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 30. Experiment – mississippi . mississi 0:7 (1,...) (0,...) (2,3) ississi... mississi... s 2:1 1:0 4:4 (4,...) (3,...) issi... sissi... 5:4 3:2 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 31. Experiment – mississippi . mississip 0:8 (8,...) (1,2) (0,...) (2,3) p... i mississi... s 14:8 12:8 1:0 4:4 (8,...) (2,5) p... ssi (3,5) (4,5) 13:8 6:8 si i (8,...) (5,...) p... ssip... 7:8 2:1 8:8 10:8 (8,...) (5,...) (8,...) (5,...) p... ssip... p... ssip... 9:8 3:2 11:8 5:4 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 32. Experiment – mississippi . mississip p 0:9 (8,...) (1,2) (0,...) (2,3) pp... i mississi... s 14:8 12:8 1:0 4:4 (8,...) (2,5) pp... ssi (3,5) (4,5) 13:8 6:8 si i (8,...) (5,...) pp... ssipp... 7:8 2:1 8:8 10:8 (8,...) (5,...) (8,...) (5,...) pp... ssipp... pp... ssipp... 9:8 3:2 11:8 5:4 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 33. Experiment – mississippi . mississippi 0:10 (1,2) (0,...) (8,9) (2,3) i mississi... p s 12:8 1:0 15:10 4:4 (2,5) (8,...) (9,...) (10,...) ssi ppi... pi... i... (3,5) (4,5) 6:8 13:8 14:8 16:10 si i (8,...) (5,...) ppi... ssippi... 7:8 2:1 8:8 10:8 (8,...) (5,...) (8,...) (5,...) ppi... ssippi... ppi... ssippi... 9:8 3:2 11:8 5:4 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 34. Experiment – mississippi . mississippi$ 0:11 (1,2) (0,...) (8,9) (11,...) (2,3) i mississi... p $... s 12:8 1:0 15:10 18:11 4:4 (8,...) (2,5) (11,...) (10,...) (9,...) ppi$... ssi $... i$... pi$... (3,5) (4,5) 13:8 6:8 17:11 16:10 14:8 si i (8,...) (5,...) ppi$... ssippi$... 7:8 2:1 8:8 10:8 (8,...) (5,...) (8,...) (5,...) ppi$... ssippi$... ppi$... ssippi$... 9:8 3:2 11:8 5:4 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 35. Time Complexity Analysis $ i p p i s s i s s i m . m i s s i s s i p p i $ Time complexity is 2N = O(N) . . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 24 / 25
  • 36. Experiment – English text . 16 "data" using 1:3 14 Construction Time (sec.) 12 10 8 6 4 2 0 0 20000 40000 60000 80000 100000 120000 140000 File size (byte) . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 25 / 25