SlideShare a Scribd company logo
1 of 29
Download to read offline
Two Combinatorial Criteria for BWT Images

                       Konstantin M. Likhomanov                   Arseny M. Shur

                               Ural State University, Yekaterinburg, Russia




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images         CSR2011   1/9
Burrows-Wheeler Transform (BWT)
Introduced by M. Burrows, D. J. Wheeler, 1994




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   2/9
Burrows-Wheeler Transform (BWT)
Introduced by M. Burrows, D. J. Wheeler, 1994
An illustration:




                                     B A N A N A




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   2/9
Burrows-Wheeler Transform (BWT)
Introduced by M. Burrows, D. J. Wheeler, 1994
An illustration:

                                     A     B      A       N      A       N
                                     A     N      A       B      A       N
                                     A     N      A       N      A       B
                                     B     A      N       A      N       A
                                     N     A      B       A      N       A
                                     N     A      N       A      B       A




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   2/9
Burrows-Wheeler Transform (BWT)
Introduced by M. Burrows, D. J. Wheeler, 1994
An illustration:

                                     A     B      A       N      A       N
                                     A     N      A       B      A       N
                                     A     N      A       N      A       B
                                     B     A      N       A      N       A
                                     N     A      B       A      N       A
                                     N     A      N       A      B       A
                             bwt (BANANA) = NNBAAA
BWT always performs a permutation of the original word.


K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   2/9
Burrows-Wheeler Transform (BWT)
Introduced by M. Burrows, D. J. Wheeler, 1994
An illustration:

                                     A     B      A       N      A       N
                                     A     N      A       B      A       N
                                     A     N      A       N      A       B
                                     B     A      N       A      N       A
                                     N     A      B       A      N       A
                                     N     A      N       A      B       A
                             bwt (BANANA) = NNBAAA
BWT always performs a permutation of the original word.
It is not a bijection because it doesn’t distinguish between cyclic shifts.
K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   2/9
BWT and context dependences


                                     inimal      length ......m
                                     inimal      letter ......m
                                     inimal      letter ......m
                                     inimal      letter ......m
                                     inimal      letter ......m
                                     inimum      letters......m




K. M. Likhomanov, A. M. Shur (USU)    Combinatorial Criteria for BWT Images   CSR2011   3/9
BWT and context dependences

                                                              
                         inimal
                                               length ......m 
                                                               
                         inimal
                                               letter ......m 
                                                               
                        
                                                              
                                                               
                          inimal                letter ......m
                                                              
         Right contexts                                          of these letters
                         inimal
                                               letter ......m 
                                                               
                        
                         inimal                               
                        
                                               letter ......m 
                                                               
                                                               
                                                              
                          inimum                letters......m




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   3/9
BWT and context dependences

                                                              
                         inimal
                                               length ......m 
                                                               
                         inimal
                                               letter ......m 
                                                               
                        
                                                              
                                                               
                          inimal                letter ......m
                                                              
         Right contexts                                          of these letters
                         inimal
                                               letter ......m 
                                                               
                        
                         inimal                               
                        
                                               letter ......m 
                                                               
                                                               
                                                              
                          inimum                letters......m

Usage of BWT:
     reversibility
 + clustering effect
 + linear-time algorithms for both bwt and bwt−1
 = good preprocessing for lossless data compressors (bzip2, 7-Zip,. . . )



K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   3/9
Combinatorial properties of BWT


       Earlier results: words with “simple” BWT images
               Description of the words having BWT images of the form bi aj
               (S. Mantaci, A. Restivo, S. Sciortino, 2003)
               Description of the words having BWT images of the form ci bj ak
               (J. Simpson, S. J. Puglisi, 2008)
               Partial description of the words having BWT images of the form
                    in−1       i
               ain an−1 . . . a11 (A. Restivo, G. Rosone, 2009)
                n




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   4/9
Combinatorial properties of BWT


       Earlier results: words with “simple” BWT images
               Description of the words having BWT images of the form bi aj
               (S. Mantaci, A. Restivo, S. Sciortino, 2003)
               Description of the words having BWT images of the form ci bj ak
               (J. Simpson, S. J. Puglisi, 2008)
               Partial description of the words having BWT images of the form
                    in−1       i
               ain an−1 . . . a11 (A. Restivo, G. Rosone, 2009)
                n
       Two general decision problems (over any fixed ordered alphabet):
               Given a word w, is w a BWT image?
               Given a word w = c1 c2 . . . cl , is there a BWT image of the form
                p
               c11 . . . cpl for some positive p1 , . . . , pl ?
                          l




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   4/9
Combinatorial properties of BWT


       Earlier results: words with “simple” BWT images
               Description of the words having BWT images of the form bi aj
               (S. Mantaci, A. Restivo, S. Sciortino, 2003)
               Description of the words having BWT images of the form ci bj ak
               (J. Simpson, S. J. Puglisi, 2008)
               Partial description of the words having BWT images of the form
                    in−1       i
               ain an−1 . . . a11 (A. Restivo, G. Rosone, 2009)
                n
       Two general decision problems (over any fixed ordered alphabet):
               Given a word w, is w a BWT image?
               Given a word w = c1 c2 . . . cl , is there a BWT image of the form
                p
               c11 . . . cpl for some positive p1 , . . . , pl ?
                          l

We present efficient solutions for both problems



K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   4/9
Checking whether a word is a BWT image
The stable sorting of a word w = b1 . . . bk is the permutation σ of the set
{1, . . . , k} such that the string σ(w) = bσ(1) . . . bσ(k) is lexicographically
sorted and the relative order of equal letters in σ(w) is the same as in w.




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   5/9
Checking whether a word is a BWT image
The stable sorting of a word w = b1 . . . bk is the permutation σ of the set
{1, . . . , k} such that the string σ(w) = bσ(1) . . . bσ(k) is lexicographically
sorted and the relative order of equal letters in σ(w) is the same as in w.
Theorem 1
A word w = cp1 . . . cpk , where pi > 0 and ci = ci+1 , is a BWT image if and
              1       k
only if the number of orbits of its stable sorting is equal to gcd(p1 , . . . , pk ).




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   5/9
Checking whether a word is a BWT image
The stable sorting of a word w = b1 . . . bk is the permutation σ of the set
{1, . . . , k} such that the string σ(w) = bσ(1) . . . bσ(k) is lexicographically
sorted and the relative order of equal letters in σ(w) is the same as in w.
Theorem 1
A word w = cp1 . . . cpk , where pi > 0 and ci = ci+1 , is a BWT image if and
              1       k
only if the number of orbits of its stable sorting is equal to gcd(p1 , . . . , pk ).


     N        N       B       A      A        A              B        A      N   A   N     A

     A        A       A       B      N        N              A        A      A   B   N     N
                  (1 4 3 6 2 5)                                         (1 2 4) (3 5 6)
                                                                              ×

K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images           CSR2011   5/9
Checking whether a word is a BWT image
The stable sorting of a word w = b1 . . . bk is the permutation σ of the set
{1, . . . , k} such that the string σ(w) = bσ(1) . . . bσ(k) is lexicographically
sorted and the relative order of equal letters in σ(w) is the same as in w.
Theorem 1
A word w = cp1 . . . cpk , where pi > 0 and ci = ci+1 , is a BWT image if and
              1       k
only if the number of orbits of its stable sorting is equal to gcd(p1 , . . . , pk ).

       A linear-time algorithm checks whether w is a BWT image, and
       returns “for free” any given preimage of w;
       so, this algorithm can be used in BWT-based compression schemes to
       check the correctness of decompression when calculating the inverse
       BWT




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   5/9
“Pumping” words to BWT images


A word w has a global ascent if w = uv and each letter of u is less than or
equal to each letter of v.




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   6/9
“Pumping” words to BWT images


A word w has a global ascent if w = uv and each letter of u is less than or
equal to each letter of v.
Theorem 2
                                               p         p
Let w = c1 . . . cl . A BWT image of the form c11 . . . cl l , pi > 0, exists if
and only if w has no global ascents.




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   6/9
“Pumping” words to BWT images


A word w has a global ascent if w = uv and each letter of u is less than or
equal to each letter of v.
Theorem 2
                                               p         p
Let w = c1 . . . cl . A BWT image of the form c11 . . . cl l , pi > 0, exists if
and only if w has no global ascents.

BANANA has no global ascents
One can check that BANNANA = bwt(AANANNB)




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   6/9
“Pumping” words to BWT images


A word w has a global ascent if w = uv and each letter of u is less than or
equal to each letter of v.
Theorem 2
                                               p         p
Let w = c1 . . . cl . A BWT image of the form c11 . . . cl l , pi > 0, exists if
and only if w has no global ascents.

BANANA has no global ascents
One can check that BANNANA = bwt(AANANNB)
In general, constructing such an image is not easy. We present a
quadratic-time algorithm.




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   6/9
Algorithm for constructing a BWT image


Lemma
If a nonempty word has no global ascents, then we can delete one of its
letters so that the new word will have no global ascents.




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   7/9
Algorithm for constructing a BWT image


Lemma
If a nonempty word has no global ascents, then we can delete one of its
letters so that the new word will have no global ascents.

Stable sorting moves blocks of letters as a whole.




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   7/9
Algorithm for constructing a BWT image


Lemma
If a nonempty word has no global ascents, then we can delete one of its
letters so that the new word will have no global ascents.

Stable sorting moves blocks of letters as a whole.
Lemma
If we change the length of some block of letters by its shift under a stable
sorting, this doesn’t change neither the number of orbits of the sorting nor
the GCD of the lengths of blocks.




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   7/9
Algorithm for constructing a BWT image


Lemma
If a nonempty word has no global ascents, then we can delete one of its
letters so that the new word will have no global ascents.

Stable sorting moves blocks of letters as a whole.
Lemma
If we change the length of some block of letters by its shift under a stable
sorting, this doesn’t change neither the number of orbits of the sorting nor
the GCD of the lengths of blocks.

The idea of the algorithm: remove letters until we obtain a BWT image,
then return them as zero-length blocks and resize those blocks by their
shifts


K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   7/9
An example
Let us construct a BWT image of the form D? A? C? B? A?




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   8/9
An example
Let us construct a BWT image of the form D? A? C? B? A?
     Deleting letters, we get DACBA → DABA → DAA (a BWT image)




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   8/9
An example
Let us construct a BWT image of the form D? A? C? B? A?
     Deleting letters, we get DACBA → DABA → DAA (a BWT image)

                                         D       A B0 A

                                         A       A B0 D
              The block B0 has zero shift and cannot be pumped
    Auxiliary step: pump an existing block (the last A, having the shift 1)

                                     D       A B0 A                    A

                                     A       A       A B0 D
  Now the block B0 has shift 1 and can be pumped to get DABAA
K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   8/9
An example
Let us construct a BWT image of the form D? A? C? B? A?
     Deleting letters, we get DACBA → DABA → DAA (a BWT image)

                                     D       A C0 B                    A     A

                                     A       A       A        B C0 D
   The block C0 has shift 2 and can be pumped to get DACCBAA

One can check that DACCBAA = bwt(AACBCAD).




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images       CSR2011   8/9
Open problems




       Improve the complexity of the above algorithm
       Construct the shortest possible BWT image with given order of letters




K. M. Likhomanov, A. M. Shur (USU)   Combinatorial Criteria for BWT Images   CSR2011   9/9

More Related Content

More from CSR2011

Csr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoffCsr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoff
CSR2011
 
Csr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoffCsr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoff
CSR2011
 
Csr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudanCsr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudan
CSR2011
 
Csr2011 june18 15_45_avron
Csr2011 june18 15_45_avronCsr2011 june18 15_45_avron
Csr2011 june18 15_45_avron
CSR2011
 
Csr2011 june18 09_30_shpilka
Csr2011 june18 09_30_shpilkaCsr2011 june18 09_30_shpilka
Csr2011 june18 09_30_shpilka
CSR2011
 
Csr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCsr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyen
CSR2011
 
Csr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCsr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskin
CSR2011
 
Csr2011 june18 11_30_remila
Csr2011 june18 11_30_remilaCsr2011 june18 11_30_remila
Csr2011 june18 11_30_remila
CSR2011
 
Csr2011 june17 16_30_blin
Csr2011 june17 16_30_blinCsr2011 june17 16_30_blin
Csr2011 june17 16_30_blin
CSR2011
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhanin
CSR2011
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhanin
CSR2011
 
Csr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCsr2011 june17 12_00_morin
Csr2011 june17 12_00_morin
CSR2011
 
Csr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCsr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyi
CSR2011
 
Csr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCsr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonati
CSR2011
 
Csr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCsr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatov
CSR2011
 
Csr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminskiCsr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminski
CSR2011
 
Csr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCsr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatov
CSR2011
 
Csr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCsr2011 june17 12_00_morin
Csr2011 june17 12_00_morin
CSR2011
 
Csr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCsr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonati
CSR2011
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhanin
CSR2011
 

More from CSR2011 (20)

Csr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoffCsr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoff
 
Csr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoffCsr2011 june18 15_15_bomhoff
Csr2011 june18 15_15_bomhoff
 
Csr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudanCsr2011 june18 14_00_sudan
Csr2011 june18 14_00_sudan
 
Csr2011 june18 15_45_avron
Csr2011 june18 15_45_avronCsr2011 june18 15_45_avron
Csr2011 june18 15_45_avron
 
Csr2011 june18 09_30_shpilka
Csr2011 june18 09_30_shpilkaCsr2011 june18 09_30_shpilka
Csr2011 june18 09_30_shpilka
 
Csr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCsr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyen
 
Csr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCsr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskin
 
Csr2011 june18 11_30_remila
Csr2011 june18 11_30_remilaCsr2011 june18 11_30_remila
Csr2011 june18 11_30_remila
 
Csr2011 june17 16_30_blin
Csr2011 june17 16_30_blinCsr2011 june17 16_30_blin
Csr2011 june17 16_30_blin
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhanin
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhanin
 
Csr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCsr2011 june17 12_00_morin
Csr2011 june17 12_00_morin
 
Csr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyiCsr2011 june17 11_30_vyalyi
Csr2011 june17 11_30_vyalyi
 
Csr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCsr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonati
 
Csr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCsr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatov
 
Csr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminskiCsr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminski
 
Csr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatovCsr2011 june17 14_00_bulatov
Csr2011 june17 14_00_bulatov
 
Csr2011 june17 12_00_morin
Csr2011 june17 12_00_morinCsr2011 june17 12_00_morin
Csr2011 june17 12_00_morin
 
Csr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonatiCsr2011 june17 11_00_lonati
Csr2011 june17 11_00_lonati
 
Csr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhaninCsr2011 june17 09_30_yekhanin
Csr2011 june17 09_30_yekhanin
 

Csr2011 june17 17_00_likhomanov

  • 1. Two Combinatorial Criteria for BWT Images Konstantin M. Likhomanov Arseny M. Shur Ural State University, Yekaterinburg, Russia K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 1/9
  • 2. Burrows-Wheeler Transform (BWT) Introduced by M. Burrows, D. J. Wheeler, 1994 K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 2/9
  • 3. Burrows-Wheeler Transform (BWT) Introduced by M. Burrows, D. J. Wheeler, 1994 An illustration: B A N A N A K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 2/9
  • 4. Burrows-Wheeler Transform (BWT) Introduced by M. Burrows, D. J. Wheeler, 1994 An illustration: A B A N A N A N A B A N A N A N A B B A N A N A N A B A N A N A N A B A K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 2/9
  • 5. Burrows-Wheeler Transform (BWT) Introduced by M. Burrows, D. J. Wheeler, 1994 An illustration: A B A N A N A N A B A N A N A N A B B A N A N A N A B A N A N A N A B A bwt (BANANA) = NNBAAA BWT always performs a permutation of the original word. K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 2/9
  • 6. Burrows-Wheeler Transform (BWT) Introduced by M. Burrows, D. J. Wheeler, 1994 An illustration: A B A N A N A N A B A N A N A N A B B A N A N A N A B A N A N A N A B A bwt (BANANA) = NNBAAA BWT always performs a permutation of the original word. It is not a bijection because it doesn’t distinguish between cyclic shifts. K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 2/9
  • 7. BWT and context dependences inimal length ......m inimal letter ......m inimal letter ......m inimal letter ......m inimal letter ......m inimum letters......m K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 3/9
  • 8. BWT and context dependences    inimal  length ......m    inimal  letter ......m       inimal letter ......m   Right contexts of these letters  inimal  letter ......m     inimal    letter ......m      inimum letters......m K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 3/9
  • 9. BWT and context dependences    inimal  length ......m    inimal  letter ......m       inimal letter ......m   Right contexts of these letters  inimal  letter ......m     inimal    letter ......m      inimum letters......m Usage of BWT: reversibility + clustering effect + linear-time algorithms for both bwt and bwt−1 = good preprocessing for lossless data compressors (bzip2, 7-Zip,. . . ) K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 3/9
  • 10. Combinatorial properties of BWT Earlier results: words with “simple” BWT images Description of the words having BWT images of the form bi aj (S. Mantaci, A. Restivo, S. Sciortino, 2003) Description of the words having BWT images of the form ci bj ak (J. Simpson, S. J. Puglisi, 2008) Partial description of the words having BWT images of the form in−1 i ain an−1 . . . a11 (A. Restivo, G. Rosone, 2009) n K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 4/9
  • 11. Combinatorial properties of BWT Earlier results: words with “simple” BWT images Description of the words having BWT images of the form bi aj (S. Mantaci, A. Restivo, S. Sciortino, 2003) Description of the words having BWT images of the form ci bj ak (J. Simpson, S. J. Puglisi, 2008) Partial description of the words having BWT images of the form in−1 i ain an−1 . . . a11 (A. Restivo, G. Rosone, 2009) n Two general decision problems (over any fixed ordered alphabet): Given a word w, is w a BWT image? Given a word w = c1 c2 . . . cl , is there a BWT image of the form p c11 . . . cpl for some positive p1 , . . . , pl ? l K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 4/9
  • 12. Combinatorial properties of BWT Earlier results: words with “simple” BWT images Description of the words having BWT images of the form bi aj (S. Mantaci, A. Restivo, S. Sciortino, 2003) Description of the words having BWT images of the form ci bj ak (J. Simpson, S. J. Puglisi, 2008) Partial description of the words having BWT images of the form in−1 i ain an−1 . . . a11 (A. Restivo, G. Rosone, 2009) n Two general decision problems (over any fixed ordered alphabet): Given a word w, is w a BWT image? Given a word w = c1 c2 . . . cl , is there a BWT image of the form p c11 . . . cpl for some positive p1 , . . . , pl ? l We present efficient solutions for both problems K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 4/9
  • 13. Checking whether a word is a BWT image The stable sorting of a word w = b1 . . . bk is the permutation σ of the set {1, . . . , k} such that the string σ(w) = bσ(1) . . . bσ(k) is lexicographically sorted and the relative order of equal letters in σ(w) is the same as in w. K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 5/9
  • 14. Checking whether a word is a BWT image The stable sorting of a word w = b1 . . . bk is the permutation σ of the set {1, . . . , k} such that the string σ(w) = bσ(1) . . . bσ(k) is lexicographically sorted and the relative order of equal letters in σ(w) is the same as in w. Theorem 1 A word w = cp1 . . . cpk , where pi > 0 and ci = ci+1 , is a BWT image if and 1 k only if the number of orbits of its stable sorting is equal to gcd(p1 , . . . , pk ). K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 5/9
  • 15. Checking whether a word is a BWT image The stable sorting of a word w = b1 . . . bk is the permutation σ of the set {1, . . . , k} such that the string σ(w) = bσ(1) . . . bσ(k) is lexicographically sorted and the relative order of equal letters in σ(w) is the same as in w. Theorem 1 A word w = cp1 . . . cpk , where pi > 0 and ci = ci+1 , is a BWT image if and 1 k only if the number of orbits of its stable sorting is equal to gcd(p1 , . . . , pk ). N N B A A A B A N A N A A A A B N N A A A B N N (1 4 3 6 2 5) (1 2 4) (3 5 6) × K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 5/9
  • 16. Checking whether a word is a BWT image The stable sorting of a word w = b1 . . . bk is the permutation σ of the set {1, . . . , k} such that the string σ(w) = bσ(1) . . . bσ(k) is lexicographically sorted and the relative order of equal letters in σ(w) is the same as in w. Theorem 1 A word w = cp1 . . . cpk , where pi > 0 and ci = ci+1 , is a BWT image if and 1 k only if the number of orbits of its stable sorting is equal to gcd(p1 , . . . , pk ). A linear-time algorithm checks whether w is a BWT image, and returns “for free” any given preimage of w; so, this algorithm can be used in BWT-based compression schemes to check the correctness of decompression when calculating the inverse BWT K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 5/9
  • 17. “Pumping” words to BWT images A word w has a global ascent if w = uv and each letter of u is less than or equal to each letter of v. K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 6/9
  • 18. “Pumping” words to BWT images A word w has a global ascent if w = uv and each letter of u is less than or equal to each letter of v. Theorem 2 p p Let w = c1 . . . cl . A BWT image of the form c11 . . . cl l , pi > 0, exists if and only if w has no global ascents. K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 6/9
  • 19. “Pumping” words to BWT images A word w has a global ascent if w = uv and each letter of u is less than or equal to each letter of v. Theorem 2 p p Let w = c1 . . . cl . A BWT image of the form c11 . . . cl l , pi > 0, exists if and only if w has no global ascents. BANANA has no global ascents One can check that BANNANA = bwt(AANANNB) K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 6/9
  • 20. “Pumping” words to BWT images A word w has a global ascent if w = uv and each letter of u is less than or equal to each letter of v. Theorem 2 p p Let w = c1 . . . cl . A BWT image of the form c11 . . . cl l , pi > 0, exists if and only if w has no global ascents. BANANA has no global ascents One can check that BANNANA = bwt(AANANNB) In general, constructing such an image is not easy. We present a quadratic-time algorithm. K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 6/9
  • 21. Algorithm for constructing a BWT image Lemma If a nonempty word has no global ascents, then we can delete one of its letters so that the new word will have no global ascents. K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 7/9
  • 22. Algorithm for constructing a BWT image Lemma If a nonempty word has no global ascents, then we can delete one of its letters so that the new word will have no global ascents. Stable sorting moves blocks of letters as a whole. K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 7/9
  • 23. Algorithm for constructing a BWT image Lemma If a nonempty word has no global ascents, then we can delete one of its letters so that the new word will have no global ascents. Stable sorting moves blocks of letters as a whole. Lemma If we change the length of some block of letters by its shift under a stable sorting, this doesn’t change neither the number of orbits of the sorting nor the GCD of the lengths of blocks. K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 7/9
  • 24. Algorithm for constructing a BWT image Lemma If a nonempty word has no global ascents, then we can delete one of its letters so that the new word will have no global ascents. Stable sorting moves blocks of letters as a whole. Lemma If we change the length of some block of letters by its shift under a stable sorting, this doesn’t change neither the number of orbits of the sorting nor the GCD of the lengths of blocks. The idea of the algorithm: remove letters until we obtain a BWT image, then return them as zero-length blocks and resize those blocks by their shifts K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 7/9
  • 25. An example Let us construct a BWT image of the form D? A? C? B? A? K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 8/9
  • 26. An example Let us construct a BWT image of the form D? A? C? B? A? Deleting letters, we get DACBA → DABA → DAA (a BWT image) K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 8/9
  • 27. An example Let us construct a BWT image of the form D? A? C? B? A? Deleting letters, we get DACBA → DABA → DAA (a BWT image) D A B0 A A A B0 D The block B0 has zero shift and cannot be pumped Auxiliary step: pump an existing block (the last A, having the shift 1) D A B0 A A A A A B0 D Now the block B0 has shift 1 and can be pumped to get DABAA K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 8/9
  • 28. An example Let us construct a BWT image of the form D? A? C? B? A? Deleting letters, we get DACBA → DABA → DAA (a BWT image) D A C0 B A A A A A B C0 D The block C0 has shift 2 and can be pumped to get DACCBAA One can check that DACCBAA = bwt(AACBCAD). K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 8/9
  • 29. Open problems Improve the complexity of the above algorithm Construct the shortest possible BWT image with given order of letters K. M. Likhomanov, A. M. Shur (USU) Combinatorial Criteria for BWT Images CSR2011 9/9