SlideShare a Scribd company logo
x86/x64最適化勉強会#4
A x86-optimized rank/select
dictionary for bit sequences
                             2012/6/16
                     Takeshi Yamamuro




                                         1
What’s Succinct Data Structure?




                                  2
SDS: Succinct Data Structure
        • Recently, Getting Popular in Some Areas
              – Researches & Engineering

        • Not Data Structure, But Data Representation
              – A compressed method for other data structures
              – e.g., alphabets, trees, and graphs

        • Transparent Operations w/o Unpacking Explicitly
              – e.g., succinct LZ77 compression*1




*1
                                                                                                             3
     Kreft, S. and Navarro, G.: LZ77-Like Compression with Fast Random Access, In Proceedings of DCC, 2010
More Details
• SDS = Succinct Data + Succinct Index

• Succinct Data
  – Compact representation for target data
  – Almost to information theoretic lower bounds
               e.g., If N patterns, the lower bound’s logN


• Succinct Index
  – O(1) operations for target data
  – o(N) space costs: ignored asymptotically




                                                             4
More Details

   If you need more information, ...




                  cited from: http://goo.gl/rkQ5z
                                                    5
A rank/select dictionary for SDS




                                   6
A Rank/Select Operations
• SDS Composed of Rank/Select Operations
  – Many calls of rank/select inside

• Rank/Select for Succinct Bit Sequences: B[i]
  – rankx(n, B): the total of 1s in B[0...n]
  – selectx(n, B): n-th position of x in B[]



        i   0    1     2    3   4    5   6     7   8
     B[i]   1    0     1    1   0    0   1     1   0
                     rank1(5, B)=3   select1(4, B)=6


                                                       7
A Rank/Select Operations
• Available Rank/Select Implementation
  – ux-trie: http://code.google.com/p/ux-trie/
  – rx: http://code.google.com/p/mozc/
  – marisa-trie: http://code.google.com/p/marisa-trie/


• Today Contributions
  – x86-optimized rank/select
  – https://github.com/maropu/dbitv




                                                         8
Performance Results
        • Performance Benchmark Setups*1
              – Generate a random sequence of bits: 50% density
              – Random rank/select queries over the bits
              – CPU: Intel Core-i5 U470@1.33GHz

        • Latency Observed
              – 11 trials, and median latency




*1
                                                                   9
     Reference: http://d.hatena.ne.jp/s-yata/20111216/1324032373
Performance Results: Rank

                             1.E+03
averaged rank latency (ns)




                             1.E+02




                             1.E+01                ux
                                                   rx
                                                   marisa
                                                   opt

                             1.E+00




                                      bit length
                                                            10
Performance Results: Select

                               1.E+04
averaged select latency (ns)




                               1.E+03



                               1.E+02


                                                     ux
                               1.E+01                rx
                                                     marisa
                                                     opt

                               1.E+00




                                        bit length

                                                              11
Implementation Details




                         12
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space

 B[] =              A sequence of bits


                          N-bits




                                               13
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
                log 2 N
  B[] =                          A sequence of bits

  L[] =            l1                       l2


• Split into log2N fixed-length blocks
• Total Counts Pre-computed in L[]

                           x          x / log 2 N                      x
          rank1 ( x, B)   B[i ]                    B[i ]           B[i]
                          i 1            i 1                                
                                                                 i  x / log 2 N 1

                                      L1[ x / log 2 N ]

                                                                                      14
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
                log 2 N
  B[] =                          A sequence of bits

  L[] =            l1                       l2


• Split into log2N fixed-length blocks
• Total Counts Pre-computed in L[]

                           x          x / log 2 N                      x
          rank1 ( x, B)   B[i ]                    B[i ]           B[i]
                          i 1            i 1                                
                                                                 i  x / log 2 N 1

                                      L[ x / log 2 N ]
                                                                         O(log2N)
                                                   O(1)                               15
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
              log 2 N
 B[] =                     A sequence of bits

  L[] =          l1                l2


• L[]: o(N) space costs

            N                  N
             2
                 log N  O(       )  o( N )
          log N              log N



                                                16
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
              log 2 N
 B[] =                          A sequence of bits

  L[] =           l1                           l2                     1 log n
                                                                       2
 S[] = s1 s2
• Split into 1/2logN fixed-length blocks again
• Total Counts Pre-computed in S[]
                                                         1           
                 x           x / log N 
                                    2                    x / 2 log N 
                                                                                  x
 rank1 ( x, B)   B[i ]                  B[i ]           B[i]                B[i]
                i 1             i 1                               
                                                      i  x / log 2 N 1        1         
                                                                           i   x / log N  1
                                                                                2         
                                                            1
                             L[ x / log 2 n]          S[ x / log n]
                                                            2
                                                                                                  17
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
              log 2 N
 B[] =                          A sequence of bits

  L[] =           l1                           l2                    1 log n
                                                                      2
 S[] = s1 s2
• Split into 1/2logN fixed-length blocks again
• Total Counts Pre-computed in S[]
                                                         1                        O(logN)
                             x / log N 
                                    2                     x / log N 
                                                         2
                 x                                                                x
 rank1 ( x, B)   B[i ]                  B[i ]           B[i]                B[i]
                i 1             i 1                              
                                                      i  x / log 2 N 1        1         
                                                                           i   x / log N  1
                                                                                2         
                                                             1
                             L[ x / log 2 n]          S [ x / log n]
                                                             2
                                        O(1)                       O(1)                           18
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
             log 2 N
 B[] =                    A sequence of bits

  L[] =        l1                 l2           1 log n
                                                2
 S[] = s1 s2
• S[]: o(N) space costs

          N                           log log N
            2
                log(log N )  O( N 
                        2
                                                )  o( N )
     1 2 log N                          log N



                                                             19
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
              log 2 N
 B[] =                           A sequence of bits

  L[] =           l1                             l2                     1 log n
                                                                         2
 S[] = s1 s2
• O(1) Popcount/Table-Lookup in Last Term

                                                           1                         O(logN) -> O(1)
                 x           x / log 2 N                 x / 2 log N 
                                                                                     x
 rank1 ( x, B)   B[i ]                    B[i ]           B[i]                 B[i]
                i 1             i 1                                 
                                                        i  x / log 2 N 1         1         
                                                                              i   x / log N  1
                                                                                   2         
                                                               1
                             L[ x / log 2 n]            S [ x / log n]
                                                               2
                                          O(1)                         O(1)
                                                                                                     20
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
                 log 2 N
 B[] =                         A sequence of bits

  L[] =              l1                l2           1 log n
                                                     2
 S[] = s1 s2
• As a result, o(N) Space Costs

            N     4 N log log N          log log N
                                O( N            )  o( N )
          log N       log N                log N
          L[] size         S[] size



                                                                21
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space




                                               22
Implementation: Practice
• Low Computation Costs & High Cache Penalties
   – 3 cache/TLB misses per rank




                         ex. rank1(402=256*1+32*4+18, B)
                256bit

  B[]: 01..000000....101......0 0110....001...............0 0000100 ...
        32bit                                   Popcount these left bits

 L[]:            18                     21                                 …
 S[]: 1 3 4 6 7 9 10 13 2 5 7 9 12 13 18 19 1 3 7 …




                                                                           23
Implementation: Practice
• Low Computation Costs & High Cache Penalties
   – 3 cache/TLB misses per rank




                         ex. rank1(402=256*1+32*4+18, B)
                256bit

  B[]: 01..000000....101......0 0110....001...............0 0000100 ...
        32bit                      Miss!        Popcount these left bits

 L[]:            18      Miss!          21                                 …
 S[]: 1 3 4 6 7 9 10 13 2 5 7 9 12 13 18 19 1 3 7 …
                           Miss!




                                                                           24
Implementation: Practice
• Packing the required data into a single cacheline




                                 56B Chunk
         4B                 1B                     32B


   ・・・        12B padding
                                         0110....001..........0 padding


                                 64B Cache line




                                                                          25
Implementation: Practice
• Packing the required data into a single cacheline




                                                      26
Implementation: Practice
• BTW, where select?
  – Omitted for my time limit 
  – Plz see the code ...


• 2 Way Implementation
  – O(logN) complexity
     • ux-trie, rx, and marisa-trie
     • Binary searches with rank
     • Many cache/TLB misses suffered


  – O(1) complexity
     • My implementation to minimize these penalties
     • 1-rank, 1-SIMD comparison, and O(1) –bsf
     • Only 2 cache/TLB misses
                                                       27
Implementation: Practice
• BTW, where select?
  – Omitted for my time limit 
  – Plz see the code ...


• 2 Way Implementation
  – O(logN) complexity
     • ux-trie, rx, and marisa-trie
     • Binary searches with rank
     • Many cache/TLB misses suffered


  – O(1) complexity
     • My implementation to minimize these penalties
     • 1-rank, 1-SIMD comparison, and O(1) –bsf
     • Only 2 cache/TLB misses
                      Not implemented yet ...

                                                       28

More Related Content

What's hot

Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
DB Tsai
 
Data assimilation with OpenDA
Data assimilation with OpenDAData assimilation with OpenDA
Data assimilation with OpenDA
nilsvanvelzen
 
Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011
Ed Dodds
 
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...
Yusuf Bhujwalla
 
Binary decision diagrams
Binary decision diagramsBinary decision diagrams
Binary decision diagrams
haroonrashidlone
 
2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup
David Smiley
 
Lucene/Solr spatial in 2015
Lucene/Solr spatial in 2015Lucene/Solr spatial in 2015
Lucene/Solr spatial in 2015
David Smiley
 
STAQ based Matrix estimation - initial concept (presented at hEART conference...
STAQ based Matrix estimation - initial concept (presented at hEART conference...STAQ based Matrix estimation - initial concept (presented at hEART conference...
STAQ based Matrix estimation - initial concept (presented at hEART conference...
Luuk Brederode
 
The status of the GeoServer WPS
The status of the GeoServer WPSThe status of the GeoServer WPS
The status of the GeoServer WPS
GeoSolutions
 
Reduced ordered binary decision diagram
Reduced ordered binary decision diagramReduced ordered binary decision diagram
Reduced ordered binary decision diagram
Team-VLSI-ITMU
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
佳蓉 倪
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
Kazuki Fujikawa
 
19. algorithms and-complexity
19. algorithms and-complexity19. algorithms and-complexity
19. algorithms and-complexityashishtinku
 
Algorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAlgorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAdelina Ahadova
 

What's hot (17)

Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Data assimilation with OpenDA
Data assimilation with OpenDAData assimilation with OpenDA
Data assimilation with OpenDA
 
Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011
 
4241
42414241
4241
 
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...
 
Binary decision diagrams
Binary decision diagramsBinary decision diagrams
Binary decision diagrams
 
2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup
 
An32272275
An32272275An32272275
An32272275
 
Lucene/Solr spatial in 2015
Lucene/Solr spatial in 2015Lucene/Solr spatial in 2015
Lucene/Solr spatial in 2015
 
STAQ based Matrix estimation - initial concept (presented at hEART conference...
STAQ based Matrix estimation - initial concept (presented at hEART conference...STAQ based Matrix estimation - initial concept (presented at hEART conference...
STAQ based Matrix estimation - initial concept (presented at hEART conference...
 
The status of the GeoServer WPS
The status of the GeoServer WPSThe status of the GeoServer WPS
The status of the GeoServer WPS
 
Reduced ordered binary decision diagram
Reduced ordered binary decision diagramReduced ordered binary decision diagram
Reduced ordered binary decision diagram
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
19. algorithms and-complexity
19. algorithms and-complexity19. algorithms and-complexity
19. algorithms and-complexity
 
Algorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAlgorithm Complexity and Main Concepts
Algorithm Complexity and Main Concepts
 
MSc Presentation
MSc PresentationMSc Presentation
MSc Presentation
 

Viewers also liked

Haswellサーベイと有限体クラスの紹介
Haswellサーベイと有限体クラスの紹介Haswellサーベイと有限体クラスの紹介
Haswellサーベイと有限体クラスの紹介MITSUNARI Shigeo
 
x86x64 SSE4.2 POPCNT
x86x64 SSE4.2 POPCNTx86x64 SSE4.2 POPCNT
x86x64 SSE4.2 POPCNT
takesako
 
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
Ryoma Sin'ya
 
Popcntによるハミング距離計算
Popcntによるハミング距離計算Popcntによるハミング距離計算
Popcntによるハミング距離計算
Norishige Fukushima
 
X86opti01 nothingcosmos
X86opti01 nothingcosmosX86opti01 nothingcosmos
X86opti01 nothingcosmos
nothingcosmos
 
明日使えないすごいビット演算
明日使えないすごいビット演算明日使えないすごいビット演算
明日使えないすごいビット演算
京大 マイコンクラブ
 

Viewers also liked (6)

Haswellサーベイと有限体クラスの紹介
Haswellサーベイと有限体クラスの紹介Haswellサーベイと有限体クラスの紹介
Haswellサーベイと有限体クラスの紹介
 
x86x64 SSE4.2 POPCNT
x86x64 SSE4.2 POPCNTx86x64 SSE4.2 POPCNT
x86x64 SSE4.2 POPCNT
 
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
 
Popcntによるハミング距離計算
Popcntによるハミング距離計算Popcntによるハミング距離計算
Popcntによるハミング距離計算
 
X86opti01 nothingcosmos
X86opti01 nothingcosmosX86opti01 nothingcosmos
X86opti01 nothingcosmos
 
明日使えないすごいビット演算
明日使えないすごいビット演算明日使えないすごいビット演算
明日使えないすごいビット演算
 

Similar to A x86-optimized rank&select dictionary for bit sequences

Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applications
Yu Liu
 
Threshold and Proactive Pseudo-Random Permutations
Threshold and Proactive Pseudo-Random PermutationsThreshold and Proactive Pseudo-Random Permutations
Threshold and Proactive Pseudo-Random PermutationsAleksandr Yampolskiy
 
Slide11 icc2015
Slide11 icc2015Slide11 icc2015
Slide11 icc2015
T. E. BOGALE
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
Sean Moran
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx
pallavidhade2
 
Mmclass5
Mmclass5Mmclass5
Mmclass5
Hassan Dar
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorPTIHPA
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.
Leonid Zhukov
 
Basic data structures part I
Basic data structures part IBasic data structures part I
Basic data structures part I
Daniel Gomez-Prado
 
Ch01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluitonCh01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluiton
shin
 
Generic parallelization strategies for data assimilation
Generic parallelization strategies for data assimilationGeneric parallelization strategies for data assimilation
Generic parallelization strategies for data assimilation
nilsvanvelzen
 
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
Alex Pruden
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Sean Moran
 
system software 16 marks
system software 16 markssystem software 16 marks
system software 16 marks
vvcetit
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select Dictionaries
Rakuten Group, Inc.
 
Code generation in Compiler Design
Code generation in Compiler DesignCode generation in Compiler Design
Code generation in Compiler Design
Kuppusamy P
 
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Matthew Lease
 
15 06-0459-02-003c-cm-matlab-release-0-85-support-document
15 06-0459-02-003c-cm-matlab-release-0-85-support-document15 06-0459-02-003c-cm-matlab-release-0-85-support-document
15 06-0459-02-003c-cm-matlab-release-0-85-support-document
maomao125
 
Selective encoding for abstractive sentence summarization
Selective encoding for abstractive sentence summarizationSelective encoding for abstractive sentence summarization
Selective encoding for abstractive sentence summarization
Kodaira Tomonori
 
Systemsoftwarenotes 100929171256-phpapp02 2
Systemsoftwarenotes 100929171256-phpapp02 2Systemsoftwarenotes 100929171256-phpapp02 2
Systemsoftwarenotes 100929171256-phpapp02 2Khaja Dileef
 

Similar to A x86-optimized rank&select dictionary for bit sequences (20)

Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applications
 
Threshold and Proactive Pseudo-Random Permutations
Threshold and Proactive Pseudo-Random PermutationsThreshold and Proactive Pseudo-Random Permutations
Threshold and Proactive Pseudo-Random Permutations
 
Slide11 icc2015
Slide11 icc2015Slide11 icc2015
Slide11 icc2015
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx
 
Mmclass5
Mmclass5Mmclass5
Mmclass5
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.
 
Basic data structures part I
Basic data structures part IBasic data structures part I
Basic data structures part I
 
Ch01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluitonCh01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluiton
 
Generic parallelization strategies for data assimilation
Generic parallelization strategies for data assimilationGeneric parallelization strategies for data assimilation
Generic parallelization strategies for data assimilation
 
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
 
system software 16 marks
system software 16 markssystem software 16 marks
system software 16 marks
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select Dictionaries
 
Code generation in Compiler Design
Code generation in Compiler DesignCode generation in Compiler Design
Code generation in Compiler Design
 
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
 
15 06-0459-02-003c-cm-matlab-release-0-85-support-document
15 06-0459-02-003c-cm-matlab-release-0-85-support-document15 06-0459-02-003c-cm-matlab-release-0-85-support-document
15 06-0459-02-003c-cm-matlab-release-0-85-support-document
 
Selective encoding for abstractive sentence summarization
Selective encoding for abstractive sentence summarizationSelective encoding for abstractive sentence summarization
Selective encoding for abstractive sentence summarization
 
Systemsoftwarenotes 100929171256-phpapp02 2
Systemsoftwarenotes 100929171256-phpapp02 2Systemsoftwarenotes 100929171256-phpapp02 2
Systemsoftwarenotes 100929171256-phpapp02 2
 

More from Takeshi Yamamuro

LT: Spark 3.1 Feature Expectation
LT: Spark 3.1 Feature ExpectationLT: Spark 3.1 Feature Expectation
LT: Spark 3.1 Feature Expectation
Takeshi Yamamuro
 
Apache Spark + Arrow
Apache Spark + ArrowApache Spark + Arrow
Apache Spark + Arrow
Takeshi Yamamuro
 
Quick Overview of Upcoming Spark 3.0 + α
Quick Overview of Upcoming Spark 3.0 + αQuick Overview of Upcoming Spark 3.0 + α
Quick Overview of Upcoming Spark 3.0 + α
Takeshi Yamamuro
 
MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理
Takeshi Yamamuro
 
Taming Distributed/Parallel Query Execution Engine of Apache Spark
Taming Distributed/Parallel Query Execution Engine of Apache SparkTaming Distributed/Parallel Query Execution Engine of Apache Spark
Taming Distributed/Parallel Query Execution Engine of Apache Spark
Takeshi Yamamuro
 
LLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecodeLLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecode
Takeshi Yamamuro
 
20180417 hivemall meetup#4
20180417 hivemall meetup#420180417 hivemall meetup#4
20180417 hivemall meetup#4
Takeshi Yamamuro
 
An Experimental Study of Bitmap Compression vs. Inverted List Compression
An Experimental Study of Bitmap Compression vs. Inverted List CompressionAn Experimental Study of Bitmap Compression vs. Inverted List Compression
An Experimental Study of Bitmap Compression vs. Inverted List Compression
Takeshi Yamamuro
 
Sparkのクエリ処理系と周辺の話題
Sparkのクエリ処理系と周辺の話題Sparkのクエリ処理系と周辺の話題
Sparkのクエリ処理系と周辺の話題
Takeshi Yamamuro
 
20160908 hivemall meetup
20160908 hivemall meetup20160908 hivemall meetup
20160908 hivemall meetup
Takeshi Yamamuro
 
20150513 legobease
20150513 legobease20150513 legobease
20150513 legobease
Takeshi Yamamuro
 
20150516 icde2015 r19-4
20150516 icde2015 r19-420150516 icde2015 r19-4
20150516 icde2015 r19-4
Takeshi Yamamuro
 
VLDB2013 R1 Emerging Hardware
VLDB2013 R1 Emerging HardwareVLDB2013 R1 Emerging Hardware
VLDB2013 R1 Emerging HardwareTakeshi Yamamuro
 
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4Takeshi Yamamuro
 
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)Takeshi Yamamuro
 
Introduction to Modern Analytical DB
Introduction to Modern Analytical DBIntroduction to Modern Analytical DB
Introduction to Modern Analytical DBTakeshi Yamamuro
 
SIGMOD’12勉強会 -Session 7-
SIGMOD’12勉強会 -Session 7-SIGMOD’12勉強会 -Session 7-
SIGMOD’12勉強会 -Session 7-Takeshi Yamamuro
 
VLDB’11勉強会 -Session 9-
VLDB’11勉強会 -Session 9-VLDB’11勉強会 -Session 9-
VLDB’11勉強会 -Session 9-Takeshi Yamamuro
 
研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法Takeshi Yamamuro
 

More from Takeshi Yamamuro (20)

LT: Spark 3.1 Feature Expectation
LT: Spark 3.1 Feature ExpectationLT: Spark 3.1 Feature Expectation
LT: Spark 3.1 Feature Expectation
 
Apache Spark + Arrow
Apache Spark + ArrowApache Spark + Arrow
Apache Spark + Arrow
 
Quick Overview of Upcoming Spark 3.0 + α
Quick Overview of Upcoming Spark 3.0 + αQuick Overview of Upcoming Spark 3.0 + α
Quick Overview of Upcoming Spark 3.0 + α
 
MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理
 
Taming Distributed/Parallel Query Execution Engine of Apache Spark
Taming Distributed/Parallel Query Execution Engine of Apache SparkTaming Distributed/Parallel Query Execution Engine of Apache Spark
Taming Distributed/Parallel Query Execution Engine of Apache Spark
 
LLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecodeLLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecode
 
20180417 hivemall meetup#4
20180417 hivemall meetup#420180417 hivemall meetup#4
20180417 hivemall meetup#4
 
An Experimental Study of Bitmap Compression vs. Inverted List Compression
An Experimental Study of Bitmap Compression vs. Inverted List CompressionAn Experimental Study of Bitmap Compression vs. Inverted List Compression
An Experimental Study of Bitmap Compression vs. Inverted List Compression
 
Sparkのクエリ処理系と周辺の話題
Sparkのクエリ処理系と周辺の話題Sparkのクエリ処理系と周辺の話題
Sparkのクエリ処理系と周辺の話題
 
20160908 hivemall meetup
20160908 hivemall meetup20160908 hivemall meetup
20160908 hivemall meetup
 
20150513 legobease
20150513 legobease20150513 legobease
20150513 legobease
 
20150516 icde2015 r19-4
20150516 icde2015 r19-420150516 icde2015 r19-4
20150516 icde2015 r19-4
 
VLDB2013 R1 Emerging Hardware
VLDB2013 R1 Emerging HardwareVLDB2013 R1 Emerging Hardware
VLDB2013 R1 Emerging Hardware
 
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
 
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
 
Introduction to Modern Analytical DB
Introduction to Modern Analytical DBIntroduction to Modern Analytical DB
Introduction to Modern Analytical DB
 
SIGMOD’12勉強会 -Session 7-
SIGMOD’12勉強会 -Session 7-SIGMOD’12勉強会 -Session 7-
SIGMOD’12勉強会 -Session 7-
 
VAST-Tree, EDBT'12
VAST-Tree, EDBT'12VAST-Tree, EDBT'12
VAST-Tree, EDBT'12
 
VLDB’11勉強会 -Session 9-
VLDB’11勉強会 -Session 9-VLDB’11勉強会 -Session 9-
VLDB’11勉強会 -Session 9-
 
研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法
 

Recently uploaded

In the Adani-Hindenburg case, what is SEBI investigating.pptx
In the Adani-Hindenburg case, what is SEBI investigating.pptxIn the Adani-Hindenburg case, what is SEBI investigating.pptx
In the Adani-Hindenburg case, what is SEBI investigating.pptx
Adani case
 
Affordable Stationery Printing Services in Jaipur | Navpack n Print
Affordable Stationery Printing Services in Jaipur | Navpack n PrintAffordable Stationery Printing Services in Jaipur | Navpack n Print
Affordable Stationery Printing Services in Jaipur | Navpack n Print
Navpack & Print
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
bosssp10
 
Set off and carry forward of losses and assessment of individuals.pptx
Set off and carry forward of losses and assessment of individuals.pptxSet off and carry forward of losses and assessment of individuals.pptx
Set off and carry forward of losses and assessment of individuals.pptx
HARSHITHV26
 
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdfSearch Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Arihant Webtech Pvt. Ltd
 
ikea_woodgreen_petscharity_dog-alogue_digital.pdf
ikea_woodgreen_petscharity_dog-alogue_digital.pdfikea_woodgreen_petscharity_dog-alogue_digital.pdf
ikea_woodgreen_petscharity_dog-alogue_digital.pdf
agatadrynko
 
Business Valuation Principles for Entrepreneurs
Business Valuation Principles for EntrepreneursBusiness Valuation Principles for Entrepreneurs
Business Valuation Principles for Entrepreneurs
Ben Wann
 
Brand Analysis for an artist named Struan
Brand Analysis for an artist named StruanBrand Analysis for an artist named Struan
Brand Analysis for an artist named Struan
sarahvanessa51503
 
Project File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdfProject File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdf
RajPriye
 
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdfModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
fisherameliaisabella
 
amptalk_RecruitingDeck_english_2024.06.05
amptalk_RecruitingDeck_english_2024.06.05amptalk_RecruitingDeck_english_2024.06.05
amptalk_RecruitingDeck_english_2024.06.05
marketing317746
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
Norma Mushkat Gaffin
 
20240425_ TJ Communications Credentials_compressed.pdf
20240425_ TJ Communications Credentials_compressed.pdf20240425_ TJ Communications Credentials_compressed.pdf
20240425_ TJ Communications Credentials_compressed.pdf
tjcomstrang
 
Auditing study material for b.com final year students
Auditing study material for b.com final year  studentsAuditing study material for b.com final year  students
Auditing study material for b.com final year students
narasimhamurthyh4
 
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Lviv Startup Club
 
Training my puppy and implementation in this story
Training my puppy and implementation in this storyTraining my puppy and implementation in this story
Training my puppy and implementation in this story
WilliamRodrigues148
 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptx
Cynthia Clay
 
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdfMeas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
dylandmeas
 
Buy Verified PayPal Account | Buy Google 5 Star Reviews
Buy Verified PayPal Account | Buy Google 5 Star ReviewsBuy Verified PayPal Account | Buy Google 5 Star Reviews
Buy Verified PayPal Account | Buy Google 5 Star Reviews
usawebmarket
 
Exploring Patterns of Connection with Social Dreaming
Exploring Patterns of Connection with Social DreamingExploring Patterns of Connection with Social Dreaming
Exploring Patterns of Connection with Social Dreaming
Nicola Wreford-Howard
 

Recently uploaded (20)

In the Adani-Hindenburg case, what is SEBI investigating.pptx
In the Adani-Hindenburg case, what is SEBI investigating.pptxIn the Adani-Hindenburg case, what is SEBI investigating.pptx
In the Adani-Hindenburg case, what is SEBI investigating.pptx
 
Affordable Stationery Printing Services in Jaipur | Navpack n Print
Affordable Stationery Printing Services in Jaipur | Navpack n PrintAffordable Stationery Printing Services in Jaipur | Navpack n Print
Affordable Stationery Printing Services in Jaipur | Navpack n Print
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
 
Set off and carry forward of losses and assessment of individuals.pptx
Set off and carry forward of losses and assessment of individuals.pptxSet off and carry forward of losses and assessment of individuals.pptx
Set off and carry forward of losses and assessment of individuals.pptx
 
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdfSearch Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
 
ikea_woodgreen_petscharity_dog-alogue_digital.pdf
ikea_woodgreen_petscharity_dog-alogue_digital.pdfikea_woodgreen_petscharity_dog-alogue_digital.pdf
ikea_woodgreen_petscharity_dog-alogue_digital.pdf
 
Business Valuation Principles for Entrepreneurs
Business Valuation Principles for EntrepreneursBusiness Valuation Principles for Entrepreneurs
Business Valuation Principles for Entrepreneurs
 
Brand Analysis for an artist named Struan
Brand Analysis for an artist named StruanBrand Analysis for an artist named Struan
Brand Analysis for an artist named Struan
 
Project File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdfProject File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdf
 
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdfModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
 
amptalk_RecruitingDeck_english_2024.06.05
amptalk_RecruitingDeck_english_2024.06.05amptalk_RecruitingDeck_english_2024.06.05
amptalk_RecruitingDeck_english_2024.06.05
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
 
20240425_ TJ Communications Credentials_compressed.pdf
20240425_ TJ Communications Credentials_compressed.pdf20240425_ TJ Communications Credentials_compressed.pdf
20240425_ TJ Communications Credentials_compressed.pdf
 
Auditing study material for b.com final year students
Auditing study material for b.com final year  studentsAuditing study material for b.com final year  students
Auditing study material for b.com final year students
 
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)
 
Training my puppy and implementation in this story
Training my puppy and implementation in this storyTraining my puppy and implementation in this story
Training my puppy and implementation in this story
 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptx
 
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdfMeas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
 
Buy Verified PayPal Account | Buy Google 5 Star Reviews
Buy Verified PayPal Account | Buy Google 5 Star ReviewsBuy Verified PayPal Account | Buy Google 5 Star Reviews
Buy Verified PayPal Account | Buy Google 5 Star Reviews
 
Exploring Patterns of Connection with Social Dreaming
Exploring Patterns of Connection with Social DreamingExploring Patterns of Connection with Social Dreaming
Exploring Patterns of Connection with Social Dreaming
 

A x86-optimized rank&select dictionary for bit sequences

  • 1. x86/x64最適化勉強会#4 A x86-optimized rank/select dictionary for bit sequences 2012/6/16 Takeshi Yamamuro 1
  • 2. What’s Succinct Data Structure? 2
  • 3. SDS: Succinct Data Structure • Recently, Getting Popular in Some Areas – Researches & Engineering • Not Data Structure, But Data Representation – A compressed method for other data structures – e.g., alphabets, trees, and graphs • Transparent Operations w/o Unpacking Explicitly – e.g., succinct LZ77 compression*1 *1 3 Kreft, S. and Navarro, G.: LZ77-Like Compression with Fast Random Access, In Proceedings of DCC, 2010
  • 4. More Details • SDS = Succinct Data + Succinct Index • Succinct Data – Compact representation for target data – Almost to information theoretic lower bounds e.g., If N patterns, the lower bound’s logN • Succinct Index – O(1) operations for target data – o(N) space costs: ignored asymptotically 4
  • 5. More Details If you need more information, ... cited from: http://goo.gl/rkQ5z 5
  • 7. A Rank/Select Operations • SDS Composed of Rank/Select Operations – Many calls of rank/select inside • Rank/Select for Succinct Bit Sequences: B[i] – rankx(n, B): the total of 1s in B[0...n] – selectx(n, B): n-th position of x in B[] i 0 1 2 3 4 5 6 7 8 B[i] 1 0 1 1 0 0 1 1 0 rank1(5, B)=3 select1(4, B)=6 7
  • 8. A Rank/Select Operations • Available Rank/Select Implementation – ux-trie: http://code.google.com/p/ux-trie/ – rx: http://code.google.com/p/mozc/ – marisa-trie: http://code.google.com/p/marisa-trie/ • Today Contributions – x86-optimized rank/select – https://github.com/maropu/dbitv 8
  • 9. Performance Results • Performance Benchmark Setups*1 – Generate a random sequence of bits: 50% density – Random rank/select queries over the bits – CPU: Intel Core-i5 U470@1.33GHz • Latency Observed – 11 trials, and median latency *1 9 Reference: http://d.hatena.ne.jp/s-yata/20111216/1324032373
  • 10. Performance Results: Rank 1.E+03 averaged rank latency (ns) 1.E+02 1.E+01 ux rx marisa opt 1.E+00 bit length 10
  • 11. Performance Results: Select 1.E+04 averaged select latency (ns) 1.E+03 1.E+02 ux 1.E+01 rx marisa opt 1.E+00 bit length 11
  • 13. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space B[] = A sequence of bits N-bits 13
  • 14. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 • Split into log2N fixed-length blocks • Total Counts Pre-computed in L[] x x / log 2 N  x rank1 ( x, B)   B[i ]   B[i ]   B[i] i 1 i 1   i  x / log 2 N 1 L1[ x / log 2 N ] 14
  • 15. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 • Split into log2N fixed-length blocks • Total Counts Pre-computed in L[] x x / log 2 N  x rank1 ( x, B)   B[i ]   B[i ]   B[i] i 1 i 1   i  x / log 2 N 1 L[ x / log 2 N ] O(log2N) O(1) 15
  • 16. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 • L[]: o(N) space costs N N 2  log N  O( )  o( N ) log N log N 16
  • 17. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 1 log n 2 S[] = s1 s2 • Split into 1/2logN fixed-length blocks again • Total Counts Pre-computed in S[]  1  x x / log N  2  x / 2 log N    x rank1 ( x, B)   B[i ]   B[i ]   B[i]   B[i] i 1 i 1   i  x / log 2 N 1  1  i   x / log N  1  2  1 L[ x / log 2 n] S[ x / log n] 2 17
  • 18. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 1 log n 2 S[] = s1 s2 • Split into 1/2logN fixed-length blocks again • Total Counts Pre-computed in S[]  1  O(logN) x / log N  2 x / log N   2 x   x rank1 ( x, B)   B[i ]   B[i ]   B[i]   B[i] i 1 i 1   i  x / log 2 N 1  1  i   x / log N  1  2  1 L[ x / log 2 n] S [ x / log n] 2 O(1) O(1) 18
  • 19. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 1 log n 2 S[] = s1 s2 • S[]: o(N) space costs N log log N 2  log(log N )  O( N  2 )  o( N ) 1 2 log N log N 19
  • 20. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 1 log n 2 S[] = s1 s2 • O(1) Popcount/Table-Lookup in Last Term  1  O(logN) -> O(1) x x / log 2 N   x / 2 log N    x rank1 ( x, B)   B[i ]   B[i ]   B[i]   B[i] i 1 i 1   i  x / log 2 N 1  1  i   x / log N  1  2  1 L[ x / log 2 n] S [ x / log n] 2 O(1) O(1) 20
  • 21. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 1 log n 2 S[] = s1 s2 • As a result, o(N) Space Costs N 4 N log log N log log N   O( N  )  o( N ) log N log N log N L[] size S[] size 21
  • 22. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space 22
  • 23. Implementation: Practice • Low Computation Costs & High Cache Penalties – 3 cache/TLB misses per rank ex. rank1(402=256*1+32*4+18, B) 256bit B[]: 01..000000....101......0 0110....001...............0 0000100 ... 32bit Popcount these left bits L[]: 18 21 … S[]: 1 3 4 6 7 9 10 13 2 5 7 9 12 13 18 19 1 3 7 … 23
  • 24. Implementation: Practice • Low Computation Costs & High Cache Penalties – 3 cache/TLB misses per rank ex. rank1(402=256*1+32*4+18, B) 256bit B[]: 01..000000....101......0 0110....001...............0 0000100 ... 32bit Miss! Popcount these left bits L[]: 18 Miss! 21 … S[]: 1 3 4 6 7 9 10 13 2 5 7 9 12 13 18 19 1 3 7 … Miss! 24
  • 25. Implementation: Practice • Packing the required data into a single cacheline 56B Chunk 4B 1B 32B ・・・ 12B padding 0110....001..........0 padding 64B Cache line 25
  • 26. Implementation: Practice • Packing the required data into a single cacheline 26
  • 27. Implementation: Practice • BTW, where select? – Omitted for my time limit  – Plz see the code ... • 2 Way Implementation – O(logN) complexity • ux-trie, rx, and marisa-trie • Binary searches with rank • Many cache/TLB misses suffered – O(1) complexity • My implementation to minimize these penalties • 1-rank, 1-SIMD comparison, and O(1) –bsf • Only 2 cache/TLB misses 27
  • 28. Implementation: Practice • BTW, where select? – Omitted for my time limit  – Plz see the code ... • 2 Way Implementation – O(logN) complexity • ux-trie, rx, and marisa-trie • Binary searches with rank • Many cache/TLB misses suffered – O(1) complexity • My implementation to minimize these penalties • 1-rank, 1-SIMD comparison, and O(1) –bsf • Only 2 cache/TLB misses Not implemented yet ... 28