Index types: Inverted index
id   make     year

0    toyota   1996

1    mazda    1996

2    toyota   1996

3    ford     2002

4    toyota   2002

5    mazda    2002

6    toyota   2002

7    toyota   2009

8    ford     2009
Index types: Inverted index
id   make     year
                       Toyota -> 0, 2, 4, 6, 7
0    toyota   1996
                       Mazda -> 1, 5
1    mazda    1996
                       Ford -> 3, 8
2    toyota   1996

3    ford     2002

4    toyota   2002

5    mazda    2002

6    toyota   2002

7    toyota   2009

8    ford     2009
Inverted index is cheap if the column
               is sorted
id   make     year
                     “1996”-> 0-2
0    toyota   1996
                     “2002”-> 3-6
1    mazda    1996
                     “2009”-> 7-8
2    toyota   1996

3    ford     2002

4    toyota   2002   2 integers per each unique value
5    mazda    2002

6    toyota   2002

7    toyota   2009

8    ford     2009
Index types: Forward index
id   make     year

0    toyota   1996

1    mazda    1996

2    toyota   1996

3    ford     2002

4    toyota   2002

5    mazda    2002

6    toyota   2002

7    toyota   2009

8    ford     2009
Index types: Forward index
                     Sorted values
id   make     year
                     array:
0    toyota   1996   Value    Index

1    mazda    1996   ford     0

                     mazda    1
2    toyota   1996
                     toyota   2
3    ford     2002

4    toyota   2002

5    mazda    2002

6    toyota   2002

7    toyota   2009

8    ford     2009
Index types: Forward index
                     Sorted values    Forward index for ‘make’
id   make     year
                     array:           column:
0    toyota   1996   Value    Index    id   value id

1    mazda    1996   ford     0        0    2
                     mazda    1        1    1
2    toyota   1996
                     toyota   2        2    2
3    ford     2002
                                       3    0
4    toyota   2002                     4    2

5    mazda    2002                     5    1

                                       6    2
6    toyota   2002
                                       7    2
7    toyota   2009
                                       8    0

8    ford     2009
How to compress the forward index
       Fixed bit size encoding
• 1000 unique field values would require 10
  bits per document
• In general we need X bits per document,
  where
 x = log2(valueArray.length)
Ways to save memory
• Use dictionary compression
• Avoid storing inverted index if the column isn’t
  sorted
• Use fixed bit size encoding for Forward Index
How much do we actually save in the
      real world use case?

  Column         Type     Column            Type
  advertiserId   int      memberId          int
  creativeId     int      industry          int
  campaignId     int      region            int
  campaignType   String   seniority         String
  age            char     titles            Int[]
  company        int      requestType       String
  education      int      time              int
  function       String   impressionCount   int
  gender         char
Space requirements per document
Sensei               Other OLAP datastore   Pinot Sensei
>100 bytes           ~100 bytes             16 bytes



Other OLAP data store and
regular Sensei do not
compress indexes. We can fit
7 times more documents in
RAM than Other OLAP
datastore

Index types

  • 1.
    Index types: Invertedindex id make year 0 toyota 1996 1 mazda 1996 2 toyota 1996 3 ford 2002 4 toyota 2002 5 mazda 2002 6 toyota 2002 7 toyota 2009 8 ford 2009
  • 2.
    Index types: Invertedindex id make year Toyota -> 0, 2, 4, 6, 7 0 toyota 1996 Mazda -> 1, 5 1 mazda 1996 Ford -> 3, 8 2 toyota 1996 3 ford 2002 4 toyota 2002 5 mazda 2002 6 toyota 2002 7 toyota 2009 8 ford 2009
  • 3.
    Inverted index ischeap if the column is sorted id make year “1996”-> 0-2 0 toyota 1996 “2002”-> 3-6 1 mazda 1996 “2009”-> 7-8 2 toyota 1996 3 ford 2002 4 toyota 2002 2 integers per each unique value 5 mazda 2002 6 toyota 2002 7 toyota 2009 8 ford 2009
  • 4.
    Index types: Forwardindex id make year 0 toyota 1996 1 mazda 1996 2 toyota 1996 3 ford 2002 4 toyota 2002 5 mazda 2002 6 toyota 2002 7 toyota 2009 8 ford 2009
  • 5.
    Index types: Forwardindex Sorted values id make year array: 0 toyota 1996 Value Index 1 mazda 1996 ford 0 mazda 1 2 toyota 1996 toyota 2 3 ford 2002 4 toyota 2002 5 mazda 2002 6 toyota 2002 7 toyota 2009 8 ford 2009
  • 6.
    Index types: Forwardindex Sorted values Forward index for ‘make’ id make year array: column: 0 toyota 1996 Value Index id value id 1 mazda 1996 ford 0 0 2 mazda 1 1 1 2 toyota 1996 toyota 2 2 2 3 ford 2002 3 0 4 toyota 2002 4 2 5 mazda 2002 5 1 6 2 6 toyota 2002 7 2 7 toyota 2009 8 0 8 ford 2009
  • 7.
    How to compressthe forward index Fixed bit size encoding • 1000 unique field values would require 10 bits per document • In general we need X bits per document, where x = log2(valueArray.length)
  • 8.
    Ways to savememory • Use dictionary compression • Avoid storing inverted index if the column isn’t sorted • Use fixed bit size encoding for Forward Index
  • 9.
    How much dowe actually save in the real world use case? Column Type Column Type advertiserId int memberId int creativeId int industry int campaignId int region int campaignType String seniority String age char titles Int[] company int requestType String education int time int function String impressionCount int gender char
  • 10.
    Space requirements perdocument Sensei Other OLAP datastore Pinot Sensei >100 bytes ~100 bytes 16 bytes Other OLAP data store and regular Sensei do not compress indexes. We can fit 7 times more documents in RAM than Other OLAP datastore