1. Index types: Inverted index
id make year
0 toyota 1996
1 mazda 1996
2 toyota 1996
3 ford 2002
4 toyota 2002
5 mazda 2002
6 toyota 2002
7 toyota 2009
8 ford 2009
2. Index types: Inverted index
id make year
Toyota -> 0, 2, 4, 6, 7
0 toyota 1996
Mazda -> 1, 5
1 mazda 1996
Ford -> 3, 8
2 toyota 1996
3 ford 2002
4 toyota 2002
5 mazda 2002
6 toyota 2002
7 toyota 2009
8 ford 2009
3. Inverted index is cheap if the column
is sorted
id make year
“1996”-> 0-2
0 toyota 1996
“2002”-> 3-6
1 mazda 1996
“2009”-> 7-8
2 toyota 1996
3 ford 2002
4 toyota 2002 2 integers per each unique value
5 mazda 2002
6 toyota 2002
7 toyota 2009
8 ford 2009
4. Index types: Forward index
id make year
0 toyota 1996
1 mazda 1996
2 toyota 1996
3 ford 2002
4 toyota 2002
5 mazda 2002
6 toyota 2002
7 toyota 2009
8 ford 2009
5. Index types: Forward index
Sorted values
id make year
array:
0 toyota 1996 Value Index
1 mazda 1996 ford 0
mazda 1
2 toyota 1996
toyota 2
3 ford 2002
4 toyota 2002
5 mazda 2002
6 toyota 2002
7 toyota 2009
8 ford 2009
6. Index types: Forward index
Sorted values Forward index for ‘make’
id make year
array: column:
0 toyota 1996 Value Index id value id
1 mazda 1996 ford 0 0 2
mazda 1 1 1
2 toyota 1996
toyota 2 2 2
3 ford 2002
3 0
4 toyota 2002 4 2
5 mazda 2002 5 1
6 2
6 toyota 2002
7 2
7 toyota 2009
8 0
8 ford 2009
7. How to compress the forward index
Fixed bit size encoding
• 1000 unique field values would require 10
bits per document
• In general we need X bits per document,
where
x = log2(valueArray.length)
8. Ways to save memory
• Use dictionary compression
• Avoid storing inverted index if the column isn’t
sorted
• Use fixed bit size encoding for Forward Index
9. How much do we actually save in the
real world use case?
Column Type Column Type
advertiserId int memberId int
creativeId int industry int
campaignId int region int
campaignType String seniority String
age char titles Int[]
company int requestType String
education int time int
function String impressionCount int
gender char
10. Space requirements per document
Sensei Other OLAP datastore Pinot Sensei
>100 bytes ~100 bytes 16 bytes
Other OLAP data store and
regular Sensei do not
compress indexes. We can fit
7 times more documents in
RAM than Other OLAP
datastore