Index types

Index types: Inverted index
id make year

0 toyota 1996

1 mazda 1996

2 toyota 1996

3 ford 2002

4 toyota 2002

5 mazda 2002

6 toyota 2002

7 toyota 2009

8 ford 2009

Index types: Inverted index
id make year
Toyota -> 0, 2, 4, 6, 7
0 toyota 1996
Mazda -> 1, 5
1 mazda 1996
Ford -> 3, 8
2 toyota 1996

3 ford 2002

4 toyota 2002

5 mazda 2002

6 toyota 2002

7 toyota 2009

8 ford 2009

Inverted index is cheap if the column
is sorted
id make year
“1996”-> 0-2
0 toyota 1996
“2002”-> 3-6
1 mazda 1996
“2009”-> 7-8
2 toyota 1996

3 ford 2002

4 toyota 2002 2 integers per each unique value
5 mazda 2002

6 toyota 2002

7 toyota 2009

8 ford 2009

Index types: Forward index
id make year

0 toyota 1996

1 mazda 1996

2 toyota 1996

3 ford 2002

4 toyota 2002

5 mazda 2002

6 toyota 2002

7 toyota 2009

8 ford 2009

Sorted values
id make year
array:
0 toyota 1996 Value Index

1 mazda 1996 ford 0

mazda 1
2 toyota 1996
toyota 2
3 ford 2002

4 toyota 2002

5 mazda 2002

6 toyota 2002

7 toyota 2009

8 ford 2009

Sorted values Forward index for ‘make’
id make year
array: column:
0 toyota 1996 Value Index id value id

1 mazda 1996 ford 0 0 2
mazda 1 1 1
2 toyota 1996
toyota 2 2 2
3 ford 2002
3 0
4 toyota 2002 4 2

5 mazda 2002 5 1

6 2
6 toyota 2002
7 2
7 toyota 2009
8 0

8 ford 2009

How to compress the forward index
Fixed bit size encoding
• 1000 unique field values would require 10
bits per document
• In general we need X bits per document,
where
x = log2(valueArray.length)

Ways to save memory
• Use dictionary compression
• Avoid storing inverted index if the column isn’t
sorted
• Use fixed bit size encoding for Forward Index

How much do we actually save in the
real world use case?

Column Type Column Type
advertiserId int memberId int
creativeId int industry int
campaignId int region int
campaignType String seniority String
age char titles Int[]
company int requestType String
education int time int
function String impressionCount int
gender char

Space requirements per document
Sensei Other OLAP datastore Pinot Sensei
>100 bytes ~100 bytes 16 bytes

Other OLAP data store and
regular Sensei do not
compress indexes. We can fit
7 times more documents in
RAM than Other OLAP
datastore

Index types

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Index types

Similar to Index types (10)

Index types