Index types

639 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
639
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Index types

  1. 1. Index types: Inverted indexid make year0 toyota 19961 mazda 19962 toyota 19963 ford 20024 toyota 20025 mazda 20026 toyota 20027 toyota 20098 ford 2009
  2. 2. Index types: Inverted indexid make year Toyota -> 0, 2, 4, 6, 70 toyota 1996 Mazda -> 1, 51 mazda 1996 Ford -> 3, 82 toyota 19963 ford 20024 toyota 20025 mazda 20026 toyota 20027 toyota 20098 ford 2009
  3. 3. Inverted index is cheap if the column is sortedid make year “1996”-> 0-20 toyota 1996 “2002”-> 3-61 mazda 1996 “2009”-> 7-82 toyota 19963 ford 20024 toyota 2002 2 integers per each unique value5 mazda 20026 toyota 20027 toyota 20098 ford 2009
  4. 4. Index types: Forward indexid make year0 toyota 19961 mazda 19962 toyota 19963 ford 20024 toyota 20025 mazda 20026 toyota 20027 toyota 20098 ford 2009
  5. 5. Index types: Forward index Sorted valuesid make year array:0 toyota 1996 Value Index1 mazda 1996 ford 0 mazda 12 toyota 1996 toyota 23 ford 20024 toyota 20025 mazda 20026 toyota 20027 toyota 20098 ford 2009
  6. 6. Index types: Forward index Sorted values Forward index for ‘make’id make year array: column:0 toyota 1996 Value Index id value id1 mazda 1996 ford 0 0 2 mazda 1 1 12 toyota 1996 toyota 2 2 23 ford 2002 3 04 toyota 2002 4 25 mazda 2002 5 1 6 26 toyota 2002 7 27 toyota 2009 8 08 ford 2009
  7. 7. How to compress the forward index Fixed bit size encoding• 1000 unique field values would require 10 bits per document• In general we need X bits per document, where x = log2(valueArray.length)
  8. 8. Ways to save memory• Use dictionary compression• Avoid storing inverted index if the column isn’t sorted• Use fixed bit size encoding for Forward Index
  9. 9. How much do we actually save in the real world use case? Column Type Column Type advertiserId int memberId int creativeId int industry int campaignId int region int campaignType String seniority String age char titles Int[] company int requestType String education int time int function String impressionCount int gender char
  10. 10. Space requirements per documentSensei Other OLAP datastore Pinot Sensei>100 bytes ~100 bytes 16 bytesOther OLAP data store andregular Sensei do notcompress indexes. We can fit7 times more documents inRAM than Other OLAPdatastore

×