SlideShare a Scribd company logo
1 of 10
Index types: Inverted index
id   make     year

0    toyota   1996

1    mazda    1996

2    toyota   1996

3    ford     2002

4    toyota   2002

5    mazda    2002

6    toyota   2002

7    toyota   2009

8    ford     2009
Index types: Inverted index
id   make     year
                       Toyota -> 0, 2, 4, 6, 7
0    toyota   1996
                       Mazda -> 1, 5
1    mazda    1996
                       Ford -> 3, 8
2    toyota   1996

3    ford     2002

4    toyota   2002

5    mazda    2002

6    toyota   2002

7    toyota   2009

8    ford     2009
Inverted index is cheap if the column
               is sorted
id   make     year
                     “1996”-> 0-2
0    toyota   1996
                     “2002”-> 3-6
1    mazda    1996
                     “2009”-> 7-8
2    toyota   1996

3    ford     2002

4    toyota   2002   2 integers per each unique value
5    mazda    2002

6    toyota   2002

7    toyota   2009

8    ford     2009
Index types: Forward index
id   make     year

0    toyota   1996

1    mazda    1996

2    toyota   1996

3    ford     2002

4    toyota   2002

5    mazda    2002

6    toyota   2002

7    toyota   2009

8    ford     2009
Index types: Forward index
                     Sorted values
id   make     year
                     array:
0    toyota   1996   Value    Index

1    mazda    1996   ford     0

                     mazda    1
2    toyota   1996
                     toyota   2
3    ford     2002

4    toyota   2002

5    mazda    2002

6    toyota   2002

7    toyota   2009

8    ford     2009
Index types: Forward index
                     Sorted values    Forward index for ‘make’
id   make     year
                     array:           column:
0    toyota   1996   Value    Index    id   value id

1    mazda    1996   ford     0        0    2
                     mazda    1        1    1
2    toyota   1996
                     toyota   2        2    2
3    ford     2002
                                       3    0
4    toyota   2002                     4    2

5    mazda    2002                     5    1

                                       6    2
6    toyota   2002
                                       7    2
7    toyota   2009
                                       8    0

8    ford     2009
How to compress the forward index
       Fixed bit size encoding
• 1000 unique field values would require 10
  bits per document
• In general we need X bits per document,
  where
 x = log2(valueArray.length)
Ways to save memory
• Use dictionary compression
• Avoid storing inverted index if the column isn’t
  sorted
• Use fixed bit size encoding for Forward Index
How much do we actually save in the
      real world use case?

  Column         Type     Column            Type
  advertiserId   int      memberId          int
  creativeId     int      industry          int
  campaignId     int      region            int
  campaignType   String   seniority         String
  age            char     titles            Int[]
  company        int      requestType       String
  education      int      time              int
  function       String   impressionCount   int
  gender         char
Space requirements per document
Sensei               Other OLAP datastore   Pinot Sensei
>100 bytes           ~100 bytes             16 bytes



Other OLAP data store and
regular Sensei do not
compress indexes. We can fit
7 times more documents in
RAM than Other OLAP
datastore

More Related Content

Viewers also liked

Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingabial
 
Index and abstract (3)
Index and abstract (3)Index and abstract (3)
Index and abstract (3)Iqra tasifali
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityStéphane Gamard
 
Lucandra
LucandraLucandra
Lucandraotisg
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneSwapnil & Patil
 
An introduction to inverted index
An introduction to inverted indexAn introduction to inverted index
An introduction to inverted indexweedge
 
Abstract and i ndexing
Abstract and i ndexingAbstract and i ndexing
Abstract and i ndexingMohit Kumar
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy SokolenkoProvectus
 
Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?Adrien Grand
 
Architecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's ThesisArchitecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's ThesisJosiane Gamgo
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
The search engine index
The search engine indexThe search engine index
The search engine indexCJ Jenkins
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 

Viewers also liked (20)

Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processing
 
Index and abstract (3)
Index and abstract (3)Index and abstract (3)
Index and abstract (3)
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalability
 
Lucene
LuceneLucene
Lucene
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
Lucandra
LucandraLucandra
Lucandra
 
Inverted index
Inverted indexInverted index
Inverted index
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
 
An introduction to inverted index
An introduction to inverted indexAn introduction to inverted index
An introduction to inverted index
 
Abstract and i ndexing
Abstract and i ndexingAbstract and i ndexing
Abstract and i ndexing
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
Architecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's ThesisArchitecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's Thesis
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
The search engine index
The search engine indexThe search engine index
The search engine index
 
Types Of Index Numbers
Types Of Index NumbersTypes Of Index Numbers
Types Of Index Numbers
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 

Similar to Index types

Similar to Index types (10)

Vgate 2014
Vgate 2014Vgate 2014
Vgate 2014
 
Bmw car parts london
Bmw car parts londonBmw car parts london
Bmw car parts london
 
Automotive2011
Automotive2011Automotive2011
Automotive2011
 
2010 wheel presentation
2010 wheel presentation2010 wheel presentation
2010 wheel presentation
 
Bmw car parts in london
Bmw car parts in londonBmw car parts in london
Bmw car parts in london
 
Bmw car parts in london
Bmw car parts in londonBmw car parts in london
Bmw car parts in london
 
Bmw car parts london
Bmw car parts londonBmw car parts london
Bmw car parts london
 
Bmw car parts london
Bmw car parts londonBmw car parts london
Bmw car parts london
 
Bmw car parts
Bmw car partsBmw car parts
Bmw car parts
 
Bmw car parts
Bmw car partsBmw car parts
Bmw car parts
 

Index types

  • 1. Index types: Inverted index id make year 0 toyota 1996 1 mazda 1996 2 toyota 1996 3 ford 2002 4 toyota 2002 5 mazda 2002 6 toyota 2002 7 toyota 2009 8 ford 2009
  • 2. Index types: Inverted index id make year Toyota -> 0, 2, 4, 6, 7 0 toyota 1996 Mazda -> 1, 5 1 mazda 1996 Ford -> 3, 8 2 toyota 1996 3 ford 2002 4 toyota 2002 5 mazda 2002 6 toyota 2002 7 toyota 2009 8 ford 2009
  • 3. Inverted index is cheap if the column is sorted id make year “1996”-> 0-2 0 toyota 1996 “2002”-> 3-6 1 mazda 1996 “2009”-> 7-8 2 toyota 1996 3 ford 2002 4 toyota 2002 2 integers per each unique value 5 mazda 2002 6 toyota 2002 7 toyota 2009 8 ford 2009
  • 4. Index types: Forward index id make year 0 toyota 1996 1 mazda 1996 2 toyota 1996 3 ford 2002 4 toyota 2002 5 mazda 2002 6 toyota 2002 7 toyota 2009 8 ford 2009
  • 5. Index types: Forward index Sorted values id make year array: 0 toyota 1996 Value Index 1 mazda 1996 ford 0 mazda 1 2 toyota 1996 toyota 2 3 ford 2002 4 toyota 2002 5 mazda 2002 6 toyota 2002 7 toyota 2009 8 ford 2009
  • 6. Index types: Forward index Sorted values Forward index for ‘make’ id make year array: column: 0 toyota 1996 Value Index id value id 1 mazda 1996 ford 0 0 2 mazda 1 1 1 2 toyota 1996 toyota 2 2 2 3 ford 2002 3 0 4 toyota 2002 4 2 5 mazda 2002 5 1 6 2 6 toyota 2002 7 2 7 toyota 2009 8 0 8 ford 2009
  • 7. How to compress the forward index Fixed bit size encoding • 1000 unique field values would require 10 bits per document • In general we need X bits per document, where x = log2(valueArray.length)
  • 8. Ways to save memory • Use dictionary compression • Avoid storing inverted index if the column isn’t sorted • Use fixed bit size encoding for Forward Index
  • 9. How much do we actually save in the real world use case? Column Type Column Type advertiserId int memberId int creativeId int industry int campaignId int region int campaignType String seniority String age char titles Int[] company int requestType String education int time int function String impressionCount int gender char
  • 10. Space requirements per document Sensei Other OLAP datastore Pinot Sensei >100 bytes ~100 bytes 16 bytes Other OLAP data store and regular Sensei do not compress indexes. We can fit 7 times more documents in RAM than Other OLAP datastore