SlideShare a Scribd company logo
1 of 100
Download to read offline
Tweets per day
Queries per day
Indexing latency
Avg. query response time
Earlybird - Realtime Search @twitter
Michael Busch
@michibusch
michael@twitter.com
buschmi@apache.org
Earlybird - Realtime Search @twitter

 Agenda

 ‣ Introduction

 - Search Architecture

 - Inverted Index 101

 - Memory Model & Concurrency

 - Top Tweets
Introduction
Introduction




• Twitter acquired Summize in 2008


• 1st gen search engine based on MySQL
Introduction




• Next gen search engine based on Lucene


• Improves scalability and performance by orders or magnitude


• Open Source
Realtime Search @twitter

 Agenda

 - Introduction

 ‣ Search Architecture

 - Inverted Index 101

 - Memory Model & Concurrency

 - Top Tweets
Search Architecture
Search Architecture


  Tweets
           Ingester




• Ingester pre-processes Tweets for search


• Geo-coding, URL expansion, tokenization, etc.
Search Architecture


  Tweets              Thrift
           Ingester            MySQL
                               Master          MySQL
                                               Slaves



• Tweets are serialized to MySQL in Thrift format
Earlybird


  Tweets              Thrift
           Ingester            MySQL
                               Master          MySQL    Earlybird
                                               Slaves    Index



• Earlybird reads from MySQL slaves


• Builds an in-memory inverted index in real time
Blender

                      Thrift
                                             Thrift
                               Blender
       Earlybird
        Index



• Blender is our Thrift service aggregator


• Queries multiple Earlybirds, merges results
Realtime Search @twitter

 Agenda

 - Introduction

 - Search Architecture

 ‣ Inverted Index 101

 - Memory Model & Concurrency

 - Top Tweets
Inverted Index 101
Inverted Index 101

 1   The old night keeper keeps the keep in the town

 2   In the big old house in the big old gown.

 3   The house in the town had the big old keep

 4   Where the old night keeper never did sleep.

 5   The night keeper keeps the keep in the night

 6   And keeps in the dark and sleeps in the light.


Table with 6 documents



Example from:
Justin Zobel , Alistair Moffat,
Inverted files for text search engines,
ACM Computing Surveys (CSUR)
v.38 n.2, p.6-es, 2006
Inverted Index 101

1   The old night keeper keeps the keep in the town      term     freq
2   In the big old house in the big old gown.             and       1    <6>
                                                           big      2    <2> <3>
3   The house in the town had the big old keep
                                                         dark       1    <6>
4   Where the old night keeper never did sleep.
                                                           did      1    <4>
5   The night keeper keeps the keep in the night         gown       1    <2>
6   And keeps in the dark and sleeps in the light.        had       1    <3>
                                                        house       2    <2> <3>
Table with 6 documents                                      in      5    <1> <2> <3> <5> <6>
                                                         keep       3    <1> <3> <5>
                                                        keeper      3    <1> <4> <5>
                                                        keeps       3    <1> <5> <6>
                                                          light     1    <6>
                                                         never      1    <4>
                                                         night      3    <1> <4> <5>
                                                           old      4    <1> <2> <3> <4>
                                                         sleep      1    <4>
                                                        sleeps      1    <6>
                                                           the      6    <1> <2> <3> <4> <5> <6>
                                                         town       2    <1> <3>
                                                        where       1    <4>
                                                      Dictionary and posting lists
Inverted Index 101                                                          Query: keeper

1   The old night keeper keeps the keep in the town      term     freq
2   In the big old house in the big old gown.             and       1    <6>
                                                           big      2    <2> <3>
3   The house in the town had the big old keep
                                                         dark       1    <6>
4   Where the old night keeper never did sleep.
                                                           did      1    <4>
5   The night keeper keeps the keep in the night         gown       1    <2>
6   And keeps in the dark and sleeps in the light.        had       1    <3>
                                                        house       2    <2> <3>
Table with 6 documents                                      in      5    <1> <2> <3> <5> <6>
                                                         keep       3    <1> <3> <5>
                                                        keeper      3    <1> <4> <5>
                                                        keeps       3    <1> <5> <6>
                                                          light     1    <6>
                                                         never      1    <4>
                                                         night      3    <1> <4> <5>
                                                           old      4    <1> <2> <3> <4>
                                                         sleep      1    <4>
                                                        sleeps      1    <6>
                                                           the      6    <1> <2> <3> <4> <5> <6>
                                                         town       2    <1> <3>
                                                        where       1    <4>
                                                      Dictionary and posting lists
Inverted Index 101                                                         Query: keeper

1   The old night keeper keeps the keep in the town      term    freq
2   In the big old house in the big old gown.             and      1    <6>
                                                          big      2    <2> <3>
3   The house in the town had the big old keep
                                                         dark      1    <6>
4   Where the old night keeper never did sleep.
                                                          did      1    <4>
5   The night keeper keeps the keep in the night        gown       1    <2>
6   And keeps in the dark and sleeps in the light.        had      1    <3>
                                                        house      2    <2> <3>
Table with 6 documents                                     in      5    <1> <2> <3> <5> <6>
                                                        keep       3    <1> <3> <5>
                                                       keeper      3    <1> <4> <5>
                                                        keeps      3    <1> <5> <6>
                                                         light     1    <6>
                                                        never      1    <4>
                                                        night      3    <1> <4> <5>
                                                          old      4    <1> <2> <3> <4>
                                                        sleep      1    <4>
                                                       sleeps      1    <6>
                                                          the      6    <1> <2> <3> <4> <5> <6>
                                                        town       2    <1> <3>
                                                        where      1    <4>
                                                      Dictionary and posting lists
Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090



Delta encoding:    5 10 8985     2    90998    90
Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090



Delta encoding:     5 10 8985        2    90998    90




VInt compression:   00000101




                               Values 0 <= delta <= 127 need
                                         one byte
Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090



Delta encoding:     5 10 8985     2       90998   90




VInt compression:     11000110 00011001




                            Values 128 <= delta <= 16384
                                   need two bytes
Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090



Delta encoding:     5 10 8985      2      90998    90




VInt compression:     11000110 00011001




                      First bit indicates whether next
                      byte belongs to the same value
Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090



Delta encoding:     5 10 8985     2       90998   90




VInt compression:     11000110 00011001




• Variable number of bytes - a VInt-encoded posting can not be written as a
  primitive Java type; therefore it can not be written atomically
Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090



Delta encoding:     5 10 8985     2   90998    90



                          Read direction


• Each posting depends on previous one; decoding only possible in old-to-new
  direction


• With recency ranking (new-to-old) no early termination is possible
Posting list encoding

• By default Lucene uses a combination of delta encoding and VInt
  compression


• VInts are expensive to decode


• Problem 1: How to traverse posting lists backwards?


• Problem 2: How to write a posting atomically?
Posting list encoding in Earlybird

int (32 bits)




                      docID                      textPosition
                     24 bits                        8 bits
                    max. 16.7M                     max. 255




• Tweet text can only have 140 chars


• Decoding speed significantly improved compared to delta and VInt decoding
  (early experiments suggest 5x improvement compared to vanilla Lucene with
  FSDirectory)
Posting list encoding in Earlybird

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090



Earlybird encoding:


         5            15     9000       9002     100000   100090



                             Read direction
Early query termination

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090



Earlybird encoding:


         5            15       9000         9002           100000   100090



                                                                Read direction
                       E.g. 3 result are requested: Here
                       we can terminate after reading 3
                                     postings
Posting list encoding - Summary

• ints can be written atomically in Java


• Backwards traversal easy on absolute docIDs (not deltas)


• Every posting is a possible entry point for a searcher


• Skipping can be done without additional data structures as binary search,
  even though there are better approaches which should be explored


• On tweet indexes we need about 30% more storage for docIDs compared to
  delta+Vints; compensated by compression of complete segments


• Max. segment size: 2^24 = 16.7M tweets
Realtime Search @twitter

 Agenda

 - Introduction

 - Search Architecture

 - Inverted Index 101

 ‣ Memory Model & Concurrency

 - Top Tweets
Memory Model &
 Concurrency
Inverted index components

         Posting list storage



                                ?
  Dictionary
Inverted index components

         Posting list storage



                                ?
  Dictionary
Inverted Index

1   The old night keeper keeps the keep in the town      term       freq
2   In the big old house in the big old gown.             and         1 <6>
                                                           big        2 <2> <3>
3   The house in the town had the big old keep
                                                         dark        Per <6> we store different
                                                                      1 term
4   Where the old night keeper never did sleep.
                                                           did    kinds<4>metadata: text pointer,
                                                                      1 of
5   The night keeper keeps the keep in the night         gown         1 <2>
                                                                  frequency, postings pointer, etc.
6   And keeps in the dark and sleeps in the light.        had         1 <3>
                                                        house         2 <2> <3>
Table with 6 documents                                      in        5 <1> <2> <3> <5> <6>
                                                         keep         3 <1> <3> <5>
                                                        keeper        3 <1> <4> <5>
                                                        keeps         3 <1> <5> <6>
                                                          light       1 <6>
                                                         never        1 <4>
                                                         night        3 <1> <4> <5>
                                                           old        4 <1> <2> <3> <4>
                                                         sleep        1 <4>
                                                        sleeps        1 <6>
                                                           the        6 <1> <2> <3> <4> <5> <6>
                                                         town         2 <1> <3>
                                                        where         1 <4>
                                                      Dictionary and posting lists
Term dictionary
termID
int[]      textPointer;       postingsPointer;   frequency;
              int[]                int[]           int[]
            0
            1
            2
            3
            4
            5
            6




             term text pool
Term dictionary
termID
int[]      textPointer;   postingsPointer;   frequency;
              int[]            int[]           int[]
            0      t0            p0             f0
            1
            2
            3
            4
            5
            6
  0



             t0
                c a t
Term dictionary
termID
int[]      textPointer;       postingsPointer;   frequency;
              int[]                int[]           int[]
            0      t0                p0             f0
  1
            1      t1                p1             f1
            2
            3
            4
            5
            6
  0



             t0         t1
                c a t f o o
Term dictionary
termID
int[]      textPointer;           postingsPointer;   frequency;
              int[]                    int[]           int[]
            0      t0                    p0             f0
  1
            1      t1                    p1             f1
            2      t2                    p2             f2
            3
            4
            5
            6
  0


  2
             t0         t1   t2
                c a t f o o b a r
Term dictionary

• Number of objects << number of terms


• O(1) lookups


• Easy to store more term metadata by adding additional parallel arrays
Inverted index components

         Posting list storage



                                    ?
  Dictionary      Parallel arrays



                                        pointer to the most recently
                                        indexed posting for a term
Inverted index components

         Posting list storage



                                    ?
  Dictionary      Parallel arrays



                                        pointer to the most recently
                                        indexed posting for a term
Posting lists storage - Objectives

• Store many single-linked lists of different lengths space-efficiently


• The number of java objects should be independent of the number of lists or
  number of items in the lists


• Every item should be a possible entry point into the lists for iterators, i.e.
  items should not be dependent on other items (e.g. no delta encoding)


• Append and read possible by multiple threads in a lock-free fashion (single
  append thread, multiple reader threads)


• Traversal in backwards order
Memory management

4 int[]
 pools




                  = 32K int[]
Memory management

4 int[]
 pools



                                 Each pool can
                                   be grown
                  = 32K int[]   individually by
                                  adding 32K
                                     blocks
Memory management

4 int[]
 pools




  • For simplicity we can forget about the blocks for now and think of the pools
    as continuous, unbounded int[] arrays


  • Small total number of Java objects (each 32K block is one object)
Memory management
 slice size
     211
     27
     24
     21


• Slices can be allocated in each pool


• Each pool has a different, but fixed slice size
Adding and appending to a list
 slice size
     211
    27                           available
    24                           allocated
    21                           current list
Adding and appending to a list
 slice size
     211
    27                                 available
    24                                 allocated
    21                                 current list



                Store first two
              postings in this slice
Adding and appending to a list
 slice size
     211
     27                                                        available
     24                                                        allocated
     21                                                        current list



When first slice is full, allocate another one in second pool
Adding and appending to a list
 slice size
     211
    27                                                  available
    24                                                  allocated
    21                                                  current list



         Allocate a slice on each level as list grows
Adding and appending to a list
 slice size
     211
    27                                                           available
    24                                                           allocated
    21                                                           current list




                   On upper most level one list can own multiple slices
Posting list format

int (32 bits)




                      docID                      textPosition
                     24 bits                        8 bits
                    max. 16.7M                     max. 255




• Tweet text can only have 140 chars


• Decoding speed significantly improved compared to delta and VInt decoding
  (early experiments suggest 5x improvement compared to vanilla Lucene with
  FSDirectory)
Addressing items

• Use 32 bit (int) pointers to address any item in any list unambiguously:



     int (32 bits)




                 poolIndex         sliceIndex           offset in slice
                  2 bits           19-29 bits              1-11 bits
                    0-3          depends on pool        depends on pool



• Nice symmetry: Postings and address pointers both fit into a 32 bit int
Linking the slices
 slice size
     211
    27               available
    24               allocated
    21               current list
Linking the slices
 slice size
     211
    27                                                                        available
    24                                                                        allocated
    21                                                                        current list


   Dictionary   Parallel arrays




                                  pointer to the last posting indexed for a term
Concurrency - Definitions

• Pessimistic locking


     • A thread holds an exclusive lock on a resource, while an action is
       performed [mutual exclusion]


     • Usually used when conflicts are expected to be likely


• Optimistic locking


     • Operations are tried to be performed atomically without holding a lock;
       conflicts can be detected; retry logic is often used in case of conflicts


     • Usually used when conflicts are expected to be the exception
Concurrency - Definitions

• Non-blocking algorithm


 Ensures, that threads competing for shared resources do not have their
 execution indefinitely postponed by mutual exclusion.


• Lock-free algorithm


 A non-blocking algorithm is lock-free if there is guaranteed system-wide
 progress.


• Wait-free algorithm


 A non-blocking algorithm is wait-free, if there is guaranteed per-thread
 progress.

                                                                   * Source: Wikipedia
Concurrency

• Having a single writer thread simplifies our problem: no locks have to be used
  to protect data structures from corruption (only one thread modifies data)


• But: we have to make sure that all readers always see a consistent state of
  all data structures -> this is much harder than it sounds!


• In Java, it is not guaranteed that one thread will see changes that another
  thread makes in program execution order, unless the same memory barrier is
  crossed by both threads -> safe publication


• Safe publication can be achieved in different, subtle ways. Read the great
  book “Java concurrency in practice” by Brian Goetz for more information!
Java Memory Model

• Program order rule


 Each action in a thread happens-before every action in that thread that comes
 later in the program order.


• Volatile variable rule


 A write to a volatile field happens-before every subsequent read of that same
 field.


• Transitivity


 If A happens-before B, and B happens-before C, then A happens-before C.


                                               * Source: Brian Goetz: Java Concurrency in Practice
Concurrency

                            Thread 1   Thread 2


Cache


RAM              0   time



        int x;
Concurrency
                     Thread A writes x=5 to cache
                                                    Thread 1   Thread 2


Cache                 5                               x = 5;




RAM              0                         time



        int x;
Concurrency

                                Thread 1             Thread 2


Cache                5            x = 5;


                                                     while(x != 5);
RAM              0       time



        int x;


                                 This condition will likely
                                   never become false!
Concurrency

                            Thread 1   Thread 2


Cache


RAM              0   time



        int x;
Concurrency
                     Thread A writes b=1 to RAM,
                         because b is volatile
                                                   Thread 1   Thread 2


Cache                  5                             x = 5;

                                                     b = 1;

RAM              0     1                   time



        int x;

                 volatile int b;
Concurrency

                                            Thread 1     Thread 2


Cache                  5                       x = 5;

                                              b = 1;

RAM              0     1           time
                                                        int dummy = b;
                                                        while(x != 5);

        int x;
                                   Read volatile b
                 volatile int b;
Concurrency

                                          Thread 1              Thread 2


Cache                  5                    x = 5;

                                            b = 1;

RAM              0     1           time
                                                               int dummy = b;
                                                               while(x != 5);

        int x;

                 volatile int b;          happens-before



• Program order rule: Each action in a thread happens-before every action in
  that thread that comes later in the program order.
Concurrency

                                            Thread 1               Thread 2


Cache                  5                      x = 5;

                                              b = 1;

RAM              0     1             time
                                                                  int dummy = b;
                                                                  while(x != 5);

        int x;

                 volatile int b;             happens-before



• Volatile variable rule: A write to a volatile field happens-before every
  subsequent read of that same field.
Concurrency

                                          Thread 1              Thread 2


Cache                  5                    x = 5;

                                            b = 1;

RAM              0     1           time
                                                               int dummy = b;
                                                               while(x != 5);

        int x;

                 volatile int b;           happens-before



• Transitivity: If A happens-before B, and B happens-before C, then A
  happens-before C.
Concurrency

                                            Thread 1               Thread 2


Cache                  5                       x = 5;

                                              b = 1;

RAM              0     1             time
                                                                  int dummy = b;
                                                                  while(x != 5);

        int x;

                 volatile int b;
                                                This condition will be
                                                   false, i.e. x==5

• Note: x itself doesn’t have to be volatile. There can be many variables like x,
  but we need only a single volatile field.
Concurrency

                                             Thread 1              Thread 2


Cache                  5                       x = 5;

                                               b = 1;

RAM              0     1             time
                                                                  int dummy = b;
                                                                  while(x != 5);

        int x;                              Memory barrier
                 volatile int b;



• Note: x itself doesn’t have to be volatile. There can be many variables like x,
  but we need only a single volatile field.
Demo
Concurrency

                                             Thread 1              Thread 2


Cache                  5                       x = 5;

                                               b = 1;

RAM              0     1             time
                                                                  int dummy = b;
                                                                  while(x != 5);

        int x;                              Memory barrier
                 volatile int b;



• Note: x itself doesn’t have to be volatile. There can be many variables like x,
  but we need only a single volatile field.
Concurrency

                   IndexWriter       IndexReader


                   write 100 docs

                   maxDoc = 100

            time
                   write more docs   in IR.open(): read maxDoc
                                     search upto maxDoc

   maxDoc is volatile
Concurrency

                       IndexWriter          IndexReader


                       write 100 docs

                       maxDoc = 100

                time
                       write more docs      in IR.open(): read maxDoc
                                            search upto maxDoc

       maxDoc is volatile

                                           happens-before



• Only maxDoc is volatile. All other fields that IW writes to and IR reads from
  don’t need to be!
Wait-free

• Not a single exclusive lock


• Writer thread can always make progress


• Optimistic locking (retry-logic) in a few places for searcher thread


• Retry logic very simple and guaranteed to always make progress
Realtime Search @twitter

 Agenda

 - Introduction

 - Search Architecture

 - Inverted Index 101

 - Memory Model & Concurrency

 ‣ Top Tweets
Top Tweets
Signals

• Query-dependent


   • E.g. Lucene text score, language


• Query-independent


   • Static signals (e.g. text quality)


   • Dynamic signals (e.g. retweets)


   • Timeliness
Signals




                Task: Find best tweet in billions
                      of tweets efficiently




          Many Earlybird segments (8M documents each)
Signals

• For top tweets we can’t early terminate as efficiently


• Scoring and ranking billions of tweets is impractical




                   Many Earlybird segments (8M documents each)
Query cache




       Idea: Mark all tweets with high
    query-independent scores and only
     visit those for top tweets queries


                                                             Query results cache
                                                             High static score doc




               Many Earlybird segments (8M documents each)
Query cache

• A background thread periodically wakes up, executes queries, and stores the
  results in the per-segment cache
                                              This clause will be executed as a
• Rewriting queries                          Lucene ConstantScoreQuery that
                                              wraps a BitSet or SortedVIntList

     • User query: q = ‘lucene’


     • Rewritten: q’ = ‘lucene AND cached_filter:toptweets’


• Efficient skipping over tweets with low query-independent scores




                  Many Earlybird segments (8M documents each)
Query cache

• A background thread periodically wakes up, executes queries, and stores the
  results in the per-segment cache


• Configurable per cached query:


 • Result set type: BitSet, SortedVIntList


 • Execution schedule


 • Filter mode: cache-only or hybrid




                  Many Earlybird segments (8M documents each)
Query cache

 • Result set type: BitSet, SortedVIntList


     • BitSet for results with many hits


     • SortedVIntList for very sparse results




                  Many Earlybird segments (8M documents each)
Query cache

 • Execution schedule


    • Per segment: Sleep time between refreshing the cached results




                Many Earlybird segments (8M documents each)
Query cache

 • Filter mode: cache-only or hybrid




                Many Earlybird segments (8M documents each)
Query cache

 • Filter mode: cache-only or hybrid




                                       Read direction
Query cache

 • Filter mode: cache-only or hybrid
                                         Partially filled segment
                                       (IndexWriter is currently
                                               appending)




                                               Read direction
Query cache

 • Filter mode: cache-only or hybrid
                                       Query cache can only be
                                       computed until current
                                             maxDocID




                                              Read direction
Query cache

 • Filter mode: cache-only or hybrid
                                       Some time later: More docs in
                                       segment, but query cache was
                                             not updated yet




                                                 Read direction
Query cache

 • Filter mode: cache-only or hybrid
                                         Cache-only mode: Ignore
                                       documents added to segment
                        Search range     since cache was updated




                                                        Read direction
Query cache

 • Filter mode: cache-only or hybrid
                                          Hybrid mode: Fall back to
                                       original query and execute it on
                        Search range           latest documents




                                                           Read direction
Query cache

 • Filter mode: cache-only or hybrid
                                       Use query cache up to the
                                          cache’s maxDocID
                        Search range




                                                       Read direction
Query result cache




query cache config yml file




                   Many Earlybird segments (8M documents each)
Questions?

More Related Content

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 

Recently uploaded (20)

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 

Realtime Search at Twitter - Michael Busch

  • 1.
  • 6. Earlybird - Realtime Search @twitter Michael Busch @michibusch michael@twitter.com buschmi@apache.org
  • 7. Earlybird - Realtime Search @twitter Agenda ‣ Introduction - Search Architecture - Inverted Index 101 - Memory Model & Concurrency - Top Tweets
  • 9. Introduction • Twitter acquired Summize in 2008 • 1st gen search engine based on MySQL
  • 10. Introduction • Next gen search engine based on Lucene • Improves scalability and performance by orders or magnitude • Open Source
  • 11. Realtime Search @twitter Agenda - Introduction ‣ Search Architecture - Inverted Index 101 - Memory Model & Concurrency - Top Tweets
  • 13. Search Architecture Tweets Ingester • Ingester pre-processes Tweets for search • Geo-coding, URL expansion, tokenization, etc.
  • 14. Search Architecture Tweets Thrift Ingester MySQL Master MySQL Slaves • Tweets are serialized to MySQL in Thrift format
  • 15. Earlybird Tweets Thrift Ingester MySQL Master MySQL Earlybird Slaves Index • Earlybird reads from MySQL slaves • Builds an in-memory inverted index in real time
  • 16. Blender Thrift Thrift Blender Earlybird Index • Blender is our Thrift service aggregator • Queries multiple Earlybirds, merges results
  • 17. Realtime Search @twitter Agenda - Introduction - Search Architecture ‣ Inverted Index 101 - Memory Model & Concurrency - Top Tweets
  • 19. Inverted Index 101 1 The old night keeper keeps the keep in the town 2 In the big old house in the big old gown. 3 The house in the town had the big old keep 4 Where the old night keeper never did sleep. 5 The night keeper keeps the keep in the night 6 And keeps in the dark and sleeps in the light. Table with 6 documents Example from: Justin Zobel , Alistair Moffat, Inverted files for text search engines, ACM Computing Surveys (CSUR) v.38 n.2, p.6-es, 2006
  • 20. Inverted Index 101 1 The old night keeper keeps the keep in the town term freq 2 In the big old house in the big old gown. and 1 <6> big 2 <2> <3> 3 The house in the town had the big old keep dark 1 <6> 4 Where the old night keeper never did sleep. did 1 <4> 5 The night keeper keeps the keep in the night gown 1 <2> 6 And keeps in the dark and sleeps in the light. had 1 <3> house 2 <2> <3> Table with 6 documents in 5 <1> <2> <3> <5> <6> keep 3 <1> <3> <5> keeper 3 <1> <4> <5> keeps 3 <1> <5> <6> light 1 <6> never 1 <4> night 3 <1> <4> <5> old 4 <1> <2> <3> <4> sleep 1 <4> sleeps 1 <6> the 6 <1> <2> <3> <4> <5> <6> town 2 <1> <3> where 1 <4> Dictionary and posting lists
  • 21. Inverted Index 101 Query: keeper 1 The old night keeper keeps the keep in the town term freq 2 In the big old house in the big old gown. and 1 <6> big 2 <2> <3> 3 The house in the town had the big old keep dark 1 <6> 4 Where the old night keeper never did sleep. did 1 <4> 5 The night keeper keeps the keep in the night gown 1 <2> 6 And keeps in the dark and sleeps in the light. had 1 <3> house 2 <2> <3> Table with 6 documents in 5 <1> <2> <3> <5> <6> keep 3 <1> <3> <5> keeper 3 <1> <4> <5> keeps 3 <1> <5> <6> light 1 <6> never 1 <4> night 3 <1> <4> <5> old 4 <1> <2> <3> <4> sleep 1 <4> sleeps 1 <6> the 6 <1> <2> <3> <4> <5> <6> town 2 <1> <3> where 1 <4> Dictionary and posting lists
  • 22. Inverted Index 101 Query: keeper 1 The old night keeper keeps the keep in the town term freq 2 In the big old house in the big old gown. and 1 <6> big 2 <2> <3> 3 The house in the town had the big old keep dark 1 <6> 4 Where the old night keeper never did sleep. did 1 <4> 5 The night keeper keeps the keep in the night gown 1 <2> 6 And keeps in the dark and sleeps in the light. had 1 <3> house 2 <2> <3> Table with 6 documents in 5 <1> <2> <3> <5> <6> keep 3 <1> <3> <5> keeper 3 <1> <4> <5> keeps 3 <1> <5> <6> light 1 <6> never 1 <4> night 3 <1> <4> <5> old 4 <1> <2> <3> <4> sleep 1 <4> sleeps 1 <6> the 6 <1> <2> <3> <4> <5> <6> town 2 <1> <3> where 1 <4> Dictionary and posting lists
  • 23. Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
  • 24. Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Delta encoding: 5 10 8985 2 90998 90
  • 25. Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Delta encoding: 5 10 8985 2 90998 90 VInt compression: 00000101 Values 0 <= delta <= 127 need one byte
  • 26. Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Delta encoding: 5 10 8985 2 90998 90 VInt compression: 11000110 00011001 Values 128 <= delta <= 16384 need two bytes
  • 27. Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Delta encoding: 5 10 8985 2 90998 90 VInt compression: 11000110 00011001 First bit indicates whether next byte belongs to the same value
  • 28. Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Delta encoding: 5 10 8985 2 90998 90 VInt compression: 11000110 00011001 • Variable number of bytes - a VInt-encoded posting can not be written as a primitive Java type; therefore it can not be written atomically
  • 29. Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Delta encoding: 5 10 8985 2 90998 90 Read direction • Each posting depends on previous one; decoding only possible in old-to-new direction • With recency ranking (new-to-old) no early termination is possible
  • 30. Posting list encoding • By default Lucene uses a combination of delta encoding and VInt compression • VInts are expensive to decode • Problem 1: How to traverse posting lists backwards? • Problem 2: How to write a posting atomically?
  • 31. Posting list encoding in Earlybird int (32 bits) docID textPosition 24 bits 8 bits max. 16.7M max. 255 • Tweet text can only have 140 chars • Decoding speed significantly improved compared to delta and VInt decoding (early experiments suggest 5x improvement compared to vanilla Lucene with FSDirectory)
  • 32. Posting list encoding in Earlybird Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Earlybird encoding: 5 15 9000 9002 100000 100090 Read direction
  • 33. Early query termination Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Earlybird encoding: 5 15 9000 9002 100000 100090 Read direction E.g. 3 result are requested: Here we can terminate after reading 3 postings
  • 34. Posting list encoding - Summary • ints can be written atomically in Java • Backwards traversal easy on absolute docIDs (not deltas) • Every posting is a possible entry point for a searcher • Skipping can be done without additional data structures as binary search, even though there are better approaches which should be explored • On tweet indexes we need about 30% more storage for docIDs compared to delta+Vints; compensated by compression of complete segments • Max. segment size: 2^24 = 16.7M tweets
  • 35. Realtime Search @twitter Agenda - Introduction - Search Architecture - Inverted Index 101 ‣ Memory Model & Concurrency - Top Tweets
  • 36. Memory Model & Concurrency
  • 37. Inverted index components Posting list storage ? Dictionary
  • 38. Inverted index components Posting list storage ? Dictionary
  • 39. Inverted Index 1 The old night keeper keeps the keep in the town term freq 2 In the big old house in the big old gown. and 1 <6> big 2 <2> <3> 3 The house in the town had the big old keep dark Per <6> we store different 1 term 4 Where the old night keeper never did sleep. did kinds<4>metadata: text pointer, 1 of 5 The night keeper keeps the keep in the night gown 1 <2> frequency, postings pointer, etc. 6 And keeps in the dark and sleeps in the light. had 1 <3> house 2 <2> <3> Table with 6 documents in 5 <1> <2> <3> <5> <6> keep 3 <1> <3> <5> keeper 3 <1> <4> <5> keeps 3 <1> <5> <6> light 1 <6> never 1 <4> night 3 <1> <4> <5> old 4 <1> <2> <3> <4> sleep 1 <4> sleeps 1 <6> the 6 <1> <2> <3> <4> <5> <6> town 2 <1> <3> where 1 <4> Dictionary and posting lists
  • 40. Term dictionary termID int[] textPointer; postingsPointer; frequency; int[] int[] int[] 0 1 2 3 4 5 6 term text pool
  • 41. Term dictionary termID int[] textPointer; postingsPointer; frequency; int[] int[] int[] 0 t0 p0 f0 1 2 3 4 5 6 0 t0 c a t
  • 42. Term dictionary termID int[] textPointer; postingsPointer; frequency; int[] int[] int[] 0 t0 p0 f0 1 1 t1 p1 f1 2 3 4 5 6 0 t0 t1 c a t f o o
  • 43. Term dictionary termID int[] textPointer; postingsPointer; frequency; int[] int[] int[] 0 t0 p0 f0 1 1 t1 p1 f1 2 t2 p2 f2 3 4 5 6 0 2 t0 t1 t2 c a t f o o b a r
  • 44. Term dictionary • Number of objects << number of terms • O(1) lookups • Easy to store more term metadata by adding additional parallel arrays
  • 45. Inverted index components Posting list storage ? Dictionary Parallel arrays pointer to the most recently indexed posting for a term
  • 46. Inverted index components Posting list storage ? Dictionary Parallel arrays pointer to the most recently indexed posting for a term
  • 47. Posting lists storage - Objectives • Store many single-linked lists of different lengths space-efficiently • The number of java objects should be independent of the number of lists or number of items in the lists • Every item should be a possible entry point into the lists for iterators, i.e. items should not be dependent on other items (e.g. no delta encoding) • Append and read possible by multiple threads in a lock-free fashion (single append thread, multiple reader threads) • Traversal in backwards order
  • 48. Memory management 4 int[] pools = 32K int[]
  • 49. Memory management 4 int[] pools Each pool can be grown = 32K int[] individually by adding 32K blocks
  • 50. Memory management 4 int[] pools • For simplicity we can forget about the blocks for now and think of the pools as continuous, unbounded int[] arrays • Small total number of Java objects (each 32K block is one object)
  • 51. Memory management slice size 211 27 24 21 • Slices can be allocated in each pool • Each pool has a different, but fixed slice size
  • 52. Adding and appending to a list slice size 211 27 available 24 allocated 21 current list
  • 53. Adding and appending to a list slice size 211 27 available 24 allocated 21 current list Store first two postings in this slice
  • 54. Adding and appending to a list slice size 211 27 available 24 allocated 21 current list When first slice is full, allocate another one in second pool
  • 55. Adding and appending to a list slice size 211 27 available 24 allocated 21 current list Allocate a slice on each level as list grows
  • 56. Adding and appending to a list slice size 211 27 available 24 allocated 21 current list On upper most level one list can own multiple slices
  • 57. Posting list format int (32 bits) docID textPosition 24 bits 8 bits max. 16.7M max. 255 • Tweet text can only have 140 chars • Decoding speed significantly improved compared to delta and VInt decoding (early experiments suggest 5x improvement compared to vanilla Lucene with FSDirectory)
  • 58. Addressing items • Use 32 bit (int) pointers to address any item in any list unambiguously: int (32 bits) poolIndex sliceIndex offset in slice 2 bits 19-29 bits 1-11 bits 0-3 depends on pool depends on pool • Nice symmetry: Postings and address pointers both fit into a 32 bit int
  • 59. Linking the slices slice size 211 27 available 24 allocated 21 current list
  • 60. Linking the slices slice size 211 27 available 24 allocated 21 current list Dictionary Parallel arrays pointer to the last posting indexed for a term
  • 61. Concurrency - Definitions • Pessimistic locking • A thread holds an exclusive lock on a resource, while an action is performed [mutual exclusion] • Usually used when conflicts are expected to be likely • Optimistic locking • Operations are tried to be performed atomically without holding a lock; conflicts can be detected; retry logic is often used in case of conflicts • Usually used when conflicts are expected to be the exception
  • 62. Concurrency - Definitions • Non-blocking algorithm Ensures, that threads competing for shared resources do not have their execution indefinitely postponed by mutual exclusion. • Lock-free algorithm A non-blocking algorithm is lock-free if there is guaranteed system-wide progress. • Wait-free algorithm A non-blocking algorithm is wait-free, if there is guaranteed per-thread progress. * Source: Wikipedia
  • 63. Concurrency • Having a single writer thread simplifies our problem: no locks have to be used to protect data structures from corruption (only one thread modifies data) • But: we have to make sure that all readers always see a consistent state of all data structures -> this is much harder than it sounds! • In Java, it is not guaranteed that one thread will see changes that another thread makes in program execution order, unless the same memory barrier is crossed by both threads -> safe publication • Safe publication can be achieved in different, subtle ways. Read the great book “Java concurrency in practice” by Brian Goetz for more information!
  • 64. Java Memory Model • Program order rule Each action in a thread happens-before every action in that thread that comes later in the program order. • Volatile variable rule A write to a volatile field happens-before every subsequent read of that same field. • Transitivity If A happens-before B, and B happens-before C, then A happens-before C. * Source: Brian Goetz: Java Concurrency in Practice
  • 65. Concurrency Thread 1 Thread 2 Cache RAM 0 time int x;
  • 66. Concurrency Thread A writes x=5 to cache Thread 1 Thread 2 Cache 5 x = 5; RAM 0 time int x;
  • 67. Concurrency Thread 1 Thread 2 Cache 5 x = 5; while(x != 5); RAM 0 time int x; This condition will likely never become false!
  • 68. Concurrency Thread 1 Thread 2 Cache RAM 0 time int x;
  • 69. Concurrency Thread A writes b=1 to RAM, because b is volatile Thread 1 Thread 2 Cache 5 x = 5; b = 1; RAM 0 1 time int x; volatile int b;
  • 70. Concurrency Thread 1 Thread 2 Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; Read volatile b volatile int b;
  • 71. Concurrency Thread 1 Thread 2 Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; volatile int b; happens-before • Program order rule: Each action in a thread happens-before every action in that thread that comes later in the program order.
  • 72. Concurrency Thread 1 Thread 2 Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; volatile int b; happens-before • Volatile variable rule: A write to a volatile field happens-before every subsequent read of that same field.
  • 73. Concurrency Thread 1 Thread 2 Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; volatile int b; happens-before • Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.
  • 74. Concurrency Thread 1 Thread 2 Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; volatile int b; This condition will be false, i.e. x==5 • Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
  • 75. Concurrency Thread 1 Thread 2 Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; Memory barrier volatile int b; • Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
  • 76. Demo
  • 77. Concurrency Thread 1 Thread 2 Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; Memory barrier volatile int b; • Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
  • 78. Concurrency IndexWriter IndexReader write 100 docs maxDoc = 100 time write more docs in IR.open(): read maxDoc search upto maxDoc maxDoc is volatile
  • 79. Concurrency IndexWriter IndexReader write 100 docs maxDoc = 100 time write more docs in IR.open(): read maxDoc search upto maxDoc maxDoc is volatile happens-before • Only maxDoc is volatile. All other fields that IW writes to and IR reads from don’t need to be!
  • 80. Wait-free • Not a single exclusive lock • Writer thread can always make progress • Optimistic locking (retry-logic) in a few places for searcher thread • Retry logic very simple and guaranteed to always make progress
  • 81. Realtime Search @twitter Agenda - Introduction - Search Architecture - Inverted Index 101 - Memory Model & Concurrency ‣ Top Tweets
  • 83. Signals • Query-dependent • E.g. Lucene text score, language • Query-independent • Static signals (e.g. text quality) • Dynamic signals (e.g. retweets) • Timeliness
  • 84. Signals Task: Find best tweet in billions of tweets efficiently Many Earlybird segments (8M documents each)
  • 85. Signals • For top tweets we can’t early terminate as efficiently • Scoring and ranking billions of tweets is impractical Many Earlybird segments (8M documents each)
  • 86. Query cache Idea: Mark all tweets with high query-independent scores and only visit those for top tweets queries Query results cache High static score doc Many Earlybird segments (8M documents each)
  • 87. Query cache • A background thread periodically wakes up, executes queries, and stores the results in the per-segment cache This clause will be executed as a • Rewriting queries Lucene ConstantScoreQuery that wraps a BitSet or SortedVIntList • User query: q = ‘lucene’ • Rewritten: q’ = ‘lucene AND cached_filter:toptweets’ • Efficient skipping over tweets with low query-independent scores Many Earlybird segments (8M documents each)
  • 88. Query cache • A background thread periodically wakes up, executes queries, and stores the results in the per-segment cache • Configurable per cached query: • Result set type: BitSet, SortedVIntList • Execution schedule • Filter mode: cache-only or hybrid Many Earlybird segments (8M documents each)
  • 89. Query cache • Result set type: BitSet, SortedVIntList • BitSet for results with many hits • SortedVIntList for very sparse results Many Earlybird segments (8M documents each)
  • 90. Query cache • Execution schedule • Per segment: Sleep time between refreshing the cached results Many Earlybird segments (8M documents each)
  • 91. Query cache • Filter mode: cache-only or hybrid Many Earlybird segments (8M documents each)
  • 92. Query cache • Filter mode: cache-only or hybrid Read direction
  • 93. Query cache • Filter mode: cache-only or hybrid Partially filled segment (IndexWriter is currently appending) Read direction
  • 94. Query cache • Filter mode: cache-only or hybrid Query cache can only be computed until current maxDocID Read direction
  • 95. Query cache • Filter mode: cache-only or hybrid Some time later: More docs in segment, but query cache was not updated yet Read direction
  • 96. Query cache • Filter mode: cache-only or hybrid Cache-only mode: Ignore documents added to segment Search range since cache was updated Read direction
  • 97. Query cache • Filter mode: cache-only or hybrid Hybrid mode: Fall back to original query and execute it on Search range latest documents Read direction
  • 98. Query cache • Filter mode: cache-only or hybrid Use query cache up to the cache’s maxDocID Search range Read direction
  • 99. Query result cache query cache config yml file Many Earlybird segments (8M documents each)