Search at Twitter
Upcoming SlideShare
Loading in...5
×
 

Search at Twitter

on

  • 1,744 views

 

Statistics

Views

Total Views
1,744
Views on SlideShare
699
Embed Views
1,045

Actions

Likes
6
Downloads
60
Comments
0

5 Embeds 1,045

http://www.lucenerevolution.org 1026
http://lucenerevolution.org 16
http://www.lucenerevolution.com 1
https://twitter.com 1
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Search at Twitter Search at Twitter Presentation Transcript

  • Search @twitter Michael Busch @michibusch michael@twitter.com buschmi@apache.org 1
  • Search @twitter Agenda ‣ Introduction - Search Architecture - Inverted Index 101 - Realtime Posting Lists 2
  • Introduction 3
  • Introduction Twitter has more than 230 million monthly active users. 4
  • Introduction 500 million tweets are sent per day. 5
  • Introduction More than 300 billion tweets have been sent since company founding in 2006. 6
  • Introduction Tweets-per-second world record: 33,388 TPS. 7
  • Introduction More than 2 billion search queries per day. 8
  • Introduction 2008 Twitter acquires Summize (MySQL-based RT search engine) 2009 2010 Modified Lucene (Earlybird) ships and replaces MySQL indexes 2011 New Earlybird features: image/video search; index compression; efficient relevance search in time-sorted index 2012 2013 2014 Tweet archive search on SSD with vanilla Lucene New RT posting list format that supports arbitrary document lengths, but keeps performance optimizations for tweets 9
  • Introduction 2008 Twitter acquires Summize (MySQL-based RT search engine) 2009 2010 Modified Lucene (Earlybird) ships and replaces MySQL indexes 2011 New Earlybird features: image/video search; index compression; efficient relevance search in time-sorted index 2012 2013 2014 Tweet archive search on SSD with vanilla Lucene New RT posting list format that supports arbitrary document lengths, but keeps performance optimizations for tweets 10
  • Introduction 2008 Twitter acquires Summize (MySQL-based RT search engine) 2009 2010 Modified Lucene (Earlybird) ships and replaces MySQL indexes 2011 New Earlybird features: image/video search; index compression; efficient relevance search in time-sorted index 2012 2013 2014 Tweet archive search on SSD with vanilla Lucene New RT posting list format that supports arbitrary document lengths, but keeps performance optimizations for tweets 11
  • Introduction 2008 Twitter acquires Summize (MySQL-based RT search engine) 2009 2010 Modified Lucene (Earlybird) ships and replaces MySQL indexes 2011 New Earlybird features: image/video search; index compression; efficient relevance search in time-sorted index 2012 2013 2014 Tweet archive search on SSD with vanilla Lucene New RT posting list format that supports arbitrary document lengths, but keeps performance optimizations for tweets 12
  • Introduction 2008 Twitter acquires Summize (MySQL-based RT search engine) 2009 2010 Modified Lucene (Earlybird) ships and replaces MySQL indexes 2011 New Earlybird features: image/video search; index compression; efficient relevance search in time-sorted index 2012 2013 2014 Tweet archive search on SSD with vanilla Lucene New RT posting list format that supports arbitrary document lengths, but keeps performance optimizations for tweets 13
  • Realtime Search @twitter Agenda - Introduction ‣ Search Architecture - Inverted Index 101 - Realtime Posting Lists 14
  • Search Architecture 15
  • Search Architecture RT stream raw tweets Analyzer/ Partitioner analyzed tweets RT index RT index (Earlybird) Blender Search requests Tweet archive HDFS raw Mapreduce tweets Analyzer analyzed tweets Archive RT index index writes searches 16
  • Search Architecture Analyzer/ Partitioner • Pre-processes Tweets for indexing • Analyzing (tokenization/normalization) of text • Geo-coding, URL expansion, etc. • Hash partitioning 17
  • Search Architecture RT stream raw tweets Analyzer/ Partitioner analyzed tweets RT index RT index (Earlybird) Blender Search requests Tweet archive HDFS raw Mapreduce tweets Analyzer analyzed tweets Archive RT index index writes searches 18
  • Search Architecture RT index RT index (Earlybird) • Modified Lucene index implementation optimized for realtime search • IndexWriter buffer is searchable (no need to flush to allow searching) • In-memory • Hash-partitioned, static layout 19
  • Cluster layout Earlybird Earlybird Earlybird Replicas 20
  • Cluster layout n hash partitions (docId % n) Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird ... Earlybird Earlybird Earlybird Replicas 21
  • Cluster layout n hash partitions (docId % n) Earlybird Earlybird Earlybird Timeslices Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird ... Earlybird Earlybird Earlybird ... Earlybird Earlybird Earlybird ... Earlybird Earlybird Earlybird ... Earlybird Earlybird Earlybird ... Earlybird Earlybird Earlybird ... ... Earlybird Earlybird Earlybird Replicas 22
  • Cluster layout Writable timeslice Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Complete timeslices Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird Earlybird ... Earlybird Earlybird Earlybird ... Earlybird Earlybird Earlybird ... Earlybird Earlybird Earlybird ... Earlybird Earlybird Earlybird ... Earlybird Earlybird Earlybird ... ... Earlybird Earlybird Earlybird 23
  • Search Architecture RT index RT index (Earlybird) • Modified Lucene index implementation optimized for realtime search • IndexWriter buffer is searchable (no need to flush to allow searching) • In-memory • Hash-partitioned, static layout 24
  • Search Architecture RT stream raw tweets Analyzer/ Partitioner analyzed tweets RT index RT index (Earlybird) Blender Search requests Tweet archive HDFS raw Mapreduce tweets Analyzer analyzed tweets Archive RT index index writes searches 25
  • Search Architecture Mapreduce Analyzer • Daily jobs that process raw tweets • Analyzes text • Aggregates metadata and signals 26
  • Search Architecture RT stream raw tweets Analyzer/ Partitioner analyzed tweets RT index RT index (Earlybird) Blender Search requests Tweet archive HDFS raw Mapreduce tweets Analyzer analyzed tweets Archive RT index index writes searches 27
  • Search Architecture Archive RT index index • Standard Lucene (4.4) indexes • Reverse time-sorted (new to old) • Cluster layout similar to realtime search cluster 28
  • Search Architecture Archive RT index index • Two tiers: In-memory and on SSD In-memory index SSD index 29
  • Search Architecture Archive RT index index • Two tiers: In-memory and on SSD Contains small number of best tweets of all time In-memory index SSD index 30
  • Search Architecture Archive RT index index • Two tiers: In-memory and on SSD In-memory index Much bigger index with more tweets, less max. QPS, limited by SSD IOPS. Only needs to be queried if inmemory index did not yield enough results SSD index 31
  • Search Architecture RT stream raw tweets Analyzer/ Partitioner analyzed tweets RT index RT index (Earlybird) Blender Search requests Tweet archive HDFS raw Mapreduce tweets Analyzer analyzed tweets Archive RT index index writes searches 32
  • Search Architecture RT index RT index (Earlybird) • Blender is our Thrift service aggregator Blender • Queries multiple Earlybirds, merges results Search requests Archive RT index index writes searches 33
  • Search Architecture RT stream raw tweets Analyzer/ Partitioner analyzed tweets RT index RT index (Earlybird) Blender Search requests Tweet archive HDFS raw Mapreduce tweets Analyzer analyzed tweets Archive RT index index writes searches 34
  • Search Architecture Tweets Analyzer/ Partitioner RT index RT index (Earlybird) queue Updates HDFS Deletes/ Engagement (e.g. retweets/favs) Mapreduce Analyzer Blender Search requests Archive RT index index writes searches 35
  • Realtime Search @twitter Agenda - Introduction - Search Architecture ‣ Inverted Index 101 - Realtime Posting Lists 36
  • Inverted Index 101 37
  • Inverted Index 101 1 The old night keeper keeps the keep in the town 2 In the big old house in the big old gown. 3 The house in the town had the big old keep 4 Where the old night keeper never did sleep. 5 The night keeper keeps the keep in the night 6 And keeps in the dark and sleeps in the light. Table with 6 documents Example from: Justin Zobel , Alistair Moffat, Inverted files for text search engines, ACM Computing Surveys (CSUR) v.38 n.2, p.6-es, 2006 38
  • Inverted Index 101 1 The old night keeper keeps the keep in the town 2 In the big old house in the big old gown. 3 The house in the town had the big old keep 4 Where the old night keeper never did sleep. 5 The night keeper keeps the keep in the night 6 And keeps in the dark and sleeps in the light. Table with 6 documents term and big dark did gown had house in keep keeper keeps light never night old sleep sleeps the town where freq 1 2 1 1 1 1 2 5 3 3 3 1 1 3 4 1 1 6 2 1 <6> <2> <3> <6> <4> <2> <3> <2> <3> <1> <2> <3> <5> <6> <1> <3> <5> <1> <4> <5> <1> <5> <6> <6> <4> <1> <4> <5> <1> <2> <3> <4> <4> <6> <1> <2> <3> <4> <5> <6> <1> <3> <4> Dictionary and posting lists 39
  • Inverted Index 101 1 The old night keeper keeps the keep in the town 2 In the big old house in the big old gown. 3 The house in the town had the big old keep 4 Where the old night keeper never did sleep. 5 The night keeper keeps the keep in the night 6 And keeps in the dark and sleeps in the light. Table with 6 documents Query: keeper term and big dark did gown had house in keep keeper keeps light never night old sleep sleeps the town where freq 1 2 1 1 1 1 2 5 3 3 3 1 1 3 4 1 1 6 2 1 <6> <2> <3> <6> <4> <2> <3> <2> <3> <1> <2> <3> <5> <6> <1> <3> <5> <1> <4> <5> <1> <5> <6> <6> <4> <1> <4> <5> <1> <2> <3> <4> <4> <6> <1> <2> <3> <4> <5> <6> <1> <3> <4> Dictionary and posting lists 40
  • Inverted Index 101 1 The old night keeper keeps the keep in the town 2 In the big old house in the big old gown. 3 The house in the town had the big old keep 4 Where the old night keeper never did sleep. 5 The night keeper keeps the keep in the night 6 And keeps in the dark and sleeps in the light. Table with 6 documents Query: keeper term and big dark did gown had house in keep keeper keeps light never night old sleep sleeps the town where freq 1 2 1 1 1 1 2 5 3 3 3 1 1 3 4 1 1 6 2 1 <6> <2> <3> <6> <4> <2> <3> <2> <3> <1> <2> <3> <5> <6> <1> <3> <5> <1> <4> <5> <1> <5> <6> <6> <4> <1> <4> <5> <1> <2> <3> <4> <4> <6> <1> <2> <3> <4> <5> <6> <1> <3> <4> Dictionary and posting lists 41
  • Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 42
  • Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Delta encoding: 5 10 8985 2 90998 90 43
  • Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Delta encoding: 5 10 8985 VInt compression: 00000101 2 90998 90 Values 0 <= delta <= 127 need one byte 44
  • Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Delta encoding: VInt compression: 5 10 8985 2 90998 90 11000110 00011001 Values 128 <= delta <= 16384 need two bytes 45
  • Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Delta encoding: VInt compression: 5 10 8985 2 90998 90 11000110 00011001 First bit indicates whether next byte belongs to the same value 46
  • Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Delta encoding: VInt compression: 5 10 8985 2 90998 90 11000110 00011001 • Variable number of bytes - a VInt-encoded posting can not be written as a primitive Java type; therefore it can not be written atomically 47
  • Posting list encoding Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Delta encoding: 5 10 8985 2 90998 90 Read direction • Each posting depends on previous one; decoding only possible in old-to-new direction • With recency ranking (new-to-old) no early termination is possible 48
  • Posting list encoding • By default Lucene uses a combination of delta encoding and VInt compression • VInts are expensive to decode • Problem 1: How to traverse posting lists backwards? • Problem 2: How to write a posting atomically? 49
  • Realtime Search @twitter Agenda - Introduction - Search Architecture - Inverted Index 101 ‣ Realtime Posting Lists 50
  • Realtime Posting Lists 51
  • Posting list encoding in Earlybird v1 int (32 bits) docID 24 bits max. 16.7M textPosition 8 bits max. 255 • Tweet text can only have 140 chars 52
  • Posting list encoding in Earlybird v1 Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Earlybird encoding: 5 15 9000 9002 100000 100090 Read direction 53
  • Early query termination Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090 Earlybird encoding: 5 15 9000 9002 100000 100090 Read direction E.g. 3 result are requested: Here we can terminate after reading 3 postings 54
  • Inverted index components Posting list storage ? Dictionary Parallel arrays pointer to the most recently indexed posting for a term 55
  • Inverted index components Posting list storage ? Dictionary Parallel arrays pointer to the most recently indexed posting for a term 56
  • Posting lists storage - Objectives • Store many single-linked lists of different lengths space-efficiently • The number of java objects should be independent of the number of lists or number of items in the lists • Every item should be a possible entry point into the lists for iterators, i.e. items should not be dependent on other items (e.g. no delta encoding) • Append and read possible by multiple threads in a lock-free fashion (single append thread, multiple reader threads) • Traversal in backwards order 57
  • Memory management 4 int[] pools = 32K int[] 58
  • Memory management 4 int[] pools = 32K int[] Each pool can be grown individually by adding 32K blocks 59
  • Memory management 4 int[] pools • For simplicity we can forget about the blocks for now and think of the pools as continuous, unbounded int[] arrays • Small total number of Java objects (each 32K block is one object) 60
  • Memory management slice size 211 27 24 21 • Slices can be allocated in each pool • Each pool has a different, but fixed slice size 61
  • Adding and appending to a list slice size 211 27 available 24 allocated 21 current list 62
  • Adding and appending to a list slice size 211 27 available 24 allocated 21 current list Store first two postings in this slice 63
  • Adding and appending to a list slice size 211 27 available 24 allocated 21 current list When first slice is full, allocate another one in second pool 64
  • Adding and appending to a list slice size 211 27 available 24 allocated 21 current list Allocate a slice on each level as list grows 65
  • Adding and appending to a list slice size 211 27 available 24 allocated 21 current list On upper most level one list can own multiple slices 66
  • Posting list format v1 int (32 bits) docID 24 bits max. 16.7M textPosition 8 bits max. 255 • Tweet text can only have 140 chars 67
  • Addressing items • Use 32 bit (int) pointers to address any item in any list unambiguously: int (32 bits) poolIndex 2 bits 0-3 sliceIndex 19-29 bits depends on pool offset in slice 1-11 bits depends on pool • Nice symmetry: Postings and address pointers both fit into a 32 bit int 68
  • Linking the slices slice size 211 27 available 24 allocated 21 current list 69
  • Linking the slices slice size 211 27 available 24 allocated 21 current list Dictionary Parallel arrays pointer to the last posting indexed for a term 70
  • Posting list encoding - Summary • ints can be written atomically in Java • Backwards traversal easy on absolute docIDs (not deltas) • Every posting is a possible entry point for a searcher • Skipping can be done without additional data structures as binary search, though there are better approaches (skip lists) • Repeating docIDs if a term occurs multiple times in the same document only works for small docs • Max. segment size: 2^24 = 16.7M tweets 71
  • New posting list encoding • Objectives: • 32 bit positions and variable-length payloads • Store term frequency (TF) instead of repeating docIDs • Keep: • Concurrency model • Space-efficiency for short documents • Performance 72
  • New posting list encoding DocID, termFreq Position, Payload 73
  • New posting list encoding DocID, termFreq Position, Payload Fixed length for each posting 74
  • New posting list encoding DocID, termFreq Position, Payload Variable length 75
  • New posting list encoding DocID, termFreq Position, Payload 76
  • New posting list encoding ... DocID, termFreq DocID, termFreq DocID, termFreq Position, Payload Position, Payload, Position ... Position, Payload 77
  • New posting list encoding ... DocID, termFreq DocID, termFreq DocID, termFreq Position, Payload Position, Payload, Position ... Position, Payload • Store TF instead of repeating the same DocID • Store DocID/TF pairs separately from position/payloads • Find a way to synchronously decode the two streams without storing a pointer for each posting (expensive) 78
  • New posting list encoding ... DocID, termFreq DocID, termFreq DocID, termFreq Position, Payload Position, Payload, Position ... Position, Payload Fixed length for each posting (32 bits) • Store TF instead of repeating the same DocID • Store DocID/TF pairs separately from position/payloads • Find a way to synchronously decode the two streams without storing a pointer for each posting (expensive) 79
  • New posting list encoding • Idea: Use an embedded skip list as periodical “synchronization points” • Keeps memory overhead for pointers low and improves search performance 80
  • New posting list encoding slice size 211 27 available 24 allocated 21 current list 81
  • New posting list encoding Slice header • Header contains: • Back-pointer to previous slice (as before) • Skip list • Slice id 82
  • New posting list encoding int (32 bits) docID 24 bits max. 16.7M textPosition 8 bits max. 255 • Observation: Most tweets don’t need all 8 bits for text position • Idea: Use the position “inlining” approach for short documents, but support Lucene’s 32-bit positions and variable length payloads 83
  • New posting list encoding int (32 bits) docID 24 bits max. 16.7M textPosition or termFreq 7 bits max. 127 0=textPosition 1=termFreq 1 bit As a storage optimization, the text position is stored with the docID if: o termFreq == 1 (term occurs once only in the doc) AND o textPosition <= 127 AND o Posting has no payload AND o Posting is not at a skip point of the docID posting list (see later). 84
  • New posting list encoding - Summary • Support for 32 bit positions and arbitrary length payloads stored in separate data structure • Performance and space consumption very similar compared to previous encoding for tweet search • Skip lists used for speed and synchronization points • For short documents positions can still be inlined 85
  • Questions? Michael Busch @michibusch michael@twitter.com buschmi@apache.org Previous talk: http://vimeo.com/31195040 86