SlideShare a Scribd company logo
TWITTER
SEARCH
By: Ramez Al-Fayez
TWITTER
User-generated content
­  140 characters called Tweet
­  Informal language, free-form
­  Diverse topics
­  Images, videos and links
­  SPAM L
Very high volume
Ø Information overload
2
“When you've got 5 minutes to fill,
Twitter is a great way to fill 35
minutes”
@mattcutts 
TWITTER STATS
3
2 BILLION
QUERIES PER DAY
230 MILLION
TWEETS PER DAY
< 10 S
INDEXING LATENCY
50 MS
AVG. QUERY
RESPONSE TIME
1 BILLION
REGISTERED USER
143,199
TWEETS PER SECONDS
4
WHAT TO SEARCH IN TWITTER?
­  Tweets
­  Images (Tweets that have images)
­  Users
­  News(Tweets that have links)
5
SEARCHING FOR “IPAD” ON TWITTER
6
More than 50 tweets
mentioning “iPad”
posted within
1-minute
CUSTOMIZED IR FOR TWITTER
Feature of Twitter’s IR
§ Modularity
§ Scalability
§ Cost effectiveness
§ Simple interface
§ Incremental development
7
CUSTOMIZED IR FOR TWITTER
The system consists four main parts
§ Batched data aggregation and preprocess pipeline
§ An inverted index builder;
§ Earlybird shards
§ Earlybird roots
8
CRAWLING TWITTER
 HoseBird API Client
	
  	
  Client	
  hosebirdClient	
  =	
  builder.build();	
  
StatusesFilterEndpoint	
  endpoint	
  =	
  new	
  StatusesFilterEndpoint();	
  
//	
  Optional:	
  set	
  up	
  some	
  followings	
  and	
  track	
  terms	
  
List<Long>	
  followings	
  =	
  Lists.newArrayList(1234L,	
  566788L);	
  
List<String>	
  terms	
  =	
  Lists.newArrayList("twitter",	
  "api");	
  
endpoint.followings(followings);	
  
endpoint.trackTerms(terms);	
  
INDEXING TWITTER
	
  	
  
In November 18, 2014 Twitter inc. announce that Twitter now
indexes every public Tweet since 2006
§ Temporal sharding: The Tweet corpus was first divided into multiple time tiers.
§ Hash partitioning: Within each time tier, data was divided into partitions based on a
hash function.
§ Earlybird: Within each hash partition, data was further divided into chunks called
Segments. Segments were grouped together based on how many could fit on each
Earlybird machine.
§ Replicas: Each Earlybird machine is replicated to increase serving capacity and
resilience
DATA AGGREGATION
11
§ Engagement aggregator: Counts the number of engagements for each Tweet in a
given day. These engagement counts are used later as an input in scoring each Tweet.
§ Aggregation: Joins multiple data sources together based on Tweet ID.
§ Ingestion: Performs different types of preprocessing — language identification,
tokenization, text feature extraction, URL resolution and more.
§ Scorer: Computes a score based on features extracted during Ingestion. For the
smaller historical indices, this score determined which Tweets were selected into the
index.
§ Partitioner: Divides the data into smaller chunks through our hashing algorithm. The
final output is stored into HDFS.
DATA AGGREGATION
12
INVERT INDEX
13
§ Segment partitioner: Groups multiple batches of preprocessed daily Tweet data from
the same partition into bundles. We call these bundles “segments.”
§ Segment indexer: Inverts each Tweet in a segment, builds an inverted index and
stores the inverted index into HDFS.
INVERT INDEX
14
SEARCH PROCESS
15
 Earlybirds shards:
­  The inverted index builders produced hundreds of inverted index segments. These segments
were then distributed to machines called Earlybirds. Since each Earlybird machine could
only serve a small portion of the full Tweet corpus, we had to introduce sharding
­  two-dimensional sharding scheme to distribute index segments onto serving Earlybirds
­  Multiple time tiers
­  Hash partitioning
­  Each Earlybird machine is replicated to increase serving capacity and resilience
 Earlybird roots:
­  The roots perform a two level scatter-gather as shown in the below diagram, merging
search results and term statistics histograms
SEARCH PROCESS
16
SEARCH PROCESS
17
RANKING
18
§ Different types of content are searched separately
§ Uniscores: used as a means to blend different content types into the search result
§ Score unification: Individual content is assigned a “raw” score, then converted into
uniscores
§ Burst: is used to filter out content types with low or no bursts. It’s also used to boost the
score of corresponding content types, as a feature for a multi-class classifier that
predicts the most likely content type for a query, and in additional components of the
ranking system.
RANKING
19
Search ranker chose News1 followed by Tweet1 so far and is presented with three candidatesTweet2,
User Group, and News2 to pick the content after Tweet1.
News2 has the highest uniscore but search ranker picks Tweet2, instead of News2 as we penalize change
in type between consecutive content by decreasing the score of News2 from 0.65 to 0.55, for instance
RANKING
20
Normalized image and news counts are matched to one of n=5 states : 1
average, 2 above, and 2 below. Matched states curves show a more
stable quantization of original sequence which has the effect of removal
of small noisy peaks
Query of “Photo” shows three sequences of number of
Tweets over eight 15 minute buckets from bucket 1 (2
hours ago) to 8 (most recent).
REFERENCES
§ Anirudh Todi, TSAR, a TimeSeries AggregatoR ,
https://blog.twitter.com/2014/tsar-a-timeseries-aggregator
§ Youngin Shin, New Twitter search results,
https://blog.twitter.com/2013/new-twitter-search-results
§ Yi Zhuang, Building a complete Tweet index,
https://blog.twitter.com/2014/building-a-complete-tweet-index
§ J. Kleinberg, Bursty and Hierarchical Structure in Streams, Proc. 8th ACM SIGKDD
Intl. Conf. on Knowledge Discovery and Data Mining, 2002
§ Brendan O'Connor, Michel Krieger, and David Ahn. 2010b. TweetMotif:
Exploratory search and topic summarization for Twitter. In Proc. of ICWSM
21
THANK YOU!

More Related Content

What's hot

Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDB
ScaleGrid.io
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
MongoDB
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
Sunil Gurav
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
MongoDB
 
Fuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer ShinFuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer Shin
Databricks
 
Building Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraBuilding Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and Hydra
Markus Lanthaler
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
sirensolutions
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
Databricks
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
Max Lapan
 
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
Altinity Ltd
 
Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
Slava Kokaev
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to Druid
DataWorks Summit
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
ABC Talks
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
confluent
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Cassandra
CassandraCassandra
Cassandra
Upaang Saxena
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the Roadmap
EDB
 

What's hot (20)

Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDB
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Fuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer ShinFuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer Shin
 
Building Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraBuilding Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and Hydra
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
 
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
 
Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to Druid
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Cassandra
CassandraCassandra
Cassandra
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the Roadmap
 

Viewers also liked

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseImplementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Lucidworks (Archived)
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
lucenerevolution
 
PageRank - per capirlo un po' meglio
PageRank - per capirlo un po' meglioPageRank - per capirlo un po' meglio
PageRank - per capirlo un po' meglio
Marco Dal Pozzo
 
Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...
Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...
Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...
Marco Dal Pozzo
 
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Lucidworks
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
lucenerevolution
 
Hadoop at Bloomberg:Medium data for the financial industry
Hadoop at Bloomberg:Medium data for the financial industryHadoop at Bloomberg:Medium data for the financial industry
Hadoop at Bloomberg:Medium data for the financial industry
Matthew Hunt
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Lucidworks
 
Evolution of The Twitter Stack
Evolution of The Twitter StackEvolution of The Twitter Stack
Evolution of The Twitter Stack
Chris Aniszczyk
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
Trey Grainger
 
Lessons from Highly Scalable Architectures at Social Networking Sites
Lessons from Highly Scalable Architectures at Social Networking SitesLessons from Highly Scalable Architectures at Social Networking Sites
Lessons from Highly Scalable Architectures at Social Networking Sites
Patrick Senti
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Lucidworks
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Lucidworks
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling Twitter
Blaine
 
Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」
 Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」 Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」
Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」
Kazuhiko Nakamura
 
Twitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessonsTwitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessons
Aditya Rao
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Lucidworks (Archived)
 
DirtyTooth: It´s only Rock'n Roll but I like it
DirtyTooth: It´s only Rock'n Roll but I like itDirtyTooth: It´s only Rock'n Roll but I like it
DirtyTooth: It´s only Rock'n Roll but I like it
Telefónica
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
Jongyoon Choi
 

Viewers also liked (20)

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseImplementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 
PageRank - per capirlo un po' meglio
PageRank - per capirlo un po' meglioPageRank - per capirlo un po' meglio
PageRank - per capirlo un po' meglio
 
Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...
Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...
Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...
 
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
 
Hadoop at Bloomberg:Medium data for the financial industry
Hadoop at Bloomberg:Medium data for the financial industryHadoop at Bloomberg:Medium data for the financial industry
Hadoop at Bloomberg:Medium data for the financial industry
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Evolution of The Twitter Stack
Evolution of The Twitter StackEvolution of The Twitter Stack
Evolution of The Twitter Stack
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
 
Lessons from Highly Scalable Architectures at Social Networking Sites
Lessons from Highly Scalable Architectures at Social Networking SitesLessons from Highly Scalable Architectures at Social Networking Sites
Lessons from Highly Scalable Architectures at Social Networking Sites
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling Twitter
 
Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」
 Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」 Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」
Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」
 
Twitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessonsTwitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessons
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User Preferences
 
DirtyTooth: It´s only Rock'n Roll but I like it
DirtyTooth: It´s only Rock'n Roll but I like itDirtyTooth: It´s only Rock'n Roll but I like it
DirtyTooth: It´s only Rock'n Roll but I like it
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 

Similar to Twitter Search Architecture

Twitter System Design
Twitter System DesignTwitter System Design
Twitter System Design
AkshatMishra72438
 
Jinchao demo v3
Jinchao demo v3Jinchao demo v3
Jinchao demo v3
Jinchao Lin
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like system
Aree Oh
 
Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter
Ke Tao
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
smile790243
 
Twitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptxTwitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptx
Md. Rakib Trofder
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 
Jinchao demo v7
Jinchao demo v7Jinchao demo v7
Jinchao demo v7
Jinchao Lin
 
Twitter Sub-event Detection Project Presentation
Twitter Sub-event Detection Project PresentationTwitter Sub-event Detection Project Presentation
Twitter Sub-event Detection Project Presentation
Pallav Shah
 
Rob Procter
Rob ProcterRob Procter
Rob ProcterNSMNSS
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
Ajay Ohri
 
EasyHashtag
EasyHashtagEasyHashtag
EasyHashtag
Vishwesh Shetty
 
Tweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognitionTweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognition
ieeepondy
 
Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams Ke Tao
 
tweet segmentation
tweet segmentation tweet segmentation
tweet segmentation
prashanttarone
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
IRJET Journal
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search EngineNIKHIL NAIR
 
Scaling / optimizing search on netlog
Scaling / optimizing search on netlogScaling / optimizing search on netlog
Scaling / optimizing search on netlog
removed_8e0e1d901e47de676f36b9b89e06dc97
 

Similar to Twitter Search Architecture (20)

Twitter System Design
Twitter System DesignTwitter System Design
Twitter System Design
 
Jinchao demo v3
Jinchao demo v3Jinchao demo v3
Jinchao demo v3
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like system
 
Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter
 
STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
 
Twitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptxTwitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptx
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 
Jinchao demo v7
Jinchao demo v7Jinchao demo v7
Jinchao demo v7
 
Twitter Sub-event Detection Project Presentation
Twitter Sub-event Detection Project PresentationTwitter Sub-event Detection Project Presentation
Twitter Sub-event Detection Project Presentation
 
Rob Procter
Rob ProcterRob Procter
Rob Procter
 
README
READMEREADME
README
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
 
EasyHashtag
EasyHashtagEasyHashtag
EasyHashtag
 
Tweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognitionTweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognition
 
Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams
 
tweet segmentation
tweet segmentation tweet segmentation
tweet segmentation
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
 
Scaling / optimizing search on netlog
Scaling / optimizing search on netlogScaling / optimizing search on netlog
Scaling / optimizing search on netlog
 

More from Ramez Al-Fayez

Process mining in business process management
Process mining in business process managementProcess mining in business process management
Process mining in business process management
Ramez Al-Fayez
 
Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
Ramez Al-Fayez
 
Wcc elise features
Wcc elise featuresWcc elise features
Wcc elise features
Ramez Al-Fayez
 
SECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORK
SECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORKSECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORK
SECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORK
Ramez Al-Fayez
 
Maria DBMS
Maria DBMSMaria DBMS
Maria DBMS
Ramez Al-Fayez
 
Social networks and social media analysis in the context of the enterprise
Social networks and social media analysis in the context of the enterpriseSocial networks and social media analysis in the context of the enterprise
Social networks and social media analysis in the context of the enterpriseRamez Al-Fayez
 
IT strategic planning session
IT strategic planning sessionIT strategic planning session
IT strategic planning session
Ramez Al-Fayez
 

More from Ramez Al-Fayez (7)

Process mining in business process management
Process mining in business process managementProcess mining in business process management
Process mining in business process management
 
Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
 
Wcc elise features
Wcc elise featuresWcc elise features
Wcc elise features
 
SECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORK
SECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORKSECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORK
SECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORK
 
Maria DBMS
Maria DBMSMaria DBMS
Maria DBMS
 
Social networks and social media analysis in the context of the enterprise
Social networks and social media analysis in the context of the enterpriseSocial networks and social media analysis in the context of the enterprise
Social networks and social media analysis in the context of the enterprise
 
IT strategic planning session
IT strategic planning sessionIT strategic planning session
IT strategic planning session
 

Recently uploaded

Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 

Recently uploaded (20)

Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 

Twitter Search Architecture

  • 2. TWITTER User-generated content ­  140 characters called Tweet ­  Informal language, free-form ­  Diverse topics ­  Images, videos and links ­  SPAM L Very high volume Ø Information overload 2 “When you've got 5 minutes to fill, Twitter is a great way to fill 35 minutes” @mattcutts 
  • 3. TWITTER STATS 3 2 BILLION QUERIES PER DAY 230 MILLION TWEETS PER DAY < 10 S INDEXING LATENCY 50 MS AVG. QUERY RESPONSE TIME 1 BILLION REGISTERED USER 143,199 TWEETS PER SECONDS
  • 4. 4
  • 5. WHAT TO SEARCH IN TWITTER? ­  Tweets ­  Images (Tweets that have images) ­  Users ­  News(Tweets that have links) 5
  • 6. SEARCHING FOR “IPAD” ON TWITTER 6 More than 50 tweets mentioning “iPad” posted within 1-minute
  • 7. CUSTOMIZED IR FOR TWITTER Feature of Twitter’s IR § Modularity § Scalability § Cost effectiveness § Simple interface § Incremental development 7
  • 8. CUSTOMIZED IR FOR TWITTER The system consists four main parts § Batched data aggregation and preprocess pipeline § An inverted index builder; § Earlybird shards § Earlybird roots 8
  • 9. CRAWLING TWITTER  HoseBird API Client    Client  hosebirdClient  =  builder.build();   StatusesFilterEndpoint  endpoint  =  new  StatusesFilterEndpoint();   //  Optional:  set  up  some  followings  and  track  terms   List<Long>  followings  =  Lists.newArrayList(1234L,  566788L);   List<String>  terms  =  Lists.newArrayList("twitter",  "api");   endpoint.followings(followings);   endpoint.trackTerms(terms);  
  • 10. INDEXING TWITTER     In November 18, 2014 Twitter inc. announce that Twitter now indexes every public Tweet since 2006 § Temporal sharding: The Tweet corpus was first divided into multiple time tiers. § Hash partitioning: Within each time tier, data was divided into partitions based on a hash function. § Earlybird: Within each hash partition, data was further divided into chunks called Segments. Segments were grouped together based on how many could fit on each Earlybird machine. § Replicas: Each Earlybird machine is replicated to increase serving capacity and resilience
  • 11. DATA AGGREGATION 11 § Engagement aggregator: Counts the number of engagements for each Tweet in a given day. These engagement counts are used later as an input in scoring each Tweet. § Aggregation: Joins multiple data sources together based on Tweet ID. § Ingestion: Performs different types of preprocessing — language identification, tokenization, text feature extraction, URL resolution and more. § Scorer: Computes a score based on features extracted during Ingestion. For the smaller historical indices, this score determined which Tweets were selected into the index. § Partitioner: Divides the data into smaller chunks through our hashing algorithm. The final output is stored into HDFS.
  • 13. INVERT INDEX 13 § Segment partitioner: Groups multiple batches of preprocessed daily Tweet data from the same partition into bundles. We call these bundles “segments.” § Segment indexer: Inverts each Tweet in a segment, builds an inverted index and stores the inverted index into HDFS.
  • 15. SEARCH PROCESS 15  Earlybirds shards: ­  The inverted index builders produced hundreds of inverted index segments. These segments were then distributed to machines called Earlybirds. Since each Earlybird machine could only serve a small portion of the full Tweet corpus, we had to introduce sharding ­  two-dimensional sharding scheme to distribute index segments onto serving Earlybirds ­  Multiple time tiers ­  Hash partitioning ­  Each Earlybird machine is replicated to increase serving capacity and resilience  Earlybird roots: ­  The roots perform a two level scatter-gather as shown in the below diagram, merging search results and term statistics histograms
  • 18. RANKING 18 § Different types of content are searched separately § Uniscores: used as a means to blend different content types into the search result § Score unification: Individual content is assigned a “raw” score, then converted into uniscores § Burst: is used to filter out content types with low or no bursts. It’s also used to boost the score of corresponding content types, as a feature for a multi-class classifier that predicts the most likely content type for a query, and in additional components of the ranking system.
  • 19. RANKING 19 Search ranker chose News1 followed by Tweet1 so far and is presented with three candidatesTweet2, User Group, and News2 to pick the content after Tweet1. News2 has the highest uniscore but search ranker picks Tweet2, instead of News2 as we penalize change in type between consecutive content by decreasing the score of News2 from 0.65 to 0.55, for instance
  • 20. RANKING 20 Normalized image and news counts are matched to one of n=5 states : 1 average, 2 above, and 2 below. Matched states curves show a more stable quantization of original sequence which has the effect of removal of small noisy peaks Query of “Photo” shows three sequences of number of Tweets over eight 15 minute buckets from bucket 1 (2 hours ago) to 8 (most recent).
  • 21. REFERENCES § Anirudh Todi, TSAR, a TimeSeries AggregatoR , https://blog.twitter.com/2014/tsar-a-timeseries-aggregator § Youngin Shin, New Twitter search results, https://blog.twitter.com/2013/new-twitter-search-results § Yi Zhuang, Building a complete Tweet index, https://blog.twitter.com/2014/building-a-complete-tweet-index § J. Kleinberg, Bursty and Hierarchical Structure in Streams, Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002 § Brendan O'Connor, Michel Krieger, and David Ahn. 2010b. TweetMotif: Exploratory search and topic summarization for Twitter. In Proc. of ICWSM 21