Twitter Timeline and Search Distributed System.pptx

Distributed Systems
Twitter Timeline
and Search

Presented By
Md. Rakib Trofder
BSSE-1129
Nazmus Sakib
Ahmed
BSSE-1108
Md. Siam
BSSE-1104

Tweets
Use cases
Followers
Notifications
Emails

Timeline
Home
Timeline
Searching
High
Availability
Use cases

Constraints and
assumptions
02

State assumptions
01
100 million active users
02
500 million tweets per day
03
250 billion read
requests per month
04
10 billion searches per
month

State assumptions
10 deliveries fanout
for a tweet
5 billion tweets delivered
on fanout per day
150 billion tweets
delivered on fanout
per month
500 million
tweets per day
15 billion tweets
per month

State assumptions
Fast Optimized Heavy
Viewing the timeline Reads of tweets Reads and Writes
Timeline & Searching

Calculate usage
250 billion read requests per month * (400 requests per second /
1 billion requests per month)
= 100 thousand read requests per second
15 billion tweets per month * (400 requests per second / 1 billion
requests per month) = 6,000 tweets per second

Calculate usage
150 billion tweets delivered on fanout per month * (400 requests
per second / 1 billion requests per month)
= 60 thousand tweets delivered on fanout per second
10 billion searches per month * (400 requests per second / 1
billion requests per month)= 4,000 search requests per second

Calculate usage
500 million tweets per day * 10 KB per tweet * 30 days per month
= 150 TB of new tweet content per month
150 TB of new tweet content per month * 12 months per year * 3 years
= 5.4 PB of new tweet content in 3 years

Database Selection (Maintaining Own
Timeline)
Indexing
Super Fast Index
Lookups
Faster Search
Inserting of Tweet to the
User’s Profile and Viewing
Own Timeline
Strict Relation
Relational
Database

Database Selection (Delivering tweets and
building the home timeline )
Overload
Writing the Tweets to
Everyone’s Profile who
Follows
Fast Write
Is the Only Solution
Caching
Non Relational
Database

Database Selection (Object Storing )
Object Database
Storing the Media (images,
videos) files

Services
User Info
Service
Tweet Info
Service
Timeline
Service
User Graph
Service
Search Service
Notification
Service Fan Out Service

Use case: User posts a tweet
● The Client posts a tweet to the Web Server, running as a reverse proxy
● The Web Server forwards the request to the Write API server
● The Write API stores the tweet in the user's timeline on a SQL database

Use case: User posts a tweet (Cont)
● The Write API contacts the Fan Out Service, which does the following:
○ Queries the User Graph Service to find the user's followers stored in the Memory
Cache
○ Stores the tweet in the home timeline of the user's followers in a Memory Cache
■ O(n) operation: 1,000 followers = 1,000 lookups and inserts
○ Stores the tweet in the Search Index Service to enable fast searching
○ Stores media in the Object Store
○ Uses the Notification Service to send out push notifications to followers:
■ Uses a Queue (not pictured) to asynchronously send out notifications

Use case: User Views the Home Timeline
● The Client posts a home timeline request to the Web Server
● The Web Server forwards the request to the Read API server
● The Read API server contacts the Timeline Service, which does the following:
○ Gets the timeline data stored in the Memory Cache, containing tweet ids and user ids -
O(1)
○ Queries the Tweet Info Service with a multiget to obtain additional info about the tweet
ids - O(n)
○ Queries the User Info Service with a multiget to obtain additional info about the user
ids - O(n)

Use case: User views the user timeline
● The Client posts a user timeline request to the Web Server
● The Web Server forwards the request to the Read API server
● The Read API retrieves the user timeline from the SQL Database

Use case: User searches keywords
● The Client sends a search request to the Web Server
● The Web Server forwards the request to the Search API server
● The Search API contacts the Search Service, which does the following:
○ Parses/tokenizes the input query, determining what needs to be searched
■ Removes markup
■ Breaks up the text into terms
■ Fixes typos
■ Normalizes capitalization
■ Converts the query to use boolean operation
● Queries the Search Cluster (ie Lucene) for the results:
○ Scatter gathers each server in the cluster to determine if there are any results for the
query
○ Merges, ranks, sorts, and returns the result

Testing
Benchmark Find Bottlenecks

CDN
Load Balancer
Scale the design
DNS

Replicas Master Slave Write
Scale the design

Points of Bottlenecks
SQL Write Master-
Slave
A single SQL write master-
slave might be
overwhelmed
Fan out service
In case of users with
millions of followers

Points of Optimization
(Fan Out Service)
● Avoid fanout of users with millions of followers
● Search to find tweets for highly-followed users
● Merge the results with user’s home tweets
● Re-order and serve them

(SQL Write Master-Slave)
● Federation (Functional Partitioning)
● Sharding
● Denormalization
● SQL Tuning

(Additional)
● Keep only several hundred tweets for each home timeline in the Memory Cache
● Keep only active users' home timeline info in the Memory Cache
● If a user was not previously active in the past 30 days, we could rebuild the timeline
from the SQL Database
● Query the User Graph Service to determine who the user is following
● Get the tweets from the SQL Database and add them to the Memory Cache
● Store only a month of tweets in the Tweet Info Service
● Store only active users in the User Info Service
● The Search Cluster would likely need to keep the tweets in memory to keep latency
low

CREDITS: Diese Präsentationsvorlage wurde von
Slidesgo erstellt, inklusive Icons von Flaticon und
Infografiken & Bilder von Freepik
Thanks!
Any Questions?

Twitter Timeline and Search Distributed System.pptx

Recommended

Recommended

More Related Content

Similar to Twitter Timeline and Search Distributed System.pptx

Similar to Twitter Timeline and Search Distributed System.pptx (20)

More from Md. Rakib Trofder

More from Md. Rakib Trofder (20)

Recently uploaded

Recently uploaded (20)

Twitter Timeline and Search Distributed System.pptx