2. Introduction
• 819 million monthly active users who used Facebook mobile products
as of June 30, 2013.
• 699 million daily active users on average in June 2013.
• 1.15 billion monthly active users as of June 2013.
• 2.5 billion content items shared per day (status updates + wall posts +
photos + videos + comments)
• 2.7 billion Likes per day
• 300 million photos uploaded per day
• 100+ petabytes of disk space in one of FB’s largest Hadoop (HDFS)
clusters
• 105 terabytes of data scanned via Hive, Facebook’s Hadoop query
language, every 30 minutes
• 70,000 queries executed on these databases per day
• 500+terabytes of new data ingested into the databases every day
By this statics, Facebook have to use such a great technology to handle
this traffic and giving their user a faster and safer social experience
3. Technologies
For faster data transfer
• Cookies and Caches
• GZip compression
• AJAX and JSON
• XMPP messaging
For data storage
• HBase & Haystack
• Zookeeper
• Memcached
• Scribe
4. Cookies and Caches
Cookies are small pieces of data that are stored on your
computer, mobile phone or other device.
Cache is a type of memory which is used by web browser. When any
page loads and it is not changeable for a long time browser cache it’s
CSS/JS and read it from memory to reduce the data transfer .
It provide and understand a range of products and services.
Facebook use this technologies to do things like:
• make Facebook easier or faster to use;
• enable features and store information about you (including on your
device or in your browser cache) and your use of Facebook;
• deliver, understand and improve advertising;
• monitor and understand the use of FB products and services;
• to protect you, others and Facebook.
5. Gzip Compression
Gzip is a software application used for file compression and
decompression
It compresses the image, CSS, JS sent by server and loads in client
machine then decompress it. So there is no change in data and UI but
data transfer rate is decreased. So all servers of Facebook used Gzip
compression to make web more faster
6. AJAX and JSON
AJAX and JSON is a group of interrelated web development techniques
used on the client-side to create asynchronous web applications.
With AJAX, web applications can send data to, and retrieve data from,
a server asynchronously (in the background) without interfering with the
display and behavior of the existing page.
Data can be retrieved using the XMlHttpRequest object.
Where AJAX-JSON mainly used in Facebook
• Like, Comment, Share
• Post story
• Send message
• Load feed
• Dialog Box – likes, Mutual friends etc…
7. XMPP Messaging
XMPP stands for Extensible Messaging and Presence Protocol.
XMPP is also called jabber protocol.
Facebook chat and messages work on this platform.
Every user of Facebook has a unique id and personal chat email like
100000874067290@chat.facebook.com and someone wants to send
message to that user core script convert it to XML and send to Jabber
server.
After this process partner user gets the message at same instance due
to highly reliable servers.
9. HBase and Haystack
Horizontal scalability
• HBase & HDFS are elastic by design
• Multiple table shards (regions) per physical server
• On node additions
• Load balancer automatically reassigns shards from overloaded
nodes to new nodes
• Because file system underneath is itself distributed, data for
reassigned regions is instantly servable from the new nodes.
• Regions can be dynamically split into smaller regions.
• Pre-sharding is not necessary
• Splits are near instantaneous!
10. HBase and Haystack
Automatic failover
• Node failures automatically detected by HBase Master
• Regions on failed node are distributed evenly among surviving
nodes.
• Multiple regions/server model avoids need for substantial
overprovisioning
• HBase Master failover
• 1 active, rest standby
• When active master fails, a standby automatically takes over
11. HBase and Haystack
HDFS ( Highly Distributed File System )
• Fault tolerance (block level replication for redundancy)
• Scalability
• End-to-end checksums to detect and recover from corruptions
• Map Reduce for large scale data processing
• HDFS already battle tested inside Facebook
• running petabyte scale clusters
• lot of in-house development and operational experience
12. Zookeeper
Zookeeper is open source software that FB use mainly for two purposes:
• As the controller for implementing sharding and failover of
application servers
• As a store for their discovery service.
Since Zookeeper provides FB with a highly available repository and
notification mechanism, it goes a long way towards helping FB build a
highly available service.
13. Memcached
If you've read anything about scaling large websites, you've probably
heard about memcached.
Memcached is a high-performance, distributed memory object
caching system.
Facebook is the world's largest user of memcached. They use
memcached to alleviate database load.
Memcached is already fast, but they need it to be faster and more
efficient than most installations. FB use more than 800 servers supplying
over 28 terabytes of memory to their users. Over the past year as
Facebook's popularity has skyrocketed, They've run into a number of
scaling issues. This ever increasing demand has required them to make
modifications to both their operating system and memcached to
achieve the performance that provides the best possible experience for
our users.
14. Scribe – Log server
Scribe is a server for aggregating log data streamed in real-time from a
large number of servers. It is designed to be scalable, extensible without
client-side modification, and robust to failure of the network or any
specific machine.
Scribe was developed at Facebook using Apache Thrift and released in
2008 as open source.
Scribe servers are arranged in a directed graph, with each server
knowing only about the next server in the graph. This network
topology allows for adding extra layers of fan-in as a system grows, and
batching messages before sending them between datacenters, without
having any code that explicitly needs to understand datacenter
topology, only a simple configuration.