SEMINAR PRESENTATION
OVERVIEW OF FACEBOOK
SCALABLE ARCHITECTURE
PRESENTED BY SHARATH BASIL KURIAN
GUIDED BY DR. SUDHEEP ELAYIDOM
CONTENTS
Introduction
How Facebook works
Facebook’s Architecture
Front End
Back End
Conclusions
2
Scalability is essential for web applications with massive numbers of users.
Hundreds of thousands of users accessing continuously lead to slow response time
The current architecture of Facebook is very large and consists of many
technologies and thousands of servers.
In this presentation Facebook’s architecture and how it handles scalability issues is
going to be described.
INTRODUCTION
3
HOW FACEBOOK WORKS
Facebook needs to handle large amounts of information every second.
Facebook continues to be a LAMP (Linux- Apache-Memcached-PHP) website, but
had to change and expand its operation to incorporate other elements and
services.
Personalized and implemented systems also exist inside Facebook, such as
Haystack.
4
FRONT-END
Facebook uses a variety of services, tools, and programming languages to make
up its core infrastructure.
Main Components are
LAMP
PHP & BigPipe
HipHop
At the front-end, the servers run a LAMP (Linux, Apache, MySQL, and PHP) stack
with Memcache.
6
LAMP 7
• Linux is the operating system used.
• It is open source, highly customizable and good for safety.
• Apache is also open source and is the most popular web serve Facebook uses
Linux with the Apache HTTP server.
PHP & BIGPIPE 8
WHY PHP ? ?
• fast iterations
• easy to use
• Simple to learn
• good web programming language with extensive community support.
• BigPipe is a dynamic web page system developed by Facebook.
• The general idea is to perform pipelining of sections through the implementation of various
stages within web browsers and servers.
9
Accessing Webpage
• Browser sends an HTTP request to web server.
• Web server parses the request, pulls data from storage tier then formulates an HTML document
and sends it to the client in an HTTP response.
• HTTP response is transferred over the Internet to browser.
• Browser parses the response from web server, constructs a tree representation of the HTML
document, and downloads CSS and JavaScript resources referenced by the document.
• After downloading resources, browser parses and executes them.
• In this scenario, while the web server is processing and creating the HTML document, the
browser is idle and when the browser is rendering the html page, the web server remains idle.
Normal Web Page Request 10
Displays
Webpage
Browser Web Server
Data
base
HTTP request
Data
HTML Document
11
PIPELINING
• The browser sends an HTTP request to web server.
• Server quickly renders a page skeleton containing the tags and a body.
The HTTP connection to the browser stays open as the page is not yet
finished.
• Browser will start rendering the page.
• The PHP server process is still executing and building the page lets. Once a
page let has been completed it's results are sent to the browser .
• Browser injects the html code for the page let received into the correct place.
If the page let needs any CSS resources those are also downloaded.
• After all page lets have been received the browser starts to load all external
files needed by those page lets asynchronously.
This results in a system where the page is rendered progressively. The initial page
content becomes visible much earlier, which dramatically improves user
perceived latency of the page.
HIP-HOP 12
• PHP compiler.
• Developed by Facebook
• The processing time for PHP language is slow
• Created to minimize server resources.
• Converts PHP scripts into optimized C++ code.
• Some key features of HipHop are:
Easy to implement extensions
Reduces CPU and Memory usage significantly
BACK-END 14
• At the heart of the application back end are the application servers.
• Application servers are responsible for answering all queries and take all the writes
into the system.
• They also interact with a number of services to achieve this
• Employ sharding and caching to process information
HAYSTACK 15
• Haystack is an object store that is designed for sharing photos on Facebook where data
is written once, read often, never modified, and rarely deleted and replaced.
• traditional file systems perform poorly under huge work- load.
• High throughput and low latency.
• Fault-tolerant
• Efficient storage of billions of photos.
• Highly scalable.
• Uses extensive caching in its main memory.
Working of Haystack 16
Structure of Haystack
Traditional methods store files in directories
into several meta files.
.
No of Disk reads required for one photo
becomes high
• Translate filename to i-node
• Read i-node
• Read file
Haystack
• Stores several meta data into a large file
• Stored in main memory
• Data Replicated on physical volumes
helps against hard disk failures etc
SCRIBE 17
• Scalable distributed logging framework
• Useful for logging a wide array of data
• Simple data model
• Built on top of Thrift
THRIFT 18
• Lightweight software framework for cross- language development
• Very quick
•
• Swaps information between various applications developed in different languages.
• Thrift protocol offers serialization between several languages.
• Eg: PHP is used for the front-end, Erlang is used for Chat
MYSQL 19
• Fast and reliable
• Thousands of MySQL servers
•
• Users randomly distributed across these servers
• Relational aspect of DB is not used
• No joins. Logically difficult(Data is distributed randomly)
•
• Primarily key-value store
MEMCACHED 20
• System uses client-server architecture.
• Key-value memory storage system for arbitrary pieces of information that result either from
database research or page rendering.
• Server maintains a key-value vector association
• Client populates this vector and performs research on it.
• The server then stores these values in memory therefore decreases reading time.
• If a server consumes the available memory, it removes older values
Memory Management using
Memcached
21
Memcached based Architecture
Protects the main
database from high
read demands from
users
HADOOP & HIVE 22
• Hadoop is ideal for scalable systems with an enormous need to store and process large
amounts of information.
• Scales horizontally.
• Instead of replacing existing storage systems, Hadoop complements them.
• This process allows existing systems to serve data in real time or provide transactional
interactive business intelligence.
Storing 23
• Apache Hadoop is being used in
three broad types of systems:
• as a warehouse for web analytics
• as storage for a distributed database
• and for MySQL database backups.
CONCLUSIONS 24
• Facebook has gained a tremendous amount of popularity in the last years and
has become the most successful social networking website ever created.
• Facebook realizes the power of social networking and is constantly innovating to
keep their service the best in the business.