Transcript of "How facebook works and function- a complete approach"
( CEO and Co-founder Team Zenith)
Why Facebook Is Giant
Facebook is the “social networking”.
People have been “facebooking” each other for about 7 years now,
making Facebook the most used social network with over 500 million
50% of our active users log on to Facebook in any given day
Average user has 130 friends
People spend over 700 billion minutes per month on Facebook
There are over 900 million objects that people interact with (pages,
groups, events and community pages)
Average user is connected to 80 community pages, groups and events
Average user creates 90 pieces of content each month
More than 30 billion pieces of content (web links, news stories, blog
posts, notes, photo albums, etc.) shared each month.
Here are a few factoids to give you an idea of the scaling challenge that
Facebook has to deal with:
Facebook serves 570 billion page views per month (according to Google Ad
There are more photos on Facebook than all other photo sites combined
(including sites like Flickr).
More than 3 billion photos are uploaded every month.
Facebook’s systems serve 1.2 million photos per second. This doesn’t
include the images served by Facebook’s CDN.
More than 25 billion pieces of content (status updates, comments, etc) are
shared every month.
Facebook has more than 30,000 servers (and this number is from last year!)
Scaling Challenge Of Facebook
Software That Helps Facebook Scale
In some ways Facebook is still a LAMP site (kind of), but it has had to change and extend its
operation to incorporate a lot of other elements and services, and modify the approach to
Facebook still uses PHP, but it has built a compiler for it so it can be turned into native
code on its web servers, thus boosting performance.
Facebook uses Linux, but has optimized it for its own purposes (especially in terms of
Facebook uses MySQL, but primarily as a key-value persistent storage, moving joins and
logic onto the web servers since optimizations are easier to perform there (on the
“other side” of the Memcached layer).
Then there are the custom-written systems, like Haystack, a highly scalable object store
used to serve Facebook’s immense amount of photos, or Scribe, a logging system that
can operate at the scale of Facebook (which is far from trivial).
But enough of that. Let’s present (some of) the software that Facebook uses to provide
us all with the world’s largest social network site.
For back end
FBML( developed at Facebook)
Xhp( developed at Facebook)
Technology Used By Facebook
For front –end
PHP is a server-side scripting language designed for web development but also used as a general-purpose
It stands for PHP: Hypertext Preprocessor
C++ is a programming language that is general purpose, statically typed, free-form,
multi-paradigm and compiled.
Java is a computer programming language that is concurrent, class-based, object-oriented, and specifically
designed to have as few implementation dependencies as possible. It is intended to let application developers
"write once, run anywhere" (WORA), meaning that code that runs on one platform does not need to be
recompiled to run on another.
Python is an interpreted, object-oriented, high-level programming language
with dynamic semantics. Its high-level built in data structures, combined
with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well
as for use as a scripting or glue language to connect existing components together.
FBML is a software environment provided by the social
networking service Facebook for third-party developers
to create their own applications and services that access
data in Facebook
Erlang is a general-purpose concurrent, garbage-collected
Programming language and runtime system. It was designed
by Ericsson to support distributed, fault-tolerant, soft-real-time,
non-stop applications. It supports hot swapping, so that code
can be changed without stopping a system.
XHP is an augmentation of PHP developed at Facebook to allow XML syntax for the
purpose of creating custom and reusable HTML elements.
of interrelated web development techniques used on the client-side
to create asynchronous web applications. With Ajax, web applications
can send data to, and retrieve data from, a server asynchronously (in
the background) without interfering with the display and behavior of
the existing page. Data can be retrieved using the XMLHttpRequest
object. Despite the name, the use of XML is not required (JSON is
often used instead.), and the requests do not need to be
part of web browsers, implementations allow client-side scripts to
interact with the user, control the browser, communicate
asynchronously, and alter the document content that is displayed It
has also become common in server-side programming, game
development and the creation of desktop applications.
designed to simplify the client-side scripting of HTML. It was released
in January 2006 at BarCamp NYC by John Resig. It is currently
developed by a team of developers led by Dave Methvin. Used by
over 65% of the 10,000 most visited websites, jQuery is the most
that uses human-readable text to transmit data objects
consisting of attribute–value pairs. It is used primarily to transmit
data between a server and web application, as an alternative to
Extensible Markup Language (XML) is a markup language that defines
a set of rules for encoding documents in a format that is both human-
readable and machine-readable. It is defined in the XML 1.0
Specification produced by the W3C, and several other related
specifications, all free open standards.
MySQL is (as of July 2013) the world's second most widely used open-source
relational database management system (RDBMS).It is named after co-founder
Michael Widenius's daughter, My.The SQL phrase stands for Structured Query
Memcached is by now one of the most famous pieces of software on the internet. It’s a
distributed memory caching system which Facebook (and a ton of other sites) use as a
caching layer between the web servers and MySQL servers (since database access is
relatively slow). Through the years, Facebook has made a ton of optimizations to
Memcached and the surrounding software (like optimizing the network stack).Facebook runs
thousands of Memcached servers with tens of terabytes of cached data at any one point in time.
It is likely the world’s largest Memcached installation.
Haystack is Facebook’s high-performance photo storage/retrieval system (strictly speaking,
Haystack is an object store, so it doesn’t necessarily have to store photos). It has a ton of work to
do. There are more than 20 billion uploaded photos on Facebook, and each one is saved in four
different resolutions, resulting in more than 80 billion photos.
And it’s not just about being able to handle billions of photos, performance is critical. As we
mentioned previously, Facebook serves around 1.2 million photos per second, a number which
doesn’t include images served by Facebook’s CDN. That’s a staggering number.
Cassandra is a distributed storage system with no single point of failure. It’s one
of the poster children for the NoSQL movement and has been made open source
(it’s even become an Apache project). Facebook uses it for its Inbox search.
Other than Facebook, a number of other services use it, for example Digg..
Scribe is a flexible logging system that Facebook uses for a multitude of purposes
internally. It’s been built to be able to handle logging at the scale of Facebook,
and automatically handles new logging categories as they show up (Facebook has
Presto is an open source distributed SQL query engine for running interactive
analytic queries against data sources of all sizes ranging from gigabytes to
Presto was designed and written from the ground up for interactive analytics and
approaches the speed of commercial data warehouses while scaling to the size of
organizations like Facebook.
Apache Hive is a data warehouse
infrastructure built on top of Hadoop
for providing data summarization,
query, and analysis.
Hadoop Distributed File System (HDFS)
To understand how it’s possible to scale a Hadoop®
cluster to hundreds (and even thousands) of nodes,
you have to start with the Hadoop Distributed File
System (HDFS). Data in a Hadoop cluster is broken
down into smaller pieces (called blocks) and
distributed throughout the cluster. In this way, the
map and reduce functions can be executed on
smaller subsets of your larger data sets, and this
provides the scalability that is needed for big data
BigPipe is a dynamic web page serving system that Facebook has developed.
Facebook uses it to serve each web page in sections (called “pagelets”) for
For example, the chat window is retrieved separately, the news feed is retrieved
separately, and so on. These pagelets can be retrieved in parallel, which is where
the performance gain comes in, and it also gives users a site that works even if
some part of it would be deactivated or broken
Hadoop and Hive
HadoopHadoop is an open source map-reduce implementation that makes it
possible to perform calculations on massive amounts of data. Facebook uses this
for data analysis (and as we all know, Facebook has massive amounts of data). Hive
originated from within Facebook, and makes it possible to use SQL queries against
Hadoop, making it easier for non-programmers to use.
Both Hadoop and Hive are open source (Apache projects) and are used by a
number of big services, for example Yahoo and Twitter.
Facebook uses several different languages for its different services. PHP is used for the
front-end, Erlang is used for Chat, Java and C++ are also used in several places (and perhaps
other languages as well). Thrift is an internally developed cross-language framework that
ties all of these different languages together, making it possible for them to talk to each
other. This has made it much easier for Facebook to keep up its cross-language
Facebook has made Thrift open source and support for even more languages has been
Varnish is an HTTP accelerator which can act as a load balancer and
also cache content which can then be served lightning-fast.
Facebook uses Varnish to serve photos and profile pictures,
billions of requests every day. Like almost everything Facebook uses, Varnish is
Epoll Server using Erlang
Accessed using thrift
Inverted index stored in HBase
epoll - I/O event notification facility
The Epoll event mechanism is designed to scale to larger numbers of
connections than select and poll.
HBase is an open source, non-relational, distributed database
modeled after Google's BigTable and is written in Java.
The Graph API
The Graph API presents a simple, consistent view of the Facebook
social graph, uniformly representing objects in the graph
(e.g.,people, photos, events, and pages) and the connections
between them (e.g., friend relationships, shared content, and photo
Restful API for accessing data on the Facebook graph.
Oauth 2.0 based authentication.
JSON Modeling of objects and connections.
Every object in the social graph has a unique ID. You can access the
properties of an object by requesting -
Alternatively, people and pages with usernames can be accessed
using their username as an ID. All responses are JSON objects.
Specifications - http://developers.facebook.com/docs/api
Facebook Markup Language
FBML is a variant-evolved subset of HTML with some elements
It allows Facebook Application developers to customize the "look
and feel" of their applications, to a limited extent.
It is the specification of how to encode content so that
Facebook's servers can read and publish it.
FBML plays an important role in building applications. FBML is used
to tap in to various Facebook elements when building applications.
It operates a lot like HTML and it gives the ability to do various tasks
with ease such as:
ending a user e-mail
creating a two column form
embedding flash video
creating a dashboard
posting on a wall
displaying a header…etc
Facebook’s New Messages
• The new Messages interweaves your chats, texts
and emails. It’s a central place to control all of
your private communication, both on and off
• Simply put, it can be a single inbox for all of your
messages, no matter how you choose to send
• A facebook.com Email Address
• SMS From Facebook
• Chat History
Open Source Software For mobile
xctool is a replacement for Apple's xcodebuild that makes it easier to build
and test iOS and Mac products. It's especially helpful for continuous
Rebound is a Java library that models spring dynamics. Rebound spring
models can be used to create animations that feel natural by introducing
real world physics to your application.
Buck is a build system for Android that encourages the creation of small, reusable
modules consisting of code and resources. Because Android applications are
predominantly written in Java, Buck also functions as a Java build system.
Powers the Ringmark testing framework at rng.io, as donated to the W3C
Coremob Community Group.
Facebook SDK for iOS
Use the Facebook SDK for iOS to integrate with Facebook, help build
engaging social apps, and get more installs.
Use the Facebook SDK for Android to integrate with Facebook, help build
engaging social apps, and get more installs
fishhook is a very simple library that enables dynamically rebinding
symbols in Mach-O binaries running on iOS in the simulator and on
Open Source Software For Web
React uses a declarative paradigm that makes it easier to reason about your
application. It's efficient: React computes the minimal set of changes necessary to keep your
DOM up-to-date. And it's flexible: React works with the libraries and frameworks that you
HipHop VM (HHVM) is an open-source virtual machine designed for executing
programs written in PHP. HHVM uses a just-in-time compilation approach to achieve
superior performance while maintaining the flexibility that PHP developers are accustomed to.
HipHop VM (and before it HPHPc) has realized more than a 5x increase in throughput for
Facebook compared with Zend PHP 5.2.
Huxley is a test-like system for catching visual regressions in Web applications.
It watches you browse, takes screenshots, and tells you when they change
Regenerator is a source transformer enabling ECMAScript 6 generator functions
cleaner alternative to using callbacks when writing asynchronous server-side
Use the Facebook SDK for PHP to integrate with Facebook, help build engaging social
apps, and get more users.
Some other tools are
Tornado is a Python web framework and asynchronous networking library,
originally developed at FriendFeed. By using non-blocking network I/O, Tornado
can scale to tens of thousands of open connections, making it ideal for long polling,
WebSockets, and other applications that require a long-lived connection to each user.
Open Source Software For Data
Presto is an open source distributed SQL query engine for running interactive
analytic queries against data sources of all sizes ranging from gigabytes to
Facebook's branch of the Oracle MySQL v5.6 database
Scribe is a server for aggregating streaming log data. It is designed to scale to a very large
number of nodes and be robust to network and node failures.
There is a scribe server running on every node in the system, configured to aggregate
messages and send them to a central scribe server (or servers) in larger groups. If the
central scribe server isn’t available the local scribe server writes the messages to a file on
local disk and sends them when the central server recovers. The central scribe server(s) can
write the messages to the files that are their final destination, typically on an nfs filer or a
distributed filesystem, or send them to another layer of scribe servers
Open Source Software For Infra
RocksDB is an embeddable persistent key-value store for fast storage. RocksDB
can also be the foundation for a client-server database but our current focus is
on embedded workloads.
The Open Compute Project Foundation is a rapidly growing community of
engineers around the world whose mission is to design and enable the delivery
of the most efficient server, storage and data center hardware designs for
pfff is mainly an OCaml API to write static analysis, dynamic analysis, code visualizations, code
navigations, or style-preserving source-to-source transformations such as refactorings on source
Swift is an easy-to-use, annotation-based Java library for creating Thrift serializable
types and services.
Folly is an open-source C++ library developed and used at Facebook. It is a
library of C++11 components designed with practicality and efficiency in
mind. It complements (as opposed to competing against) offerings such as
Boost and of course std. In fact, we embark on defining our own
component only when something we need is either not available, or does
not meet the needed performance profile.
FlashCache is a general purpose writeback block cache for Linux.
Some other relevant tools:
Gradual releases and dark launches
Facebook has a system, Gatekeeper that lets run
different code for different sets of users.
This lets Facebook do gradual releases of new
features, activate certain features only for Facebook
Gatekeeper also lets Facebook do something called
“dark launches”, which is to activate elements of a
certain feature behind the scenes before it goes live.
Facebook has also widgetized large portions of their
application, meaning that widgets can be written in an
appropriate language instead of simply using PHP.
These widgets interface with the other parts of the
application through the use of internal APIs.
Like many other big sites, Facebook uses a Content
delivery network (CDN) to help serve static content.
And then of course there is the huge data center
Facebook is building in Oregon to help it scale out with
even more servers.
Shocking facts about Facebook
A third of all divorce filings in 2011 contained
the word "Facebook”
Iceland used Facebook to rewrite its
Adding the number 4 to the end of Facebook’s
URL will automatically direct you to Mark
Facebook pays $500 to anyone who can hack
A couple got murdered because they de-
friended someone on Facebook
A man was ordered to apologize on Facebook
or Go to Jail Read more at
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.