CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
Facebook's Nuts and Bolts - The Technologies Behind the World's Largest Social Network
1. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
MADANAPALLE INSTITUTE OF TECHNOLOGY AND SCIENCE
(UGC-AUTONOMOUS)
A Seminar Presentation
On
FACEBOOK
[The Nuts and Bolts – Technology]
By
M.Koushik reddy
12691A0546
Under the guidance
N.Sudhakar Yadav
M.Tech
Asst.professor
3. So what's all the Hype?
What exactly is Facebook®?
• Facebook® is a “social networking website”
• Facebook® is a free service that allows you to
create an online page to connect with friends,
family, or make new friends with anyone
anywhere.
• On your Facebook® page you can share
pictures, personal information , messages,
videos , join groups and add applications.
4. Introduction
• Here are a few factoids to give you an idea of the scaling challenge that
Facebook has to deal with:
• Facebook serves 570 billion page views per month (according to Google
Ad Planner).
• There are more photos on Facebook than all other photo sites combined
(including sites like Flickr).
• More than 3 billion photos are uploaded every month.
• Facebook’s systems serve 1.2 million photos per second.
• More than 25 billion pieces of content (status updates, comments, etc) are
shared every month.
• Facebook has more than 30,000 servers (and this number is from last year!)
5. Languages:
Front End: (client side)
- Java script
Back End: (server side)
- Hack, PHP (HHVM)
- C++,Java
- Python,Erlang
- D,XHP and
- Haskell
6. •Java script: (Front End)
It is a high-level, dynamic, un typed, and
interpreted programming language
- It is supported by all modern web browsers without plug-ins
• .
Sample code:
FB.getLoginStatus(function(response)
{
if (response.status === 'connected')
{
console.log('Logged in.');
}
else
{
FB.login();
}
})
7. Back End:(Server side)
Hack:
Hack is a programming language for the Hip-hop Virtual
Machine (HHVM), created by face book as a dialect of PHP.
• It is open-source, licensed under the BSD License
• Hack allows programmers to use both dynamic typing and static typing.
• Introduced on march 20,2014
Sample code:
<?hh
echo 'Hello World';
• An important point : Unlike PHP, Hack and HTML code do not mix.
Normally you can mix PHP and HTML code together in the same file.
8. PHP VS HACK
• They are both PHP, both run on apache
• Hack tries to implement more functionality and features to PHP and helps to clean up
some of the inconsistencies
9. Back End:(Server side)
Erlang:
It is a general purpose, concurrent, garbage collected programming
language and runtime system.
• It was originally designed by Ericsson .
• It supports hot swapping, thus code can be changed without stopping a
system.
• It provides language-level features for creating and managing processes
with the aim of simplifying concurrent programming.
• All concurrency is explicit in Erlang, processes communicate
using message passing instead of shared variables, which removes the need
for explicit locks.
10. Back End:(Server side)
Continue…
Sample code:
An Erlang function that uses recursion to count to ten
-module(count_to_ten).
-export([count_to_ten/0]).
count_to_ten() -> do_count(0).
do_count(10) -> 10;
do_count(Value) -> do_count(Value + 1).
13. Back End:(Server side)
Continue…
System overview :User Interface-Chat in the browser:
• Channel (Erlang): message queuing and delivery .
Queue messages in each user’s “channel”
Deliver messages as responses to long-polling HTTP requests
• Presence (C++): aggregates online info in memory (pull-based presence)
• Chatlogger (C++): stores conversations between page loads
• Web tier (PHP): serves our vanilla web requests
14. Back End:(Server side)
Haskell:
Haskell is a standardized, general-purpose purely functional
programming language, with non-strict semantics and strong static
typing.
Sample code:
”Hello world program “
module Main where
main :: IO ()
main = putStrLn "Hello, World!"
15. Back End:(Server side)
Haskell in facebook…??
Fighting spam with Haskell:
Sigma:
One of the weapons in the fight against spam, malware, and other abuse on
Facebook is a system called Sigma.
• Its job is to proactively identify malicious actions on Facebook, such as spam,
phishing attacks, posting links to malware, etc.
• Bad content detected by Sigma is removed automatically so that it doesn't show up
in your News Feed.
• Sigma is a rule engine, which means it runs a set of rules, called policies.
• These policies make it possible for us to identify and block malicious interactions
before they affect people on Facebook.
16. Back End:(Server side)
• Continue…
Why Haskell in sigma…??
• It was replaced by the FXL(Feature
eXtraction Language) with Haskell.
Reasons for replacements:
1. Purely functional and strongly
typed.
2. Push code changes to production in
minutes.
3. Performance.
4. Support for interactive development.
17. Database
What database actually Facebook uses..?
• A billion of people are using FACEBOOK, storing every transaction for 800
million users and handling more than 60 million queries per second
• Interacting with their peer and friends through wall posts, uploading their photos,
passing information’s about events and other meaningful information .
• Facebook uses several database techniques.
Databases used in facebook:
• MySql
• HBase
• Cassandra
18. Databases
MYSQL: Facebook primarily uses MYSQL
for structured data storage such as wall posts,
user information, timeline etc.
• This data is replicated between their various data centers.
20. Database
HBase:
Is an open source, non-relational, distributed database modeled written in Java.
• It is developed as part of Apache Software Foundation's Apache Hadoop project
• Runs on top of HDFS (Hadoop Distributed File system), providing BigTable-like
capabilities for Hadoop.
• Hbase is now serving several data-driven websites, including Facebook's Messaging
Platform
21. Hbase Architecture:
• In HBase, tables are split into regions and are served by the region servers.
• Regions are vertically divided by column families into “Stores”.
• Stores are saved as files in HDFS.
22. Continue…
The Master Server -
Assigns regions to the region servers and takes the help of Apache ZooKeeper
for this task.
• Handles load balancing of the regions across region servers. It unloads the busy servers
and shifts the regions to less occupied servers.
• Is responsible for operations such as creation of tables and column families.
Regions-
Regions are nothing but tables that are split up and spread across the region
servers.
Zookeeper-
Zookeeper is an open-source project that provides services like maintaining
configuration information, naming, providing distributed synchronization, etc.
• Clients communicate with region servers via zookeeper.
23. HBase in facebook Messaging
Messaging Data:
• Small/Medium sized data—Hbase
• Search index
• Small message bodies
o Attachments and Large messages– Haystack
• Used for our exesting photo/video store
25. Continue….
Write Path in HBase:
•In Hbase, the messages are
stored in the file(Hfiles), the
messages are directly
appended in the HDFS
Read path:
•Simillarly messages can be
read directly from the Hfiles
26. Software And Techniques
The Front End:
• Linux & Apache
• Memcache
• Haystack
• Bigpipe
The Back End:
• Thrift (protocol)
• Scribe (log server)
• HipHop for PHP
27. Software And Techniques
The Front End:
Linux & Apache:
Linux is a Unix-like computer operating
system kernel.
• It’s open source, very customizable, and good for security.
• Facebook runs the Linux operating system on Apache HTTP Servers.
• Apache is also free and is the most popular open source web server in use.
28. Software And Techniques
Memcache:
• Facebook makes heavy use of
Memcached,
• A memory caching system that is
used to speed up dynamic database
driven websites by caching data and
objects in RAM to reduce reading
time.
• Having a caching system allows
Facebook to be as fast as it is at
recalling your data.
• Doesn’t have to go to the database, it
will just fetch your data from the
cache based on your user ID.
29. Software And Techniques
Faceebook-Photos-Haystack:
• The Photos application is one of Facebook’s most popular features.
• Users have uploaded over 15 billion photos which make Facebook the
biggest photo sharing website.
• For each uploaded photo, Facebook generates and stores four images of
different sizes, which translates to a total of 60 billion images and
1.5PB of storage.
• The current growth rate is 220 million new photos per week, which
translates to 25TB of additional storage consumed weekly.
30. Haystack in facebook
• Haystack is Facebook’s high-performance photo storage/retrieval system.
• A highly scalable object store used to serve Facebook’s immense amount of
photos.
• Implements a HTTP based photo server which stores photos in a generic
object store called Haystack.
31. Software And Techniques
BigPipe:
Dynamic web page serving system, Facebook has developed.
• BigPipe is a fundamental redesign of the dynamic web page serving system.
• BigPipe breaks the page generation process into several stages
• The first three stages are executed by the web server, and the last four stages are
executed by the browser.