Presention on Facebook in f Distributed systems

THE “SOCIAL MEDIA” REVOLUTION A STUDY AND ANALYSIS OF THE
PHENOMENON
Ahmad Yar
BS Computer Science
Bahauddin Zakariya University Multan (BZU), Sahiwal Campus.
Email: ahmadyark1@gmail.com
Mobile: +92303 9464551

What are Distributed Systems ?
 A distributed system is one in which hardware or software components located
at networked computers.
 A distributed system is a piece of software that ensures that :
A collection of independent computers appears to its users as a single
coherent system.
 Two aspects:
 Independent computers
 Single system
World Wide Web (WWW) is the biggest example of distributed system.

What is Facebook?
 A portal for social networking
 Interact with friends
 Share photos and/or videos
 Community organizing
 Email and instant messaging
 Various forms of interpersonal communication
 Operated and privately owned by Facebook, Inc.

Who Created Facebook?
 Mark Zuckerberg created Facebook while at Harvard University in
2004 with roommate Dustin and fellow Computer Science major Chris.
 Initially created for college students
 Then moved to include high school students
 Now open to anyone over the age of 13
Mark Zuckerberg, 23, founded Facebook while studying psychology at Harvard University.
A keen computer programmer, Mr Zuckerberg had already developed a number of social-
networking websites.
 Coursematch
 Facemash

Idea & Creation of Facebook
 Divya Narendra
 Cameron and Tyler Winklevoss
In February 2004 Mr Zuckerberg launched "The facebook", as it was originally known; the
name taken from the sheets of paper distributed to freshmen, profiling students and staff.
Within 24 hours, 1,200 Harvard students had signed up, and after one month, over half of
the undergraduate population had a profile.
The network was promptly extended to other Boston universities, the Ivy League and
eventually all US universities. It became Facebook.com in August 2005 after the address
was purchased for $200,000. US high schools could sign up from September 2005, then
it began to spread worldwide, reaching UK universities the following month.

Social Network Feb. 2008 Feb. 2009 Growth
Facebook 20,043,000 65,704,000 +228%
Growth of Facebook

Facebook Architecture
Front End
LAMP:
 Linux, Apache, MYSQL, PHP & Bigpipe
 Great Documentation
 Large Community
Why LAMP?
Easy to learn, huge community, lots of
Framework used by Facebook

• LINUX is a computer operating system kernel.
• It’s open source, very customizable, and good for security.
• Facebook runs the Linux operating system on Apache HTTP Servers.
In many ways, linux is similar to other operating systems you may have used before,
such as windows, osx, or ios.
Like other operating systems, linux has a graphical interface, and types of software
you are accustomed to using on other operating systems, such as word processing
applications, have linux equivalents. In many cases, the software’s creator may have
made a linux version of the same program you use on other systems. If you can use
a computer or other electronic device, you can use linux.
Linux & Apache

• APACHE is also free and is the most popular open source
webserver in use.
Facebook messaging system has recently added to the application, by the support of
Apache HBase which is a database like layer built on Hadoop designed to support
billions of messages per day. The application’s requirements for consistency, availability,
partition tolerance, data model and scalability.
Hbase support Facebook billion messages capacity which will be increased with minimal
overhead and no down time, with Highly write throughput, efficient and low-latency that
support the strong consistency semantics within a data center, the efficient random
reads from disks, and being highly available specially in disaster recovery, and fault
isolation ,and retaining the atomic read modify write primitives .
Linux & Apache

• PHP is a dynamically typed/interpreted scripting language.
• Facebook uses PHP because it is a good web programming
Language with extensive support and an active developer
community and it is good for rapid iteration.
The facebook sdk (system development kit) for php is a library with powerful features
that enable php developers to easily integrate facebook login and make requests to
the graph API.
It also plays well with the facebook sdk for javascript to give the front-end user the
best possible user experience. But it doesn't end there, the facebook sdk for php
makes it easy to upload photos and videos and send batch requests to the graph API
among other things.
PHP & Bigpipe

Pipelining
• Bigpipe is a dynamic web page system developed by Facebook.
The general idea is to perform pipelining of sections through the
implementation of various stages within web browsers and servers.
Browser sends an http request to web server.
Web server parses the request, pulls data from storage tier then formulates
an html document and sends it to the client in an http response.
Http response is transferred over the internet to browser.
Browser parses the response from web server, constructs a tree
representation of the html document, and downloads
css and javascript resources referenced by the document.
After downloading javascript resources,
browser parses and executes them.
BigPipe

HIP-HOP
• PHP compiler.
• Developed by Facebook
• The processing time for PHP language is slow Created to minimize server
resources.
• Converts PHP scripts into optimized C++ code.

• Back end are the application servers.
• Application servers are responsible for answering all queries and take all the writes
into the system.
• Facebook’s backend services are written in a variety of different programming
languages including C++, Java, Python, and Erlang.
Back End
• Haystack
• SCRIBE
• My SQL
• Memcache
• Cassandra
• Storing

Haystack
• Haystack is an object store that is designed for
sharing photos on Facebook where data is
written once, read often, never modiﬁed, and
rarely deleted and replaced.
• Efficient storage of billions of photos.
• Highly scalable.
• Uses extensive caching in its main memory.
The new photo infrastructure merges the photo serving tier and storage tier into one
physical tier. It implements a HTTP based photo server which stores photos in a
generic object store called Haystack. The main requirement for the new tier was to
eliminate any unnecessary metadata overhead for photo read operations, so that
each read I/O operation was only reading actual photo data (instead of file system
metadata). Haystack can be broken down into these functional layers:

SCRIBE
• Simple data model
• Scalable distributed logging framework
• Useful for logging a wide array of data
• Built on top of Thrift
• HTTP server
• Photo Store
• Haystack Object Store
• File system
• Storage

SCRIBE
Scribe is a server for aggregating log data streamed in real-time from a large
number of servers. It was designed to be scalable, extensible without client-side
modification, and robust to failure of the network or any specific machine.
Scribe is developed at facebook and released in 2008 as open source. Scribe servers are arranged in a
directed graph, with each server knowing only about the next server in the graph. This network topology
allows for adding extra layers of fan-in as a system grows, and batching messages before sending them
between datacenters, without having any code that explicitly needs to understand datacenter topology,
only a simple configuration.
Scribe is designed to consider reliability but to not require heavyweight protocols and expansive disk
usage. Scribe spools data to disk on any node to handle intermittent connectivity node failure, but doesn't
sync a log file for every message. This creates a possibility of a small amount of data loss in the event of
a crash or catastrophic hardware failure. However, this degree of reliability is often suitable for most
facebook use cases.

• Facebook utilizes MySQL because of its speed and reliability.
• Thousands of MySQL servers
• Users randomly distributed across these servers
• Relational aspect of DB is not used
• No joins. Logically difficult(Data is distributed randomly)
• Primarily key-value store
Memcache
• Protects the main database from high read demands
from users.
• Memcache is a memory caching system that is used
to speed up dynamic database driven websites (like
Facebook)
Memory Management using Memcached
My SQL

Cassandra is a database management system designed to
handle large amounts of data spread out across many
servers. It powers Facebook’s Inbox Search feature and
provides a structured key-value store with eventual
consistency.
Storing
Apache Hadoop is being used in three broad types of systems:
• as a warehouse for web analytics
• as storage for a distributed database
• and for MySQL database backups.
Cassandra
Cassandra

Fault Tolerance
Ability of a system to continue functioning in the event of a partial failure.
Though the system continues to function but overall performance may get affected.
Two main reasons for the occurrence of a fault :
1)Hardware or software failure. 2)Unauthorized Access.
Why do we need fault tolerance
Fault Tolerance is needed in order to provide 3 main feature to distributed systems.
1) Reliability-Focuses on a continuous service with out any interruptions.
2) Availability - Concerned with read readiness of the system.
3) Security-Prevents any unauthorized access.

Phases In Fault Tolerance
• Implementation of a fault tolerance technique depends on the design , configuration
and application of a distributed system.
• In general designers have suggested some general principles which have been followed.
1)Fault Detection
2)Fault Diagnosis
3)Evidence Generation
4)Assessment
5)Recovery

Fault
Detection
•Constantly monitoring the performance and comparing it with
expected outcome.
•Fault is reported if there is a deviation from expected
outcome.
Fault
Diagnosis
•Done to understand the nature of the fault and possible root
cause.
Evidence
Generation
•Report generated based on the outcome of the fault diagnosis.
Assessment •Understanding the extent of the damage caused by the faulty
component.
•Done by examining the flow of information that has passed out
from the faulty component to the rest of the system.
•A virtual Boundary is created.
Recovery Making the system fault free and restoring it to a consistent
state- Forward recovery and Backward recovery.

Fault Tolerance Techniques
Replication
• Creating multiple copies or replica of data items and storing them at different sites
• Main idea is to increase the availability so that if a node fails at one site, so data can
be accessed from a different site.
• Has its limitation too such as data consistency and degree of replica.

LIMITATIONS
Replication
• Difficult to manage as the no. replica or copies increases.
• Consistency and degree of replica is a major issue.
Check Pointing
• Lost of computation
• Check point length and check point frequency and storage is a major issue.

• A situation in which two or more persons access the same record at same time is
called Concurrency.
• Concurrency control ensures that correct results of parallel operations are generated.
Concurrency
Why concurrency control?
• Concurrency control is needed because there are a lot of things that can go wrong
• Each transaction itself can be okay, but the concurrency generates problems such as:
• The lost update problem
• The dirty read problem
• The incorrect summary problem

• Facebook has worked hard on concurrent programming. Now, Facebook is sharing its
newest debugger tool: RacerD, its new open source race detector.
• RacerD is launched by the company in 2015.
• Dedicated to identifying source code bugs.
• RacerD statically analyzes Java code to detect potential concurrency bugs. This
analysis does not attempt to prove the absence of concurrency issues, rather, it
searches for a high-confidence class of data races.
• RacerD doesn’t try to check all code for concurrency issues.
• There are two signals that RacerD looks for:
1. Explicitly annotating a class/method
2. Using a lock via the synchronized keyword.
RacerD

• Scalability is an attribute that describes the ability of a process, network, software or
organization to grow and manage increased demand. A system, business or software
that is described as scalable has an advantage because it is more adaptable to the
changing needs or demands of its users or clients.
Scalability

Facebook’s scaling challenge
Before we get into the details, here are a few factoids to give you an idea of the scaling challenge
that Facebook has to deal with:
Facebook serves 570 billion page views per month (according to Google Ad Planner).
There are more photos on Facebook than all other photo sites combined (including sites like
Flickr) More than 3 billion photos are uploaded every month.
Facebook’s systems serve 1.2 million photos per second. This doesn’t include the images
served by Facebook’s CDN. More than 25 billion pieces of content (status updates, comments,
etc) are shared every month. Facebook has more than 30,000 servers (and this number is from
last year)
1-LAMP 2-PHP 3-Linux 4-MySQL 5-Memcached
6-HIPHOP 7-HAYSTACK 8-BIGPIPE 9-CASSANDRA 10-SCRIBE
11-HADOOP & HIVE 12-THRIFT
Software that helps Facebook scale

Here’s a look at Facebook’s rapidly growing data center campuses around the
world:
Prineville, Oregon : 2.15 million square feet of data center space in Prineville by
2021.
Altoona, Iowa : 2.5 million square feet of data center space.
The campus features three data centers between 468,000 SF and 496,000 SF. In 2016
the company added a 100,000 SF cold storage facility
Clonee, Ireland : 621,000 square feet of data center space.
Forth Worth, Texas : 2.5 million square feet of data center space.

Las Lunas, New Mexico : Sept. 2016 nearly 3 million square feet of data
center space.
Papillion, Nebraska : In March 2018 2.6 million square feet of space.
New Albany, Ohio : Facebook investing $750 million in a 900,000 square foot data
center in New Albany, an Ohio town that also hosts a cloud computing data center for
Amazon Web Services.
Henrico County, Virginia : Facebook spend $750 million to build a 970,000 square
foot data center.
Newton County, Georgia : In February 2017 Facebook invest about $750 million in
the facility in Newton County, about 40 miles east of downtown Atlanta, where it build
two data centers spanning 970,000 square feet. The buildings will be fully operational
in 2020.

Openness
Openness means being open in terms of sharing information so employees
know what’s going on, and crucially, feel heard. But it also means being, and
expecting, an openness to different ways of working different styles, different
opinions, and, critically, feedback. It means openness to change.
Whether the system can be extended in various ways without troublesome
existing system and services
• Hardware extensions
• Adding peripherals, memory, communication interfaces
• Software extensions
• Operating System features
• Communication protocols

Openness is supported by:
• Public interfaces
• Standardized communication protocols
1.Be Personal:
Don’t try to be something you’re not, or someone
else. Be yourself. Just be yourself. That includes being
vulnerable, honest. If something isn’t working, or is
worrying you, share it. If you’ve struggled with
something that’s relevant and learned a lesson or
two along the way, share it. Sharing your own
perspective on an event, a trend, or a challenge
makes you more relatable and builds trust.
Share a story.“We can tag others and it is a much more elegant
way to have a conversation, versus the email
conversations that we were having a lot of times.” Stacie Sherer, SVP Corporate
Communications, Weight Watchers.
Openness key aspects

2. Internal before external:
Just about everything should be shared internally before it’s shared externally. It gives us
the opportunity to get feedback, prepare for public feedback, and to refine and practice
our broader messages before going to the public.
3. Feedback:
Root your programs in feedback and use data to support wherever possible. Often the
feedback helps you figure out what point you’re trying to make. And be clear about what
kind of feedback you want, where and how you want it shared, and what you’ve learned
or what changes you’ve made from the feedback.
Feedback also helps all people get better together. Without it, people can see the
problems and become complacent or jaded if they don’t think their opinion matters or that
their insight can make a difference.

Transparency
Concealment (Hiding) from the user and the application programmer
of the separation of the components of a distributed system
 Access Transparency - Local and remote resources are accessed in
same way
 Location Transparency - Users are unaware of the location of
resources
 Migration Transparency - Resources can migrate without name
change
 Replication(something that has been copied) Transparency -
Users are unaware of the existence of multiple copies of resources
 Failure Transparency - Users are unaware of the failure of individual
components
 Concurrency Transparency - Users are unaware of sharing
resources with others

Facebook released its latest Transparency report, where the social network
shares information on government requests for user data, noting that these
requests had increased globally by around 4 percent compared to the first half of
2017, though U.S. government-initiated requests stayed roughly the same. In
addition, the company added a new report to accompany the usual Transparency
report, focused on detailing how and why Facebook takes action on enforcing its
Community Standards, specifically in the areas of graphic violence, and sexual
activity, terrorist propaganda, hate speech, spam and fake accounts.
Including that facts this is very much a work in progress and they will likely
improve their methodology over time.
Government requests for account data increased globally by around 4%
compared to the first half of 2017, increasing from 78,890 to 82,341 requests. In
the US, government requests remained roughly even at 32,742, of which 62%
included a non-disclosure order prohibiting Facebook from notifying the user,
which is up from 57% during the first half of 2017.

During the second half of 2017, the number of pieces of content we restricted
based on local law fell from 28,036 to 14,294. Last cycle’s figures had been
increased primarily by content restrictions in Mexico related to the video of a
tragic school shooting.
There were 46 disruptions of Facebook services in 12 countries in the second
half of 2017, compared to 52 disruptions in nine countries in the first half. We
continue to be deeply concerned by internet disruptions, which prevent people
from communicating with family and friends and also threaten the growth of small
businesses.
The report also includes data covering the volume and nature of copyright,
trademark and counterfeit reports we received, as well as the amount of content
affected by those reports. During this period, on Facebook and Instagram we
took down 2,776,665 pieces of content based on 373,934 copyright reports,
222,226 pieces of content based on 61,172 trademark reports and 459,176
pieces of content based on 28,680 counterfeit reports.

Presention on Facebook in f Distributed systems

Presention on Facebook in f Distributed systems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Presention on Facebook in f Distributed systems

Similar to Presention on Facebook in f Distributed systems (20)

Recently uploaded

Recently uploaded (15)

Presention on Facebook in f Distributed systems