How facebook works and function- a complete approach


Published on

Published in: Social Media, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

How facebook works and function- a complete approach

  1. 1. The Technology Behind Facebook Revealed Presented by: Prakhar Gethe ( CEO and Co-founder Team Zenith)
  2. 2. Why Facebook Is Giant  Facebook is the “social networking”.  People have been “facebooking” each other for about 7 years now, making Facebook the most used social network with over 500 million users worldwide.  50% of our active users log on to Facebook in any given day  Average user has 130 friends  People spend over 700 billion minutes per month on Facebook  There are over 900 million objects that people interact with (pages, groups, events and community pages)  Average user is connected to 80 community pages, groups and events  Average user creates 90 pieces of content each month  More than 30 billion pieces of content (web links, news stories, blog posts, notes, photo albums, etc.) shared each month.
  3. 3. Here are a few factoids to give you an idea of the scaling challenge that Facebook has to deal with:  Facebook serves 570 billion page views per month (according to Google Ad Planner).  There are more photos on Facebook than all other photo sites combined (including sites like Flickr).  More than 3 billion photos are uploaded every month.  Facebook’s systems serve 1.2 million photos per second. This doesn’t include the images served by Facebook’s CDN.  More than 25 billion pieces of content (status updates, comments, etc) are shared every month.  Facebook has more than 30,000 servers (and this number is from last year!) Scaling Challenge Of Facebook
  4. 4. Software That Helps Facebook Scale In some ways Facebook is still a LAMP site (kind of), but it has had to change and extend its operation to incorporate a lot of other elements and services, and modify the approach to existing ones. For example:  Facebook still uses PHP, but it has built a compiler for it so it can be turned into native code on its web servers, thus boosting performance.  Facebook uses Linux, but has optimized it for its own purposes (especially in terms of network throughput).  Facebook uses MySQL, but primarily as a key-value persistent storage, moving joins and logic onto the web servers since optimizations are easier to perform there (on the “other side” of the Memcached layer).  Then there are the custom-written systems, like Haystack, a highly scalable object store used to serve Facebook’s immense amount of photos, or Scribe, a logging system that can operate at the scale of Facebook (which is far from trivial).  But enough of that. Let’s present (some of) the software that Facebook uses to provide us all with the world’s largest social network site.
  5. 5. For back end  PHP  C++  Java  Python  FBML( developed at Facebook)  Erlang  Xhp( developed at Facebook) Technology Used By Facebook Database  mysql-5.6  Memcached  Haystack  Cassandra  Scribe  Preasto For front –end  Ajax  JSON  Javascript  Jquery
  6. 6. For Back-end • PHP PHP is a server-side scripting language designed for web development but also used as a general-purpose programming language. It stands for PHP: Hypertext Preprocessor • C++ C++ is a programming language that is general purpose, statically typed, free-form, multi-paradigm and compiled. • Java Java is a computer programming language that is concurrent, class-based, object-oriented, and specifically designed to have as few implementation dependencies as possible. It is intended to let application developers "write once, run anywhere" (WORA), meaning that code that runs on one platform does not need to be recompiled to run on another. • Python Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.
  7. 7.  FBML FBML is a software environment provided by the social networking service Facebook for third-party developers to create their own applications and services that access data in Facebook  Erlang Erlang is a general-purpose concurrent, garbage-collected Programming language and runtime system. It was designed by Ericsson to support distributed, fault-tolerant, soft-real-time, non-stop applications. It supports hot swapping, so that code can be changed without stopping a system.  Xhp XHP is an augmentation of PHP developed at Facebook to allow XML syntax for the purpose of creating custom and reusable HTML elements.
  8. 8. For Front-end  Ajax Ajax (an acronym for Asynchronous JavaScript and XML)[1] is a group of interrelated web development techniques used on the client-side to create asynchronous web applications. With Ajax, web applications can send data to, and retrieve data from, a server asynchronously (in the background) without interfering with the display and behavior of the existing page. Data can be retrieved using the XMLHttpRequest object. Despite the name, the use of XML is not required (JSON is often used instead.), and the requests do not need to be asynchronous.  JavaScript JavaScript (JS) is an interpreted computer programming language.As part of web browsers, implementations allow client-side scripts to interact with the user, control the browser, communicate asynchronously, and alter the document content that is displayed It has also become common in server-side programming, game development and the creation of desktop applications.
  9. 9.  jQuery jQuery is a multi-browser (cf. cross-browser) JavaScript library designed to simplify the client-side scripting of HTML. It was released in January 2006 at BarCamp NYC by John Resig. It is currently developed by a team of developers led by Dave Methvin. Used by over 65% of the 10,000 most visited websites, jQuery is the most popular JavaScript library in use today  JSON JSON or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML.  XML Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human- readable and machine-readable. It is defined in the XML 1.0 Specification[3] produced by the W3C, and several other related specifications,[4] all free open standards.[5]
  10. 10. Database Technologies  mysql-5.6 MySQL is (as of July 2013) the world's second most widely used open-source relational database management system (RDBMS).It is named after co-founder Michael Widenius's daughter, My.The SQL phrase stands for Structured Query Language.  Memcached Memcached is by now one of the most famous pieces of software on the internet. It’s a distributed memory caching system which Facebook (and a ton of other sites) use as a caching layer between the web servers and MySQL servers (since database access is relatively slow). Through the years, Facebook has made a ton of optimizations to Memcached and the surrounding software (like optimizing the network stack).Facebook runs thousands of Memcached servers with tens of terabytes of cached data at any one point in time. It is likely the world’s largest Memcached installation.  Haystack Haystack is Facebook’s high-performance photo storage/retrieval system (strictly speaking, Haystack is an object store, so it doesn’t necessarily have to store photos). It has a ton of work to do. There are more than 20 billion uploaded photos on Facebook, and each one is saved in four different resolutions, resulting in more than 80 billion photos. And it’s not just about being able to handle billions of photos, performance is critical. As we mentioned previously, Facebook serves around 1.2 million photos per second, a number which doesn’t include images served by Facebook’s CDN. That’s a staggering number.
  11. 11.  Cassandra Cassandra is a distributed storage system with no single point of failure. It’s one of the poster children for the NoSQL movement and has been made open source (it’s even become an Apache project). Facebook uses it for its Inbox search. Other than Facebook, a number of other services use it, for example Digg..  Scribe Scribe is a flexible logging system that Facebook uses for a multitude of purposes internally. It’s been built to be able to handle logging at the scale of Facebook, and automatically handles new logging categories as they show up (Facebook has hundreds).  Preasto Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.
  12. 12. Hive Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Hadoop Distributed File System (HDFS) To understand how it’s possible to scale a Hadoop® cluster to hundreds (and even thousands) of nodes, you have to start with the Hadoop Distributed File System (HDFS). Data in a Hadoop cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster. In this way, the map and reduce functions can be executed on smaller subsets of your larger data sets, and this provides the scalability that is needed for big data processing
  13. 13. Other Application  BigPipe BigPipe is a dynamic web page serving system that Facebook has developed. Facebook uses it to serve each web page in sections (called “pagelets”) for optimal performance. For example, the chat window is retrieved separately, the news feed is retrieved separately, and so on. These pagelets can be retrieved in parallel, which is where the performance gain comes in, and it also gives users a site that works even if some part of it would be deactivated or broken  Hadoop and Hive HadoopHadoop is an open source map-reduce implementation that makes it possible to perform calculations on massive amounts of data. Facebook uses this for data analysis (and as we all know, Facebook has massive amounts of data). Hive originated from within Facebook, and makes it possible to use SQL queries against Hadoop, making it easier for non-programmers to use. Both Hadoop and Hive are open source (Apache projects) and are used by a number of big services, for example Yahoo and Twitter.
  14. 14.  Thrift Facebook uses several different languages for its different services. PHP is used for the front-end, Erlang is used for Chat, Java and C++ are also used in several places (and perhaps other languages as well). Thrift is an internally developed cross-language framework that ties all of these different languages together, making it possible for them to talk to each other. This has made it much easier for Facebook to keep up its cross-language development. Facebook has made Thrift open source and support for even more languages has been added  Varnish Varnish is an HTTP accelerator which can act as a load balancer and also cache content which can then be served lightning-fast. Facebook uses Varnish to serve photos and profile pictures, billions of requests every day. Like almost everything Facebook uses, Varnish is open source.
  15. 15.  For Chat Epoll Server using Erlang Accessed using thrift  Message Search Inverted index stored in HBase  epoll epoll - I/O event notification facility The Epoll event mechanism is designed to scale to larger numbers of connections than select and poll. HBase HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java.
  16. 16. The Graph API  The Graph API presents a simple, consistent view of the Facebook social graph, uniformly representing objects in the graph (e.g.,people, photos, events, and pages) and the connections between them (e.g., friend relationships, shared content, and photo tags).  Restful API for accessing data on the Facebook graph.  Oauth 2.0 based authentication.  JSON Modeling of objects and connections.  Every object in the social graph has a unique ID. You can access the properties of an object by requesting -  Alternatively, people and pages with usernames can be accessed using their username as an ID. All responses are JSON objects.  Specifications -
  17. 17. Facebook Markup Language  FBML is a variant-evolved subset of HTML with some elements removed.  It allows Facebook Application developers to customize the "look and feel" of their applications, to a limited extent.  It is the specification of how to encode content so that Facebook's servers can read and publish it.  FBML plays an important role in building applications. FBML is used to tap in to various Facebook elements when building applications.  It operates a lot like HTML and it gives the ability to do various tasks with ease such as:  ending a user e-mail  creating a two column form  embedding flash video  creating a dashboard  posting on a wall  displaying a header…etc
  18. 18. Facebook’s New Messages • The new Messages interweaves your chats, texts and emails. It’s a central place to control all of your private communication, both on and off Facebook. • Simply put, it can be a single inbox for all of your messages, no matter how you choose to send them. • A Email Address • SMS From Facebook • Chat History
  19. 19. Open Source Software For mobile  Xctool xctool is a replacement for Apple's xcodebuild that makes it easier to build and test iOS and Mac products. It's especially helpful for continuous integration  Rebound Rebound is a Java library that models spring dynamics. Rebound spring models can be used to create animations that feel natural by introducing real world physics to your application.  Buck Buck is a build system for Android that encourages the creation of small, reusable modules consisting of code and resources. Because Android applications are predominantly written in Java, Buck also functions as a Java build system.
  20. 20.  Powers the Ringmark testing framework at, as donated to the W3C Coremob Community Group.  Facebook SDK for iOS Use the Facebook SDK for iOS to integrate with Facebook, help build engaging social apps, and get more installs.  facebook-android-sdk Use the Facebook SDK for Android to integrate with Facebook, help build engaging social apps, and get more installs  Fishhook fishhook is a very simple library that enables dynamically rebinding symbols in Mach-O binaries running on iOS in the simulator and on devices.
  21. 21. Open Source Software For Web  React React is a JavaScript library for building user interfaces. React uses a declarative paradigm that makes it easier to reason about your application. It's efficient: React computes the minimal set of changes necessary to keep your DOM up-to-date. And it's flexible: React works with the libraries and frameworks that you already know.  Hhvm HipHop VM (HHVM) is an open-source virtual machine designed for executing programs written in PHP. HHVM uses a just-in-time compilation approach to achieve superior performance while maintaining the flexibility that PHP developers are accustomed to. HipHop VM (and before it HPHPc) has realized more than a 5x increase in throughput for Facebook compared with Zend PHP 5.2.  Huxley Huxley is a test-like system for catching visual regressions in Web applications. It watches you browse, takes screenshots, and tells you when they change
  22. 22.  Regenerator Regenerator is a source transformer enabling ECMAScript 6 generator functions (yield) in JavaScript-of-today (ES5). The generator syntax provides a much cleaner alternative to using callbacks when writing asynchronous server-side code.  facebook-php-sdk Use the Facebook SDK for PHP to integrate with Facebook, help build engaging social apps, and get more users. Some other tools are  node-haste  jstransform   rebound  Tornado Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed. By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user.
  23. 23. Open Source Software For Data  Presto Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.  mysql-5.6 Facebook's branch of the Oracle MySQL v5.6 database  Scribe Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups. If the central scribe server isn’t available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers. The central scribe server(s) can write the messages to the files that are their final destination, typically on an nfs filer or a distributed filesystem, or send them to another layer of scribe servers
  24. 24. Open Source Software For Infra  Rocksdb RocksDB is an embeddable persistent key-value store for fast storage. RocksDB can also be the foundation for a client-server database but our current focus is on embedded workloads.  Opencompute The Open Compute Project Foundation is a rapidly growing community of engineers around the world whose mission is to design and enable the delivery of the most efficient server, storage and data center hardware designs for scalable computing  Pfff pfff is mainly an OCaml API to write static analysis, dynamic analysis, code visualizations, code navigations, or style-preserving source-to-source transformations such as refactorings on source code  Swift Swift is an easy-to-use, annotation-based Java library for creating Thrift serializable types and services.
  25. 25.  Folly Folly is an open-source C++ library developed and used at Facebook. It is a library of C++11 components designed with practicality and efficiency in mind. It complements (as opposed to competing against) offerings such as Boost and of course std. In fact, we embark on defining our own component only when something we need is either not available, or does not meet the needed performance profile.  Flashcache FlashCache is a general purpose writeback block cache for Linux. Some other relevant tools:  tornado  pyaib  watchman  hhvm
  26. 26. Gradual releases and dark launches  Facebook has a system, Gatekeeper that lets run different code for different sets of users.  This lets Facebook do gradual releases of new features, activate certain features only for Facebook employees, etc.  Gatekeeper also lets Facebook do something called “dark launches”, which is to activate elements of a certain feature behind the scenes before it goes live.
  27. 27.  Facebook has also widgetized large portions of their application, meaning that widgets can be written in an appropriate language instead of simply using PHP. These widgets interface with the other parts of the application through the use of internal APIs.  Like many other big sites, Facebook uses a Content delivery network (CDN) to help serve static content.  And then of course there is the huge data center Facebook is building in Oregon to help it scale out with even more servers.
  28. 28. Shocking facts about Facebook A third of all divorce filings in 2011 contained the word "Facebook” Iceland used Facebook to rewrite its constitution! Adding the number 4 to the end of Facebook’s URL will automatically direct you to Mark Zuckerberg’s wall. Facebook pays $500 to anyone who can hack into it!. A couple got murdered because they de- friended someone on Facebook A man was ordered to apologize on Facebook or Go to Jail Read more at