Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.

  1. SEMINAR PRESENTATION OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE PRESENTED BY SHARATH BASIL KURIAN GUIDED BY DR. SUDHEEP ELAYIDOM
  2. CONTENTS  Introduction  How Facebook works  Facebook’s Architecture  Front End  Back End  Conclusions 2
  3. Scalability is essential for web applications with massive numbers of users. Hundreds of thousands of users accessing continuously lead to slow response time The current architecture of Facebook is very large and consists of many technologies and thousands of servers. In this presentation Facebook’s architecture and how it handles scalability issues is going to be described. INTRODUCTION 3
  4. HOW FACEBOOK WORKS  Facebook needs to handle large amounts of information every second.  Facebook continues to be a LAMP (Linux- Apache-Memcached-PHP) website, but had to change and expand its operation to incorporate other elements and services.  Personalized and implemented systems also exist inside Facebook, such as Haystack. 4
  5. 5 Architecture of Facebook Components of Facebook
  6. FRONT-END  Facebook uses a variety of services, tools, and programming languages to make up its core infrastructure.  Main Components are LAMP PHP & BigPipe HipHop  At the front-end, the servers run a LAMP (Linux, Apache, MySQL, and PHP) stack with Memcache. 6
  7. LAMP 7 • Linux is the operating system used. • It is open source, highly customizable and good for safety. • Apache is also open source and is the most popular web serve Facebook uses Linux with the Apache HTTP server.
  8. PHP & BIGPIPE 8 WHY PHP ? ? • fast iterations • easy to use • Simple to learn • good web programming language with extensive community support. • BigPipe is a dynamic web page system developed by Facebook. • The general idea is to perform pipelining of sections through the implementation of various stages within web browsers and servers.
  9. 9 Accessing Webpage • Browser sends an HTTP request to web server. • Web server parses the request, pulls data from storage tier then formulates an HTML document and sends it to the client in an HTTP response. • HTTP response is transferred over the Internet to browser. • Browser parses the response from web server, constructs a tree representation of the HTML document, and downloads CSS and JavaScript resources referenced by the document. • After downloading resources, browser parses and executes them. • In this scenario, while the web server is processing and creating the HTML document, the browser is idle and when the browser is rendering the html page, the web server remains idle.
  10. Normal Web Page Request 10 Displays Webpage Browser Web Server Data base HTTP request Data HTML Document
  11. 11 PIPELINING • The browser sends an HTTP request to web server. • Server quickly renders a page skeleton containing the tags and a body. The HTTP connection to the browser stays open as the page is not yet finished. • Browser will start rendering the page. • The PHP server process is still executing and building the page lets. Once a page let has been completed it's results are sent to the browser . • Browser injects the html code for the page let received into the correct place. If the page let needs any CSS resources those are also downloaded. • After all page lets have been received the browser starts to load all external files needed by those page lets asynchronously. This results in a system where the page is rendered progressively. The initial page content becomes visible much earlier, which dramatically improves user perceived latency of the page.
  12. HIP-HOP 12 • PHP compiler. • Developed by Facebook • The processing time for PHP language is slow • Created to minimize server resources. • Converts PHP scripts into optimized C++ code. • Some key features of HipHop are: Easy to implement extensions Reduces CPU and Memory usage significantly
  13. 13
  14. BACK-END 14 • At the heart of the application back end are the application servers. • Application servers are responsible for answering all queries and take all the writes into the system. • They also interact with a number of services to achieve this • Employ sharding and caching to process information
  15. HAYSTACK 15 • Haystack is an object store that is designed for sharing photos on Facebook where data is written once, read often, never modified, and rarely deleted and replaced. • traditional file systems perform poorly under huge work- load. • High throughput and low latency. • Fault-tolerant • Efficient storage of billions of photos. • Highly scalable. • Uses extensive caching in its main memory.
  16. Working of Haystack 16 Structure of Haystack Traditional methods store files in directories into several meta files. . No of Disk reads required for one photo becomes high • Translate filename to i-node • Read i-node • Read file Haystack • Stores several meta data into a large file • Stored in main memory • Data Replicated on physical volumes helps against hard disk failures etc
  17. SCRIBE 17 • Scalable distributed logging framework • Useful for logging a wide array of data • Simple data model • Built on top of Thrift
  18. THRIFT 18 • Lightweight software framework for cross- language development • Very quick • • Swaps information between various applications developed in different languages. • Thrift protocol offers serialization between several languages. • Eg: PHP is used for the front-end, Erlang is used for Chat
  19. MYSQL 19 • Fast and reliable • Thousands of MySQL servers • • Users randomly distributed across these servers • Relational aspect of DB is not used • No joins. Logically difficult(Data is distributed randomly) • • Primarily key-value store
  20. MEMCACHED 20 • System uses client-server architecture. • Key-value memory storage system for arbitrary pieces of information that result either from database research or page rendering. • Server maintains a key-value vector association • Client populates this vector and performs research on it. • The server then stores these values in memory therefore decreases reading time. • If a server consumes the available memory, it removes older values
  21. Memory Management using Memcached 21 Memcached based Architecture Protects the main database from high read demands from users
  22. HADOOP & HIVE 22 • Hadoop is ideal for scalable systems with an enormous need to store and process large amounts of information. • Scales horizontally. • Instead of replacing existing storage systems, Hadoop complements them. • This process allows existing systems to serve data in real time or provide transactional interactive business intelligence.
  23. Storing 23 • Apache Hadoop is being used in three broad types of systems: • as a warehouse for web analytics • as storage for a distributed database • and for MySQL database backups.
  24. CONCLUSIONS 24 • Facebook has gained a tremendous amount of popularity in the last years and has become the most successful social networking website ever created. • Facebook realizes the power of social networking and is constantly innovating to keep their service the best in the business.
  25. REFERENCES 25 • Research.fb.com • https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Beaver.pdf • https://www.fb.com/barrigas.pdf
  26. QUERIES ?? 26
  27. THANK YOU ! 27
Advertisement