Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Architecture Patterns - Open Discussion

3,272 views

Published on

Architecture Patterns - Open Discussion
- Scalable System Design
- Facebook Architecture

Published in: Software
  • Get access to 16,000 woodworking plans, Download 50 FREE Plans... ★★★ http://tinyurl.com/yy9yh8fu
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • There are over 16,000 woodworking plans that comes with step-by-step instructions and detailed photos, Click here to take a look ●●● http://tinyurl.com/y3hc8gpw
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Architecture Patterns - Open Discussion

  1. 1. TOPIC NAME Software Architecture Fundamentals Tung.Nguyen (tungnq@fsoft.com.vn) Solution Architect Jun-2014 Architecture Patterns Open Discussion -Scalable System Design -Facebook Architecture
  2. 2. Architecture Portfolios Scalable System Design Principles
  3. 3. Quality Attributes Availability Performance Reliability Scalability Manageability Cost Key Quality Attributes 3 http://www.aosabook.org/en/distsys.html
  4. 4. "Scalability" is not equivalent to "Raw Performance" Understand environmental workload conditions that the system is design for Understand who is your priority customers Scale out and Not scale up Keep your code modular and simple Don't guess the bottleneck, Measure it Plan for growth What Should We Focus On? 4 http://horicky.blogspot.com/2008/02/scalable-system-design.html
  5. 5. Techniques Server Farm/Cluster (real time access) Data Partitioning Map / Reduce (Batch Parallel Processing) Content Delivery Network (Static Cache) Cache Engine (Dynamic Cache) Resources Pool Calculate an approximate result Filtering at the source Asynchronous Processing Common Techniques 5
  6. 6. Architecture Portfolios Scalable System Design Patterns
  7. 7. Load Balancer •A dispatcher determines which worker instance will handle a request based on different policies. Scatter and Gather •A dispatcher multicasts requests to all workers in a pool. Each worker will compute a local result and send it back to the dispatcher, who will consolidate them into a single response and then send back to the client. Result Cache •A dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order tosave the actual execution. Shared Space •All workers monitors information from the shared space and contributes partial knowledge back to the blackboard. The informationis continuously enriched until a solution is reached. Pipe and Filter •All workers connected by pipes across which data flows. MapReduce •Targets batch jobs where disk I/O is the major bottleneck. It use a distributed file system so that disk I/O can be done in parallel. Bulk Synchronous Parallel •A lock-step execution across all workers, coordinated by a master. Execution Orchestrator •An intelligent scheduler / orchestrator schedules ready-to-run tasks (based on a dependency graph) across a clusters of dumb workers. 8 Commonly Used Scalable System Design Patterns 7
  8. 8. Load Balancer (1) 8 •There is a dispatcher that determines which worker instance will handle the request based on different policies. •The application should best be "stateless" so any worker instance can handle the request.
  9. 9. Load Balancer (2) •Multi-Datacenter Architecture 9 •3-Tier Architecture Rightscale: Cloud_Computing_System_Architecture_Diagrams
  10. 10. Load Balancer (3) 10 •Multi-Tier Architecture with Memcached Rightscale: Cloud_Computing_System_Architecture_ Diagrams
  11. 11. Scatter and Gather 11 •This pattern is used in Search engines like Yahoo, Google to handle user's keyword search request ... etc.
  12. 12. Result Cache 12 The dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order to save the actual execution.
  13. 13. Shared Space 13
  14. 14. Pipe and Filter 14 It is a very common EAI pattern.
  15. 15. Map Reduce 15 The model is targeting batch jobs where disk I/O is the major bottleneck. It use a distributed file system so that disk I/O can be done in parallel.
  16. 16. Bulk Synchronous Parallel 16
  17. 17. Execution Orchestrator 17
  18. 18. Architecture Portfolios Facebook Architecture -Facebook Web Site -Chat Service @ Facebook -Big Data @ Facebook
  19. 19. Facebook Web Site 19
  20. 20. •Users –More than 400 million active users. –50% of active users login each day. –Average user has 130 friends on the site. •Activity –User spends an average of 55 minsper day. –More than 60 million status updates each day. –More than 100 million photos uploaded each day. •Platform –Currently 500K active applications. –About 250 apps have more than 1 million users. –About 60 million users use FB Connect from external web sites each month. Facebook Web SiteStatistic 20
  21. 21. Challenges •High Concurrency •High Data Volumes •Multilevel Hierarchical Data Ok to Live with •Not Mission Critical •Cached data is fine •Write Failures are tolerable Facebook Web SiteTechnical Challenges 21
  22. 22. •General Design Principles –Use open source where possible –Unix philosophy •Keep individual components simple yet performant •Combine as necessary •Concentrate on clean interface points –Build everything for scale –Try to minimize failure points –Simplicity, Simplicity, Simplicity. Facebook Web SiteArchitecture 22
  23. 23. Facebook Web SiteArchitecture 23 Services Cache Database File System Presentation Layer Load Balancer Web Servers runingPHP, basically assembles data from lower layer and presents it on the page. Backend services are mainly implemented in C++(other language can be used). Very fast. Memcachedis used to cache almost everything that is needed to produce a page An array of MySQLservers are used in an interesting way to store the data Internally developed file system, Haystackused to store photos.
  24. 24. Facebook Web SiteArchitecture 24
  25. 25. Facebook Web SiteTechnology Stack 25
  26. 26. •Presentation Layer: PHP •Issues: –High CPU and memory consumption. –An Interoperability with C++ Challenging. –Language doesn’t encourage good programming in the large. –Initialization cost of each page scales with size of code base Facebook Web SiteWeb Tier at Facebook 26 70 40 38 21 6 3 2 1 0 10 20 30 40 50 60 70 80 Ruby PHP Perl Python Erlang C# Java C++ Relative Performance of Language Runtime (lower is better) Programming Language performance ranking
  27. 27. •Optimizing PHP –Op-code optimization –APC improvements •Lazy loading •Cache priming –Custom extensions •Memcacheclient extension •Serialization format •Logging, Stats Collection, Monitoring •Asynchronous event-handling mechanism Facebook Web SiteWeb Tier at Facebook 27
  28. 28. •HipHop –Source Code Transformer •Transform PHP in to highly optimized C++ and then compile it using g++ –50% reduction in CPU usage than Apache + HTTP –Facebook’s API tier can serve twice the traffic using 30% less CPU –It has embedded simple webserver on top of libevent. Facebook Web SiteWeb Tier at Facebook 28 https://github.com/facebook/hhvm
  29. 29. •Tornado: Facebook's Real-Time Web Framework for Python –Tornado is a relatively simple, non-blocking Web server framework written in Python –It is designed to handle thousands of simultaneous connections, making it ideal for real-time Web services. Facebook Web SiteWeb Tier at Facebook 29 http://www.tornadoweb.org/
  30. 30. •BigPipe: first breaks web pages into multiple chunk called pagelets Facebook Web SiteWeb Tier at Facebook 30
  31. 31. •Memcachedis an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering. •Alleviate database load •Over 25TB in-memory caching on more than 800 servers •Multi-gets used to make the system more efficient •Facebook contributes UDP support and performance enhanced to Memcached. Facebook Web SiteMemcached 31 http://memcached.org/
  32. 32. •MySQL –Fast and reliable –Thousands of MySQL servers •Users randomly distributed across these servers –Relational aspect of DB is not used •No joins. Logically difficult(Data is distributed randomly) •Primarily key-value store –Customizations •Custom partitioning scheme –Global id assigned to all data •Custom archiving scheme –Base on frequency and recencyof data on a per user basis Facebook Web SiteMySQL Database 32
  33. 33. Facebook Web SiteMemcached& MySQL at facebook 33
  34. 34. •Many services written in C++, Python, Java –AdServer –Search –Network Selector –News Feed –Blogfeeds –CSSParser –Mobile –ShareScraper Facebook Web SiteServices 34
  35. 35. •Services Philosophy –Create a service if required •Real overhead for deployment, maintenance, separate code-base •Another failure point –Create a common framework and toolset that allow for easier creation of services •Thrift •Scribe •ODS, Alerting, Monitoring service –Use the right language Facebook Web SiteServices 35
  36. 36. •How to community between these? •Thrift –http://incubator.apache.org/thrift/ –Lightweight software framework for cross-language development –Provide IDL, statically generate code –Supported bindings: C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and Ocaml Facebook Web SiteServices 36
  37. 37. Architecture Portfolios Facebook Architecture -Facebook Web Site -Chat Service @ Facebook -Big Data @ Facebook
  38. 38. •Statistic –Facebook has 200M active users –800+ million user messages / day –7+ million active channels at peak –1GB+ in / sec at peak –100+ channel machines •System challenges –How does synchronous messaging work on the Web? •"Presence" is hard to scale –Need a system to queue and deliver messages –Millions of connections, mostly idle –Need logging, at least between page loads –Make it work in Facebook’s environment Chat Service @ Facebook Statistic & Challenges 38
  39. 39. Chat Service @ Facebook System Overview 39
  40. 40. Chat Service @ Facebook System Overview •User Interface –Mix of client-side Javascriptand server-side PHP –Works around transport errors, browser differences –Regular AJAX for sending messages, fetching conversation history –Periodic AJAX polling for list of online friends –AJAX long-polling for messages (Comet) •Back End –Discrete responsibilities for each service •Communicate via Thrift –Channel (Erlang): message queuing and delivery •Queue messages in each user’s “channel” •Deliver messages as responses to long-polling HTTP requests –Presence (C++): aggregates online info in memory (pull-based presence) –Chatlogger(C++): stores conversations between page loads –Web tier (PHP): serves our vanilla web requests 40
  41. 41. Chat Service @ Facebook System Overview 41
  42. 42. Chat Service @ Facebook Message Send 42
  43. 43. Chat Service @ Facebook Channel Server 43
  44. 44. Chat Service @ Facebook Channel Server 44 •One channel per user •Web tier delivers messages for that user •Channel State: short queue of sequenced messages •Long poll for streaming (Comet) –Clients make an HTTP request –Server replies when a message is ready –One active request per browser tab
  45. 45. Architecture Portfolios Facebook Architecture -Facebook Web Site -Chat Service @ Facebook -Big Data @ Facebook
  46. 46. Big Data @ FacebookOverview 46
  47. 47. Big Data @ FacebookData Infrastructure Overview 47 Data Infrastructure @ FB built on open source technologies: •Many committers across all the projects •Plans to open source other parts of the data stack •Figure out a model to stay in sync and still work at FB speed
  48. 48. Big Data @ FacebookLife of a tag for data infrastructure 48
  49. 49. Big Data @ FacebookLife of a tag for data infrastructure 49 •Technoloy: –Log collection -Scribe –Realtimeanalyitcs-Puma –Batch analytics –Hive –Adhocanalytics -Peregrine –Periodic analytics -Nocron
  50. 50. Big Data @ FacebookWarehouse Architecture 50 4TB of compressed new data added per day. 135TB of compressed data scanned per day. 7500+ Hive Job on production cluster per day. 80K computer hours per day.
  51. 51. •Scribe –http://developers.facebook.com/scribe/ –Scalable distributed logging framework –Useful for logging a wide array of data –Simple data model –Built on top of Thrift •Hive –A system for managing and querying structure data built on top of Hadoop. •Map Reduce for execution •HDFS for storage •Metadata in an RDBMS –Key Building Principles: •SQL as a familiar data warehousing tool. •Extensibility –Type, Functions, Formats, Scripts. •Scalability and Performance •Interoperability. Big Data @ FacebookWarehouse Architecture 51
  52. 52. Big Data @ FacebookWarehouse Architecture 52 •Hive Architecture
  53. 53. Big Data @ FacebookWarehouse Architecture 53 •Data Flow Architecture
  54. 54. •Memcacheat FB - https://www.facebook.com/video/video.php?v=631826881803 •http://www.gargasz.info/facebook-discovering-software-architecture/ •http://www.infoq.com/presentations/Facebook-Software-Stack •http://www.infoq.com/presentations/Scale-at-Facebook •http://www.slideshare.net/AditiTechnologies/facebook-architecture- breaking-it-open •http://stackoverflow.com/questions/3533948/facebook-architecture •http://www.quora.com/Facebook-Engineering/What-is-Facebooks- architecture •http://www.slideshare.net/AditiTechnologies/google-architecture-breaking- it-open Open Discussion !!! 54

×