Facebook Architecture



Aditya Agarwal
Director of Engineering
11/22/2008
Agenda
 1   Architecture Overview

 2   PHP, MySQL, Memcache

 3   Thrift, Scribe, Tools

 4   News Feed Architecture
At a Glance
   The Social Graph
   120M+ active users
   50B+ PVs per month
   10B+ Photos
   1B+ connections
   50K+ Plat...
General Design Principles
    Use open source where possible
▪

          Explore making optimizations where needed
      ...
Architecture Overview

        LAMP           +      Services
        PHP                    AdServer
                    ...
PHP

    Good web programming language
▪

         Extensive library support for web development
     ▪


         Active ...
PHP: What we Learnt
    Tough to scale for large code bases
▪

          Weak typing
      ▪

          Limited opportunit...
PHP: Customizations
    Op-code optimization
▪

    APC improvements
▪

         Lazy loading
     ▪


         Cache prim...
MySQL
    Fast, reliable
▪



    Used primarily as <key,value> store
▪

          Data randomly distributed amongst large...
MySQL: What We Learnt (ing)
    Logical migration of data is very difficult
▪




    Create a large number of logical dbs...
MySQL: What we Learnt (ing)
    Most data access is for recent data
▪

          Optimize table layout for recency
      ▪...
MySQL: Customizations
    No extensive native MySQL modifications
▪



    Custom partitioning scheme
▪

         Global i...
MySQL: Customizations
    Graph based data-access libraries
▪

         Loosely typed objects (nodes) with limited datatyp...
Memcache
    High-Performance, distributed in-memory hash table
▪

    Used to alleviate database load
▪

    Primary form...
Memache: Customizations
    Memache over UDP
▪

         Reduce memory overhead of thousands of TCP connection buffers
   ...
Let’s put this into action
Under the Covers
    Get my profile data
▪

          Fetch from cache, potentially go to my DB (based on user-id)
      ▪...
LAMP is not Perfect
LAMP is not Perfect
    PHP+MySQL+Memcache works for a large class of problems but not for
▪

    everything
         PHP ...
Services Philosophy
    Create a service iff required
▪

          Real overhead for deployment, maintenance, separate cod...
Thrift




High-Level Goal: Enable transparent interaction between these.
                                                ...
Thrift
    Lightweight software framework for cross-language development
▪

    Provide IDL, statically generate code
▪

 ...
Hasn’t this been done before?                      (yes.)


    SOAP
▪

           XML, XML, and more XML
       ▪



    ...
Thrift: Why?
    It’s quick. Really quick.
•


    Less time wasted by individual developers
•
         No duplicated netw...
Scribe
    Scalable distributed logging framework
▪

    Useful for logging a wide array of data
▪

          Search Redol...
Other Tools
    SMC (Service Management Console)
▪

         Centralized configuration
     ▪


         Used to determine...
Other Tools
    ODS
▪

         Used to log and view historical trends for any stats published by service
     ▪


       ...
Open Source
    Thrift
▪

          http://developers.facebook.com/thrift/
      ▪




    Scribe
▪
          http://devel...
NewsFeed – The Goodz
NewsFeed – The Work
                                                                                       friends’
      ...
Search – The Goodz
Search – The Work
                    Thrift


                                        search tier
                       ...
Questions?

More info at www.facebook.com/eblog


Aditya Agarwal
aditya@facebook.com
Qcon
Upcoming SlideShare
Loading in...5
×

Qcon

22,609

Published on

Published in: Technology
5 Comments
121 Likes
Statistics
Notes
  • New version http://www.mediafire.com/download/7gr3eyz6u26b89n

    Facebook hacking 2013 is easier than people think.There are different types of automated tools(both free and paid) that can do it for you, brute force crackers,sql injectors,astreak decoders etc. I personally recommend this free brute force cracker called chinaman facebook hacker. see link to it on my channel.why i recommend this? cuz its the easiest to use and guaranteed to crack your victims password(some softwares give you the wrong password) and it uses an online server to crack passwords
    hack facebook password 2013
    facebook hack password
    facebook hack
    facebook hack password
    facebook hack password free
    facebook hack attack
    facebook hack mediafire
    facebook hack password 2013 free download
    facebook hack no survey
    facebook hack password
    Download From Here http://bit.ly/mighty1
    -------------------------------------------------------------------------

    how to hack facebook
    how to hack facebook passwords
    how to hack facebook accounts
    how to hack facebook passwords for free no download
    how to hack facebook passwords without software
    how to hack facebook accounts for free no download
    how to hack facebook passwords without downloading anything
    how to hack facebook passwords for free
    how to hack facebook accounts for free
    how to hack facebook accounts without downloading
    hack facebook password
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • ODS
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • THANKS ! Good weekend.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • More Than 600 Free Downloadable Old Bucharest Pictures http://www.bucuresti.hi2.ro
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Open source is the right choice to make it. And this large scale applications are very good example to demonstrate the value, power and greatness of open source.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
22,609
On Slideshare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
1,719
Comments
5
Likes
121
Embeds 0
No embeds

No notes for slide

Qcon

  1. 1. Facebook Architecture Aditya Agarwal Director of Engineering 11/22/2008
  2. 2. Agenda 1 Architecture Overview 2 PHP, MySQL, Memcache 3 Thrift, Scribe, Tools 4 News Feed Architecture
  3. 3. At a Glance The Social Graph 120M+ active users 50B+ PVs per month 10B+ Photos 1B+ connections 50K+ Platform Apps 400K+ App Developers
  4. 4. General Design Principles Use open source where possible ▪ Explore making optimizations where needed ▪ Unix Philosophy ▪ Keep individual components simple yet performant ▪ Combine as necessary ▪ Concentrate on clean interface points ▪ Build everything for scale ▪ Try to minimize failure points ▪ Simplicity, Simplicity, Simplicity! ▪
  5. 5. Architecture Overview LAMP + Services PHP AdServer Search Memcache Network Selector News Feed MySQL Blogfeeds CSSParser php! Mobile ShareScraper !php Thrift Scribe ODS Tools
  6. 6. PHP Good web programming language ▪ Extensive library support for web development ▪ Active developer community ▪ Good for rapid iteration ▪ Dynamically typed, interpreted scripting language ▪
  7. 7. PHP: What we Learnt Tough to scale for large code bases ▪ Weak typing ▪ Limited opportunities for static analysis, code optimizations ▪ Not necessarily optimized for large website use case ▪ E.g. No dynamic reloading of files on web server ▪ Linearly increasing cost per included file ▪ Extension framework is difficult to use ▪
  8. 8. PHP: Customizations Op-code optimization ▪ APC improvements ▪ Lazy loading ▪ Cache priming ▪ More efficient locking semantics for variable cache data ▪ Custom extensions ▪ Memcache client extension ▪ Serialization format ▪ Logging, Stats collection, Monitoring ▪ Asynchronous event-handling mechanism ▪
  9. 9. MySQL Fast, reliable ▪ Used primarily as <key,value> store ▪ Data randomly distributed amongst large set of logical instances ▪ Most data access based on global id ▪ Large number of logical instances spread out across physical nodes ▪ Load balancing at physical node level ▪ No read replication ▪
  10. 10. MySQL: What We Learnt (ing) Logical migration of data is very difficult ▪ Create a large number of logical dbs, load balance them over varying ▪ number of physical nodes No joins in production ▪ Logically difficult (because data is distributed randomly) ▪ Easier to scale CPU on web tier ▪
  11. 11. MySQL: What we Learnt (ing) Most data access is for recent data ▪ Optimize table layout for recency ▪ Archive older data ▪ Don’t ever store non-static data in a central db ▪ CDB makes it easier to perform certain aggregated queries ▪ Will not scale ▪ Use services or memcache for global queries ▪ E.g.: What are the most popular groups in my network ▪
  12. 12. MySQL: Customizations No extensive native MySQL modifications ▪ Custom partitioning scheme ▪ Global id assigned to all data ▪ Custom archiving scheme ▪ Based on frequency and recency of data on a per-user basis ▪ Extended Query Engine for cross-data center replication, cache ▪ consistency
  13. 13. MySQL: Customizations Graph based data-access libraries ▪ Loosely typed objects (nodes) with limited datatypes (int, varchar, text) ▪ Replicated connections (edges) ▪ Analogous to distributed foreign keys ▪ Some data collocated ▪ Example: User profile data and all of user’s connections ▪ Most data distributed randomly ▪
  14. 14. Memcache High-Performance, distributed in-memory hash table ▪ Used to alleviate database load ▪ Primary form of caching ▪ Over 25TB of in-memory cache ▪ Average latency < 200 micro-seconds ▪ Cache serialized PHP data structures ▪ Lots and lots of multi-gets to retrieve data spanning across graph edges ▪
  15. 15. Memache: Customizations Memache over UDP ▪ Reduce memory overhead of thousands of TCP connection buffers ▪ Application-level flow control (optimization for multi-gets) ▪ On demand aggregation of per-thread stats ▪ Reduces global lock contention ▪ Multiple Kernel changes to optimize for Memcache usage ▪ Distributing network interrupt handling over multiple cores ▪ Opportunistic polling of network interface ▪
  16. 16. Let’s put this into action
  17. 17. Under the Covers Get my profile data ▪ Fetch from cache, potentially go to my DB (based on user-id) ▪ Get friend connections ▪ Cache, if not DB (based on user-id) ▪ In parallel, fetch last 10 photo album ids for each of my friends ▪ Multi-get; individual cache misses fetches data from db (based on photo- ▪ album id) Fetch data for most recent photo albums in parallel ▪ Execute page-specific rendering logic in PHP ▪ Return data, make user happy ▪
  18. 18. LAMP is not Perfect
  19. 19. LAMP is not Perfect PHP+MySQL+Memcache works for a large class of problems but not for ▪ everything PHP is stateless ▪ PHP not the fastest executing language ▪ All data is remote ▪ Reasons why services are written ▪ Store code closer to data ▪ Compiled environment is more efficient ▪ Certain functionality only present in other languages ▪
  20. 20. Services Philosophy Create a service iff required ▪ Real overhead for deployment, maintenance, separate code-base ▪ Another failure point ▪ Create a common framework and toolset that will allow for easier ▪ creation of services Thrift ▪ Scribe ▪ ODS, Alerting service, Monitoring service ▪ Use the right language, library and tool for the task ▪
  21. 21. Thrift High-Level Goal: Enable transparent interaction between these. …and some others too.
  22. 22. Thrift Lightweight software framework for cross-language development ▪ Provide IDL, statically generate code ▪ Supported bindings: C++, PHP, Python, Java, Ruby, Erlang, Perl, Haskell ▪ etc. Transports: Simple Interface to I/O ▪ Tsocket, TFileTransport, TMemoryBuffer ▪ Protocols: Serialization Format ▪ TBinaryProtocol, TJSONProtocol ▪ Servers ▪ Non-Blocking, Async, Single Threaded, Multi-threaded ▪
  23. 23. Hasn’t this been done before? (yes.) SOAP ▪ XML, XML, and more XML ▪ CORBA ▪ Bloated? Remote bindings? ▪ COM ▪ Face-Win32ClientSoftware.dll-Book ▪ Pillar ▪ Slick! But no versioning/abstraction. ▪ Protocol Buffers ▪
  24. 24. Thrift: Why? It’s quick. Really quick. • Less time wasted by individual developers • No duplicated networking and protocol code • Less time dealing with boilerplate stuff • Write your client and server in about 5 minutes • Division of labor • Work on high-performance servers separate from applications • Common toolkit • Fosters code reuse and shared tools •
  25. 25. Scribe Scalable distributed logging framework ▪ Useful for logging a wide array of data ▪ Search Redologs ▪ Powers news feed publishing ▪ A/B testing data ▪ Weak Reliability ▪ More reliable than traditional logging but not suitable for database ▪ transactions. Simple data model ▪ Built on top of Thrift ▪
  26. 26. Other Tools SMC (Service Management Console) ▪ Centralized configuration ▪ Used to determine logical service -> physical node mapping ▪
  27. 27. Other Tools ODS ▪ Used to log and view historical trends for any stats published by service ▪ Useful for service monitoring, alerting ▪
  28. 28. Open Source Thrift ▪ http://developers.facebook.com/thrift/ ▪ Scribe ▪ http://developers.facebook.com/scribe/ ▪ PHPEmbed ▪ http://developers.facebook.com/phpembed/ ▪ More good stuff ▪ http://developers.facebook.com/opensource.php ▪
  29. 29. NewsFeed – The Goodz
  30. 30. NewsFeed – The Work friends’ actions Leaf Server web tier Html Leaf Server Actions (Scribe) PHP home.php Leaf Server user Leaf Server return view state view aggregators state storage friends’ actions? aggregating... - Most arrows indicate thrift calls ranking...
  31. 31. Search – The Goodz
  32. 32. Search – The Work Thrift search tier slave slave master slave index index index index user web tier Scribe live db PHP change index logs files Indexing service DB Tier Updates
  33. 33. Questions? More info at www.facebook.com/eblog Aditya Agarwal aditya@facebook.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×