SlideShare a Scribd company logo
1 of 34
Download to read offline
Facebook Architecture



Aditya Agarwal
Director of Engineering
11/22/2008
Agenda
 1   Architecture Overview

 2   PHP, MySQL, Memcache

 3   Thrift, Scribe, Tools

 4   News Feed Architecture
At a Glance
   The Social Graph
   120M+ active users
   50B+ PVs per month
   10B+ Photos
   1B+ connections
   50K+ Platform Apps
   400K+ App Developers
General Design Principles
    Use open source where possible
▪

          Explore making optimizations where needed
      ▪


    Unix Philosophy
▪

          Keep individual components simple yet performant
      ▪


          Combine as necessary
      ▪


          Concentrate on clean interface points
      ▪


    Build everything for scale
▪

    Try to minimize failure points
▪


    Simplicity, Simplicity, Simplicity!
▪
Architecture Overview

        LAMP           +      Services
        PHP                    AdServer
                               Search
        Memcache               Network Selector
                               News Feed
        MySQL                  Blogfeeds
                               CSSParser
              php!             Mobile
                               ShareScraper


                                     !php
                     Thrift
                     Scribe
                     ODS
                     Tools
PHP

    Good web programming language
▪

         Extensive library support for web development
     ▪


         Active developer community
     ▪




    Good for rapid iteration
▪

         Dynamically typed, interpreted scripting language
     ▪
PHP: What we Learnt
    Tough to scale for large code bases
▪

          Weak typing
      ▪

          Limited opportunities for static analysis, code optimizations
      ▪




    Not necessarily optimized for large website use case
▪

          E.g. No dynamic reloading of files on web server
      ▪




    Linearly increasing cost per included file
▪



    Extension framework is difficult to use
▪
PHP: Customizations
    Op-code optimization
▪

    APC improvements
▪

         Lazy loading
     ▪


         Cache priming
     ▪


         More efficient locking semantics for variable cache data
     ▪


    Custom extensions
▪

         Memcache client extension
     ▪


         Serialization format
     ▪


         Logging, Stats collection, Monitoring
     ▪


         Asynchronous event-handling mechanism
     ▪
MySQL
    Fast, reliable
▪



    Used primarily as <key,value> store
▪

          Data randomly distributed amongst large set of logical instances
      ▪

          Most data access based on global id
      ▪




    Large number of logical instances spread out across physical nodes
▪

          Load balancing at physical node level
      ▪




    No read replication
▪
MySQL: What We Learnt (ing)
    Logical migration of data is very difficult
▪




    Create a large number of logical dbs, load balance them over varying
▪

    number of physical nodes


    No joins in production
▪

          Logically difficult (because data is distributed randomly)
      ▪




    Easier to scale CPU on web tier
▪
MySQL: What we Learnt (ing)
    Most data access is for recent data
▪

          Optimize table layout for recency
      ▪

          Archive older data
      ▪




    Don’t ever store non-static data in a central db
▪

          CDB makes it easier to perform certain aggregated queries
      ▪

          Will not scale
      ▪




    Use services or memcache for global queries
▪

          E.g.: What are the most popular groups in my network
      ▪
MySQL: Customizations
    No extensive native MySQL modifications
▪



    Custom partitioning scheme
▪

         Global id assigned to all data
     ▪




    Custom archiving scheme
▪

         Based on frequency and recency of data on a per-user basis
     ▪




    Extended Query Engine for cross-data center replication, cache
▪
    consistency
MySQL: Customizations
    Graph based data-access libraries
▪

         Loosely typed objects (nodes) with limited datatypes (int, varchar, text)
     ▪


         Replicated connections (edges)
     ▪


         Analogous to distributed foreign keys
     ▪




    Some data collocated
▪

         Example: User profile data and all of user’s connections
     ▪




    Most data distributed randomly
▪
Memcache
    High-Performance, distributed in-memory hash table
▪

    Used to alleviate database load
▪

    Primary form of caching
▪

    Over 25TB of in-memory cache
▪


    Average latency < 200 micro-seconds
▪

    Cache serialized PHP data structures
▪


    Lots and lots of multi-gets to retrieve data spanning across graph edges
▪
Memache: Customizations
    Memache over UDP
▪

         Reduce memory overhead of thousands of TCP connection buffers
     ▪


         Application-level flow control (optimization for multi-gets)
     ▪




    On demand aggregation of per-thread stats
▪

         Reduces global lock contention
     ▪




    Multiple Kernel changes to optimize for Memcache usage
▪

         Distributing network interrupt handling over multiple cores
     ▪


         Opportunistic polling of network interface
     ▪
Let’s put this into action
Under the Covers
    Get my profile data
▪

          Fetch from cache, potentially go to my DB (based on user-id)
      ▪


    Get friend connections
▪

          Cache, if not DB (based on user-id)
      ▪


    In parallel, fetch last 10 photo album ids for each of my friends
▪

          Multi-get; individual cache misses fetches data from db (based on photo-
      ▪

          album id)

    Fetch data for most recent photo albums in parallel
▪


    Execute page-specific rendering logic in PHP
▪

    Return data, make user happy
▪
LAMP is not Perfect
LAMP is not Perfect
    PHP+MySQL+Memcache works for a large class of problems but not for
▪

    everything
         PHP is stateless
     ▪


         PHP not the fastest executing language
     ▪


         All data is remote
     ▪


    Reasons why services are written
▪

         Store code closer to data
     ▪


         Compiled environment is more efficient
     ▪


         Certain functionality only present in other languages
     ▪
Services Philosophy
    Create a service iff required
▪

          Real overhead for deployment, maintenance, separate code-base
      ▪


          Another failure point
      ▪


    Create a common framework and toolset that will allow for easier
▪

    creation of services
          Thrift
      ▪


          Scribe
      ▪


          ODS, Alerting service, Monitoring service
      ▪


    Use the right language, library and tool for the task
▪
Thrift




High-Level Goal: Enable transparent interaction between these.
                                                                 …and some others too.
Thrift
    Lightweight software framework for cross-language development
▪

    Provide IDL, statically generate code
▪

    Supported bindings: C++, PHP, Python, Java, Ruby, Erlang, Perl, Haskell
▪

    etc.
    Transports: Simple Interface to I/O
▪

         Tsocket, TFileTransport, TMemoryBuffer
     ▪


    Protocols: Serialization Format
▪

         TBinaryProtocol, TJSONProtocol
     ▪


    Servers
▪

         Non-Blocking, Async, Single Threaded, Multi-threaded
     ▪
Hasn’t this been done before?                      (yes.)


    SOAP
▪

           XML, XML, and more XML
       ▪



    CORBA
▪

           Bloated? Remote bindings?
       ▪



    COM
▪

           Face-Win32ClientSoftware.dll-Book
       ▪



    Pillar
▪

           Slick! But no versioning/abstraction.
       ▪



    Protocol Buffers
▪
Thrift: Why?
    It’s quick. Really quick.
•


    Less time wasted by individual developers
•
         No duplicated networking and protocol code
     •

         Less time dealing with boilerplate stuff
     •

         Write your client and server in about 5 minutes
     •




    Division of labor
•
         Work on high-performance servers separate from applications
     •



    Common toolkit
•
         Fosters code reuse and shared tools
     •
Scribe
    Scalable distributed logging framework
▪

    Useful for logging a wide array of data
▪

          Search Redologs
      ▪


          Powers news feed publishing
      ▪


          A/B testing data
      ▪


    Weak Reliability
▪

          More reliable than traditional logging but not suitable for database
      ▪

          transactions.

    Simple data model
▪

    Built on top of Thrift
▪
Other Tools
    SMC (Service Management Console)
▪

         Centralized configuration
     ▪


         Used to determine logical service -> physical node mapping
     ▪
Other Tools
    ODS
▪

         Used to log and view historical trends for any stats published by service
     ▪


         Useful for service monitoring, alerting
     ▪
Open Source
    Thrift
▪

          http://developers.facebook.com/thrift/
      ▪




    Scribe
▪
          http://developers.facebook.com/scribe/
      ▪




    PHPEmbed
▪

          http://developers.facebook.com/phpembed/
      ▪




    More good stuff
▪

          http://developers.facebook.com/opensource.php
      ▪
NewsFeed – The Goodz
NewsFeed – The Work
                                                                                       friends’
                                                                                       actions
                                                                         Leaf Server
                                      web tier
                        Html
                                                                         Leaf Server
                                                     Actions (Scribe)
                                        PHP
                     home.php                                            Leaf Server
     user

                                                                         Leaf Server
                                          return
                                        view state



                                       view                             aggregators
                                       state
                                      storage                                             friends’
                                                                                          actions?
                                                                         aggregating...
- Most arrows indicate thrift calls                                      ranking...
Search – The Goodz
Search – The Work
                    Thrift


                                        search tier
                                         slave             slave   master     slave
                                        index             index    index    index
user
         web tier
                      Scribe     live              db
        PHP                    change            index
                                logs              files




                                           Indexing service




                                           DB Tier
               Updates
Questions?

More info at www.facebook.com/eblog


Aditya Agarwal
aditya@facebook.com

More Related Content

What's hot

Technology stack of social networks [MTS]
Technology stack of social networks [MTS]Technology stack of social networks [MTS]
Technology stack of social networks [MTS]philmaweb
 
7 Stages of Scaling Web Applications
7 Stages of Scaling Web Applications7 Stages of Scaling Web Applications
7 Stages of Scaling Web ApplicationsDavid Mitzenmacher
 
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...Robert Mederer
 
Yahoo Communities Architecture Unlikely Bedfellows
Yahoo Communities Architecture Unlikely BedfellowsYahoo Communities Architecture Unlikely Bedfellows
Yahoo Communities Architecture Unlikely BedfellowsConSanFrancisco123
 
Twitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessonsTwitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessonsAditya Rao
 
Java Web Start - How Zhara POS Works
Java Web Start - How Zhara POS WorksJava Web Start - How Zhara POS Works
Java Web Start - How Zhara POS WorksYohan Liyanage
 
Open Ap Is State Of The Market
Open Ap Is State Of The MarketOpen Ap Is State Of The Market
Open Ap Is State Of The MarketConSanFrancisco123
 
Performance metrics for a social network
Performance metrics for a social networkPerformance metrics for a social network
Performance metrics for a social networkThierry Schellenbach
 
Imagine 2014: The Devil is in the Details How to Optimize Magento Hosting to ...
Imagine 2014: The Devil is in the Details How to Optimize Magento Hosting to ...Imagine 2014: The Devil is in the Details How to Optimize Magento Hosting to ...
Imagine 2014: The Devil is in the Details How to Optimize Magento Hosting to ...George White
 
Flickr Architecture Presentation
Flickr Architecture PresentationFlickr Architecture Presentation
Flickr Architecture Presentationeraz
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesfeng1212
 
Debugging IBM Connections for the Impatient Admin - Social Connections VII
Debugging IBM Connections for the Impatient Admin - Social Connections VIIDebugging IBM Connections for the Impatient Admin - Social Connections VII
Debugging IBM Connections for the Impatient Admin - Social Connections VIIMartin Leyrer
 

What's hot (18)

Technology stack of social networks [MTS]
Technology stack of social networks [MTS]Technology stack of social networks [MTS]
Technology stack of social networks [MTS]
 
7 Stages of Scaling Web Applications
7 Stages of Scaling Web Applications7 Stages of Scaling Web Applications
7 Stages of Scaling Web Applications
 
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
 
Yahoo Communities Architecture Unlikely Bedfellows
Yahoo Communities Architecture Unlikely BedfellowsYahoo Communities Architecture Unlikely Bedfellows
Yahoo Communities Architecture Unlikely Bedfellows
 
Twitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessonsTwitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessons
 
Java Web Start - How Zhara POS Works
Java Web Start - How Zhara POS WorksJava Web Start - How Zhara POS Works
Java Web Start - How Zhara POS Works
 
Open Ap Is State Of The Market
Open Ap Is State Of The MarketOpen Ap Is State Of The Market
Open Ap Is State Of The Market
 
SPDY Talk
SPDY TalkSPDY Talk
SPDY Talk
 
Performance metrics for a social network
Performance metrics for a social networkPerformance metrics for a social network
Performance metrics for a social network
 
Imagine 2014: The Devil is in the Details How to Optimize Magento Hosting to ...
Imagine 2014: The Devil is in the Details How to Optimize Magento Hosting to ...Imagine 2014: The Devil is in the Details How to Optimize Magento Hosting to ...
Imagine 2014: The Devil is in the Details How to Optimize Magento Hosting to ...
 
Flickr Architecture Presentation
Flickr Architecture PresentationFlickr Architecture Presentation
Flickr Architecture Presentation
 
Fashiolista
FashiolistaFashiolista
Fashiolista
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
MongoDB
MongoDBMongoDB
MongoDB
 
RESTful Rails2
RESTful Rails2RESTful Rails2
RESTful Rails2
 
Debugging IBM Connections for the Impatient Admin - Social Connections VII
Debugging IBM Connections for the Impatient Admin - Social Connections VIIDebugging IBM Connections for the Impatient Admin - Social Connections VII
Debugging IBM Connections for the Impatient Admin - Social Connections VII
 
RBS in SharePoint
RBS in SharePointRBS in SharePoint
RBS in SharePoint
 
Life in the Fast Lane: Full Speed XPages!, #dd13
Life in the Fast Lane: Full Speed XPages!, #dd13Life in the Fast Lane: Full Speed XPages!, #dd13
Life in the Fast Lane: Full Speed XPages!, #dd13
 

Similar to Qcon

Facebook architecture
Facebook architectureFacebook architecture
Facebook architecturedrewz lin
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecturemysqlops
 
Facebook的架构
Facebook的架构Facebook的架构
Facebook的架构yiditushe
 
Caching and tuning fun for high scalability @ FrOSCon 2011
Caching and tuning fun for high scalability @ FrOSCon 2011Caching and tuning fun for high scalability @ FrOSCon 2011
Caching and tuning fun for high scalability @ FrOSCon 2011Wim Godden
 
Extending The My Sql Data Landscape
Extending The My Sql Data LandscapeExtending The My Sql Data Landscape
Extending The My Sql Data LandscapeRonald Bradford
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalabilityWim Godden
 
Memcached, presented to LCA2010
Memcached, presented to LCA2010Memcached, presented to LCA2010
Memcached, presented to LCA2010Mark Atwood
 
From One to a Cluster
From One to a ClusterFrom One to a Cluster
From One to a Clusterguestd34230
 
Caching and tuning fun for high scalability @ phpBenelux 2011
Caching and tuning fun for high scalability @ phpBenelux 2011Caching and tuning fun for high scalability @ phpBenelux 2011
Caching and tuning fun for high scalability @ phpBenelux 2011Wim Godden
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At CraigslistMySQLConference
 
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...confluent
 
Caching and tuning fun for high scalability @ FOSDEM 2012
Caching and tuning fun for high scalability @ FOSDEM 2012Caching and tuning fun for high scalability @ FOSDEM 2012
Caching and tuning fun for high scalability @ FOSDEM 2012Wim Godden
 
ReST Vs SOA(P) ... Yawn
ReST Vs SOA(P) ... YawnReST Vs SOA(P) ... Yawn
ReST Vs SOA(P) ... Yawnozten
 
High Performance Php My Sql Scaling Techniques
High Performance Php My Sql Scaling TechniquesHigh Performance Php My Sql Scaling Techniques
High Performance Php My Sql Scaling TechniquesZendCon
 

Similar to Qcon (20)

20080611accel
20080611accel20080611accel
20080611accel
 
20081022cca
20081022cca20081022cca
20081022cca
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecture
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecture
 
Facebook的架构
Facebook的架构Facebook的架构
Facebook的架构
 
20080528dublinpt1
20080528dublinpt120080528dublinpt1
20080528dublinpt1
 
The Web Scale
The Web ScaleThe Web Scale
The Web Scale
 
Rest Vs Soap Yawn2289
Rest Vs Soap Yawn2289Rest Vs Soap Yawn2289
Rest Vs Soap Yawn2289
 
Caching and tuning fun for high scalability @ FrOSCon 2011
Caching and tuning fun for high scalability @ FrOSCon 2011Caching and tuning fun for high scalability @ FrOSCon 2011
Caching and tuning fun for high scalability @ FrOSCon 2011
 
Extending The My Sql Data Landscape
Extending The My Sql Data LandscapeExtending The My Sql Data Landscape
Extending The My Sql Data Landscape
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
Memcached, presented to LCA2010
Memcached, presented to LCA2010Memcached, presented to LCA2010
Memcached, presented to LCA2010
 
From One to a Cluster
From One to a ClusterFrom One to a Cluster
From One to a Cluster
 
Caching and tuning fun for high scalability @ phpBenelux 2011
Caching and tuning fun for high scalability @ phpBenelux 2011Caching and tuning fun for high scalability @ phpBenelux 2011
Caching and tuning fun for high scalability @ phpBenelux 2011
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At Craigslist
 
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
 
Caching and tuning fun for high scalability @ FOSDEM 2012
Caching and tuning fun for high scalability @ FOSDEM 2012Caching and tuning fun for high scalability @ FOSDEM 2012
Caching and tuning fun for high scalability @ FOSDEM 2012
 
ReST Vs SOA(P) ... Yawn
ReST Vs SOA(P) ... YawnReST Vs SOA(P) ... Yawn
ReST Vs SOA(P) ... Yawn
 
High Performance Php My Sql Scaling Techniques
High Performance Php My Sql Scaling TechniquesHigh Performance Php My Sql Scaling Techniques
High Performance Php My Sql Scaling Techniques
 

Recently uploaded

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Qcon

  • 1.
  • 3. Agenda 1 Architecture Overview 2 PHP, MySQL, Memcache 3 Thrift, Scribe, Tools 4 News Feed Architecture
  • 4. At a Glance The Social Graph 120M+ active users 50B+ PVs per month 10B+ Photos 1B+ connections 50K+ Platform Apps 400K+ App Developers
  • 5. General Design Principles Use open source where possible ▪ Explore making optimizations where needed ▪ Unix Philosophy ▪ Keep individual components simple yet performant ▪ Combine as necessary ▪ Concentrate on clean interface points ▪ Build everything for scale ▪ Try to minimize failure points ▪ Simplicity, Simplicity, Simplicity! ▪
  • 6. Architecture Overview LAMP + Services PHP AdServer Search Memcache Network Selector News Feed MySQL Blogfeeds CSSParser php! Mobile ShareScraper !php Thrift Scribe ODS Tools
  • 7. PHP Good web programming language ▪ Extensive library support for web development ▪ Active developer community ▪ Good for rapid iteration ▪ Dynamically typed, interpreted scripting language ▪
  • 8. PHP: What we Learnt Tough to scale for large code bases ▪ Weak typing ▪ Limited opportunities for static analysis, code optimizations ▪ Not necessarily optimized for large website use case ▪ E.g. No dynamic reloading of files on web server ▪ Linearly increasing cost per included file ▪ Extension framework is difficult to use ▪
  • 9. PHP: Customizations Op-code optimization ▪ APC improvements ▪ Lazy loading ▪ Cache priming ▪ More efficient locking semantics for variable cache data ▪ Custom extensions ▪ Memcache client extension ▪ Serialization format ▪ Logging, Stats collection, Monitoring ▪ Asynchronous event-handling mechanism ▪
  • 10. MySQL Fast, reliable ▪ Used primarily as <key,value> store ▪ Data randomly distributed amongst large set of logical instances ▪ Most data access based on global id ▪ Large number of logical instances spread out across physical nodes ▪ Load balancing at physical node level ▪ No read replication ▪
  • 11. MySQL: What We Learnt (ing) Logical migration of data is very difficult ▪ Create a large number of logical dbs, load balance them over varying ▪ number of physical nodes No joins in production ▪ Logically difficult (because data is distributed randomly) ▪ Easier to scale CPU on web tier ▪
  • 12. MySQL: What we Learnt (ing) Most data access is for recent data ▪ Optimize table layout for recency ▪ Archive older data ▪ Don’t ever store non-static data in a central db ▪ CDB makes it easier to perform certain aggregated queries ▪ Will not scale ▪ Use services or memcache for global queries ▪ E.g.: What are the most popular groups in my network ▪
  • 13. MySQL: Customizations No extensive native MySQL modifications ▪ Custom partitioning scheme ▪ Global id assigned to all data ▪ Custom archiving scheme ▪ Based on frequency and recency of data on a per-user basis ▪ Extended Query Engine for cross-data center replication, cache ▪ consistency
  • 14. MySQL: Customizations Graph based data-access libraries ▪ Loosely typed objects (nodes) with limited datatypes (int, varchar, text) ▪ Replicated connections (edges) ▪ Analogous to distributed foreign keys ▪ Some data collocated ▪ Example: User profile data and all of user’s connections ▪ Most data distributed randomly ▪
  • 15. Memcache High-Performance, distributed in-memory hash table ▪ Used to alleviate database load ▪ Primary form of caching ▪ Over 25TB of in-memory cache ▪ Average latency < 200 micro-seconds ▪ Cache serialized PHP data structures ▪ Lots and lots of multi-gets to retrieve data spanning across graph edges ▪
  • 16. Memache: Customizations Memache over UDP ▪ Reduce memory overhead of thousands of TCP connection buffers ▪ Application-level flow control (optimization for multi-gets) ▪ On demand aggregation of per-thread stats ▪ Reduces global lock contention ▪ Multiple Kernel changes to optimize for Memcache usage ▪ Distributing network interrupt handling over multiple cores ▪ Opportunistic polling of network interface ▪
  • 17. Let’s put this into action
  • 18. Under the Covers Get my profile data ▪ Fetch from cache, potentially go to my DB (based on user-id) ▪ Get friend connections ▪ Cache, if not DB (based on user-id) ▪ In parallel, fetch last 10 photo album ids for each of my friends ▪ Multi-get; individual cache misses fetches data from db (based on photo- ▪ album id) Fetch data for most recent photo albums in parallel ▪ Execute page-specific rendering logic in PHP ▪ Return data, make user happy ▪
  • 19. LAMP is not Perfect
  • 20. LAMP is not Perfect PHP+MySQL+Memcache works for a large class of problems but not for ▪ everything PHP is stateless ▪ PHP not the fastest executing language ▪ All data is remote ▪ Reasons why services are written ▪ Store code closer to data ▪ Compiled environment is more efficient ▪ Certain functionality only present in other languages ▪
  • 21. Services Philosophy Create a service iff required ▪ Real overhead for deployment, maintenance, separate code-base ▪ Another failure point ▪ Create a common framework and toolset that will allow for easier ▪ creation of services Thrift ▪ Scribe ▪ ODS, Alerting service, Monitoring service ▪ Use the right language, library and tool for the task ▪
  • 22. Thrift High-Level Goal: Enable transparent interaction between these. …and some others too.
  • 23. Thrift Lightweight software framework for cross-language development ▪ Provide IDL, statically generate code ▪ Supported bindings: C++, PHP, Python, Java, Ruby, Erlang, Perl, Haskell ▪ etc. Transports: Simple Interface to I/O ▪ Tsocket, TFileTransport, TMemoryBuffer ▪ Protocols: Serialization Format ▪ TBinaryProtocol, TJSONProtocol ▪ Servers ▪ Non-Blocking, Async, Single Threaded, Multi-threaded ▪
  • 24. Hasn’t this been done before? (yes.) SOAP ▪ XML, XML, and more XML ▪ CORBA ▪ Bloated? Remote bindings? ▪ COM ▪ Face-Win32ClientSoftware.dll-Book ▪ Pillar ▪ Slick! But no versioning/abstraction. ▪ Protocol Buffers ▪
  • 25. Thrift: Why? It’s quick. Really quick. • Less time wasted by individual developers • No duplicated networking and protocol code • Less time dealing with boilerplate stuff • Write your client and server in about 5 minutes • Division of labor • Work on high-performance servers separate from applications • Common toolkit • Fosters code reuse and shared tools •
  • 26. Scribe Scalable distributed logging framework ▪ Useful for logging a wide array of data ▪ Search Redologs ▪ Powers news feed publishing ▪ A/B testing data ▪ Weak Reliability ▪ More reliable than traditional logging but not suitable for database ▪ transactions. Simple data model ▪ Built on top of Thrift ▪
  • 27. Other Tools SMC (Service Management Console) ▪ Centralized configuration ▪ Used to determine logical service -> physical node mapping ▪
  • 28. Other Tools ODS ▪ Used to log and view historical trends for any stats published by service ▪ Useful for service monitoring, alerting ▪
  • 29. Open Source Thrift ▪ http://developers.facebook.com/thrift/ ▪ Scribe ▪ http://developers.facebook.com/scribe/ ▪ PHPEmbed ▪ http://developers.facebook.com/phpembed/ ▪ More good stuff ▪ http://developers.facebook.com/opensource.php ▪
  • 31. NewsFeed – The Work friends’ actions Leaf Server web tier Html Leaf Server Actions (Scribe) PHP home.php Leaf Server user Leaf Server return view state view aggregators state storage friends’ actions? aggregating... - Most arrows indicate thrift calls ranking...
  • 32. Search – The Goodz
  • 33. Search – The Work Thrift search tier slave slave master slave index index index index user web tier Scribe live db PHP change index logs files Indexing service DB Tier Updates
  • 34. Questions? More info at www.facebook.com/eblog Aditya Agarwal aditya@facebook.com