Social Networking at ScaleSanjeev KumarFacebook
Outline 1   What makes scaling Facebook challenging? 2   Evolution of Software Architecture 3   Evolution of Datacenter Ar...
845M users worldwide  2004   2005   2006   2009   2010           500M                      700B   30B       2.5Mdaily acti...
What makes scaling Facebook challenging?▪  Massive    scale▪  Social   Graph is central to everything on the site▪  Rapidl...
Traditional websites                        Bob’s data                                              Bob’s Beth’s data     ...
Social Graph      People are only one dimension of the social graph
Facebook: The data is interconnectedCommon operation: Query the social graph    Bob                  Beth              Eri...
Social Graph Cont’d▪  Highly   connected ▪    4.74 average degree-of-separation between users on Facebook ▪    Made denser...
Social Graph Cont’d▪  System   Implications of Social Graph ▪    Expensive to query ▪    Difficult to partition ▪    Highly...
What makes scaling Facebook challenging?▪  Massive    scale▪  Social   Graph: Querying is expensive at every level▪  Rapid...
Product Launches500M                                                                                                      ...
Rapidly evolving product▪  Facebook       is a platform ▪    External developers are innovating as well▪  One      integra...
What makes scaling Facebook challenging?▪  Massive    scale▪  Social   Graph: Querying is expensive at every level▪  Rapid...
Complex infrastructure▪  Large   number of Software components ▪    Multiple Storage systems ▪    Multiple Caching Systems...
Outline 1   What makes scaling Facebook challenging? 2   Evolution of Software Architecture 3   Evolution of Datacenter Ar...
Evolution of the Software ArchitectureEvolution of each of these 4 tiers                          Web TierCache Tier      ...
Evolution of the Software ArchitectureEvolution of Web Tier                          Web TierCache Tier                   ...
Web Tier▪  Stateless   request processing ▪    Gather Data: from storage tiers ▪    Transform: Ranking (for Relevance) and...
Generation 1: Zend Interpreter for PHP▪  Reasonably     fast (for an interpreter)▪  Rapid   development ▪    Don’t have to...
Generation 2: HipHop Compiler for PHP        C++       Java                                  Relative Execution Time      ...
Generation 3: HipHop Virtual Machine                                                                HHVM                  ...
Web Tier Facts▪  Execution   time only a small factor in user-perceived performance ▪    Can potentially use less powerful...
Evolution of the Software ArchitectureEvolution of Storage Tier                            Web TierCache Tier             ...
Evolution of a Storage Tier▪  Multiple      storage systems at Facebook ▪    MySQL ▪    HBase (NoSQL) ▪    Haystack (for B...
Generation 1: Commercial Filers▪  New     Photos Product                              NFS Storage▪  First   build it the e...
Generation 2: Gen 1 Optimized▪  Optimization     Example:                             NFS Storage Optimized ▪    Cache NFS...
Generation 3: Haystack [OSDI’10]▪  Custom         Solution                                                     Superblock ...
Generation 4: Tiered Storage▪  Usage      characteristics ▪    Fat tail of accesses: everyone has friends J ▪    A large ...
BLOB Storage Facts▪  Hot   and Warm data. Little cold data.▪  Low   CPU utilization ▪    Single digit percentages▪  Fixed ...
Evolution of the Software ArchitectureEvolution of Cache Tier                            Web TierCache Tier               ...
First few Generations: Memcache                         Web TierCache Tier: Memcache            Look-Aside Cache          ...
Memcache limitations▪  “Values”   are opaque ▪    End up moving huge amounts of data across the network▪  Storage   hierar...
Alternative Caching Tier: Tao                    Web TierCache Tier: Tao                         1. Has a data model      ...
Tao Cont’d▪  Data        Model  ▪    Objects (Nodes)  ▪    Associations (edges)  ▪    Have “type” and data▪  Simple       ...
Tao opens up possibilities▪  Alternate      storage systems ▪    Multiple storage systems      ▪    To accommodate differe...
Cache Tier Facts▪  Memcache ▪    Low CPU utilization ▪    Little use for Flash since it is bottlenecked on network▪  Tao ▪...
Evolution of the Software ArchitectureEvolution of Services Tier                         Web TierCache Tier               ...
Life before ServicesExample: Wish your friend a Happy Birthday                        Web Tier                         Ine...
A more complex service: News FeedAggregation of your friends’ activityOne of many (100s) services at Facebook
News Feed Product characteristics▪  Real-time   distribution ▪    Along edges on the Social Graph▪  Writer   can potential...
News Feed Service      User Update                   Query        [ Write ]                  [ Read ]     Service: News Fe...
Two approaches: Push vs. Pull▪  Push    approach                          ▪  Pull   approach  ▪    Distribute actions by r...
News Feed Service: Big Picture          User Update                      Query            [ Write ]                     [ ...
News Feed Service: Writes        User Update              Query          [ Write ]             [ Read ]   Service: News Fe...
News Feed Service: Reads        User Update                  Query          [ Write ]                 [ Read ]   Service: ...
News Feed Service: Scalability        User Update                  Query          [ Write ]                 [ Read ]      ...
News Feed Service: Reliability▪  Dealing      with (daily) failures ▪    Large number of failure types      ▪    Hardware/...
New Feed Service Facts▪  Number    of leafs dominate the number of aggregators ▪    Reads are more expensive than writes ▪...
Evolution of the Software ArchitectureSummary                            Web Tier HipHop Compiler & VMCache Tier Memcache ...
Outline 1   What makes scaling Facebook challenging? 2   Evolution of Software Architecture 3   Evolution of Datacenter Ar...
Recall: Characteristics of Facebook▪  Massive   Scale▪  Social   Graph ▪    Expensive to query ▪    Hard to partition ▪   ...
Implications▪  On      Datacenters ▪    Small number of massive datacenters (currently 4)▪  On      Servers ▪    Minimize ...
Servers                        Data Center Server      AMD
        Intel
Chassis   MotherboardMotherboard                 ...
Evaporative cooling system
Open Compute▪  Custom    datacenters & servers▪  Minimizes   power loss ▪    POE of 1.07▪  Vanity   Free design ▪    Desig...
Outline 1   What makes scaling Facebook challenging? 2   Evolution of Software Architecture 3   Evolution of Datacenter Ar...
(c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0
Hpca2012 facebook keynote
Hpca2012 facebook keynote
Upcoming SlideShare
Loading in...5
×

Hpca2012 facebook keynote

12,211

Published on

Published in: Technology

Hpca2012 facebook keynote

  1. 1. Social Networking at ScaleSanjeev KumarFacebook
  2. 2. Outline 1 What makes scaling Facebook challenging? 2 Evolution of Software Architecture 3 Evolution of Datacenter Architecture
  3. 3. 845M users worldwide 2004 2005 2006 2009 2010 500M 700B 30B 2.5Mdaily active users minutes spent pieces of content sites using on the site every shared each social plugins month month
  4. 4. What makes scaling Facebook challenging?▪  Massive scale▪  Social Graph is central to everything on the site▪  Rapidly evolving product▪  Complex Infrastructure
  5. 5. Traditional websites Bob’s data Bob’s Beth’s data data Bob Beth Julie’s data Sue’s data Bob Julie Sue Dan’s data Erin’s data Dan ErinHorizontally scalable
  6. 6. Social Graph People are only one dimension of the social graph
  7. 7. Facebook: The data is interconnectedCommon operation: Query the social graph Bob Beth Erin Servers
  8. 8. Social Graph Cont’d▪  Highly connected ▪  4.74 average degree-of-separation between users on Facebook ▪  Made denser by our connections to places, interests, etc.▪  Examples of Queries on Social Graph ▪  What are the most interesting updates from my connections? ▪  Who are my connections in real-life who I am not connected to on Facebook? ▪  What are the most relevant events tonight near me and related to my interests? Or that my friends are going to?
  9. 9. Social Graph Cont’d▪  System Implications of Social Graph ▪  Expensive to query ▪  Difficult to partition ▪  Highly customized for each user ▪  Large working sets (Fat tail)
  10. 10. What makes scaling Facebook challenging?▪  Massive scale▪  Social Graph: Querying is expensive at every level▪  Rapidly evolving product▪  Complex Infrastructure
  11. 11. Product Launches500M ? Questions New2011 Profile Messages 800M 2010 Groups iPad App Video Calling Music Timeline Unified Mobile Sites 2010 2010 Mobile Event </> 2010 Places Photos Update 2010 2010 Social Plugins400M Open2010 Graph 2010300M200M The Stream 2009 Translations 2008100M Sign Up Platform launch New Apps New Apps NewsFeed 2007 February 2004 2004/2005 2006 0M2004 2011
  12. 12. Rapidly evolving product▪  Facebook is a platform ▪  External developers are innovating as well▪  One integrated product ▪  Changes in one part have major implications on other parts ▪  For e.g. Timeline surfaces some of the older photos▪  System Implications ▪  Build for flexibility (avoid premature optimizations) ▪  Revisit design tradeoffs (they might have changed)
  13. 13. What makes scaling Facebook challenging?▪  Massive scale▪  Social Graph: Querying is expensive at every level▪  Rapidly evolving product▪  Complex Infrastructure
  14. 14. Complex infrastructure▪  Large number of Software components ▪  Multiple Storage systems ▪  Multiple Caching Systems ▪  100s of specialized services▪  Often deploy cutting-edge hardware ▪  At our scale, we are early adopters of new hardware▪  Failure is routine▪  Systems implications ▪  Keep things as simple as possible
  15. 15. Outline 1 What makes scaling Facebook challenging? 2 Evolution of Software Architecture 3 Evolution of Datacenter Architecture
  16. 16. Evolution of the Software ArchitectureEvolution of each of these 4 tiers Web TierCache Tier Services Tier Storage Tier
  17. 17. Evolution of the Software ArchitectureEvolution of Web Tier Web TierCache Tier Services Tier Storage Tier
  18. 18. Web Tier▪  Stateless request processing ▪  Gather Data: from storage tiers ▪  Transform: Ranking (for Relevance) and Filtering (for Privacy) ▪  Presentation: Generate HTML▪  Runs PHP code ▪  Widely used for web development ▪  Dynamically typed scripting language▪  Integrated product è One single source tree for all the entire code ▪  Same “binary” on every web tier box▪  Scalability: Efficiently process each request
  19. 19. Generation 1: Zend Interpreter for PHP▪  Reasonably fast (for an interpreter)▪  Rapid development ▪  Don’t have to recompile during testing▪  But: at scale, performance matters C++ Java Relative Execution Time C# Ocaml Ruby Python PHP Zend 0 5 10 15 20 25 30 35 40 45
  20. 20. Generation 2: HipHop Compiler for PHP C++ Java Relative Execution Time C# Ocaml Ruby Python PHP ZendPHP HipHop 0 5 10 15 20 25 30 35 40 45▪  Technically challenging, Impressive gains, Still room for improvement▪  But: takes time to compile (slows down development) ▪  Solution: HipHop interpreter ▪  But: Interpreter and compiler sometimes disagree ▪  Performance Gains are slowing. Can we improve performance further?
  21. 21. Generation 3: HipHop Virtual Machine HHVM Interpreter PHP AST Bytecode Parser Bytecode HHVM JIT Generator Optimizer▪  Best of both worlds ▪  Common path, well-specified bytecode semantics ▪  Potential performance upside from dynamic specialization▪  Work-In-Progress
  22. 22. Web Tier Facts▪  Execution time only a small factor in user-perceived performance ▪  Can potentially use less powerful processors ▪  Throughput matters more than latency (True for other tiers as well)▪  Memory management (allocation/free) is a significant remaining cost ▪  Copy-on-Write in HipHop implementation▪  Poor Instruction Cache Performance ▪  Partly due to the one massive binary▪  Web load predictable in aggregate ▪  Can use less dynamic techniques to save power ▪  Potentially even turn off machines. Failure rates is an open question?
  23. 23. Evolution of the Software ArchitectureEvolution of Storage Tier Web TierCache Tier Services Tier Storage Tier
  24. 24. Evolution of a Storage Tier▪  Multiple storage systems at Facebook ▪  MySQL ▪  HBase (NoSQL) ▪  Haystack (for BLOBS) ç▪  Case Study: BLOB storage ▪  BLOB: Binary Large Objects (Photos, Videos, Email attachments, etc.) ▪  Large files, No updates/appends, Sequential reads ▪  More than 100 petabytes ▪  250 million photos uploaded per day
  25. 25. Generation 1: Commercial Filers▪  New Photos Product NFS Storage▪  First build it the easy way ▪  Commercial Storage Tier + HTTP server ▪  Each Photo is stored as a separate file▪  Quickly up and running ▪  Reliably Store and Serve Photos▪  But: Inefficient ▪  Limited by IO rate and not storage density ▪  Average 10 IOs to serve each photo ▪  Wasted IO to traverse the directory structure
  26. 26. Generation 2: Gen 1 Optimized▪  Optimization Example: NFS Storage Optimized ▪  Cache NFS handles to reduce wasted IO directory inode •  owner info operations •  size •  timestamps▪  Reduce the number of IO operations per •  blocks photo by 3X directory data •  inode #▪  But: •  filename ▪  Still expensive: High end storage boxes file inode ▪  Still inefficient: Still IO bound and wasting IOs •  owner info •  size •  timestamps •  blocks data
  27. 27. Generation 3: Haystack [OSDI’10]▪  Custom Solution Superblock ▪  Commodity Storage Hardware Magic No Needle 1 ▪  Optimized for 1 IO operation per request Key ▪  File system on top of a file system Flags ▪  Compact Index in memory Needle 2 ▪  Metadata and data laid out contiguously Photo▪  Efficient from IO perspective Checksum▪  But: Needle 3 ▪  Problem has changed now Single Disk IO to read/write a photo
  28. 28. Generation 4: Tiered Storage▪  Usage characteristics ▪  Fat tail of accesses: everyone has friends J ▪  A large fraction of the tier is no longer IO limited (new) ▪  Storing efficiency matters much more than serving efficiency▪  Approach: Tiered Storage ▪  Last layer optimized for storage efficiency and durability ▪  Fronted by caching tier optimized for serving efficiency▪  Working-In-Progress
  29. 29. BLOB Storage Facts▪  Hot and Warm data. Little cold data.▪  Low CPU utilization ▪  Single digit percentages▪  Fixed memory need ▪  Enough for the index ▪  Little use for anything more▪  Next generation will use denser storage systems ▪  Do we even bother with hardware raid? ▪  Details to be publicly released soon
  30. 30. Evolution of the Software ArchitectureEvolution of Cache Tier Web TierCache Tier Services Tier Storage Tier
  31. 31. First few Generations: Memcache Web TierCache Tier: Memcache Look-Aside Cache Key-Value Store Does one thing very well Does little else Improved performance by 10X Storage Tier
  32. 32. Memcache limitations▪  “Values” are opaque ▪  End up moving huge amounts of data across the network▪  Storage hierarchy exposed to web tier ▪  Harder to explore alternative storage solutions ▪  Harder to keep consistent ▪  Harder to protect the storage tier from thundering herds
  33. 33. Alternative Caching Tier: Tao Web TierCache Tier: Tao 1. Has a data model 2. Write-Through Cache 3. Abstracts the storage tier Storage Tier
  34. 34. Tao Cont’d▪  Data Model ▪  Objects (Nodes) ▪  Associations (edges) ▪  Have “type” and data▪  Simple graph operations on them ▪  Efficient: Content-aware ▪  Can be performed on the caching tier▪  In production for a couple of years ▪  Serving a big portion of data accesses
  35. 35. Tao opens up possibilities▪  Alternate storage systems ▪  Multiple storage systems ▪  To accommodate different use case (access patterns)▪  Even more powerful Graph operations▪  Multi-Tiered caching
  36. 36. Cache Tier Facts▪  Memcache ▪  Low CPU utilization ▪  Little use for Flash since it is bottlenecked on network▪  Tao ▪  Much higher CPU load ▪  Will continue to increase as it supports more complex operations ▪  Could use Flash in a multi-tiered cache hierarchy
  37. 37. Evolution of the Software ArchitectureEvolution of Services Tier Web TierCache Tier Services Tier Storage Tier
  38. 38. Life before ServicesExample: Wish your friend a Happy Birthday Web Tier Inefficient and MessyCache Tier •  Potentially access hundreds of machines •  Solution: Nightly cron jobs •  Issues with corner cases What about more complex problems? Solution: Build Specialized Services Storage Tier
  39. 39. A more complex service: News FeedAggregation of your friends’ activityOne of many (100s) services at Facebook
  40. 40. News Feed Product characteristics▪  Real-time distribution ▪  Along edges on the Social Graph▪  Writer can potentially broadcast to very large audience▪  Reader wants different & dynamic ways to filter data ▪  Average user has 1000s of stories per day from friends/pages ▪  Friend list, Recency, Aggregation, Ranking, etc.
  41. 41. News Feed Service User Update Query [ Write ] [ Read ] Service: News Feed▪  Build and maintain an index: Distributed▪  Rank: Multiple ranking algorithms
  42. 42. Two approaches: Push vs. Pull▪  Push approach ▪  Pull approach ▪  Distribute actions by reader ▪  Distribute actions by writer ▪  Write broadcasts, read one location ▪  Write one location, read gathers▪  Pull model is preferred because ▪  More dynamic: Easier to iterate ▪  “In a social graph, the number of incoming edges is much smaller than the outgoing ones.” 9,000,000 621
  43. 43. News Feed Service: Big Picture User Update Query [ Write ] [ Read ] Service: News Feed Aggregators Leafs▪  Pull Model ▪  Leafs: One copy of the entire index. Stored in memory (Soft state) ▪  Aggregators: Aggregate results on the read path (Stateless)
  44. 44. News Feed Service: Writes User Update Query [ Write ] [ Read ] Service: News Feed Aggregators Leafs▪  On User update (Write) ▪  Index sharded by Writer ▪  Need to update one leaf
  45. 45. News Feed Service: Reads User Update Query [ Write ] [ Read ] Service: News Feed Aggregators Leafs▪  On Query (Read) ▪  Query all leafs ▪  Then do aggregation/ranking
  46. 46. News Feed Service: Scalability User Update Query [ Write ] [ Read ] Service: News Feed Aggregators Leafs▪  1000s of machines ▪  Leafs: Multiple sets. Each set (10s of machines) has the entire index ▪  Aggregators: Stateless. Scale with load.
  47. 47. News Feed Service: Reliability▪  Dealing with (daily) failures ▪  Large number of failure types ▪  Hardware/software ▪  Servers/Networks ▪  Intermittent/Permanent ▪  Local/Global▪  Keep the software architecture simple ▪  Stateless components are a plus▪  For example, on read requests: ▪  If a leaf is inaccessible, failover the request to a different set ▪  If an aggregator is inaccessible, just pick another
  48. 48. New Feed Service Facts▪  Number of leafs dominate the number of aggregators ▪  Reads are more expensive than writes ▪  Every read (query) involves one aggregator and every leaf in the set▪  Very high network load between aggregator and leafs ▪  Important to keep a full leaf set within a single rack on machines ▪  Uses Flash on leafs to ensure this
  49. 49. Evolution of the Software ArchitectureSummary Web Tier HipHop Compiler & VMCache Tier Memcache & Tao New Feed Services Tier Storage Tier BLOB Storage
  50. 50. Outline 1 What makes scaling Facebook challenging? 2 Evolution of Software Architecture 3 Evolution of Datacenter Architecture
  51. 51. Recall: Characteristics of Facebook▪  Massive Scale▪  Social Graph ▪  Expensive to query ▪  Hard to partition ▪  Large working set (Fat tail)▪  Product is rapidly evolving▪  Hardware failures are routine
  52. 52. Implications▪  On Datacenters ▪  Small number of massive datacenters (currently 4)▪  On Servers ▪  Minimize the “classes” (single digit) of machines deployed ▪  Web Tier, Cache Tier, Storage Tier, and a couple of special configurations▪  Started with ▪  Leased datacenters + Standard server configurations from vendors▪  Moving to ▪  Custom built datacenters + custom servers ▪  Continue to rely on a small number of machine “classes”
  53. 53. Servers Data Center Server AMD
 Intel
Chassis MotherboardMotherboard Electrical MechanicalPower
 Battery Triplet Supply Cabinet Rack
  54. 54. Evaporative cooling system
  55. 55. Open Compute▪  Custom datacenters & servers▪  Minimizes power loss ▪  POE of 1.07▪  Vanity Free design ▪  Designed for ease of operations▪  Designs are open-sourced ▪  More on the way
  56. 56. Outline 1 What makes scaling Facebook challenging? 2 Evolution of Software Architecture 3 Evolution of Datacenter Architecture Questions?
  57. 57. (c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×