Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

TAO: Facebook's Distributed Data Store for the Social Graph


Published on

This is a presentation about Facebook's social graph storage system, from the first Papers We Love Iasi meetup, in January 2016.

Published in: Internet
  • Be the first to comment

TAO: Facebook's Distributed Data Store for the Social Graph

  1. 1. Papers We Love Iasi Kickstart
  2. 2. Presentation online
  3. 3.
  4. 4. Why PWL Iasi? ● Because it’s cool ● There is no theory-oriented community ● Discuss language/ framework-agnostic topics ● Bring together academia and industry ● Get more people (and institutions?) interested in actual Research & Development
  5. 5. 4 presentations in 2016
  6. 6. Who? Adrian-Tudor Panescu Software Engineer Adrian Burlacu PhD, Automatic Control and Applied Informatics, TUIASI Alexandru Archip PhD, Computer Engineering, TUIASI Marius Kloetzer PhD, Automatic Control and Applied Informatics, TUIASI Alexandra Adam The Human Side™
  7. 7. And you! ● We are looking for speakers/ moderators! ○ ~1 hour presentation/ open discussion on a significant paper on your favourite subject/ field ● And for sponsors ○ Mainly for ensuring that we have a venue for the meetups ● Feel free to contact me:
  8. 8.
  9. 9. Code of Conduct and other info:
  10. 10. TAO: Facebook’s Distributed Data Store for the Social Graph Bronson et al., Facebook Inc., USENIX’ 13
  11. 11. 2
  12. 12. ● Billion reads and million writes per second ● Petabyte-sized data set ● Geographically spread ● Users are unique and impatient ○ Privacy constraints must be satisfied at view time How do we store and serve all this? 3
  13. 13. Before TAO Cache Query Store 4
  14. 14. The 3 contributions ● Characterize Facebook’s workload ● Describe a proper data model ● Present an actual large-scale implementation 5
  15. 15. The Associations and Objects 6
  16. 16. The data model ● There are only 2 data types: nodes and edges ○ Labeled directed multigraph ○ You need only 2 tables in the DB ● Facebook leverages certain application characteristics: ○ They don’t need a full graph query API ○ “Most of the data is old, but many of the queries are for the newest subset” ● “Likely to be useful for any application domain that needs to efficiently generate fine-grained customized content from highly interconnected data.” 7
  17. 17. “TAO provides basic access to the nodes and edges of a constantly changing graph in data centers across multiple regions. It is optimised heavily for reads, and explicitly favours efficiency and availability over consistency.” 8
  18. 18. CAP Principle Consistency Availability Partition tolerance TAO Towards Robust Distributed Systems, Eric A. Brewer, 2000 9
  19. 19. 10
  20. 20. Summary ● 1 storage layer (MySQL) ● 2 cache layers (custom, LRU) directly implementing the graph abstraction ○ Leader (DB I/O), protects DB from thundering herds ○ Follower (Client I/O) ○ Consistency maintained via asynchronous maintenance messages ● A full copy of Facebook’s data is stored in a cluster of data centers in geographical proximity ● A region has a master and multiple slaves (per shard!) deployments ○ Writes are always forwarded first to the master 11
  21. 21. Consistency ● Remember that we are eventually consistent! ● Write request to slaves forwarded to master ○ If applied, all slaves are informed ○ Follower caches are invalidated via maintenance messages ● Propagated changesets use a version number to solve conflicts generated by stale data ● The master DB is the single source of truth ○ Requests can be marked as critical and will always be forwarded to the master DB (e.g., logins) ● Replication lag: <1s (85%), <3s (99%), <10s (99.8%) 12
  22. 22. Evaluation: request types Random sample of 6.5 million requests over 40 days 13
  23. 23. Evaluation: read latency Overall hit rate: 96.4% 14
  24. 24. Evaluation: write latency Send packet US West - Netherlands - US West: 150ms 15
  25. 25. Evaluation: hit rate vs. throughput 16
  26. 26. Related work ● Spanner: Google’s globally distributed database ● Redis: in-memory key-value store ● Dynamo, Voldemort, COPS: distributed key-value store ● BigTable, PNUTS, SimpleDB, HBase: NoSQL (NoACID) ● Pig Latin, Pregel: graph processing 17
  27. 27. Conclusions ● Paper describes a solution to a practical problem ● Data model, API and implementation for a read-intensive, eventually-consistent, geographically-distributed graph ● Simple data model, layered cache which incorporates application logic ● Interesting to see how they leverage domain knowledge to optimize the system ● Evaluation on real data from production system 18
  28. 28. Thank you!