Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Social databases - A brief overview

276 views

Published on

A brief review of Database technologies used on Social Media sites.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Social databases - A brief overview

  1. 1. SOCIAL DATA AND DBS IVAN SANCHEZ JULIO SALINAS MARLENE ROBLES
  2. 2. CONTENTS •Background •Study Cases: •Twitter: Real time search-Earlybird •Facebook: Storage. •LinkedIn: Storage (Voldemort) •Conclusion •References
  3. 3. BACKGROUND OSN ● Huge amount of data, diverse and changing over time. Likes, sharing, comments, logins, page-views, search queries. ● New approaches to manipulate it. ● Distributed Databases, NoSQL. ● How to retrieve the data (Search relevance, Recommendations, Security against abusive behavior, Newsfeed features) ● Goal: massive scaling of demand: Unstructured, Semi- Facebook Twitter LinkedIn 2.7M likes & comments/da y 500M tweets/day 300+ M.Users(2 new/s). 200 group conversation/min
  4. 4. STORING AND QUERYING AT TWITTER ● Storage: o MySQL used as key-value store. o FlockDB to Twitter Social Graph. ● Desired queries: o TrendingTopics o Breaking news o Sentiment
  5. 5. REAL TIME SEARCH AT TWITTER: EARLYBIRD
  6. 6. TAO AND THE FACEBOOK SOCIAL GRAPH
  7. 7. TAO o Architecture and Data Model:  Objects: (id) → (otype, (key ? value)∗)  Associations: (id1, atype, id2) → (time, (key ? value)∗) o MySQL to the Storage Layer. o Main challenges:  Efficience scale.  Very fast response time.  High Read Availability.
  8. 8. Professional Social Network Data Driven Features: ● Recomendation System (people you may know) ● People Search (Jobs search - candidates) ● Who view your profile? ● Events you may be
  9. 9. STORAGE - VOLDEMORT Highly Available Distrib. KV Store 10 Voldemort Clusters (+100 nodes) - 9 of BDB Layered Design All layers – single interface: -Put/Delete/Get -Flexible -Every layer->decorates next one
  10. 10. STORAGE - VOLDEMORT Voldemort provides: •High available •Low latency •Distributed Like a Distrib. Hash Table (DHT). Storage Data engine on nodes: •Compact index •Data files
  11. 11. DISTRIBUTED HASHING ALGORITHM This slide is from Roshan Sumbaly & Jay Kreps! (thanks Rosh & Jay)
  12. 12. SUMMARY Problem Solved Main Advantages EarlyBird Real time search Fast indexing, concurrence Management TAO Storing Facebook Social Graph Very fast response time. High read availability. Voldemort Simple Data Partitioning to meet scalability needs High Scalable, Seamless replication
  13. 13. CONCLUSION • The selection of the database systems depends on the needs of the applications and the primary type of information of the social network. • Many OSN have developed their own solutions to cope with the ever growing nature of big data and its challenges. • Summarizing, the main features that the data solutions should have are: • Storage huge amount of data. • Fast read and low latency. • Processing of big data (meaningful results) • Streaming and indexing are critical.
  14. 14. EXTREMELY DIFFICULT QUESTIONS 1.Why did LinkedIn needed to build their own solution Voldemort? 2.How does TAO resolve the challenges it was built for? 3.How the real time search service works at twitter?
  15. 15. REFERENCES ● Auradkar, A., Botev, C., Das, S., De Maagd, D., Feinberg, A., Ganti, P., … Zhang, J. (2012). Data Infrastructure at LinkedIn. In Data Engineering (ICDE), 2012 IEEE 28th International Conference on (pp. 1370–1381). ● N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, and H. Li, “Tao: Facebook’s distributed data store for the social graph,” in USENIX ATC, 2013. ● N. Ruflin, H. Burkhart, and S. Rizzotti, “Social-data storage-systems,” Databases Soc. Networks - DBSocial ’11, pp. 7–12, 2011. ● A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, and H. Liu, “Data warehousing and analytics infrastructure at facebook,” Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, Indianapolis, Indiana, USA, pp. 1013–1020, 2010. ● D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel, “Finding a Needle in Haystack: Facebook’s Photo Storage,” in OSDI, 2010, vol. 2010, pp. 47–60. ● M. Busch, K. Gade, B. Larson, P. Lok, S. Luckenbill, and J. Lin, “Earlybird: Real-Time Search at Twitter,” Proceedings of the 2012 IEEE 28th International Conference on Data Engineering. IEEE Computer Society, pp. 1360–1369, 2012.
  16. 16. REFERENCES (II) ● D. Borthakur, J. Gray, J. Sen Sarma, K. Muthukkaruppan, N. Spiegelberg, H. Kuang, K. Ranganathan, D. Molkov, A. Menon, S. Rash, R. Schmidt, and A. Aiyer, “Apache hadoop goes realtime at Facebook,” Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, Athens, Greece, pp. 1071–1080, 2011. ● A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, and H. Liu, “Data warehousing and analytics infrastructure at facebook,” Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, Indianapolis, Indiana, USA, pp. 1013– 1020, 2010. ● C. Chen, F. Li, B. C. Ooi, and S. Wu, “TI: an efficient indexing mechanism for real-time search on tweets,” Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, Athens, Greece, pp. 649–660, 2011. ● G. Mishne, J. Dalton, Z. Li, A. Sharma, and J. Lin, “Fast data in the era of big data: Twitter’s real- time related query suggestion architecture,” Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, New York, New York, USA, pp. 1147– 1158, 2013. ● S. Cohen and B. Kimelfeld, “A Social Network Database that Learns How to Answer Queries ∗,” 2013.
  17. 17. LINKS ● https://www.usenix.org/conference/atc13/technical- sessions/presentation/bronson ● http://www- conf.slac.stanford.edu/xldb2012/talks/xldb2012_wed_1105 _DhrubaBorthakur.pdf ● http://www.slideshare.net/linkedin/jay-kreps-on-project- voldemort-scaling-simple-storage-at-linkedin ● http://data.linkedin.com/ ● http://www.infoq.com/presentations/Project-Voldemort-at- Gilt-Groupe
  18. 18. THE END

×