Your SlideShare is downloading. ×
Lessons from Highly Scalable Architectures at Social Networking Sites
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Lessons from Highly Scalable Architectures at Social Networking Sites

5,096
views

Published on

What are the techniques and technolgies used by popular social networking sites such as Facebook, Twitter, Tumblr, Pinterest or Instagram? How do they architect their systems to scale to multiples of …

What are the techniques and technolgies used by popular social networking sites such as Facebook, Twitter, Tumblr, Pinterest or Instagram? How do they architect their systems to scale to multiples of 100 million of visits per day?

Published in: Technology

0 Comments
21 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,096
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
124
Comments
0
Likes
21
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Software Engineering in a Cloud World Lessons from highly-scalable architectures at social networking sites Patrick Senti patrick.senti@gmail.com 1
  • 2. Social Networking – Trends 2012 more users... … higher share of time ... … for longer Source: State of Media: The Social Media Report 2012, nielsen, http://is.gd/LYHmnm 2
  • 3. User Adoption Faster for New Entrants User Growth (years since launch) 1000 Facebook Twitter Tumblr Instagram Pinterest Million (logarithmic) 100 10 1 0.1 0.01 0.5 1 2 3 4 5 6 7 8 years Source: author's compilations of data from company data, press statements, technical blogs & presentations 3
  • 4. Staggering Volumes Page views Reads Writes New data Servers Engineers 500 million/day ~40k requests/second ~1 million/second ~3 TB/day 1000 20 Likes (counter) Photos Queries New Data Servers Engineers 2.7 billion/day 300 millions/day 70'000/day 500 TB/day “tens of thousands” ~1700 Tweets (peak) Tweets (avg) API calls New data Engineers ~25'000/second ~250 million/day (1000/second) 6 billion/day (70'000/second) ~8 TB/day (80MB/second) 500 (of 1000 total employees) Page views Growth rate Machinery 2.3 billion/month 50% (visitors, March 2012) 150 web servers 90 caching servers 70 database instances 35 logging, internal 410 TB (user data) ~65 (NB, until end of 2011: 12) Data size Employees 4 Sources: http://is.gd/mpdOPN, http://is.gd/1vJ1il, http://is.gd/58X8ns, http://is.gd/LGexI6, http://is.gd/tZfNPA, http://is.gd/bcpCJc, http://is.gd/kXVEEF
  • 5. Methodology ● Author's synthesis ● ● ● Information collected 2010 – 2012 Mostly secondary research conducted on the internet Sources of information ● ● Engineering blogs by social network companies ● Research reports ● Technology documentation ● ● Public presentations at industry conferences Author's data analysis Threats to validity ● Subjective selection of information sources ● Non-systematic analysis and synthesis of data gathered 5
  • 6. Typical Scalability Approaches ● Load Balancing ● Static content on dedicated servers ● Caching ● Database Partitioning ● Replication (high availability) ● (How) Do these work at social-network scale? 6
  • 7. Facebook Functionality - Type of blog - User profile with personal data - Users 'friend' each-other - Post public or private messages Data Center - owned by facebook Software Architecture Source: Aaditya Agarwal, Facebook Architecture, Qcon'2008, London 7
  • 8. Twitter Functionality - 140-character messages - Users follow each-other - Posts can contain pictures, media links etc. Software architecture - Ruby on Rails, Erlang - since 2009: JVM, Scala - MySQL - Memcached - Unicorn (Mongrel) web server Data Center - dedicated data center (outsourced) Source: Krikorian R., Twitter's Real Time Architecture, Qcon NYC 2012 8
  • 9. tumblr Functionality - Microblogging - Users follow each-other - Dashboard similar to a Facebook page Software architecture - PHP, Ruby, Scala - Redis, Hbase, MySQL - Memcache - Thrift Data Center - started at Rackspace - co-located, dedicated Source: Tumblr Architecture – 15 Billion Page Views A months and Harder to Scale than Twitter, Highscalability Blog Source: tumblr.com 9
  • 10. Pinterest Functionality - Photo sharing pinboards - Categorize images, share with others - mostly used by women (2012: 83%) Software architecture - Python - Django Data Center - Amazon EC2, EBS, S3 Source: pinterest.com Source: Jackson B., Pinterest growth driven by Amazon cloud scalability, 04.2012, techworld.com 10
  • 11. Instagram Functionality - Smartphone photo sharing - Post to other social networks - Send messages Software architecture - Python, Django - PostgreSQL - Redis - Nginx - Node.js - Android Data Center - started with single small scale PC (up to 30+ million users) - 100+ instances at Amazon (EC2, EBS, S3 for photos) Source: Wikipedia Employees - 2010: 2 engineers, 2012: 5 engineers - That's the total employee count Source: Instagram, What Powers Instagram: Hundreds of Instances, Dozens of Technologies, Instagram Engineering Blog 11
  • 12. Scalability Options #CPUs RAM disk transparent scalability scale 'out of the box' ●complex hardware (high cost) ●specialised Knowledge ●more complex software (multi-core) ● ● either way - scale by parallization - partition for fault tolerance - replicate for reliability scale up simple hardware (low cost) scale by numbers ●difficult to implement ●difficult to maintain (myth?) ●nore complex software (expensive licenses) ● this means: - decouple components - asynchronous processing - monitor to operate ● scale out #machines 12
  • 13. Caching ● Goal Reduce response times for web site & data access ● Product memcached (open source, initially developed 2003) ● Benefits All accesses (read & write) are O(1) 13
  • 14. memchached Features ● Remote-accessible in-memory key/value cache ● Least Recently Used (LRU) eviction ● Shared-nothing, distributed architecture Implementation ● memcached nodes map to key-ranges (client-side hashing – no SPOF) ● Multi-threaded, event-based async network I/O (200'000 requests/s at Facebook) ● Single-node fault tolerance by consistent hashing scheme memcached memcached Web Server Load Balancer Keys={4,5,6} memcached Keys={7,8,9} memcached Keys={10,11,12} client Web Server server = hash­f(key) % #servers Source: memcached.org Keys={1,2,3} 14
  • 15. Consistent Hashing in a nutshell 'Traditional' hashing: buckets contain pre-defined range => at worst requires re-building the full cache, every node may be affected Consistent hashing: buckets are located on a ring, contain up to pre-defined limit => at worst, only the keys of the failing node need to be re-mapped server = min(s | s.location >= (hash­f(key) % #locations)) Keys={8,9,10} Keys={5,6,7} Keys={1,2,3} Keys={3,4,5} Keys={8,9,10} Keys={1,2,3} m Keys={5,6,7} Source: David Karget et al, Web caching with consistent hashing, Vol 31, 1999, Computer Networks Keys={1,2,3,4,5} 16
  • 16. Memcached Results ● Results at Twitter ● ● 20TB of data covering >30 services ● 2 trillion queries/day (>23 million queries/second) ● ● 100s of servers Modified memcached, released as “Twemcache” Key objectives ● High Availability ● Predictable Performance ● Dynamic adoption to size (grow/shrink) ● Monitoring of cache effectiveness Source: Chris Aniszczyk, Caching with Twemcache, 07.2012, Twitter Engineering Blog 17
  • 17. Shard your data ● Shards ● horizontal partitions (e.g. by user, time, ...) ● distributed to multiple physical nodes => parallelized data access ● data typically denormalized ● similar data is replicated to all shards – e.g. static data Web Server node = hash­f(user­id) % #nodes db-client node1 Userids= {A, …, F} node2 Userids= {G, …, L} node3 node4 Userids= {….} Userids= {….} 18
  • 18. Sharding Results ● Impressive results at Facebook ● ● 4ms reads, 5ms writes ● 60M queries/second (peak) ● ● 1800 MySQL servers Growth 20x (overall data, over two years) What work's ● ● Linking across shards – store cross-reference s in both shards (two-way access) ● Fault tolerance: single-instance failure only affects subset of users ● ● Shard by user – group similar data into the same shard Consistent hashing - What doesn't ● Join's across shards – not possible efficient ● Sharding by time not helpful – one shard keeps running “hot” ● Sharding by function not helpful – non-uniform distribution, hot spots, unique access patterns ● Fixed hashing – nodes become unbalanced, difficult to grow or shrink Source: Facebook Techtalks, MySQL & Hbase, December 5, 2011 19
  • 19. Managing shards ● Results at Tumblr ● ● Grouped into 5 global pools / 58 shard pools ● 28 TB ● 100 billion rows ● ● 200 db servers No DBAs - 2 engineers keep this running at 50% of their time Jetpants – DB management toolkit ● ● Split shards into new shards ● Master promotions ● ● Clone slaves efficiently Command line to work with topology Open sourced ● https://github.com/tumblr/jetpants Source: Elias E., Managing Large Sharded Topologies with Jetpants, 12.2012, Percona Live MySQL Conference 20
  • 20. Asynchronous & Distributed Work ● Problem Do more work in less time ● Solution Distributed, asynchronous processing MapReduce ● Requirements ● ● Distribute work ● Collect results ● ● Split work job into multiple pieces Fault tolerant Technologies ● Message Queueing ● Gearman ● Hadoop / Pig 21
  • 21. Asynchronous Work Example ● Instagram Push Notifications ● ● All uploads go into a task-queue ● ● Image uploads ~200 worker processes asynchronously process the images Gearman ● Open Source ● Framework to distribute work ● Load Balancing ● No SPOF Source: gearman.org Source: Instagram, What Powers Instagram: Hundreds of Instances, Dozens of Technologies, 2012, Instagram Engineering Blog 22
  • 22. Apache Hadoop ● What it is ● Distributed MapReduce engine ● Fault tolerant ● Asynchronous job scheduling ● ● Scalable: e.g. 4000 node cluster, sorts of 1TB in 62 seconds Datastorage ● ● Distributed storage ● Written in Java ● Data replicated among 3 nodes ● Block storage of 64MB/block ● ● HDFS – scalable to multiple PB No SPOF Apache Pig ● High-level query language Sources: Apache Hadoop, Wikipedia, The Free Encyclopedia, accesses January 8, 2013 Weil K., NoSQL at Twitter, 04.2010, NoSQL EU 2012 23
  • 23. Results ● NoSQL at Twitter ● ● HD speed: ~80MB/s => 24.3 hours ● ● Store 7TB of data/day Need to parallelize writes and reads Analysis using Pig ● Count all tweets ● 12 billion ● 5 minutes Source: Weil K., NoSQL at Twitter, 04.2010, NoSQL EU 2012 24
  • 24. Simplified Queries Source: Weil K., NoSQL at Twitter, 04.2010, NoSQL EU 2012 25
  • 25. Service Oriented Architecture “Onion-Style” outer services - public (e.g. REST) - user interface - typically scripted (Python, Ruby, JavaScript) inner services - private & highly efficient - data access, calculation etc. - workers to accomplish work in parallel - mix of languages (Java, Scala, Python, C, ...) fire hose - highly available, scalable service bus - distribute services as needed - typically asynchronous 27
  • 26. Tumblr Firehose Results - 4 x CPUs @ 72GB RAM, 2 disks - provide 1 week of streams - ~400k messages/second - 1 Week of Tumblr posts New Post Apache kafka - O(1) persistent message queue - x times 100K messages/s - pub/sub interface finagle finagle internal API (thrift) finagle - asynchronous RPC system - JVM-hosted languages (Java, Scala, ...) - Connection pools, failure detectors, failover, load-balancing, back-pressure ... HTTP Client HTTP Client HTTP Client public API (JSON) Apache Zookeeper (Cluster) - distributed coordination - highly available Source: Blake M., Tumblr Firehose - The Gory Details, 2012, Tumblr Engineering Blog 28
  • 27. SOA revisited – network efficiency efficient? consumer 1. Serialize 2. Wait for response 3. Deserialize Interface CORBA, HTTP/JSON, WSDL/XML/SOAP, ... provider 1. Deserialize 2. Provide response 3. Serialize 29
  • 28. Apache thrift – optimized wire protocol ● What it is ● ● Cross-language service implementation ● Code-generation engine (C++, Java, Python, JavaScript, …) ● ● Human-readable interface definition language (non-XML) Binary wire protocol Benefits ● Low-overhead serialization/de-serialization ● Native language bindings (no XML parsing or XSD) ● Efficient protocol implementation 30
  • 29. thrift example interface struct UserProfile { 1: i32 uid, 2: string name, 3: string blurb } service UserStorage { void store(1: UserProfile user), UserProfile retrieve(1: i32 uid) } client # Make an object up = UserProfile(uid=1, name="Test User", blurb="Thrift is great") # Talk to a server via TCP sockets, binary protocol transport = TSocket.TSocket("localhost", 9090) transport.open() Protocol =TBinaryProtocol.TBinaryProtocol(transport) # Use the service we already defined service = UserStorage.Client(protocol) service.store(up) Up2 = service.retrieve(1) Service implementation class UserStorageHandler : virtual public UserStorageIf { public: UserStorageHandler() { // Your initialization goes here } void store(const UserProfile& user) { // Your implementation goes here printf("storen"); } void retrieve(UserProfile& _return, const int32_t uid) { // Your implementation goes here printf("retrieven"); } }; //main ... } Source: thrift.apache.org 31
  • 30. Serialization / Deserialization Performance Benchmark - CPU Core i7 2.7GHz - Serialization of a service message (media descriptor of a video) Serialization … (thrift: -66% ) … Deserialization (thrift: -92%) Message size (thrift: -19%) Source: Author testing 32
  • 31. redis: In-Memory DB Problem Solution Require speed of cache, query semantics, persistence, fault-tolerance of DB cluster redis.io – a distributed in-memory DB Redis ● fast: O(1) access times - 100'000 writes/second, 80'000 read/second ● fault-tolerant ● datatypes: strings, hashes, lists, sets, sorted sets ● complex queries: intersection, subset, sort, … ● more than just a DB: pub/sub channels redis Keys={1,2,3} master redis Keys={3,4,5} redis Keys={5,6,7} async replication consumer redis Keys={8,9,10} slave slave slave 33
  • 32. redis results ● tumblr ● >7500 notifications/second (well above MySQL max. concurrent limit) ● <5ms response time requirement ● Redis: 30'000 requests/second Source: Blake M., Staircar: Redis-powered notifications, 07.2011, Tumblr Engineering Blog 35
  • 33. Automate everything & Monitor ● If just two engineers ● ● ● maintain dozens of databases ● ● run 100+ servers Scale a system to 30+ million users … automation is like air to breathe … … monitoring is the lifeline Source: Adams J., Scaling Twitter, 2010, Chirp Conference Dashboard @ Twitter 36
  • 34. Cell Architecture Cell Architecture ● ● Self-contained cells of data + logic ● Each cell itself made up of a cluster of nodes ● Cells provide internal failover ● Reliability ● Scalability Client Discovery Service consistent hashing by user-id Application Server Cluster Metadata store (HBase) Cell Source: Malik P., Scaling the Messages Application Back End, 04.11, facebook Engineering's Notes 37
  • 35. Summary Scalability ● Cache ● Data Sharding ● In-Memory DB ● Efficient wire protocols Flexibility ● SOA ● ● Layered (outer, inner services) ● ● Decoupled Asynchronous (firehouse) Automation Reliability ● Replication ● Cell Architecture 38
  • 36. Take Away for Application Development ● Scalability => Distribution ● ● Efficiency at every level ● ● Loosely Coupled Components (accessible via APIs, services) Shared nothing Reliability => Replication ● ● Monitoring ● ● Automation Fast provisioning of replicates Flexibility => Simplification ● Build for simple use ● Abstract to simplify (e.g. Pig/Hadoop, Redis/in-Memory DB) ● API-everything 39
  • 37. Paradigm Shift? ● New normal ● ● <5 engineers ● Distributed work load ● Horizontal scalability ● ● 100s of machines PBs of data Drivers ● Low barriers of entry – free or low-cost hosting ● Declining cost – CPU, storage, networking ● Web-scale ready open-source software 40
  • 38. Q&A Thank you 41
  • 39. What we haven't covered ● CAP Theorem ● A/B Testing ● NoSQL Databases 42