Scalable Social Architectures by Biren Gandhi


Published on

Slides from Biren Gandhi's GITPRO session on scalability lessons from Facebook and Zynga.

Checkout Biren's profiles at:

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Scalable Social Architectures by Biren Gandhi

  1. 1. Scalable Social Architectures<br />Think Early, Think Often, Think Through<br />birengandhi<br /><br />
  2. 2. Scalability by the numbers<br />source: (as of Dec 1, 2010)<br />Scalable Social Architectures<br />
  3. 3. Scalability by the numbers<br />500M active users, 50% log-in <br />EVERY DAY<br />source:<br />Scalable Social Architectures<br />
  4. 4. Isn’t that great?<br />Exponential Growth <br />is <br />AWESOME <br />or PAINFUL  <br />Scalable Social Architectures<br />
  5. 5. Scaling Challenges<br />Scalable Social Architectures<br />Think Early<br />Think Often<br />Think Through<br />
  6. 6. Scaling Challenges<br />And the biggest challenge is …<br />One Size Does NOT Fit All<br />Scalable Social Architectures<br />
  7. 7. Social Web Architectures<br />Scalable Social Architectures<br />Firewall<br />Caching<br />Web and App Tier<br />Public Network<br />LB/Proxy<br />DB Tier<br />Platform API Calls<br />“Social” Part<br />
  8. 8. Social Game Architectures<br />Scalable Social Architectures<br />Not <br />“nice to have”, but “must have”<br />CDN<br />Upload/Miss<br />Firewall<br />Caching<br />Web and App Tier<br />Public Network<br />LB/Proxy<br />DB Tier<br />Platform API Calls<br />
  9. 9. Gaming Stack Summary<br />Scalable Social Architectures<br />
  10. 10. Obvious Challenges <br />Persistence<br />Caching<br />Web Server<br />Client Perception<br />Data<br />Scalable Social Architectures<br />
  11. 11. Persistence<br />RDBMS (mysql)<br />Data modeling<br />De-normalization is your friend <br />Slaving<br />Sharding<br />NoSQL<br />Hbase, Cassandra, MongoDB, Redis, …<br />Membase<br />Graph Databases<br />Neo4j, InfiniteGraph, …<br />Traffic Patterns<br />Read/Write Mix<br />Consistency Models<br />Scalable Social Architectures<br />
  12. 12. MysqlSharding Example<br />App Tier<br />App Tier<br />M0<br />M1<br />M0<br />M1<br />Master<br />S0<br />S1<br />S0<br />S1<br />S2<br />S3<br />Slave<br />S’0<br />S’1<br />S’2<br />S’3<br />
  13. 13. Caching<br />In-process (MMO)<br />Java vs. Scripting Languages<br />APC<br />Mostly read-only global data<br />Priming after releases<br />Serialization <br />Memcache<br />Functional (vertical) Sharding<br />Statistical (horizontal) Sharding<br />Granularity & Compression of Stored Objects<br />Scalable Social Architectures<br />
  14. 14. Memcached Example<br />Key Format: <function>:<version>:value (e.g. user:1:123 or better u:1:123)<br />Scalable Social Architectures<br />Horizontal<br />Sharding<br />Vertical<br />Sharding<br />
  15. 15. Memcached Example<br />Key Format: <function>:<version>:value (e.g. user:1:123 or better u:1:123)<br />Scalable Social Architectures<br />Horizontal<br />Sharding<br />Vertical<br />Sharding<br />uid % n<br />feature?<br />
  16. 16. Web Tier<br />Easy to Scale (Stateless)<br />Cloud-friendly auto-grow policies<br />Sticky sessions?<br />H/W or S/W Load Balancers<br />HAProxy, Zeus for Cloud environments<br />DNS could be a bottleneck<br />Round Robin DNS<br />Tech Choices<br />Apache, Nginx + FastCGI<br />Node.js on the server side<br />Scalable Social Architectures<br />
  17. 17. Client Tier<br />CDN<br />Static assets (images, JS, CSS etc.)<br />Invalidation/update schemes (versioning)<br />UGC might be little challenging<br />Flash (or Graphic Engines)<br />Watch out for frame rates<br />Users’ machines are not as powerful<br />JavaScript<br />Frameworks (jQuery, Prototype, etc.)<br />Tons of great advice on minifying, compressing, obfuscating<br />Scalable Social Architectures<br />
  18. 18. Data<br />Versioning<br />Coexistence of old and new data<br />Rollback during catastrophic events<br />Integrity<br />Checksum (costly)<br />Field-level Validation (not cheap either)<br />Gaming Content<br />Lightweight CMS (integrated in art pipeline)<br />Scalable Social Architectures<br />
  19. 19. Not-so-obvious Challenges<br />Data Center vs. Cloud<br />Network Port Saturation<br />CPU Utilization<br />Connection Bloat<br />Runtime Configuration<br />Programming Models<br />Deployment Discipline<br />Scalable Social Architectures<br />
  20. 20. Not-so-obvious Challenges<br />Data Center vs. Cloud<br />Virtualization is good, but slow<br />Little control over optimization <br />High failure rates<br />Network Saturation<br />Traffic going across (DB/Memcache boxes)<br />CPU Utilization<br />One core is overworked <br />Scalable Social Architectures<br />Cloud Perf. <br />1/5 to 1/8 of Data Center<br />
  21. 21. Not-so-obvious Challenges<br />Connection Bloat<br />Database/Memcached<br />Persistent Connections<br />Proxies (mysql-proxy, moxi, memagent etc.)<br />Runtime Configuration<br />Controlled rollout of features<br />Fire-fighting defense<br />Home-grown, Apache ZooKeeper, … <br />Scalable Social Architectures<br />
  22. 22. Not-so-obvious Challenges<br />Programming Models<br />External platform APIs<br />Failure semantics<br />Make async, as much as possible<br />Graceful subsystem failures<br />Organize all users data into same shards<br />Things WILL fail when you least expect them to<br />Constant Performance Evaluation<br />Profiling Tools (xhprof, xdebug etc. for PHP)<br />HipHop Compiler<br />PHP Extensions for Common Code<br />Heterogeneous languages for subcomponents <br />Scalable Social Architectures<br />
  23. 23. Deployment Disciplines<br />Tools <br />Hudson/Bamboo (deployment)<br />Munin/Nagios (monitoring)<br />Staging environment<br />As close to production as possible<br />User Downtime<br />404 Pages<br />Transparent Handling<br />Cache Priming<br />DB failures after every deployment<br />Deployment Time<br />1000+ servers<br />P2P solutions<br />Scalable Social Architectures<br />
  24. 24. Summary<br />Metrics-driven Optimizations<br />You can’t improve something you can’t measure<br />Every smart person has an ‘opinion’<br />Leverage automated monitoring tools<br />Let machines work while humans rest<br />Any system is only as robust as its weakest link<br />It’s an ONGOING process – a journey …<br />better get used to it <br />Think Early, Think Often, Think Through<br />Scalable Social Architectures<br />
  25. 25. Questions?<br /><br /><br /><br /><br /><br />Thank You<br />Scalable Social Architectures<br />