6. Scaling Challenges And the biggest challenge is … One Size Does NOT Fit All Scalable Social Architectures
7. Social Web Architectures Scalable Social Architectures Firewall Caching Web and App Tier Public Network LB/Proxy DB Tier Platform API Calls “Social” Part
8. Social Game Architectures Scalable Social Architectures Not “nice to have”, but “must have” CDN Upload/Miss Firewall Caching Web and App Tier Public Network LB/Proxy DB Tier Platform API Calls
13. Caching In-process (MMO) Java vs. Scripting Languages APC Mostly read-only global data Priming after releases Serialization Memcache Functional (vertical) Sharding Statistical (horizontal) Sharding Granularity & Compression of Stored Objects Scalable Social Architectures
14. Memcached Example Key Format: <function>:<version>:value (e.g. user:1:123 or better u:1:123) Scalable Social Architectures Horizontal Sharding Vertical Sharding
15. Memcached Example Key Format: <function>:<version>:value (e.g. user:1:123 or better u:1:123) Scalable Social Architectures Horizontal Sharding Vertical Sharding uid % n feature?
16. Web Tier Easy to Scale (Stateless) Cloud-friendly auto-grow policies Sticky sessions? H/W or S/W Load Balancers HAProxy, Zeus for Cloud environments DNS could be a bottleneck Round Robin DNS Tech Choices Apache, Nginx + FastCGI Node.js on the server side Scalable Social Architectures
17. Client Tier CDN Static assets (images, JS, CSS etc.) Invalidation/update schemes (versioning) UGC might be little challenging Flash (or Graphic Engines) Watch out for frame rates Users’ machines are not as powerful JavaScript Frameworks (jQuery, Prototype, etc.) Tons of great advice on minifying, compressing, obfuscating Scalable Social Architectures
18. Data Versioning Coexistence of old and new data Rollback during catastrophic events Integrity Checksum (costly) Field-level Validation (not cheap either) Gaming Content Lightweight CMS (integrated in art pipeline) Scalable Social Architectures
19. Not-so-obvious Challenges Data Center vs. Cloud Network Port Saturation CPU Utilization Connection Bloat Runtime Configuration Programming Models Deployment Discipline Scalable Social Architectures
20. Not-so-obvious Challenges Data Center vs. Cloud Virtualization is good, but slow Little control over optimization High failure rates Network Saturation Traffic going across (DB/Memcache boxes) CPU Utilization One core is overworked Scalable Social Architectures Cloud Perf. 1/5 to 1/8 of Data Center
21. Not-so-obvious Challenges Connection Bloat Database/Memcached Persistent Connections Proxies (mysql-proxy, moxi, memagent etc.) Runtime Configuration Controlled rollout of features Fire-fighting defense Home-grown, Apache ZooKeeper, … Scalable Social Architectures
22. Not-so-obvious Challenges Programming Models External platform APIs Failure semantics Make async, as much as possible Graceful subsystem failures Organize all users data into same shards Things WILL fail when you least expect them to Constant Performance Evaluation Profiling Tools (xhprof, xdebug etc. for PHP) HipHop Compiler PHP Extensions for Common Code Heterogeneous languages for subcomponents Scalable Social Architectures
23. Deployment Disciplines Tools Hudson/Bamboo (deployment) Munin/Nagios (monitoring) Staging environment As close to production as possible User Downtime 404 Pages Transparent Handling Cache Priming DB failures after every deployment Deployment Time 1000+ servers P2P solutions Scalable Social Architectures
24. Summary Metrics-driven Optimizations You can’t improve something you can’t measure Every smart person has an ‘opinion’ Leverage automated monitoring tools Let machines work while humans rest Any system is only as robust as its weakest link It’s an ONGOING process – a journey … better get used to it Think Early, Think Often, Think Through Scalable Social Architectures