Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Everything I Learned About Scaling Online Games I Learned at Google and eBay [Part 1, QConSF 2013]


Published on

While the worlds of ecommerce, search, and application platforms might seem as far from the gaming industry as one might imagine, lessons learned in those environments are surprisingly applicable to online games. Real-time games in particular face many of the same challenges faced -- and solved -- by companies like eBay and Google. They are extremely latency-sensitive, are subject to unpredictable growth and scalability curves, and exhibit extremely spiky load profiles. The real-time player experience is critical to the success of the company -- if a game is down or slow, players will leave and never come back. This session will discuss how experiences with large-scale websites like eBay and Google have informed our approach to building, testing, and operating real-time games at KIXEYE.

This session tells several war stories from eBay and Google about performance, consistency, iterative development, and autoscaling. It further puts it all together by connecting those experiences with what we are now doing in our next-generation gaming platform at KIXEYE.

Published in: Internet, Technology, Business
  • Be the first to comment

Everything I Learned About Scaling Online Games I Learned at Google and eBay [Part 1, QConSF 2013]

  1. 1. Everything I Learned About Scaling Online Games I Learned at Google and eBay Randy Shoup @randyshoup
  2. 2. Background CTO at KIXEYE • Making awesome games awesomer (and scalabler and reliabler) Director of Engineering for Google App Engine • World’s largest Platform-as-a-Service Chief Engineer at eBay • Multiple generations of eBay’s real-time search infrastructure
  3. 3. Engineering “Fun” Whole user / player experience • Think holistically about the full end-to-end experience of the user • UX, functionality, performance, bugs, etc. All useful metrics are *proxies* for fun • Performance: load time, frame rate, lag • Technology: latency, availability • Business: acquisition, retention, monetization
  4. 4. Real-Time Strategy Games are … Real-time Spiky Diverse Constantly evolving Constantly pushing boundaries  Technically and operationally demanding
  5. 5. Know Your Requirements Less is more • More wood, fewer arrows • Solve 100% of one problem rather than 50% of two • Release one great feature instead of two iffy ones Understand the requirements • e.g., Battle replay • Ephemeral combat • Immutable recording • Manageable storage footprint
  6. 6. Know Your Bottlenecks Log everything Monitor relentlessly Measure bottlenecks and attack the first • “When you solve problem one, problem two gets a promotion” • Theory of Constraints: attacking *any* other problem yields no improvement Accept that your intuition is WRONG (!)
  7. 7. Know Your Distributions “Normal” distribution is *not* normal • Only works for quantities physically constrained on both sides, clustered around a mean • E.g., adult height or weight Leads to invalid analysis and conclusions • Removing outliers • Ignoring real problems • Your (trained) intuition is WRONG (!)
  8. 8. Know Your Distributions Exponential (“Long Tail”) distribution *much* more common • Income, latency, human connections, etc. • Also easy to reason about – only single parameter Percentiles are your best friends (!) • Reasonably characterize any distribution • Measure 90%ile, 99%ile, 99.9%ile • Focus on the real problems • Mean and Standard Deviation are useless
  9. 9. Layering and Responsibility Multiple layers • Client • Game server • Services • Persistence Clarify roles and responsibilities • Client- vs. server-authoritative • Google service layering (+)
  10. 10. Distribution of Data / Work Load-balancing (for stateless work) • Web servers, proxies • Most services Sharding (for stateful work) • Combat servers • Matchmaking • Leaderboards • Databases
  11. 11. Services Simple, well-defined interface Single-purpose Modular and independent Small team Autonomy and responsibility
  12. 12. Component Isolation Combat server for TOME • Highly “twitchy” real-time MOBA combat • Very latency-sensitive Real-time interactions isolated to a single, ephemeral component • No coordination with any central service Highly dynamic load distribution • Router assigns battle to least-loaded server • Requires latency-fairness between players
  13. 13. Asynchrony: Do Work Up Front Custom asset pipeline • Spriting, compression, etc Pre-render “movies” instead of real-time particle effects Tons of caching
  14. 14. Asynchrony: Client Liveness Client continues seamlessly if disconnected • Gameplay more important than immediate synchronization Event loop for rendering • Keep up with the frame rate (!) Default to background processing • Refresh assets • Save client state
  15. 15. Asynchrony: Reactive Server Minimize request latency • Respond as rapidly as possible to client • Queue events / messages for complex work • Service interactions via reliable events Functional Reactive programming • Heavy use of Scala and Akka • Never block (!) • eBay, Google programming models (-)
  16. 16. Small, Independent Teams Studio System • Full-stack, independent game teams • Near-complete autonomy on technology choices, development processes Vendor-customer discipline • Google service teams (+) Reduces contention and coherence
  17. 17. Hire and Retain Top People Hire „A‟ Players • Difference between top and bottom performers is not 1.5x; it’s 10x (!) • (+) Google hiring process Virtuous Cycle • A players bring A players • B players bring C players • Constantly raise the bar Reduces contention and coherence
  18. 18. Play to People‟s Strengths People are not cogs, not fungible • (-) eBay “Train seats” • Destroyed incentives, personal pride, long-term ownership Align work with skills and passion • Symphony instead of Factory (!) • Skills in Flash, Scala, etc. • Build customizability for target developer, not builder (DSL >> code)
  19. 19. Small Details Matter In the very large, the very small matters a *lot* • Subatomic physics and cosmology • eBay and variable-byte encoding (+) • GAE and memcache slab memory allocation (+) Discipline is *which* details matter • Combat server and memory contention • 40% improvement from six characters … • “const ”
  20. 20. Join us! [jobs]