Your SlideShare is downloading. ×
Everything I Learned About
Scaling Online Games I
Learned at Google and eBay
Randy Shoup 
@randyshoup
linkedin.com/in/rand...
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/kixeye-scalability

InfoQ.com...
Presented at QCon San Francisco
www.qconsf.com
Purpose of QCon
- to empower software development by facilitating the sprea...
Background
CTO at KIXEYE
•  Making awesome games awesomer (and scalabler
and reliabler)

Director of Engineering for Googl...
Engineering “Fun”
Whole user / player experience
•  Think holistically about the full end-to-end
experience of the user
• ...
Real-Time Strategy Games are …
Real-time
Spiky
Diverse
Constantly evolving
Constantly pushing boundaries

è Technically a...
Know Your Requirements
Less is more
•  More wood, fewer arrows
•  Solve 100% of one problem rather than 50% of two
•  Rele...
Know Your Bottlenecks
Log everything
Monitor relentlessly
Measure bottlenecks and attack the first
•  “When you solve probl...
Know Your Distributions
“Normal” distribution is *not* normal
•  Only works for quantities physically constrained on
both ...
Know Your Distributions
Exponential (“Long Tail”) distribution *much*
more common
•  Income, latency, human connections, e...
Layering and Responsibility
Multiple layers
• 
• 
• 
• 

Client
Game server
Services
Persistence

Clarify roles and respon...
Distribution of Data / Work
Load-balancing (for stateless work)
•  Web servers, proxies
•  Most services


Sharding (for s...
Services
Simple, well-defined interface
Single-purpose
Modular and independent
Small team
Autonomy and responsibility
Component Isolation
Combat server for TOME
•  Highly “twitchy” real-time MOBA combat
•  Very latency-sensitive

Real-time ...
Asynchrony: Do Work Up Front
Custom asset pipeline
•  Spriting, compression, etc


Pre-render “movies” instead of real-tim...
Asynchrony: Client Liveness
Client continues seamlessly if disconnected
•  Gameplay more important than immediate
synchron...
Asynchrony: Reactive Server
Minimize request latency
•  Respond as rapidly as possible to client
•  Queue events / message...
Small, Independent Teams
Studio System
•  Full-stack, independent game teams
•  Near-complete autonomy on technology choic...
Hire and Retain Top People
Hire ‘A’ Players
•  Difference between top and bottom performers is
not 1.5x; it’s 10x (!) 
•  ...
Play to People’s Strengths
People are not cogs, not fungible
•  (-) eBay “Train seats”
•  Destroyed incentives, personal p...
Small Details Matter
In the very large, the very small matters a *lot*
•  Subatomic physics and cosmology
•  eBay and vari...
Join us!
www.kixeye.com [jobs]
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/kixeyescalability
Upcoming SlideShare
Loading in...5
×

Everything I Learned About Scaling Online Games I Learned at Google and eBay: Scalability at KIXEYE

329

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1fBeVJ0.

Randy Shoup shares war stories from eBay and Google about performance, consistency, iterative development, and autoscaling, connecting them with experiences building KIXEYE's gaming platform. Filmed at qconsf.com.

As CTO, Randy Shoup brings 25 years of engineering leadership to KIXEYE, ranging from tiny startups to Internet-scale companies. Prior to KIXEYE, he was Director of Engineering at Google, leading several teams building Google App Engine. Prior to Google, he was CTO and Co-Founder of Shopilly, an ecommerce startup, and spent 6 1/2 years as Chief Engineer and Distinguished Architect at eBay.

Published in: Technology, News & Politics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
329
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Everything I Learned About Scaling Online Games I Learned at Google and eBay: Scalability at KIXEYE"

  1. 1. Everything I Learned About Scaling Online Games I Learned at Google and eBay Randy Shoup @randyshoup linkedin.com/in/randyshoup
  2. 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /kixeye-scalability InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  3. 3. Presented at QCon San Francisco www.qconsf.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. Background CTO at KIXEYE •  Making awesome games awesomer (and scalabler and reliabler) Director of Engineering for Google App Engine •  World’s largest Platform-as-a-Service Chief Engineer at eBay •  Multiple generations of eBay’s real-time search infrastructure
  5. 5. Engineering “Fun” Whole user / player experience •  Think holistically about the full end-to-end experience of the user •  UX, functionality, performance, bugs, etc. All useful metrics are *proxies* for fun •  Performance: load time, frame rate, lag •  Technology: latency, availability •  Business: acquisition, retention, monetization
  6. 6. Real-Time Strategy Games are … Real-time Spiky Diverse Constantly evolving Constantly pushing boundaries è Technically and operationally demanding
  7. 7. Know Your Requirements Less is more •  More wood, fewer arrows •  Solve 100% of one problem rather than 50% of two •  Release one great feature instead of two iffy ones Understand the requirements •  •  •  •  e.g., Battle replay Ephemeral combat Immutable recording Manageable storage footprint
  8. 8. Know Your Bottlenecks Log everything Monitor relentlessly Measure bottlenecks and attack the first •  “When you solve problem one, problem two gets a promotion” •  Theory of Constraints: attacking *any* other problem yields no improvement Accept that your intuition is WRONG (!)
  9. 9. Know Your Distributions “Normal” distribution is *not* normal •  Only works for quantities physically constrained on both sides, clustered around a mean •  E.g., adult height or weight Leads to invalid analysis and conclusions •  Removing outliers •  Ignoring real problems •  Your (trained) intuition is WRONG (!)
  10. 10. Know Your Distributions Exponential (“Long Tail”) distribution *much* more common •  Income, latency, human connections, etc. •  Also easy to reason about – only single parameter Percentiles are your best friends (!) •  •  •  •  Reasonably characterize any distribution Measure 90%ile, 99%ile, 99.9%ile Focus on the real problems Mean and Standard Deviation are useless
  11. 11. Layering and Responsibility Multiple layers •  •  •  •  Client Game server Services Persistence Clarify roles and responsibilities •  Client- vs. server-authoritative •  Google service layering (+)
  12. 12. Distribution of Data / Work Load-balancing (for stateless work) •  Web servers, proxies •  Most services Sharding (for stateful work) •  •  •  •  Combat servers Matchmaking Leaderboards Databases
  13. 13. Services Simple, well-defined interface Single-purpose Modular and independent Small team Autonomy and responsibility
  14. 14. Component Isolation Combat server for TOME •  Highly “twitchy” real-time MOBA combat •  Very latency-sensitive Real-time interactions isolated to a single, ephemeral component •  No coordination with any central service Highly dynamic load distribution •  Router assigns battle to least-loaded server •  Requires latency-fairness between players
  15. 15. Asynchrony: Do Work Up Front Custom asset pipeline •  Spriting, compression, etc Pre-render “movies” instead of real-time particle effects Tons of caching
  16. 16. Asynchrony: Client Liveness Client continues seamlessly if disconnected •  Gameplay more important than immediate synchronization Event loop for rendering •  Keep up with the frame rate (!) Default to background processing •  Refresh assets •  Save client state
  17. 17. Asynchrony: Reactive Server Minimize request latency •  Respond as rapidly as possible to client •  Queue events / messages for complex work •  Service interactions via reliable events Functional Reactive programming •  Heavy use of Scala and Akka •  Never block (!) •  eBay, Google programming models (-)
  18. 18. Small, Independent Teams Studio System •  Full-stack, independent game teams •  Near-complete autonomy on technology choices, development processes Vendor-customer discipline •  Google service teams (+) Reduces contention and coherence
  19. 19. Hire and Retain Top People Hire ‘A’ Players •  Difference between top and bottom performers is not 1.5x; it’s 10x (!) •  (+) Google hiring process Virtuous Cycle •  A players bring A players •  B players bring C players •  Constantly raise the bar Reduces contention and coherence
  20. 20. Play to People’s Strengths People are not cogs, not fungible •  (-) eBay “Train seats” •  Destroyed incentives, personal pride, long-term ownership Align work with skills and passion •  Symphony instead of Factory (!) •  Skills in Flash, Scala, etc. •  Build customizability for target developer, not builder (DSL >> code)
  21. 21. Small Details Matter In the very large, the very small matters a *lot* •  Subatomic physics and cosmology •  eBay and variable-byte encoding (+) •  GAE and memcache slab memory allocation (+) Discipline is *which* details matter •  Combat server and memory contention •  40% improvement from six characters … •  “const ”
  22. 22. Join us! www.kixeye.com [jobs]
  23. 23. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/kixeyescalability

×