The Game of Operations
and
The Operation of Games
Randy Shoup
@randyshoup
linkedin.com/in/randyshoup
DevOpsDays Silicon Valley, June 28 2014
Background
CTO at KIXEYE
• Real-time strategy games for web and mobile
Director of Engineering for Google App
Engine
• World’s largest Platform-as-a-Service
Chief Engineer at eBay
• Multiple generations of eBay’s real-time
search infrastructure
1973: Xerox PARC and
SuperPaint
en.wikipedia.org/wiki/SuperPaint
www.computerhistory.org/collections/catalog/X1001.89B
40 Years Later …
tomeimmortalarena.com
Real-Time Strategy Games are
… • Real-time
• Spiky
• Computationally-
intensive
• Constantly evolving
• Constantly pushing
boundaries
 Technically and
operationally demanding
Operating Games: Goals
Player Fun
• If players aren’t playing, we don’t have a business
• If players aren’t having fun, we don’t have a business
for long
• Fun includes game mechanics, feature set, uptime,
performance
Developer Productivity and Satisfaction
• We are a vendor; the studios are our customers
• Must be *strictly better* than the alternatives of build,
buy, borrow
Cost Efficiency
• More output for less
The Game of Operations
Cloud
• All studios and services moving to AWS
• Strong focus on automation
Services
• Small, focused teams
• Clean, well-defined interface to customers
DevOps Culture
• One team across development and ops
The Game of Operations
Cloud
Services
DevOps Culture
Why Cloud? (The Obvious)
Provisioning Speed
• Minutes, not weeks
• Autoscaling in response to load
Near-Infinite Capacity
• No need to predict and plan for growth
• No need to defensively overprovision
Pay For What You Use
• No “utilization risk” from owning / renting
• If it’s not in use, spin it down
Why Cloud? (The Less
Obvious)
Instance Shaping
• Instance shapes to fit most parts of the
solution space (compute-intensive, IO-
intensive, etc.)
• If one shape does not fit, try another
Service Quality
• Amazon and Google know how to run data
centers
• Battle-tested and highly automated
• World-class networking, both cluster fabric
and external peering
Why Cloud? (Fundamental
Forces)
Economics
• Nearly impossible to beat Google / Amazon
buying power or operating efficiencies
• 2010s in computing are like 1910s in electric
power
Developer Adoption
• It Just Works ™
• Makes it easy to fall in love with infrastructure

“Soon it will be just as common to
run your own data center as it is
to run your own electric power
generation”
-- me
Autoscaling
Games are very spiky
• Very unpredictable
• Huge variability between peak and trough
Hits are self-reinforcing
Automation Work at KIXEYE
Resilient Clients
• Clients back off in response to latency
• Clients continue gameplay despite network
disruption
Elastic Services
• Services grow / shrink based on load
• Service Cluster == AWS Auto Scale Group
Automation Work at KIXEYE
Build / Deploy Pipeline
• One button
• Puppet -> Packer -> AMI -> Asgard
• Zero-downtime red-black deployment
• Futures: canarying, auto-rollback
Manageability
• Puppet for configuration management
• Flume -> ElasticSearch / Kibana for logging
• Shinken -> PagerDuty for monitoring and
alerting
The Game of Operations
Cloud
Services
DevOps Culture
Service Teams
• Give teams autonomy
• Freedom to choose technology, methodology,
working environment
• Responsibility for the results of those choices
• Hold them accountable for *results*
• Give a team a goal, not a solution
• Let team own the best way to achieve the
goal
KIXEYE Service Chassis
• Goal: “chassis” for building scalable game
services
• Minimal resources, minimal direction
• 3 people x 1 month
• Consider building on NetflixOSS
Team exceeded expectations
• Co-developed chassis, transport layer, service
template, build pipeline, red-black deployment,
etc.
• Operability and manageability from the beginning
• 15 minutes from no code to running service in
AWS (!)
• Open-sourced at github.com/kixeye
Micro-Services
Single-purpose
Simple, well-defined interface
Modular and independent
Small teams
Autonomy and responsibility
A
C D E
B
Transition to Service
Relationships
Vendor – Customer Relationship
• Friendly and cooperative, but structured
• Clear ownership and division of responsibility
• Customer can choose to use service or not (!)
Service-Level Agreement (SLA)
• Promise of service levels by the provider
• Customer needs to be able to rely on the
service, like a utility
Transition to Service
Relationships
Charging and Cost Allocation
• Charge customers for *usage* of the service
• Aligns economic incentives of customer and
provider
• Motivates both sides to optimize
The Game of Operations
Cloud
Services
DevOps Culture
One Team (!)
• Act as one team across development,
product, operations, etc.
• Solve problems instead of blaming and
pointing fingers
• Political games are not as fun as real-time
strategy games 
Everyone Is Responsible for
Prod
Everyone’s incentives are aligned
Everyone is strongly motivated to have solid
instrumentation and monitoring
“DevOps is a reorg”
– Adrian Cockcroft
Blame-Free Post-Mortems
Learn from mistakes and improve
• What did you do -> What did you learn
• Take emotion and personalization out of it
Post-mortem After Every Incident
• Document exactly what happened
• What went right
• What went wrong
Blame-Free Post-Mortems
Open and Honest Discussion
• What contributed to the incident?
• What could we have done better?
Engineers compete to take responsibility (!)
“Failure is not falling down but
refusing to get back up”
– Theodore
Roosevelt
Transition to DevOps
Organization
• Studios make user-visible games
• Services provide common endpoints
Training / Retraining
• Common bootcamp
• Train devs as Ops, Ops as devs
Transition On-call
• Use primary / secondary on-call as
apprenticeship
“You Build It, You Run It”
– Everyone
Recap: The Game of
Operations
Cloud
Services
DevOps
Come Join Us!
DevOps Whiskey Tasting, July 22
333 Bush St., San Francisco
kixeyeloveswhiskey.eventbrite.com
Hiring in SF, Seattle, Victoria,
Brisbane, Amsterdam
www.kixeye.com/jobs

DevOpsDays Silicon Valley 2014 - The Game of Operations

  • 1.
    The Game ofOperations and The Operation of Games Randy Shoup @randyshoup linkedin.com/in/randyshoup DevOpsDays Silicon Valley, June 28 2014
  • 2.
    Background CTO at KIXEYE •Real-time strategy games for web and mobile Director of Engineering for Google App Engine • World’s largest Platform-as-a-Service Chief Engineer at eBay • Multiple generations of eBay’s real-time search infrastructure
  • 3.
    1973: Xerox PARCand SuperPaint en.wikipedia.org/wiki/SuperPaint www.computerhistory.org/collections/catalog/X1001.89B
  • 4.
    40 Years Later… tomeimmortalarena.com
  • 5.
    Real-Time Strategy Gamesare … • Real-time • Spiky • Computationally- intensive • Constantly evolving • Constantly pushing boundaries  Technically and operationally demanding
  • 6.
    Operating Games: Goals PlayerFun • If players aren’t playing, we don’t have a business • If players aren’t having fun, we don’t have a business for long • Fun includes game mechanics, feature set, uptime, performance Developer Productivity and Satisfaction • We are a vendor; the studios are our customers • Must be *strictly better* than the alternatives of build, buy, borrow Cost Efficiency • More output for less
  • 7.
    The Game ofOperations Cloud • All studios and services moving to AWS • Strong focus on automation Services • Small, focused teams • Clean, well-defined interface to customers DevOps Culture • One team across development and ops
  • 8.
    The Game ofOperations Cloud Services DevOps Culture
  • 9.
    Why Cloud? (TheObvious) Provisioning Speed • Minutes, not weeks • Autoscaling in response to load Near-Infinite Capacity • No need to predict and plan for growth • No need to defensively overprovision Pay For What You Use • No “utilization risk” from owning / renting • If it’s not in use, spin it down
  • 10.
    Why Cloud? (TheLess Obvious) Instance Shaping • Instance shapes to fit most parts of the solution space (compute-intensive, IO- intensive, etc.) • If one shape does not fit, try another Service Quality • Amazon and Google know how to run data centers • Battle-tested and highly automated • World-class networking, both cluster fabric and external peering
  • 11.
    Why Cloud? (Fundamental Forces) Economics •Nearly impossible to beat Google / Amazon buying power or operating efficiencies • 2010s in computing are like 1910s in electric power Developer Adoption • It Just Works ™ • Makes it easy to fall in love with infrastructure 
  • 12.
    “Soon it willbe just as common to run your own data center as it is to run your own electric power generation” -- me
  • 13.
    Autoscaling Games are veryspiky • Very unpredictable • Huge variability between peak and trough Hits are self-reinforcing
  • 14.
    Automation Work atKIXEYE Resilient Clients • Clients back off in response to latency • Clients continue gameplay despite network disruption Elastic Services • Services grow / shrink based on load • Service Cluster == AWS Auto Scale Group
  • 15.
    Automation Work atKIXEYE Build / Deploy Pipeline • One button • Puppet -> Packer -> AMI -> Asgard • Zero-downtime red-black deployment • Futures: canarying, auto-rollback Manageability • Puppet for configuration management • Flume -> ElasticSearch / Kibana for logging • Shinken -> PagerDuty for monitoring and alerting
  • 16.
    The Game ofOperations Cloud Services DevOps Culture
  • 17.
    Service Teams • Giveteams autonomy • Freedom to choose technology, methodology, working environment • Responsibility for the results of those choices • Hold them accountable for *results* • Give a team a goal, not a solution • Let team own the best way to achieve the goal
  • 18.
    KIXEYE Service Chassis •Goal: “chassis” for building scalable game services • Minimal resources, minimal direction • 3 people x 1 month • Consider building on NetflixOSS Team exceeded expectations • Co-developed chassis, transport layer, service template, build pipeline, red-black deployment, etc. • Operability and manageability from the beginning • 15 minutes from no code to running service in AWS (!) • Open-sourced at github.com/kixeye
  • 19.
    Micro-Services Single-purpose Simple, well-defined interface Modularand independent Small teams Autonomy and responsibility A C D E B
  • 20.
    Transition to Service Relationships Vendor– Customer Relationship • Friendly and cooperative, but structured • Clear ownership and division of responsibility • Customer can choose to use service or not (!) Service-Level Agreement (SLA) • Promise of service levels by the provider • Customer needs to be able to rely on the service, like a utility
  • 21.
    Transition to Service Relationships Chargingand Cost Allocation • Charge customers for *usage* of the service • Aligns economic incentives of customer and provider • Motivates both sides to optimize
  • 22.
    The Game ofOperations Cloud Services DevOps Culture
  • 23.
    One Team (!) •Act as one team across development, product, operations, etc. • Solve problems instead of blaming and pointing fingers • Political games are not as fun as real-time strategy games 
  • 24.
    Everyone Is Responsiblefor Prod Everyone’s incentives are aligned Everyone is strongly motivated to have solid instrumentation and monitoring
  • 25.
    “DevOps is areorg” – Adrian Cockcroft
  • 26.
    Blame-Free Post-Mortems Learn frommistakes and improve • What did you do -> What did you learn • Take emotion and personalization out of it Post-mortem After Every Incident • Document exactly what happened • What went right • What went wrong
  • 27.
    Blame-Free Post-Mortems Open andHonest Discussion • What contributed to the incident? • What could we have done better? Engineers compete to take responsibility (!)
  • 28.
    “Failure is notfalling down but refusing to get back up” – Theodore Roosevelt
  • 29.
    Transition to DevOps Organization •Studios make user-visible games • Services provide common endpoints Training / Retraining • Common bootcamp • Train devs as Ops, Ops as devs Transition On-call • Use primary / secondary on-call as apprenticeship
  • 30.
    “You Build It,You Run It” – Everyone
  • 31.
    Recap: The Gameof Operations Cloud Services DevOps
  • 32.
    Come Join Us! DevOpsWhiskey Tasting, July 22 333 Bush St., San Francisco kixeyeloveswhiskey.eventbrite.com Hiring in SF, Seattle, Victoria, Brisbane, Amsterdam www.kixeye.com/jobs