DevOpsDays Silicon Valley 2014 - The Game of Operations

2,912 views

Published on

Operating online games is fun and challenging. Games are some of the spikiest workloads around, and real-time really means *real-time*. Randy shares many of the DevOps techniques his team has put into practice at KIXEYE: Cloud infrastructure, Service teams, and DevOps Culture. He talks about elastic workloads, micro-services, configuration automation, and a common service "chassis". He further discusses the organizational and technical disciplines of team autonomy, internal vendor-customer relationships, and, of course, "you build it, you run it"!

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,912
On SlideShare
0
From Embeds
0
Number of Embeds
106
Actions
Shares
0
Downloads
21
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

DevOpsDays Silicon Valley 2014 - The Game of Operations

  1. 1. The Game of Operations and The Operation of Games Randy Shoup @randyshoup linkedin.com/in/randyshoup DevOpsDays Silicon Valley, June 28 2014
  2. 2. Background CTO at KIXEYE • Real-time strategy games for web and mobile Director of Engineering for Google App Engine • World’s largest Platform-as-a-Service Chief Engineer at eBay • Multiple generations of eBay’s real-time search infrastructure
  3. 3. 1973: Xerox PARC and SuperPaint en.wikipedia.org/wiki/SuperPaint www.computerhistory.org/collections/catalog/X1001.89B
  4. 4. 40 Years Later … tomeimmortalarena.com
  5. 5. Real-Time Strategy Games are … • Real-time • Spiky • Computationally- intensive • Constantly evolving • Constantly pushing boundaries  Technically and operationally demanding
  6. 6. Operating Games: Goals Player Fun • If players aren’t playing, we don’t have a business • If players aren’t having fun, we don’t have a business for long • Fun includes game mechanics, feature set, uptime, performance Developer Productivity and Satisfaction • We are a vendor; the studios are our customers • Must be *strictly better* than the alternatives of build, buy, borrow Cost Efficiency • More output for less
  7. 7. The Game of Operations Cloud • All studios and services moving to AWS • Strong focus on automation Services • Small, focused teams • Clean, well-defined interface to customers DevOps Culture • One team across development and ops
  8. 8. The Game of Operations Cloud Services DevOps Culture
  9. 9. Why Cloud? (The Obvious) Provisioning Speed • Minutes, not weeks • Autoscaling in response to load Near-Infinite Capacity • No need to predict and plan for growth • No need to defensively overprovision Pay For What You Use • No “utilization risk” from owning / renting • If it’s not in use, spin it down
  10. 10. Why Cloud? (The Less Obvious) Instance Shaping • Instance shapes to fit most parts of the solution space (compute-intensive, IO- intensive, etc.) • If one shape does not fit, try another Service Quality • Amazon and Google know how to run data centers • Battle-tested and highly automated • World-class networking, both cluster fabric and external peering
  11. 11. Why Cloud? (Fundamental Forces) Economics • Nearly impossible to beat Google / Amazon buying power or operating efficiencies • 2010s in computing are like 1910s in electric power Developer Adoption • It Just Works ™ • Makes it easy to fall in love with infrastructure 
  12. 12. “Soon it will be just as common to run your own data center as it is to run your own electric power generation” -- me
  13. 13. Autoscaling Games are very spiky • Very unpredictable • Huge variability between peak and trough Hits are self-reinforcing
  14. 14. Automation Work at KIXEYE Resilient Clients • Clients back off in response to latency • Clients continue gameplay despite network disruption Elastic Services • Services grow / shrink based on load • Service Cluster == AWS Auto Scale Group
  15. 15. Automation Work at KIXEYE Build / Deploy Pipeline • One button • Puppet -> Packer -> AMI -> Asgard • Zero-downtime red-black deployment • Futures: canarying, auto-rollback Manageability • Puppet for configuration management • Flume -> ElasticSearch / Kibana for logging • Shinken -> PagerDuty for monitoring and alerting
  16. 16. The Game of Operations Cloud Services DevOps Culture
  17. 17. Service Teams • Give teams autonomy • Freedom to choose technology, methodology, working environment • Responsibility for the results of those choices • Hold them accountable for *results* • Give a team a goal, not a solution • Let team own the best way to achieve the goal
  18. 18. KIXEYE Service Chassis • Goal: “chassis” for building scalable game services • Minimal resources, minimal direction • 3 people x 1 month • Consider building on NetflixOSS Team exceeded expectations • Co-developed chassis, transport layer, service template, build pipeline, red-black deployment, etc. • Operability and manageability from the beginning • 15 minutes from no code to running service in AWS (!) • Open-sourced at github.com/kixeye
  19. 19. Micro-Services Single-purpose Simple, well-defined interface Modular and independent Small teams Autonomy and responsibility A C D E B
  20. 20. Transition to Service Relationships Vendor – Customer Relationship • Friendly and cooperative, but structured • Clear ownership and division of responsibility • Customer can choose to use service or not (!) Service-Level Agreement (SLA) • Promise of service levels by the provider • Customer needs to be able to rely on the service, like a utility
  21. 21. Transition to Service Relationships Charging and Cost Allocation • Charge customers for *usage* of the service • Aligns economic incentives of customer and provider • Motivates both sides to optimize
  22. 22. The Game of Operations Cloud Services DevOps Culture
  23. 23. One Team (!) • Act as one team across development, product, operations, etc. • Solve problems instead of blaming and pointing fingers • Political games are not as fun as real-time strategy games 
  24. 24. Everyone Is Responsible for Prod Everyone’s incentives are aligned Everyone is strongly motivated to have solid instrumentation and monitoring
  25. 25. “DevOps is a reorg” – Adrian Cockcroft
  26. 26. Blame-Free Post-Mortems Learn from mistakes and improve • What did you do -> What did you learn • Take emotion and personalization out of it Post-mortem After Every Incident • Document exactly what happened • What went right • What went wrong
  27. 27. Blame-Free Post-Mortems Open and Honest Discussion • What contributed to the incident? • What could we have done better? Engineers compete to take responsibility (!)
  28. 28. “Failure is not falling down but refusing to get back up” – Theodore Roosevelt
  29. 29. Transition to DevOps Organization • Studios make user-visible games • Services provide common endpoints Training / Retraining • Common bootcamp • Train devs as Ops, Ops as devs Transition On-call • Use primary / secondary on-call as apprenticeship
  30. 30. “You Build It, You Run It” – Everyone
  31. 31. Recap: The Game of Operations Cloud Services DevOps
  32. 32. Come Join Us! DevOps Whiskey Tasting, July 22 333 Bush St., San Francisco kixeyeloveswhiskey.eventbrite.com Hiring in SF, Seattle, Victoria, Brisbane, Amsterdam www.kixeye.com/jobs

×