Gaming in the Cloud: How Gearbox Software Uses Amazon Web Services to Reach Millions of Gamers


Published on

Gearbox Software uses cloud technology to power the SHiFT service and communicate directly with millions of fans. This presentation describes some of the things we have done with SHiFT in Borderlands 2 and how we have used the cloud to reach a broad audience.

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Gearbox is an Award-Winning Independent Video Game Developer Based in Plano, TXShot of our playtest lab where our User Research department conducts studies on how people play and respond to our games
  • Borderlands 2 received a lot of awards in2012Game of the Year from X-PlayIGN People's Choice Award for Best Overall GameMost Played New Game from RaptrBest Cooperative Multiplayer from Game InformerBest Shooter and Character of the Year (Claptrap) from Spike, Editor's Pickthe list goes on and on, with other awards coming in from US Military Gamers, PlayStation Blog, Wired, Yahoo, Complex, Mature Gaming, Rev3Games, The Speaky's (Kotaku Community Awards), All That's Epic and the community-voted G4TV Videogame Deathmatch just to name a few
  • Artists at work
  • Common area in our studio
  • Borderlands Introduced in 2009Co-op Shooter LooterFPS Action, Action-RPG Mechanics4 player Cooperative, drop-in drop-out
  • Borderlands 2 released in 2012 and to date over 16.5 million sales in the franchise.Refined Shooter Looter, enhanced coop playBuilt SHiFT and Spark to connect to communityA new initiative, something we’ve never done before
  • Customer-facing, Fan and reward-focused
  • Linux, Ruby, Rails, MongoDB, Redis, Puppet, JavaMySQL, HadoopAll running on Amazon Web Services
  • Big believers in open source, love technology that has strong user communitiesLinux, Ruby, Rails, MongoDB, Redis, Puppet, JavaMySQL, HadoopAll running on Amazon Web Services tying it together
  • 23 Products and Features of AWS in Use TodayWe feel like AWS has been a huge enabler for Spark, especially with a small team like ours
  • Launch night was big, the first weekend was the peak. Sustained traffic for the first couple of monthsboosted by DLC, eventually settle into a stable player base
  • I decided we needed a beta to prove out what we were doing
  • Launched Friday, September 9, 2011
  • Bad (or maybe good) luck that the servers were up just long enough that by Btest launch they had drifted enough to expose the bug – we rejected tickets from the futurePatch Tuesday is a thing for a reasonAlso release Micropatches on Tuesdays, built our hotfix workflow around this lesson
  • We started in March, looked at Steam dataPredictable decline to planned July Launch
  • Unfortunately, R&D caused our schedule to slip a bit, so we didn’t launch until SeptemberMeanwhile, marketing was doing what they do best: selling our game! A couple of things happenedAnd, we announced Borderlands 2!
  • Coming out of the BTest1 experience we knew we had some unanswered questions.We wanted to try new tools and infrastructure and we wanted to get experience with it with less than a year to go until Borderlands 2 retail launch
  • Steep learning curve as the team was familiar with traditional IT environment. Experience with virtualization, but on a much smaller scale for internal resourcesStill, we jumped right in and started digesting the APIs, had an environment up and running pretty quickly, with a LONG list of things to improve on post-btest2Launched Tuesday, December 13, 2011
  • Nice validation of our decision to deploy on AWS
  • 9 months from BTest2 to finish up the core system implementation and get ready for the vertical AAA launchConstant communication with Business and Marketing to understand the expected Day 1 / Week 1 sales
  • Things generally worked out OK!There were a few issues to solve in the first week of launch but the team largely survived unscathed
  • Valley to peak was about a factor of 4Ran custom scripts to change capacity, very painful
  • While looking for real-time code redemption results, I issued a bad query that impacted some monitoringTook most of the afternoon and evening to recoverRedis failover scripts did not work as expectedRestart monitoring node, stabilize clusterMove some monitoring functionality to new nodeLessons:Try not to intermingle monitoring for different componentsBe extra careful querying 100MM record datasets!
  • Right at 2pm, code drops and traffic spikes – had to redeem code in main menuDoes not recover for quite some time, users play is interrupted
  • Implement ASG based on AMIsCompletely overhauled deploymentImplemented centralized log collection and searchFixed memory leaks in some appsImplemented VPCMade apps stateless (mostly)ActiveMQ something we are still using but want to move away from as it is hard to scale dynamically
  • Find opportunities to get things built that improve the platform as a wholeDid some other things too
  • Later that year, integrated News back into Borderlands 2Community team thanked usLive team thanked us as well 
  • Using EMR, we were able to finally get a handle on our data
  • Confidence in our systems and data processing felt realLaunched a sweepstakes, called the Borderlands 2 $100,000 Loot HuntGave away cash and prizes for playing our game30 day eventDaily challenge: kill this enemy and earn a special weapon rewardCommunity goal: take that weapon and kill some other enemies with it, work towards the total as a communityDay 7 rewards for hitting all the goals: a bonus weapon drop on the daily challenge
  • Nearly killed meEverything on the backend – tweak game via MicropatchesEMR for data processingData Pipeline to tie it together……but eventually replaced that as it didn’t feel ready for us
  • Tremendous participation, over 1 million entries from fansPut together infographic as a result
  • iOS and Android apps integrated with SHiFTBuilt a new service for tracking Item CollectionBuilt OAuth service to permit loginsConnects to existing News and Account service
  • Team of 10 launched Spark and supports it todayYou built it, you operate it! Some specialization in the team but in general a lot of collaboration. Everyone works together to keep Spark running
  • Started with just 3 services (Auth, Configuration, Telemetry)Built up over time, now over 25 apps in the backend – saw this with Aliens bringing News, LootTheWorld bringing Oauth and Item CollectionsEvery new piece gives almost a geometric new capability to the platform
  • Kinesis can solve ActiveMQ issuesGive DataPipeline a whirl again as it maturesImplement Cloud Formation to make deploying an entire application and full environment turnkey
  • Keep our eyes openAzure has some compelling servicesXbox Live Cloud Compute, powering the new Titanfall game on Xbox One is very interestingI hear Google has great performance
  • New titles in developmentDesigners find new ways to use existing services
  • Gaming in the Cloud: How Gearbox Software Uses Amazon Web Services to Reach Millions of Gamers

    2. 2. A WORD ABOUT ME I’ve been programming for 25+ years Making games since 1995; at Gearbox for 12 years Network Programming on multiple titles • Halo: Combat Evolved (2003: PC) • Brothers in Arms: Road to Hill 30 (2005: PC/Xbox) • Brothers in Arms: Hell’s Highway (2008: PC/PS3/Xbox 360) • Borderlands (2009: PC/PS3/Xbox 360) • Borderlands 2 (2012: PC/PS3/Xbox 360) Currently directing Spark team and SHiFT
    6. 6. BASED IN PLANO, TX
    8. 8. OUR BRANDS
    9. 9. OUR BRANDS
    10. 10. OUR BRANDS
    11. 11. OUR BRANDS
    13. 13. A WORD ABOUT BORDERLANDS 2 Franchise sales of over 16.5 Million
    14. 14. SHIFT SHiFT: Our online service • In-game, Web, Mobile
    15. 15. WHY BUILD THIS? Games are increasingly social, connected experiences Next generation of games: Always on, Always Connected AAA Games must go beyond the box • Embrace the web and mobile, companion experiences • Engage with players any time, anywhere • Build the brand Ultimately, all about the customer • Connection directly to the fans • Enable the community to forge connections
    16. 16. Our backend platform Internal name, describes the Team and Technology Small team of 10 devops Services-Oriented Architecture SPARK Source:
    17. 17. SPARK
    18. 18. Amazon EC2 Amazon EMR Amazon Kinesis Amazon Route 53 Elastic Load Balancing Amazon VPCAuto Scaling Amazon S3Amazon EBSCloudFront DynamoDBAmazon RDSElastiCache Amazon Redshift CloudWatchAWS Data Pipeline AWS CloudFormationAWS CloudTrail IAM Amazon SES Amazon SNS Amazon SQS virtual private cloud
    20. 20. THE CHALLENGE OF AAA GAMES Startups & mobile teams reference a soft launch, gradual run-up to inflection point (John Mayer tweets about Words with Friends) Day 0 Day 1 Day 30 Day 120
    21. 21. THE CHALLENGE OF AAA GAMES AAA game launches are the opposite: Vertical, long tail and plateau Day 0 Day 1 Day 30 Day 120 Startup AAA
    22. 22. BUILDING THE SERVICE Research Build a team Start coding Ship it 2-3 years later? …. This isn’t easy. Is there a better way?
    23. 23. BUILDING A BETA
    24. 24. BUILDING A BETA We used Borderlands 1 as a testbed for Borderlands 2 Built on Slicehost • At the time all Gearbox websites were hosted there • Ran our own MySQL and ActiveMQ instances Manually provisioned hardware and configured software • Took a couple of weeks to get everything working • A bit of a painful, heroic effort
    25. 25. BENEFITS OF BETA Clock synchronization problem on server • Servers slowly drifted away from game clients • Some crash reports early… • …By Saturday morning, all clients crashing! • Workaround server side, instantly fixed crashes! Lessons • Some test are vectors very difficult to predict • Server tunability is incredibly valuable • Tuesdays are the Best Days! (Not Friday!)
    26. 26. BETA CAPACITY PLANNING Looked at Steam data in March Predictable decline to July Launch March May July September
    27. 27. BETA CAPACITY PLANNING We shipped Btest in September… Steam Summer Sale! Borderlands 2 announced! March May July September Planned Actual
    28. 28. BETA CAPACITY PLANNING Scrambled to handle dramatically higher load • Resized DBs, more servers, reconfiguration • Painful! Lessons: • Pay close attention and adjust constantly • Be plugged in to PR and Business • Be agile, use tools to help agility
    29. 29. DO ANOTHER BETA! Source:
    30. 30. SPARK -> CLOUD BTest1 was hard to operate on Slicehost • Capacity hard to adjust, and we didn’t get it right • We knew we needed to design for more flexibility • Tools didn’t support the agility we needed BTest2 Shipped on Amazon Web Services • EC2, RDS, ELB • Puppet to configure instances • Steep learning curve, but paid off • Didn’t get everything right…
    31. 31. BTEST2: HOLIDAY STABILITY We launched and were pretty stable However, problem Christmas evening! • Our game was still selling, new people playing • Queues were backing up, not severe • A few days later, CPU is pegged! • The Cloud to the rescue! Deploy more bigger! Lessons: • Queue storage in cloud gave wiggle room • It was actually pretty easy to recover from CPU peg • Capacity planning still hard!
    32. 32. BTEST2: MISSED OPPORTUNITIES New to AWS, Deployed classic EC2 instances Skipped VPC • This turned out to be a mistake • More difficult to secure some resources like we wanted • Had to build load balancing logic into app layer Lessons: • Embrace as much of the feature set as you can • Don’t be afraid to choose long term over short term • Especially for a Beta!
    33. 33. MOVING TO LAUNCH
    34. 34. LAUNCHING BORDERLANDS 2 Borderlands 2 launch: September 18, 2012 Applied some lessons from BTest2 • Doubled down on load testing • Improved our usage of Puppet and Capistrano • Pre-warmed our ELBs with Amazon and established LOC Latest capacity info from industry friends and experts projected we would survive • But still, wave of terror washed over me at T-6 hrs • Capacity planning is hard!
    36. 36. DAY 2: KEEPING TELEMETRY GOING Launch week capacity was tough to manage We wanted to keep costs in check, but had not implemented AWS Auto-Scaling Groups Manually add/remove instances at set times
    37. 37. A week post-launch we were stable enough to use SHiFT Codes • Randy got things started with some quick tests • Engaged directly with devops team to measure results • Got a little TOO engaged… SHIFT CODES!
    38. 38. SHIFT CODES: CHAOS Lessons: Try not to intermingle monitoring for different components Be extra careful querying 100MM record datasets!
    39. 39. SHIFT CODES: UNEXPECTED BEHAVIOR Telemetry traffic pattern changes when a code drops Users Save & Exit game, wait to redeem in menu Causes spike and lull in telemetry traffic
    40. 40. TAKING SPARK TO 1.0 We shipped Borderlands 2 on something like a 0.8 Spent next 6 months improving every aspect of platform
    41. 41. ADDED MORE SERVICES AND TITLES Borderlands 2 was a success! Quickly integrated into Aliens: Colonial Marines Developed a News service to communicate directly to fans
    43. 43. BEYOND THE GAME
    44. 44. GOT EXPERIENCE WITH HADOOP & EMR 3 months 3 days 3 hours 1 10 100 1000 10000 Generation 1 Generation 2 Generation 3 Processing Time in hours for 1 month of raw data
    45. 45. SWEEPSTAKES! Borderlands 2 Game of the Year Edition release October 2013
    46. 46. BEHIND THE LOOT HUNT Inception to ship in 2 months No changes to the Game or Core systems Goals • Put the R&D EMR effort into production • Try Elasticache with Redis • Learn something about running a live community event Amazon EMR AWS Data Pipeline
    51. 51. LOOT THE WORLD!
    52. 52. LOOT THE WORLD!
    53. 53. LOOT THE WORLD
    54. 54. WHY DID WE SUCCEED? Great team that believed in the vision Adopt Devops Mentality
    55. 55. WHY DID WE SUCCEED? Start simple and build piece-by-piece Learn as you go • Optimize • Refactor • Measure
    56. 56. WHAT’S NEXT? New services look appealing to us Amazon Kinesis AWS Data Pipeline AWS CloudFormation
    57. 57. WHAT’S NEXT? Evaluate other Cloud Providers
    59. 59. GEARBOX IS HIRING! Come join the team! • Designers • Artists • Programmers • Devops Jimmy Sieben @jimmys