Building a World in the Clouds
MMO Architecture on AWS
Jeffrey Berube
Director of Technical Operations
Red 5 Studios, Inc....
Overview
•
•
•
•
•

What is Firefall?
Why build in the cloud?
Infrastructure Goals
Evolution of the Platform
Tools for Suc...
What is Firefall?
• Free-to-play cooperative open world shooter
• “Shardless” world
• Instance-based maps
• Both persistent and transient ma...
• Free-to-play cooperative open world shooter
• “Shardless” world
• Instance-based maps
• Both persistent and transient ma...
• Free-to-play cooperative open world shooter
• “Shardless” world
• Instance-based maps
• Both persistent and transient ma...
Why?
Why build in the cloud?
• Players are unpredictable
• Developers are unpredictable
• Cyclical player behavior opens up opp...
Why build in the cloud?
• Players are unpredictable
– Forecasts can be (and usually are) wrong
• Too little hardware, Too ...
Why build in the cloud?
• Players are unpredictable
• Developers are unpredictable
• Cyclical player behavior opens up opp...
Why build in the cloud?
• Players are unpredictable
• Developers are unpredictable
– Active development has risks
• Perfor...
Why build in the cloud?
• Players are unpredictable
• Developers are unpredictable
• Cyclical player behavior opens up opp...
Player Graph
36 Hours
Player Graph with Server Overlay
Player Graph with Efficient Server Overlay
35%–37.5% Savings
Infrastructure Goals
Infrastructure Goals
Deployment and Recovery

Platform

How do we make site
management better?

How can the platform make
...
Infrastructure Goals:
Deployment and Recovery
Deployment and Recovery Goals
•
•
•
•

Quick regional expansion
On-demand scalability
Disaster recovery with minimal downt...
Deployment and Recovery Goals
• Quick regional expansion
– Traditionally, expansion is a multi-month process
• Contracts, ...
Deployment and Recovery Goals
•
•
•
•

Quick regional expansion
On-demand scalability
Disaster recovery with minimal downt...
Deployment and Recovery Goals
• Quick regional expansion
• On-demand scalability
– Automated scale up and down without* li...
Deployment and Recovery Goals
•
•
•
•

Quick regional expansion
On-demand scalability
Disaster recovery with minimal downt...
Deployment and Recovery Goals
• Quick regional expansion
• On-demand scalability
• Disaster recovery with minimal downtime...
Deployment and Recovery Goals
•
•
•
•

Quick regional expansion
On-demand scalability
Disaster recovery with minimal downt...
Infrastructure Goals:
Platform
Platform Goals
• Zero downtime game updates
• Players can play globally without restrictions
Platform Goals
• Zero downtime game updates
– Blue-Green deployment
– Doesn’t preclude scheduled maintenance
• Some things...
Platform Goals
• Zero downtime game updates
• Players can play globally without restrictions
Platform Goals
• Zero downtime game updates
• Players can play globally without restrictions
– Characters won’t be held ho...
Evolution of the Platform
The Beginning
March 2011
INET

The Beginning
March 2011
AWS Services in Use

Outside-Game

AWS

Elastic Compute Cloud (EC2)

US-West-1

CORP

Avail...
Alpha
October 2011
INET
Operator

Alpha
October 2011

AWS ELB

AWS Services in Use
Mx

HP

Outside-Game
HP
PvP

HP
MM

HP

HP

Inside-LB
Chef...
Closed Beta
April 2012
INET

Closed Beta
April 2012

Operator AWS ELB

AWS Services in Use
OW

PvP

HP

HP

Outside-Game

Outside-Game

MM

Outsi...
Gamescom
August 2012
INET
Operator

Gamescom
August 2012

AWS ELB

AWS Services in Use
OW

NPE

PvE

EU-West-1

PvP
HP

HP

HP

Outside-Game

T...
Open Beta
July 2013
INET
Operator

Open Beta
July 2013

AWS ELB

AWS Services in Use
ES

ES

ES

OW

NPE

PvE

EU-West-1

PvP
HP

MCP

MD

MM
...
Today
November 2013
INET
Operator

Today
November 2013

AWS ELB

AWS Services in Use
ES

ES

ES

ES

ES

ES

OW

NPE

PvE

EU-West-1

PvP
HP

...
Tools for Success
Third-party Tools
•
•
•
•
•
•
•
•
•

Opscode Chef
collectd
Icinga (Nagios)
Graphite
Graylog2
HAProxy
Keepalived
elasticsea...
Third-party Services
• Dyn
– Global Load Balancing
– DNS Failover

• PagerDuty
– On-call scheduling
– Phone calls!

• Ping...
Internal Tools
• Architect
• Cartographer
• Dashboards (Everywhere)
Internal Tools
• Architect
– Everything heartbeats
– Service-specific data aggregation

• Cartographer
• Dashboards (Every...
Internal Tools
• Architect
• Cartographer
• Dashboards (Everywhere)
Internal Tools
• Architect
• Cartographer
– Builds new game server stacks
– Replaces failed game server components
– Scale...
Internal Tools
• Architect
• Cartographer
• Dashboards (Everywhere)
Internal Tools
• Architect
• Cartographer
• Dashboards (Everywhere)
– Multiple data sources
• Graphite
• Production databa...
Please give us your feedback on this
presentation

MBL304
As a thank you, we will select prize
winners daily for completed...
Upcoming SlideShare
Loading in …5
×

Building a World in the Clouds: MMO Architecture on AWS (MBL304) | AWS re:Invent 2013

4,419 views

Published on

Can you really build the infrastructure required to bring a massively multiplayer online game (MMO) to life in the cloud? This session discusses the evolution of Red 5 Studios' FireFall—a free-to-play MMO. FireFall runs entirely on the AWS platform and allows players from around the world to play together in the cloud. The session covers some of the design decisions made over the last two years—the things that worked well and not so well. The session also presents some of the solutions Red 5 implemented to ease the transition from dedicated data center hardware to virtual servers in AWS.

Published in: Technology, Business
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,419
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
94
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Building a World in the Clouds: MMO Architecture on AWS (MBL304) | AWS re:Invent 2013

  1. 1. Building a World in the Clouds MMO Architecture on AWS Jeffrey Berube Director of Technical Operations Red 5 Studios, Inc. November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Overview • • • • • What is Firefall? Why build in the cloud? Infrastructure Goals Evolution of the Platform Tools for Success
  3. 3. What is Firefall?
  4. 4. • Free-to-play cooperative open world shooter • “Shardless” world • Instance-based maps • Both persistent and transient map types • Possible for many instances of a map to exist at the same time
  5. 5. • Free-to-play cooperative open world shooter • “Shardless” world • Instance-based maps • Both persistent and transient map types • Possible for many instances of a map to exist at the same time
  6. 6. • Free-to-play cooperative open world shooter • “Shardless” world • Instance-based maps • Both persistent and transient map types • Possible for many instances of a map to exist at the same time
  7. 7. Why?
  8. 8. Why build in the cloud? • Players are unpredictable • Developers are unpredictable • Cyclical player behavior opens up opportunities for significant cost savings
  9. 9. Why build in the cloud? • Players are unpredictable – Forecasts can be (and usually are) wrong • Too little hardware, Too many players – Bad for everyone • Too much hardware, Too few players – Good for players (sort of) but bad for the business – What if they don’t stick around? • Developers are unpredictable • Cyclical player behavior opens up opportunities for significant cost savings
  10. 10. Why build in the cloud? • Players are unpredictable • Developers are unpredictable • Cyclical player behavior opens up opportunities for significant cost savings
  11. 11. Why build in the cloud? • Players are unpredictable • Developers are unpredictable – Active development has risks • Performance can change drastically • New services can “appear” the day of the patch – MMOs are ALWAYS being actively developed! • (If you want to be successful…) • Cyclical player behavior opens up opportunities for significant cost savings
  12. 12. Why build in the cloud? • Players are unpredictable • Developers are unpredictable • Cyclical player behavior opens up opportunities for significant cost savings
  13. 13. Player Graph 36 Hours
  14. 14. Player Graph with Server Overlay
  15. 15. Player Graph with Efficient Server Overlay 35%–37.5% Savings
  16. 16. Infrastructure Goals
  17. 17. Infrastructure Goals Deployment and Recovery Platform How do we make site management better? How can the platform make the player experience better? • • • • Expansion Scalability Disaster Recovery Self-Healing • Downtime • Player Mobility
  18. 18. Infrastructure Goals: Deployment and Recovery
  19. 19. Deployment and Recovery Goals • • • • Quick regional expansion On-demand scalability Disaster recovery with minimal downtime Self-healing
  20. 20. Deployment and Recovery Goals • Quick regional expansion – Traditionally, expansion is a multi-month process • Contracts, Purchase and Shipping, Installation, etc. – Today, adding a region is about a week long task • Additional improvements are in the works • On-demand scalability • Disaster recovery with minimal downtime • Self-healing
  21. 21. Deployment and Recovery Goals • • • • Quick regional expansion On-demand scalability Disaster recovery with minimal downtime Self-healing
  22. 22. Deployment and Recovery Goals • Quick regional expansion • On-demand scalability – Automated scale up and down without* limits • Instance sizes desired may not always be available, however • Disaster recovery with minimal downtime • Self-healing
  23. 23. Deployment and Recovery Goals • • • • Quick regional expansion On-demand scalability Disaster recovery with minimal downtime Self-healing
  24. 24. Deployment and Recovery Goals • Quick regional expansion • On-demand scalability • Disaster recovery with minimal downtime – Traditionally, DR sites are expensive and are not always properly maintained – Our goal is to automate disaster recovery safely • We do a lot manually at present • Self-healing
  25. 25. Deployment and Recovery Goals • • • • Quick regional expansion On-demand scalability Disaster recovery with minimal downtime Self-healing
  26. 26. Infrastructure Goals: Platform
  27. 27. Platform Goals • Zero downtime game updates • Players can play globally without restrictions
  28. 28. Platform Goals • Zero downtime game updates – Blue-Green deployment – Doesn’t preclude scheduled maintenance • Some things are more safely done offline • Players can play globally without restrictions
  29. 29. Platform Goals • Zero downtime game updates • Players can play globally without restrictions
  30. 30. Platform Goals • Zero downtime game updates • Players can play globally without restrictions – Characters won’t be held hostage • Player data is available everywhere they want to be • We don’t charge a player so that they can play with their friends – Prefer closest healthy region, however
  31. 31. Evolution of the Platform
  32. 32. The Beginning March 2011
  33. 33. INET The Beginning March 2011 AWS Services in Use Outside-Game AWS Elastic Compute Cloud (EC2) US-West-1 CORP Availability Zone: b
  34. 34. Alpha October 2011
  35. 35. INET Operator Alpha October 2011 AWS ELB AWS Services in Use Mx HP Outside-Game HP PvP HP MM HP HP Inside-LB Chef Ic I I C Ar Inside-Core Log U Ad Inside-App C U Ad Inside-DB Availability Zone: b US-West-1 CORP HP Outside-LB Inside-Game AWS MD MCP Elastic Compute Cloud (EC2) Simple Storage Service (S3) Elastic Load Balancing (ELB) Simple Queue Service (SQS)
  36. 36. Closed Beta April 2012
  37. 37. INET Closed Beta April 2012 Operator AWS ELB AWS Services in Use OW PvP HP HP Outside-Game Outside-Game MM Outside-LB HP HP Inside-LB HP Inside-Game Outside-LB MD MM HP HP Inside-LB MCP Inside-Game HP HP HP Chef I Ic Gr Inside-Core Log C U I C U Ar Ad S Ar Ad S Inside-App Availability Zone: b AWS RDS Inside-App Availability Zone: c Elastic Compute Cloud (EC2) Simple Storage Service (S3) Elastic Load Balancing (ELB) Simple Queue Service (SQS) Relational Database Service (RDS) ElastiCache AWS PvP HP US-West-1 CORP OW HP
  38. 38. Gamescom August 2012
  39. 39. INET Operator Gamescom August 2012 AWS ELB AWS Services in Use OW NPE PvE EU-West-1 PvP HP HP HP Outside-Game Task Task Task MD MM Inside-Game Log Ic C U Ar L In Ad Co S A P W I Chef C U A P Gr Log Ic Outside-LB I Inside-AppTasks Chef AWS ELB Gr Inside-Core VPC Inside-Core AWS MCP Inside-App W Inside-DB Availability Zone: b US-West-2 CORP US-East-1 US-West-2 Elastic Compute Cloud (EC2) Simple Storage Service (S3) Elastic Load Balancing (ELB) Simple Queue Service (SQS) Relational Database Service (RDS) ElastiCache CloudFront Virtual Private Cloud (VPC) HQ
  40. 40. Open Beta July 2013
  41. 41. INET Operator Open Beta July 2013 AWS ELB AWS Services in Use ES ES ES OW NPE PvE EU-West-1 PvP HP MCP MD MM HP HP Inside-LB Log Ic C U Ar L In Ad Co S A P W C U A P Gr Log Gr Inside-Core Inside-Core VPC US-East-1 Ops US-East-1 Outside-LB I Chef VPC HP I Inside-AppTasks Ic HP Inside-Game Task Task Task Chef HP Outside-Game Inside-Search AWS ES ES Inside-App W Inside-DB Availability Zone: b US-West-2 CORP ES HQ Elastic Compute Cloud (EC2) Simple Storage Service (S3) Elastic Load Balancing (ELB) Simple Queue Service (SQS) CloudFront Virtual Private Cloud (VPC) Elastic MapReduce (EMR)
  42. 42. Today November 2013
  43. 43. INET Operator Today November 2013 AWS ELB AWS Services in Use ES ES ES ES ES ES OW NPE PvE EU-West-1 PvP HP Outside-Game MD MM Chef Log Ic Ic Inside-LB C U Ar L In Ad Co S A P W Gr Log Gr VPC Inside-Core VPC US-East-1 Ops AP-NorthEast-1 Inside-App C Inside-Core US-East-1 Outside-LB I Inside-AppTasks Chef HP Inside-Game AWS MCP Task Task Task A P Inside-DB Availability Zone: b US-West-2 SA-East-1 CORP Inside-Search HQ Elastic Compute Cloud (EC2) Simple Storage Service (S3) Elastic Load Balancing (ELB) Simple Queue Service (SQS) CloudFront Virtual Private Cloud (VPC) Elastic MapReduce (EMR)
  44. 44. Tools for Success
  45. 45. Third-party Tools • • • • • • • • • Opscode Chef collectd Icinga (Nagios) Graphite Graylog2 HAProxy Keepalived elasticsearch Others (Bluepill, Thin, memcached, RabbitMQ, etc.)
  46. 46. Third-party Services • Dyn – Global Load Balancing – DNS Failover • PagerDuty – On-call scheduling – Phone calls! • Pingdom – Transaction monitoring • Duo Security
  47. 47. Internal Tools • Architect • Cartographer • Dashboards (Everywhere)
  48. 48. Internal Tools • Architect – Everything heartbeats – Service-specific data aggregation • Cartographer • Dashboards (Everywhere)
  49. 49. Internal Tools • Architect • Cartographer • Dashboards (Everywhere)
  50. 50. Internal Tools • Architect • Cartographer – Builds new game server stacks – Replaces failed game server components – Scales up (or down) the servers within a pool depending on player demand • Dashboards (Everywhere)
  51. 51. Internal Tools • Architect • Cartographer • Dashboards (Everywhere)
  52. 52. Internal Tools • Architect • Cartographer • Dashboards (Everywhere) – Multiple data sources • Graphite • Production databases • Business Intelligence databases
  53. 53. Please give us your feedback on this presentation MBL304 As a thank you, we will select prize winners daily for completed surveys!

×