October 21st 2015, Cloud Gaming Architectures: From Social to Mobile to MMO, Mark Bate
Das AWS Pop-up Loft in Berlin ist nur für kurze Zeit geöffnet. Vom 15.10. bis 13.11.2015 haben Sie die einmalige Gelegenheit Teil von etwas Besonderem zu sein. Werden Sie jetzt kostenlos Loft Member und erhalten Sie exklusiven Zugang zu den attraktiven Loft-Angeboten. http://aws.amazon.com/de/start-ups/loft/de-loft/
3. Traditional: Rigid AWS: Elastic
Servers
Demand
Capacity
Excess Capacity
Wasted $$
Demand
Unmet Demand
Upset Players
Missed Revenue :(
Scale to what you need, pay for what you use
5. Common game back-end concepts
Think in terms of APIs
HTTP + JSON
Get friends, leaderboard
Binary asset data
Multiplayer servers
High availability
Scalability
6. Core (HA) game back end
ELB
• Choose region
• >=2 Availability Zones
• Amazon EC2 for app
• Elastic Load Balancing
• Amazon RDS database
• Multi-AZ
Region
7. Scale it way out
ELB
• Amazon S3 for game data
• Assets
• UGC
• Analytics
Region
8. Scale it way out
ELB
• Amazon S3 for game data
• Assets
• UGC
• Analytics
• ... With Amazon
CloudFront!
Region
CloudFront
CDN
9. Scale it way out
• Amazon S3 for game data
• Assets
• UGC
• Analytics
• ... with CloudFront!
• Auto Scaling group
• Capacity on demand
• Respond to users
• Automatic healing
ELB
Region
CloudFront
CDN
10. Scale it way out
• Amazon S3 for game data
• Assets
• UGC
• Analytics
• ... with CloudFront!
• Auto Scaling group
• Capacity on demand
• Respond to users
• Automatic healing
• Amazon ElastiCache
• Memcached
• Redis
ELB
Region
CloudFront
CDN
11. Writing is painful
• Games are write heavy
• Caching of limited use
• Key value
• Binary structures
• Database = bottleneck
ELB
Region
CloudFront
CDN
17. Related sessions
DAT204 NoSQL? No Worries: Building Scalable
Applications on AWS NoSQL Services
DAT401 Amazon DynamoDB Deep Dive: Schema Design,
Indexing, JSON, Search, and More
GAM401 Serverless Mobile Game Development with
Amazon Cognito, AWS Lambda, and Amazon
DynamoDB
18. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO 18
Customer Story: Devsisters
19. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
What to expect from the session
19
How we improved our design
Tips and tricks
Retrospect
How we started
20. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Cookie Run
20
21. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Cookie Run video
21
22. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO 22
About Cookie Run
• 70M~ downloads
• 10M DAU
• Top free 1st in 10 countries
• Top free 10th in 38 countries
23. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
More about Devsisters and Cookie Run
23
24. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
How We Started
24
25. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
In early 2013…
Lack of infrastructure, lack of developer, no hope
(1 server developer / 0 system engineers)
Only 1 game in service
Ovenbreak 2
- AWS US East
Cookie Run
- Only 1 person, 1 month left
25
26. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Goal
26
Highly reliable Quality assured
Scalable design
Auto configuring
and scaling
Real-time
monitoring system
Log system
27. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
First design
27
Game server
Operation tool
Monitoring
Java, Spring MVC, MySQL 5.5
Python, Django, Boto
Amazon CloudWatch, Zabbix, Statsd, Graphite
28. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
First design
28
29. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
After 11 days
29
30. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Design Improvements
30
31. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Design improvements
31
Improving the logging system
Improving the game patch system
Adding global user ranking system
Redesigning the back end
32. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Redesigning the back end
Players send game hearts to each other. Back ends do the bookkeeping
- Plan A: Used MySQL for storing data
Trouble: MySQL can’t keep up;; too many rows (100M ~)
- Plan B: Gave unlimited hearts to users! Disabled the feature
Trouble: Not so bad, but need to come up with a better solution
32
Situation
33. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Solution
MySQL → NoSQL (Couchbase)
Use MySQL for game data (shop data, stage data, …)
Use NoSQL for user data (user items, level, coin, …)
33
Redesigning the back end
34. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Before
34
35. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
After
35
36. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Improving the logging system
We need real-time log querying capability
36
Real-time log viewing system based on ELK
Situation
Solution
37. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Before
37
38. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
After
38
39. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
/Real-time log viewing system
39
40. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Improving the game patch system
40
App Store binary size limit
Some resources need to be downloaded on demand
Wanted to distribute patches without App Store update
Constructed a decent patch system
Based on Amazon S3 and Amazon CloudFront
Situation
Solution
41. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Before
41
42. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
After
42
43. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Improving the logging system
43
Total log size >10 TB;; want to analyze all logs
Situation forced us to look for big data solutions
Adopted big data platforms using Amazon EMR or Amazon EC2
Situation
Solution
Eventually migrated to Spark and Spark SQL
44. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Before
44
45. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
After
45
46. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Spark
46
47. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Log dashboard
47
48. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Adding global user ranking system
48
Want to introduce global user ranking system
Use ordered set based on skip list using with ElastiCache
…with custom caching and a lot of optimization techniques
Situation
Solution
49. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Before
49
50. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
After
50
51. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Tips and Tricks
51
52. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
THIS CAN HAPPEN TO YOU
BASED ON THE TRUE STORY OF OUR TEAM
52
WARNING
53. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Auto Scaling gotchas
Frequency: More than 10 times during 2 years
Many users connect to the game simultaneously
• During holiday seasons
• Start of in-game events
• When bulk push notifications are sent
• Or reasons unknown
Booting instances takes several minutes,
which isn’t quick enough to handle spikey loads
We have to predict traffic surges and prepare beforehand
53
54. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Our bulk push system
54
55. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Auto Scaling gotchas
Don’t set minimum instance of 1 or 2
If one machine dies, service fails
Use multiple Availability Zones
Sometimes instance availability of a single AZ can run out
Use multiple AZ with ELB cross-zone balancing
55
56. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Auto Scaling gotchas
Set scale-out(scale-in) policy meticulously
scale-out: +4 when Latency >= 0.1 for 2 minutes
scale-in: -2 when CPUUtilization < 10 for 2 minutes
Sometimes scale-up can be a useful option
56
57. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Chef server failure
Auto Scaling group relying on Chef server is dangerous
Chef server is a single point of failure (SPOF)
May become unresponsive when too many servers start simultaneously
Errors happen in unexpected places!
57
58. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Couchbase storage failure
Hardware problems can occur in EC2 instances
The worst, the most hopeless system failure
Front end API server can crash;; that’s OK
But if you are maintaining a database on EC2, this can be a tragedy
It really happens
58
59. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Couchbase storage failure
June 2015
A monumental hell gate in our company history
Server down for 12 consecutive hours because of a disk error in Couchbase
Also, our daily backup script had not worked for 1 week prior to the shutdown
Some data were restored via replication
The other data were restored through adding the lost week’s logs to previously
backed up data
Lesson learned: Replica is necessary. Confirm backups.
59
60. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Overseas network failure
Frequency: More than 5 times over 2 years
This situation has really happened
ISPs cut costs leading to overseas packet loss
Just Call AWS
60
61. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Final Design Review
61
62. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
First design
62
63. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Final
63
64. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Future Plans
64
65. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud Gaming Architectures from Mobile to Social to MMO
Future plans
Transactional log system
High latency / packet loss networks : QUIC
…
Entertain the world!
65
66. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
jaeman@devsisters.com
68. Identity
Providers
Unique
IdentitiesJoe Anna Bob
Any Device
Any Platform
Any AWS
Service
Helps implement security best practices
Securely access any AWS service from mobile
device;; it simplifies the interaction with AWS
Identity and Access Management
Support multiple login providers
Easily integrate with major login providers for
authentication, or use your own authentication
system
Unique users vs. devices
Manage unique identities;; automatically recognize
unique user across devices and platforms
Mobile
Analytics
S3 DynamoDB Kinesis
Your own
Auth
Amazon Cognito
69. Synchronize data across devices with Amazon Cognito
Sync game state
across OS, devices
State transition
(link multiple accounts)
Sync user profiles
across OS, devices, web
70. Related sessions
GAM401 Serverless Mobile Game Development with
Amazon Cognito, AWS Lambda, and Amazon
DynamoDB
MBL402 Mobile Identity Management and Data
Synchronization Using Amazon Cognito
WRK202 Build a Scalable Mobile App on Serverless,
EventTriggered, BackEnd Logic
72. Multiplayer game servers
Region
• API back-end app
• Core session
• Matchmaking
• S3 + CloudFront
• DLC, assets
• Game saves
• UGC
• Public server tier
• Direct client socket
• Scale on players
74. Multiplayer game servers
① Login via API
② Request matchmaking
③ Get game server IP
④ Connect to server
⑤ Pull down assets
⑥ Other players join
Region
76. Related sessions
GAM403 From 0 to 60 Million Player Hours in 400 Billion
Star Systems
GAM404 Evolve: Hunting Monsters in a Low Latency
Multiplayer Game on Amazon EC2
GAM407 Quiplash: The Multiscreen, Multidevice,
Multiplayer Game for 10,000
77. Wrap it up already
Use Auto Scaling to save money
Amazon CloudFront + Amazon S3 for download and upload
Painful DIYDB? No! Use Amazon DynamoDB
Dynamically manage game servers using the APIs
• Even multiregion!