SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Use AWS to learn how much players love your game by analyzing in-game metrics to measure engagement and retention. Start simple by uploading data to S3 and analyzing it with Redshift. Add additional game data sources and dive deeper with Cohort analysis. Finally I cover real-time analytics with Kinesis and Spark.
Use AWS to learn how much players love your game by analyzing in-game metrics to measure engagement and retention. Start simple by uploading data to S3 and analyzing it with Redshift. Add additional game data sources and dive deeper with Cohort analysis. Finally I cover real-time analytics with Kinesis and Spark.
1.
AWS Gaming Solutions | GDC 2014
Game Analytics with AWS
Or, How to learn what your players love so they will love your game
Nate Wiger @nateware | Principal Gaming Solutions Architect
2.
AWS Gaming Solutions | GDC 2014
Mobile Game Landscape
• Free To Play
• In-App Purchases
• Long-Tail
• Cross-Platform
• Go Global
• User Retention = Revenue
17.
AWS Gaming Solutions | GDC 2014
Plumbing
① Create S3 bucket ("mygame-analytics-events")
② Request a security token for your mobile app:
http://docs.aws.amazon.com/STS/latest/UsingSTS/Welcome.html
③ Upload data from your users' devices
④ Run a scheduled copy to Redshift
⑤ Setup Tableau to access Redshift
⑥ Go to the Beach
18.
AWS Gaming Solutions | GDC 2014
Loading Redshift from S3
copy
events
from
's3://mygame-‐analytics-‐events'
credentials
'aws_access_key_id=<access-‐key-‐id>;
aws_secret_access_key=<secret-‐access-‐key>'
delimiter=',';
Scheduled Redshift Load using Data Pipeline:
http://aws.amazon.com/articles/1143507459230804
19.
AWS Gaming Solutions | GDC 2014
• Also Collect Server Logs
• Periodically Upload to S3
• Stuff into Redshift
• External Analytics Data Too
More Data Sources
EC2
External
Analytics
20.
AWS Gaming Solutions | GDC 2014
Logrotate to S3
/var/log/apache2/*.log
{
sharedscripts
postrotate
sudo
/usr/sbin/apache2ctl
graceful
s3cmd
sync
/var/log/*.gz
s3://mygame-‐logs/
endscript
}
Blog Entry on Log Rotation:
http://www.dowdandassociates.com/blog/content/howto-rotate-logs-to-s3/
And/or, Use ELB Access Logs:
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/
access-log-collection.html
21.
AWS Gaming Solutions | GDC 2014
• Different File Formats
• Device vs Apache vs CDN
• Cleanup with EMR Job
• Output to Clean Bucket
• Load into Redshift
Dealing With Messy Data
EC2
22.
AWS Gaming Solutions | GDC 2014
Redshift vs Elastic MapReduce
Redshift
• Columnar DB
• Familiar SQL
• Structured Data
• Batch Load
• Faster to Query
• Long-term Storage
Elastic MapReduce
• Hadoop
• Hive/Pig are SQL-like
• Unstructured Data
• Streaming Loop
• Scales > PB's
• Transient
23.
AWS Gaming Solutions | GDC 2014
• Integrate Game DB
• Load Directly into Redshift
• Redshift does Intelligent Merge
• Tracks Hash Keys, Columns
Direct From DynamoDB
EC2
24.
AWS Gaming Solutions | GDC 2014
• Integrate Game DB
• Load Directly into Redshift
• Redshift does Intelligent Merge
• Tracks Hash Keys, Columns
• Or Stream into EMR
Direct From DynamoDB
EC2
25.
AWS Gaming Solutions | GDC 2014
Loading Redshift from DynamoDB
copy
games
from
'dynamodb://games'
credentials
'aws_access_key_id=<access-‐key-‐id>;
aws_secret_access_key=<secret-‐access-‐key>';
copy
events
from
's3://mygame-‐analytics-‐events'
credentials
'aws_access_key_id=<access-‐key-‐id>;
aws_secret_access_key=<secret-‐access-‐key>'
delimiter=',';
28.
AWS Gaming Solutions | GDC 2014
Back To Basics
2014-‐01-‐24,nateware,e4df,login
2014-‐01-‐24,nateware,e4df,gamestart
2014-‐01-‐24,nateware,e4df,gameend
2014-‐01-‐25,nateware,a88c,login
2014-‐01-‐25,nateware,a88c,friendlist
2014-‐01-‐25,nateware,a88c,gamestart
29.
AWS Gaming Solutions | GDC 2014
Measure Retention: Repeated Plays
create
view
events_by_user_by_month
as
select
user_id,
date_trunc('month',
event_date)
as
month_active,
count(*)
as
total_events
from
events
group
by
user_id,
month_active;
31.
AWS Gaming Solutions | GDC 2014
Cohorts & Cambria
• Enables calculating relative metrics
• Group users by a common attribute
– Month game installed
– Demographics
• Run analysis by cohort
– Join with metrics
• Use Redshift as it's SQL
– Example of where SQL is a good fit
32.
AWS Gaming Solutions | GDC 2014
Creating Cohorts with Redshift
create
view
cohort_by_first_event_date
as
select
user_id,
date_trunc('month',
min(event_date))
as
first_month
from
events
group
by
user_id;
http://snowplowanalytics.com/analytics/customer-
analytics/cohort-analysis.html
34.
AWS Gaming Solutions | GDC 2014
Moar Cohorts
• Define multiple cohorts
– By activity, time, demographics
– As many as you like
• Change cohort depending on analysis
• Join same metrics with different cohorts
– Retention by date
– Retention by demographic
– Retention by average plays/month quartile
37.
AWS Gaming Solutions | GDC 2014
Cohorts by Type of Activity
create
view
cohort_by_first_play_date
as
select
user_id,
date_trunc('month',
min(event_date))
as
first_month
from
events
where
action
=
'gamestart'
group
by
user_id;
40.
AWS Gaming Solutions | GDC 2014
Real-Time Analytics
Batch
• What game modes do
people like best?
• How many people have
downloaded DLC pack 2?
• Where do most people
die on map 4?
• How many daily players
are there on average?
Real-Time
• What game modes are
people playing now?
• Are more or less people
downloading DLC today?
• Are people dying in the
same places? Different?
• How many people are
playing today? Variance?
41.
AWS Gaming Solutions | GDC 2014
Why Real-Time Analytics?
30x in 24 hours
What if you ran a promo?
42.
AWS Gaming Solutions | GDC 2014
Real-Time Tools
Spark
• High-Performance
Hadoop Alternative
• Berkeley.edu
• Compatible with HiveQL
• 100x faster than Hadoop
• Runs on EMR
Kinesis
• Amazon fully-managed
streaming data layer
• Similar to Kafka
• Streams contain Shards
• Each Shard ingests data
up to 1MB/sec, 1000 TPS
• Data stored for 24 hours
43.
AWS Gaming Solutions | GDC 2014
• Always Batch Due to S3
Back To Basics [Dubstep Remix]
EC2
44.
AWS Gaming Solutions | GDC 2014
• Stream Data With Kinesis
• Multiple Writers and Readers
• Still Output to Redshift
Need Data Faster!
EC2
45.
AWS Gaming Solutions | GDC 2014
• Stream Data With Kinesis
• Multiple Writers and Readers
• Still Output to Redshift
• Stream to Spark on EMR
• Storm via Kinesis Spout
• Custom EC2 Workers
Lots of Ins and Outs
EC2
EC2
46.
AWS Gaming Solutions | GDC 2014
Data
Sources
App.4
[Machine
Learning]
AWS
Endpoint
App.1
[Aggregate
&
De-‐Duplicate]
Data
Sources
Data
Sources
Data
Sources
App.2
[Metric
Extrac=on]
S3
DynamoDB
Redshift
App.3
[Sliding
Window
Analysis]
Data
Sources
Availability
Zone
Shard 1
Shard 2
Shard N
Availability
Zone
Availability
Zone
Introducing Amazon Kinesis
Service for Real-Time Big Data Ingestion
47.
AWS Gaming Solutions | GDC 2014
Putting Data into Kinesis
• Producers use PUT to send data to a Stream
• PutRecord {Data, PartitionKey, StreamName}
• Partition Key distributes PUTs across Shards
• Unique Sequence # returned on PUT call
• Documentation:
http://docs.aws.amazon.com/kinesis/latest/dev/
introduction.html
Producer
Shard 1
Shard 2
Shard 3
Shard n
Shard 4
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Kinesis
50.
AWS Gaming Solutions | GDC 2014
Death in Real-Time
PUT
"kills"
{"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"274,591,48"}
PUT
"kills"
{"game_id":"e4b5","map":"Boston","killer":13,"victim":27,"coord":"101,206,35"}
PUT
"kills"
{"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"165,609,17"}
PUT
"kills"
{"game_id":"e4b5","map":"Boston","killer":6,"victim":29,"coord":"120,422,26"}
PUT
"kills"
{"game_id":"30a4","map":"Los
Angeles","killer":34,"victim":18,"coord":"163,677,18"}
PUT
"kills"
{"game_id":"30a4","map":"Los
Angeles","killer":20,"victim":37,"coord":"71,473,20"}
PUT
"kills"
{"game_id":"30a4","map":"Los
Angeles","killer":21,"victim":19,"coord":"332,381,17"}
PUT
"kills"
{"game_id":"30a4","map":"Los
Angeles","killer":0,"victim":10,"coord":"14,108,25"}
PUT
"kills"
{"game_id":"6ebd","map":"Seattle","killer":32,"victim":18,"coord":"13,685,32"}
PUT
"kills"
{"game_id":"6ebd","map":"Seattle","killer":7,"victim":14,"coord":"16,233,16"}
PUT
"kills"
{"game_id":"6ebd","map":"Seattle","killer":27,"victim":19,"coord":"16,498,29"}
PUT
"kills"
{"game_id":"6ebd","map":"Seattle","killer":1,"victim":38,"coord":"138,732,21"}
52.
AWS Gaming Solutions | GDC 2014
But A Bow On It
• Collect data from the start
• Store it even if you can't process it (yet)
• Start simple – S3 + Redshift
• Add data sources – process with EMR
• Real-time – Kinesis + Spark
• Tons of untapped potential for gaming