Start Simple
• Write Events File on Device
• Periodically Upload to S3
• Process into Redshift
• Point GUI Tool to Redshift
2014-‐01-‐24,nateware,e4df,login
2014-‐01-‐24,nateware,e4df,gamestart
2014-‐01-‐24,nateware,e4df,gameend
2014-‐01-‐25,nateware,a88c,login
2014-‐01-‐25,nateware,a88c,friendlist
2014-‐01-‐25,nateware,a88c,gamestart
Profit!
More Data Sources
• Also Collect Server Logs
• Periodically Upload to S3
• Stuff into Redshift
• External Analytics Data Too
External
Analytics
EC2
Dealing With Messy Data
• Different File Formats
• Device vs Apache vs CDN
• Cleanup with EMR Job
• Output to Clean Bucket
• Load into Redshift
EC2
Direct From DynamoDB
• Integrate Game DB
• Load Directly into Redshift
• Redshift does Intelligent Merge
• Tracks Hash Keys, Columns
EC2
Direct From DynamoDB
• Integrate Game DB
• Load Directly into Redshift
• Redshift does Intelligent Merge
• Tracks Hash Keys, Columns
• Or Stream into EMR
EC2
Back To Basics
2014-‐01-‐24,nateware,e4df,login
2014-‐01-‐24,nateware,e4df,gamestart
2014-‐01-‐24,nateware,e4df,gameend
2014-‐01-‐25,nateware,a88c,login
2014-‐01-‐25,nateware,a88c,friendlist
2014-‐01-‐25,nateware,a88c,gamestart
Need Data Faster!
• Stream Data With Kinesis
• Multiple Writers and Readers
• Still Output to Redshift
EC2
Lots of Ins and Outs
• Stream Data With Kinesis
• Multiple Writers and Readers
• Still Output to Redshift
• Stream to Spark on EMR
• Storm via Kinesis Spout
• Custom EC2 Workers
EC2
EC2
Amazon Kinesis
リアルタイムでビッグデータを取り込むためのサービス
Data
Sources
App.4
!
[Machine
Learning]
!
!
!
A
WS
En
dp
oin
t
App.1
!
[Aggregate
&
De-‐Duplicate]
Data
Sources
Data
Sources
Data
Sources
App.2
!
[Metric
Extraction]
S3
DynamoDB
Redshift
App.3
[Sliding
Window
Analysis]
Data
Sources
Availability
Zone
Shard 1
Shard 2
Shard N
Availability
Zone
Availability
Zone
Clash of Clans
Amazon
Kinesis
Redshift
Clickstream
archive
EC2: In-game
engagement
trends dashboard
Real-time clickstream
processing app
Kinesis: Real-time data stream of in-game activity
Multiple Kinesis applications: Dashboards, analytics and storage
Redshift: Business intelligence reporting and interactive queries
S3 and Glacier: Data storage and long term archival
In-game
activity
S3 Aggregate
statistics
Business-intelligence
user
Kinesis-enabled apps on EC2