27. More Data Sources
• Also Collect Server Logs
• Periodically Upload to S3
• Stuff into Redshift
• External Analytics Data Too
External
Analytics
EC2
28. Dealing With Messy Data
• Different File Formats
• Device vs Apache vs CDN
• Cleanup with EMR Job
• Output to Clean Bucket
• Load into Redshift
EC2
29. Direct From DynamoDB
• Integrate Game DB
• Load Directly into Redshift
• Redshift does Intelligent Merge
• Tracks Hash Keys, Columns
EC2
30. Direct From DynamoDB
• Integrate Game DB
• Load Directly into Redshift
• Redshift does Intelligent Merge
• Tracks Hash Keys, Columns
• Or Stream into EMR
EC2
32. Back To Basics
2014-‐01-‐24,nateware,e4df,login
2014-‐01-‐24,nateware,e4df,gamestart
2014-‐01-‐24,nateware,e4df,gameend
2014-‐01-‐25,nateware,a88c,login
2014-‐01-‐25,nateware,a88c,friendlist
2014-‐01-‐25,nateware,a88c,gamestart
33. Back To Basics [Dubstep Remix]
• Always Batch Due to S3
EC2
34. Need Data Faster!
• Stream Data With Kinesis
• Multiple Writers and Readers
• Still Output to Redshift
EC2
35. Lots of Ins and Outs
• Stream Data With Kinesis
• Multiple Writers and Readers
• Still Output to Redshift
• Stream to Spark on EMR
• Storm via Kinesis Spout
• Custom EC2 Workers
EC2
EC2
36. Amazon Kinesis
リアルタイムでビッグデータを取り込むためのサービス
Data
Sources
App.4
!
[Machine
Learning]
!
!
!
A
WS
En
dp
oin
t
App.1
!
[Aggregate
&
De-‐Duplicate]
Data
Sources
Data
Sources
Data
Sources
App.2
!
[Metric
Extraction]
S3
DynamoDB
Redshift
App.3
[Sliding
Window
Analysis]
Data
Sources
Availability
Zone
Shard 1
Shard 2
Shard N
Availability
Zone
Availability
Zone
38. Clash of Clans
Amazon
Kinesis
Redshift
Clickstream
archive
EC2: In-game
engagement
trends dashboard
Real-time clickstream
processing app
Kinesis: Real-time data stream of in-game activity
Multiple Kinesis applications: Dashboards, analytics and storage
Redshift: Business intelligence reporting and interactive queries
S3 and Glacier: Data storage and long term archival
In-game
activity
S3 Aggregate
statistics
Business-intelligence
user
Kinesis-enabled apps on EC2