SlideShare a Scribd company logo
Transforming Mobile 
Push Notifications with 
Big Data 
Dennis Waldron, Data Engineering 
Pablo Varela, Systems Engineering
Who is Plumbee? 
● 12.8M Installs 
● 209K Daily Active Users 
● 818K Monthly Active Users 
● Social Games Studio 
● Mirrorball Slots & Bingo 
● Facebook Canvas, iOS
Data Providers 
Inhouse data = 99.9% of all data 
In Total: 
● 98TB (907 days of data) 
● All stored in Amazon S3 
Daily: 
● 78GB compressed 
● ~450M events/day 
● 4,800 events/second (peak)
Architecture - Overview 
Events (JSON) 
Daily Batch Processing 
Aggregates 
Application/Game Servers 
End Users (Desktop & Mobile) 
Log Aggregators 
Amazon S3 
Amazon EMR 
(Elastic MapReduce) 
DataPipeline (Simple Storage Service) 
Amazon Redshift 
Plumbee Employees 
Analytics (SQL Queries) 
SQS Analytics Queue 
Events (JSON)
Amazon Web Service 
Application/Game Servers 
End Users (Desktop & Mobile) 
● Collect everything! 
● RPC events intercepted by 
annotated endpoints. (Requests) 
● All mutating state changes 
recorded: 
○ DynamoDB, MySQL, Memcache 
(Blobs Updates) 
● Custom Telemetry (Other): 
○ Client: click tracking, loading time 
statistics, GPU data... 
○ Server: promotions, transactions, 
Facebook user data... 
Game Data 
MySQL 
MemCache 
RPC 
77% 
9% 
OTHER 15% 
GENERATES 
DynamoDB
Game Data - Example RPC Endpoint Annotation 
/** 
* Example annotation 
*/ 
@SQSRequestLog(requestMessage = SpinRequest.class) 
@RequestMapping(“/spin”) 
public SpinResponse spin(SpinRequest spinRequest) { 
… 
}
Example Event - userStats 
● All events are recorded in JSON. 
● Structure: 
○ Headers 
○ Categorization Data (metadata) 
○ Payload (message) 
● Important Headers: 
○ timestamp 
○ testVariant 
○ plumbeeUid
Architecture - Collection 
Analytics (SQL Queries) 
Daily Batch Processing 
Aggregates 
Application/Game Servers 
End Users (Desktop & Mobile) 
Amazon S3 
Amazon EMR 
(Elastic MapReduce) 
DataPipeline (Simple Storage Service) 
Amazon Redshift 
Plumbee Employees 
Log Aggregators 
Events (JSON) 
SQS Analytics Queue 
Events (JSON)
Data Collection (I) - PUT 
Application/Game Servers 
Events (JSON) 
SQS Queue 
Log Aggregators 
Producers Consumers 
What is SQS (Simple Queue Service)? 
A cloud-based message queue for transmitting 
messages between producers and consumers 
SQS Provides: 
● ACK/FAIL semantics 
● Unlimited number of messages 
● Scales transparently 
● Buffer zone
Data Collection (II) - GET 
SQS Queue 
What is Apache Flume? 
A distributed, reliable, and available service 
for efficiently collecting, aggregating, and 
moving large amounts of log data 
Apache Flume 
Consumers 
Amazon S3 
(Simple Storage Service) 
S3 Data: 
● Partitioned by: date / type / sub_type 
● Compressed with: Snappy 
● Aggregated in 512MB chunks
Data Collection (III) - Flume 
Flume Agent 
Source 
(Custom) 
Sink 
(HDFS) 
SQS Queue 
Channel 
(File Based) 
● Pluggable component architecture 
● Durability via transactions 
● File channel use Elastic Book Store (EBS) volumes (network attached storage) 
○ Protects against Hardware failure 
● SQS Flume Plugin: https://github.com/plumbee/flume-sqs-source 
S3 Bucket 
Transactions 
A + B + C = Flow 
A B C
Architecture - Processing 
Events (JSON) 
Daily Batch Processing 
Aggregates 
Application/Game Servers 
End Users (Desktop & Mobile) 
Amazon S3 
Amazon EMR 
(Elastic MapReduce) 
DataPipeline (Simple Storage Service) 
Amazon Redshift 
Plumbee Employees 
Analytics (SQL Queries) 
SQS Analytics Queue 
Events (JSON)
Extract, Transform, Load 
● Daily activity 
● Orchestrated by Amazon DataPipeline 
● Includes generation of reports 
● Configured with JSON 
What is DataPipeline? 
A cloud-based data workflow service that 
helps you process and move data between 
different AWS services 
RESOURCE COMMAND SCHEDULE
Extract & Transform (I) 
What is Elastic Map Reduce? 
Cloud-based MapReduce implementation to 
process vast amounts of data built on top of 
the open-sourced Hadoop framework. 
Two phases: 
● Map() Procedure -> Filtering & Sorting 
● Reduce() -> Summary operation 
Penguin 
Horse 
Cake 
Cake 
Penguin 
Penguin 
Penguin 
Horse 
Horse 
Cake 
Cake 
Horse 
Horse 
Horse 
MAP() 
Penguin 
Penguin 
Penguin 
Penguin 
REDUCE() 
Cake: 2 Horse: 3 
RESULT SORTED QUEUES RAW DATA 
Penguin: 
4
Extract & Transform (II) 
What is Hive? 
An open-sourced Apache project with provides a 
SQL-Like interface to summarize, query and 
analysis large datasets by leveraging Hadoop’s 
MapReduce infrastructure. 
● Not really SQL, HQL -> HiveQL 
● No transactions, materialized views, 
limited subquery support, ... 
SELECT plumbeeuid, 
COUNT(*) AS spins 
FROM eventlog 
-- Partitioned data access 
WHERE event_date = '2014-11-18' 
AND event_type = 'rpc' 
AND event_sub_type = 'rpc-spin' 
-- Aggregation 
GROUP BY plumbeeuid; 
Table: Eventlog 
● Mounted on top of raw data 
● SerDe provides JSON parsing 
● Target data via partition filters
Extract & Transform (III) 
● Hive has limitations! 
○ Speed, JSON 
● Most of our transformations use: 
Streaming MapReduce Jobs 
What is Streaming? 
“A Hadoop utility that allows you to create 
and run MapReduce jobs using any 
executable script as a mapper or reducer” 
for line in sys.stdin: 
data = json.loads(line) 
print data['plumbeeUid'] + 't' + 1 
Emits, Key value Pairs 
466264 => 1, 376166 => 1 
983131 => 1, 466264 => 1 
Hadoop sorts and shuffles the data making sure 
matching keys are processed by a single reducer! 
results = defaultdict(int) 
for line in sys.stdin: 
plumbee_uid, count = line.split('t') 
results[plumbee_uid] += int(count) 
print results 
JSON rpc-spin 
Data 
Result: 
{ 466264: 2, 376166: 1, 983131: 1 } 
map() 
reduce()
Results 
Load (I) - Problem 
Raw S3 JSON Data Aggregated Data 
EMR Transformed data: 
● Referred to as aggregates 
● Stored in S3 
● Accessible via EMR cluster 
EMR Transformation 
(Hive & Streaming Jobs) 
5.4TB 
Problem 
● We don’t run long-lived EMR clusters. 
EMR requires: 
● Specialists knowledge 
● Is slow, processing and booting “offline”. 
Use Amazon Redshift for fast “online” data access
What is Redshift? 
A column-oriented database which uses 
Massive Parallel Processing (MPP) techniques 
to support analytics style SQL based 
workloads across large datasets. 
Power comes from: 
● Query parallelization 
● Column-oriented design 
Redshift Provides: 
● Low latency JDBC and ODBC access 
● Fault Tolerance 
● Automated Backups 
Load (II) - Redshift 
Redshift (x3 nodes): 0.33s 
EMR (x20 nodes): 135.46s
Load (II) - Column-Oriented Databases 
Row-oriented Database - MySQL 
ID First Name Last Name Country 
1 Penguin Situation GB 
2 Cheese Labs US 
3 Horse Barracks GB 
Column-oriented Database - Redshift 
ID First Name Last Name Country 
1 Penguin Situation GB 
2 Cheese Labs US 
3 Horse Barracks GB 
● East to add/modify records 
● Could read irrelevant data. 
● Great for fast lookups (OLTP) 
● Only read in relevant data 
● Adding rows requires multiple 
updates to column data. 
● Great for aggregation queries 
(OLAP)
Architecture - Revisit 
Daily Batch Processing 
Aggregates 
Application/Game Servers 
End Users (Desktop & Mobile) 
Amazon S3 
Amazon EMR 
(Elastic MapReduce) 
DataPipeline (Simple Storage Service) 
Amazon Redshift 
Plumbee Employees 
Analytics (SQL Queries) 
Log Aggregators 
Events (JSON) 
SQS Analytics Queue 
Events (JSON)
Q&A
Targeted Push 
Notifications
Mirrorball Slots: Kingdom of Riches
Mirrorball Slots: Challenges 
● recurring timed event 
● collect symbols from non-winning 
spins 
● get free coins if enough symbols are 
collected
Some players ask for notifications
Use Cases
Building blocks
Data Collection
Data Collection 
Players 
Amazon Redshift
Architecture - Overview 
Amazon Redshift 
Amazon S3 
Trigger Publisher Segmentation Workers 
Batch Processors Amazon SNS 
Players 
Targeting 
Mobile Push
User Targeting
User targeting 
Run SQL queries directly against Redshift 
SQL Query 
Amazon Redshift User Segment
User targeting: Query example 
-- Target all mobile users 
SELECT plumbee_uid, arn 
FROM mobile_user
User targeting: Query example (II) 
-- Target lapsed users (1 week lapse) 
SELECT plumbee_uid, arn 
FROM mobile_user 
WHERE last_play_time < (now - 7 days)
Demo (I) 
Mobile MBS Notifications
Architecture - Mobile Push 
Amazon Redshift 
Amazon S3 
Trigger Publisher Segmentation Workers 
Batch Processors Amazon SNS 
Players 
Targeting 
Mobile Push
Amazon Simple 
Notification Service
What is SNS? 
“Amazon Simple Notification Service (Amazon 
SNS) is a fast, flexible, fully managed push 
messaging service”
Amazon SNS
Amazon SNS
Amazon SNS: Device Registration 
Players Game Servers SQS Analytics Queue Amazon Redshift 
Amazon SNS 
register device 
event 
register
Amazon SNS: ARN Retrieval 
private String getArnForDeviceEndpoint(String platformApplicationArn, String deviceToken) { 
CreatePlatformEndpointRequest request = 
new CreatePlatformEndpointRequest() 
.withPlatformApplicationArn(platformApplicationArn) 
.withToken(deviceToken); 
CreatePlatformEndpointResult result = snsClient.createPlatformEndpoint(request); 
return result.getEndpointArn(); 
}
Amazon SNS: Analytics Event 
private String registerEndpointForApplicationAndPlatform( final long plumbeeUid, 
String platformARN, String platformToken) { 
final String deviceEndpointARN = getArnForDeviceEndpoint( platformARN , platformToken ); 
sqsLogger.queueMessage( new HashMap<String, Object>() {{ 
put( "notification", "register"); 
put( "plumbeeUid", plumbeeUid ); 
put( "provider", platformName ); 
put( "endpoint", deviceEndpointARN ); 
}}, null); 
return deviceEndpointARN; 
}
Amazon SNS: Mobile Push 
private void publishMessage(UserData userData, String jsonPayload) { 
amazonSNS.publish(new PublishRequest() 
.withTargetArn( userData.getEndpoint()) 
.withMessageStructure( "json") 
.withMessage( jsonPayload )); 
} 
Payload example 
{"default": "The 5 day Halloween Challenge has started today! Touch to play NOW!"}
Architecture - Orchestration 
Amazon Redshift 
Amazon S3 
Trigger Publisher Segmentation Workers 
Batch Processors Amazon SNS 
Players 
Targeting 
Mobile Push
Amazon Simple Workflow
What is Amazon SWF? 
“Amazon Simple Workflow (Amazon SWF) is a 
task coordination and state management 
service for cloud applications.”
What Amazon SWF provides 
● consistent execution state management 
● workflow executions and tasks tracking 
● non-duplicated dispatch of tasks 
● task routing and queuing 
● the AWS Flow Framework
Architecture - Orchestration 
Amazon Redshift 
Amazon S3 
Trigger Publisher Segmentation Workers 
Batch Processors Amazon SNS 
Players 
Targeting 
Mobile Push
Mobile Push: Scheduling 
Trigger Publish Service Amazon 
Simple Workflow
Mobile Push: Targeting 
query query 
target 
users 
Amazon SWF 
Amazon EC2 
Worker 
(Segmentation) 
Amazon 
Redshift 
Amazon 
S3
Mobile Push: Processing 
batch 1-N publish push 
Workers 
(Processing) 
Amazon SWF Read data + push End User
Mobile Push: Reporting 
send send 
Amazon SWF 
Amazon EC2 
Worker 
(Reporting) 
Amazon 
SES
Demo (II)
Q&A

More Related Content

What's hot

Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
Jeff Patti
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
Fabrizio Fortino
 
Druid
DruidDruid
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
Prakash Chockalingam
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
Prakash Chockalingam
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
Databricks
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
Neville Li
 
Hadoop summit 2010, HONU
Hadoop summit 2010, HONUHadoop summit 2010, HONU
Hadoop summit 2010, HONUJerome Boulon
 
Pulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at ScalePulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at Scale
Tony Ng
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
Streaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowStreaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud Dataflow
C4Media
 
Apache Spark for Library Developers with William Benton and Erik Erlandson
 Apache Spark for Library Developers with William Benton and Erik Erlandson Apache Spark for Library Developers with William Benton and Erik Erlandson
Apache Spark for Library Developers with William Benton and Erik Erlandson
Databricks
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
MongoDB
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Anton Kirillov
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
Amazon Web Services
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
Lars Albertsson
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13
DECK36
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB
 
Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討
Amazon Web Services
 

What's hot (20)

Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
 
Druid
DruidDruid
Druid
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Hadoop summit 2010, HONU
Hadoop summit 2010, HONUHadoop summit 2010, HONU
Hadoop summit 2010, HONU
 
Pulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at ScalePulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at Scale
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Streaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowStreaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud Dataflow
 
Apache Spark for Library Developers with William Benton and Erik Erlandson
 Apache Spark for Library Developers with William Benton and Erik Erlandson Apache Spark for Library Developers with William Benton and Erik Erlandson
Apache Spark for Library Developers with William Benton and Erik Erlandson
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor Management
 
Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討
 

Viewers also liked

Your Guide to Push Notifications - Comparing GCM & APNS
Your Guide to Push Notifications - Comparing GCM & APNS  Your Guide to Push Notifications - Comparing GCM & APNS
Your Guide to Push Notifications - Comparing GCM & APNS
Sparkbit
 
Brug - Web push notification
Brug  - Web push notificationBrug  - Web push notification
Brug - Web push notification
Olga Lavrentieva
 
Push notification to the open web
Push notification to the open webPush notification to the open web
Push notification to the open web
Ahmed Gamal
 
Push notifications
Push notificationsPush notifications
Push notifications
Sam Verschueren
 
How to Choose Between Push Notifications and SMS | CM Telecom
How to Choose Between Push Notifications and SMS | CM TelecomHow to Choose Between Push Notifications and SMS | CM Telecom
How to Choose Between Push Notifications and SMS | CM Telecom
CM.com
 
Push notifications
Push notificationsPush notifications
Push notifications
Ishaq Ticklye
 
(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS
(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS
(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS
Amazon Web Services
 
Push notifications
Push notificationsPush notifications
Push notifications
Dale Lane
 
Push Notifications for Websites
Push Notifications for WebsitesPush Notifications for Websites
Push Notifications for Websites
Roost
 
Push Notification
Push NotificationPush Notification
Push Notification
Software Infrastructure
 
web push notifications for your webapp
web push notifications for your webappweb push notifications for your webapp
web push notifications for your webapp
Lahiru Jayakody
 
(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...
(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...
(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...
Amazon Web Services
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design Patterns
Donald Miner
 

Viewers also liked (13)

Your Guide to Push Notifications - Comparing GCM & APNS
Your Guide to Push Notifications - Comparing GCM & APNS  Your Guide to Push Notifications - Comparing GCM & APNS
Your Guide to Push Notifications - Comparing GCM & APNS
 
Brug - Web push notification
Brug  - Web push notificationBrug  - Web push notification
Brug - Web push notification
 
Push notification to the open web
Push notification to the open webPush notification to the open web
Push notification to the open web
 
Push notifications
Push notificationsPush notifications
Push notifications
 
How to Choose Between Push Notifications and SMS | CM Telecom
How to Choose Between Push Notifications and SMS | CM TelecomHow to Choose Between Push Notifications and SMS | CM Telecom
How to Choose Between Push Notifications and SMS | CM Telecom
 
Push notifications
Push notificationsPush notifications
Push notifications
 
(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS
(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS
(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS
 
Push notifications
Push notificationsPush notifications
Push notifications
 
Push Notifications for Websites
Push Notifications for WebsitesPush Notifications for Websites
Push Notifications for Websites
 
Push Notification
Push NotificationPush Notification
Push Notification
 
web push notifications for your webapp
web push notifications for your webappweb push notifications for your webapp
web push notifications for your webapp
 
(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...
(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...
(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design Patterns
 

Similar to Transforming Mobile Push Notifications with Big Data

Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
Xiang Fu
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Amazon Web Services
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Amazon Web Services
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Amazon Web Services
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
Amazon Web Services
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
Yi Pan
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
Amazon Web Services Korea
 
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDKGDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
Nate Wiger
 
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
Fwdays
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Martin Zapletal
 
Serverless Realtime Backup
Serverless Realtime BackupServerless Realtime Backup
Serverless Realtime Backup
Amazon Web Services
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Amazon Web Services
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
Amazon Web Services
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
jimliddle
 
Get Value From Your Data
Get Value From Your DataGet Value From Your Data
Get Value From Your Data
Danilo Poccia
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Amazon Web Services
 
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSBuilding a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWS
SmartNews, Inc.
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
Amazon Web Services
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
kgshukla
 

Similar to Transforming Mobile Push Notifications with Big Data (20)

Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDKGDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
 
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
 
Serverless Realtime Backup
Serverless Realtime BackupServerless Realtime Backup
Serverless Realtime Backup
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
 
Get Value From Your Data
Get Value From Your DataGet Value From Your Data
Get Value From Your Data
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSBuilding a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWS
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Transforming Mobile Push Notifications with Big Data

  • 1. Transforming Mobile Push Notifications with Big Data Dennis Waldron, Data Engineering Pablo Varela, Systems Engineering
  • 2. Who is Plumbee? ● 12.8M Installs ● 209K Daily Active Users ● 818K Monthly Active Users ● Social Games Studio ● Mirrorball Slots & Bingo ● Facebook Canvas, iOS
  • 3. Data Providers Inhouse data = 99.9% of all data In Total: ● 98TB (907 days of data) ● All stored in Amazon S3 Daily: ● 78GB compressed ● ~450M events/day ● 4,800 events/second (peak)
  • 4. Architecture - Overview Events (JSON) Daily Batch Processing Aggregates Application/Game Servers End Users (Desktop & Mobile) Log Aggregators Amazon S3 Amazon EMR (Elastic MapReduce) DataPipeline (Simple Storage Service) Amazon Redshift Plumbee Employees Analytics (SQL Queries) SQS Analytics Queue Events (JSON)
  • 5. Amazon Web Service Application/Game Servers End Users (Desktop & Mobile) ● Collect everything! ● RPC events intercepted by annotated endpoints. (Requests) ● All mutating state changes recorded: ○ DynamoDB, MySQL, Memcache (Blobs Updates) ● Custom Telemetry (Other): ○ Client: click tracking, loading time statistics, GPU data... ○ Server: promotions, transactions, Facebook user data... Game Data MySQL MemCache RPC 77% 9% OTHER 15% GENERATES DynamoDB
  • 6. Game Data - Example RPC Endpoint Annotation /** * Example annotation */ @SQSRequestLog(requestMessage = SpinRequest.class) @RequestMapping(“/spin”) public SpinResponse spin(SpinRequest spinRequest) { … }
  • 7. Example Event - userStats ● All events are recorded in JSON. ● Structure: ○ Headers ○ Categorization Data (metadata) ○ Payload (message) ● Important Headers: ○ timestamp ○ testVariant ○ plumbeeUid
  • 8. Architecture - Collection Analytics (SQL Queries) Daily Batch Processing Aggregates Application/Game Servers End Users (Desktop & Mobile) Amazon S3 Amazon EMR (Elastic MapReduce) DataPipeline (Simple Storage Service) Amazon Redshift Plumbee Employees Log Aggregators Events (JSON) SQS Analytics Queue Events (JSON)
  • 9. Data Collection (I) - PUT Application/Game Servers Events (JSON) SQS Queue Log Aggregators Producers Consumers What is SQS (Simple Queue Service)? A cloud-based message queue for transmitting messages between producers and consumers SQS Provides: ● ACK/FAIL semantics ● Unlimited number of messages ● Scales transparently ● Buffer zone
  • 10. Data Collection (II) - GET SQS Queue What is Apache Flume? A distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data Apache Flume Consumers Amazon S3 (Simple Storage Service) S3 Data: ● Partitioned by: date / type / sub_type ● Compressed with: Snappy ● Aggregated in 512MB chunks
  • 11. Data Collection (III) - Flume Flume Agent Source (Custom) Sink (HDFS) SQS Queue Channel (File Based) ● Pluggable component architecture ● Durability via transactions ● File channel use Elastic Book Store (EBS) volumes (network attached storage) ○ Protects against Hardware failure ● SQS Flume Plugin: https://github.com/plumbee/flume-sqs-source S3 Bucket Transactions A + B + C = Flow A B C
  • 12. Architecture - Processing Events (JSON) Daily Batch Processing Aggregates Application/Game Servers End Users (Desktop & Mobile) Amazon S3 Amazon EMR (Elastic MapReduce) DataPipeline (Simple Storage Service) Amazon Redshift Plumbee Employees Analytics (SQL Queries) SQS Analytics Queue Events (JSON)
  • 13. Extract, Transform, Load ● Daily activity ● Orchestrated by Amazon DataPipeline ● Includes generation of reports ● Configured with JSON What is DataPipeline? A cloud-based data workflow service that helps you process and move data between different AWS services RESOURCE COMMAND SCHEDULE
  • 14. Extract & Transform (I) What is Elastic Map Reduce? Cloud-based MapReduce implementation to process vast amounts of data built on top of the open-sourced Hadoop framework. Two phases: ● Map() Procedure -> Filtering & Sorting ● Reduce() -> Summary operation Penguin Horse Cake Cake Penguin Penguin Penguin Horse Horse Cake Cake Horse Horse Horse MAP() Penguin Penguin Penguin Penguin REDUCE() Cake: 2 Horse: 3 RESULT SORTED QUEUES RAW DATA Penguin: 4
  • 15. Extract & Transform (II) What is Hive? An open-sourced Apache project with provides a SQL-Like interface to summarize, query and analysis large datasets by leveraging Hadoop’s MapReduce infrastructure. ● Not really SQL, HQL -> HiveQL ● No transactions, materialized views, limited subquery support, ... SELECT plumbeeuid, COUNT(*) AS spins FROM eventlog -- Partitioned data access WHERE event_date = '2014-11-18' AND event_type = 'rpc' AND event_sub_type = 'rpc-spin' -- Aggregation GROUP BY plumbeeuid; Table: Eventlog ● Mounted on top of raw data ● SerDe provides JSON parsing ● Target data via partition filters
  • 16. Extract & Transform (III) ● Hive has limitations! ○ Speed, JSON ● Most of our transformations use: Streaming MapReduce Jobs What is Streaming? “A Hadoop utility that allows you to create and run MapReduce jobs using any executable script as a mapper or reducer” for line in sys.stdin: data = json.loads(line) print data['plumbeeUid'] + 't' + 1 Emits, Key value Pairs 466264 => 1, 376166 => 1 983131 => 1, 466264 => 1 Hadoop sorts and shuffles the data making sure matching keys are processed by a single reducer! results = defaultdict(int) for line in sys.stdin: plumbee_uid, count = line.split('t') results[plumbee_uid] += int(count) print results JSON rpc-spin Data Result: { 466264: 2, 376166: 1, 983131: 1 } map() reduce()
  • 17. Results Load (I) - Problem Raw S3 JSON Data Aggregated Data EMR Transformed data: ● Referred to as aggregates ● Stored in S3 ● Accessible via EMR cluster EMR Transformation (Hive & Streaming Jobs) 5.4TB Problem ● We don’t run long-lived EMR clusters. EMR requires: ● Specialists knowledge ● Is slow, processing and booting “offline”. Use Amazon Redshift for fast “online” data access
  • 18. What is Redshift? A column-oriented database which uses Massive Parallel Processing (MPP) techniques to support analytics style SQL based workloads across large datasets. Power comes from: ● Query parallelization ● Column-oriented design Redshift Provides: ● Low latency JDBC and ODBC access ● Fault Tolerance ● Automated Backups Load (II) - Redshift Redshift (x3 nodes): 0.33s EMR (x20 nodes): 135.46s
  • 19. Load (II) - Column-Oriented Databases Row-oriented Database - MySQL ID First Name Last Name Country 1 Penguin Situation GB 2 Cheese Labs US 3 Horse Barracks GB Column-oriented Database - Redshift ID First Name Last Name Country 1 Penguin Situation GB 2 Cheese Labs US 3 Horse Barracks GB ● East to add/modify records ● Could read irrelevant data. ● Great for fast lookups (OLTP) ● Only read in relevant data ● Adding rows requires multiple updates to column data. ● Great for aggregation queries (OLAP)
  • 20. Architecture - Revisit Daily Batch Processing Aggregates Application/Game Servers End Users (Desktop & Mobile) Amazon S3 Amazon EMR (Elastic MapReduce) DataPipeline (Simple Storage Service) Amazon Redshift Plumbee Employees Analytics (SQL Queries) Log Aggregators Events (JSON) SQS Analytics Queue Events (JSON)
  • 21. Q&A
  • 24. Mirrorball Slots: Challenges ● recurring timed event ● collect symbols from non-winning spins ● get free coins if enough symbols are collected
  • 25. Some players ask for notifications
  • 29. Data Collection Players Amazon Redshift
  • 30. Architecture - Overview Amazon Redshift Amazon S3 Trigger Publisher Segmentation Workers Batch Processors Amazon SNS Players Targeting Mobile Push
  • 32. User targeting Run SQL queries directly against Redshift SQL Query Amazon Redshift User Segment
  • 33. User targeting: Query example -- Target all mobile users SELECT plumbee_uid, arn FROM mobile_user
  • 34. User targeting: Query example (II) -- Target lapsed users (1 week lapse) SELECT plumbee_uid, arn FROM mobile_user WHERE last_play_time < (now - 7 days)
  • 35. Demo (I) Mobile MBS Notifications
  • 36. Architecture - Mobile Push Amazon Redshift Amazon S3 Trigger Publisher Segmentation Workers Batch Processors Amazon SNS Players Targeting Mobile Push
  • 38. What is SNS? “Amazon Simple Notification Service (Amazon SNS) is a fast, flexible, fully managed push messaging service”
  • 41. Amazon SNS: Device Registration Players Game Servers SQS Analytics Queue Amazon Redshift Amazon SNS register device event register
  • 42. Amazon SNS: ARN Retrieval private String getArnForDeviceEndpoint(String platformApplicationArn, String deviceToken) { CreatePlatformEndpointRequest request = new CreatePlatformEndpointRequest() .withPlatformApplicationArn(platformApplicationArn) .withToken(deviceToken); CreatePlatformEndpointResult result = snsClient.createPlatformEndpoint(request); return result.getEndpointArn(); }
  • 43. Amazon SNS: Analytics Event private String registerEndpointForApplicationAndPlatform( final long plumbeeUid, String platformARN, String platformToken) { final String deviceEndpointARN = getArnForDeviceEndpoint( platformARN , platformToken ); sqsLogger.queueMessage( new HashMap<String, Object>() {{ put( "notification", "register"); put( "plumbeeUid", plumbeeUid ); put( "provider", platformName ); put( "endpoint", deviceEndpointARN ); }}, null); return deviceEndpointARN; }
  • 44. Amazon SNS: Mobile Push private void publishMessage(UserData userData, String jsonPayload) { amazonSNS.publish(new PublishRequest() .withTargetArn( userData.getEndpoint()) .withMessageStructure( "json") .withMessage( jsonPayload )); } Payload example {"default": "The 5 day Halloween Challenge has started today! Touch to play NOW!"}
  • 45. Architecture - Orchestration Amazon Redshift Amazon S3 Trigger Publisher Segmentation Workers Batch Processors Amazon SNS Players Targeting Mobile Push
  • 47. What is Amazon SWF? “Amazon Simple Workflow (Amazon SWF) is a task coordination and state management service for cloud applications.”
  • 48. What Amazon SWF provides ● consistent execution state management ● workflow executions and tasks tracking ● non-duplicated dispatch of tasks ● task routing and queuing ● the AWS Flow Framework
  • 49. Architecture - Orchestration Amazon Redshift Amazon S3 Trigger Publisher Segmentation Workers Batch Processors Amazon SNS Players Targeting Mobile Push
  • 50. Mobile Push: Scheduling Trigger Publish Service Amazon Simple Workflow
  • 51. Mobile Push: Targeting query query target users Amazon SWF Amazon EC2 Worker (Segmentation) Amazon Redshift Amazon S3
  • 52. Mobile Push: Processing batch 1-N publish push Workers (Processing) Amazon SWF Read data + push End User
  • 53. Mobile Push: Reporting send send Amazon SWF Amazon EC2 Worker (Reporting) Amazon SES
  • 55. Q&A