SlideShare a Scribd company logo
1 of 56
Download to read offline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Building a Lake of Wisdom
H e n r i
H e i s k a n e n
G A M 3 0 6
N o v e m b e r 2 7 , 2 0 1 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hollywood Production
Executive...
Moonlighting as Data
Engineering Lead at
Rovio Games
20 years of software
development
5 years of analytics
Who am I?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Lake of Wisdom”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
Rovio and data Building the lake Learnings
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Possible takeaways
How should I
build my data
lake?
How should I
handle schema
evolution?
What are the
limitations of
Amazon Athena
that I need to
consider?
Are there services
out there that
would help to
accomplish this
easier?
Could this be our
high level
architecture in the
future?
How should I
partition my
data in S3?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rovio and data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Creator of Angry Birds
• Founded in 2003
• HQ in Espoo, Finland
• ~400 employees in Finland, Sweden,
UK, China, and US
• Games-first entertainment company
• Games
• Brand licensing
• 3.7 billion game downloads*
• 80 million monthly active users and 11
million daily active users**
• Angry Birds Movie
About Rovio
* at June 30, 2017
** during the three months ended June 30, 2017
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
0
2
4
6
8
10
12
14
Puzzle & Casino
subgenres
Tap & Clear
Puzzle
Shooter
Casino Card
Casino
Bingo
Puzzle Card
Puzzle
Words
String Match
Impression
Click
Install
Purchase
Rovio games and data
Market research Performance marketing Game optimization
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
011101010010110001001001010
101001010101010101010101010
111111100101010101010100101
010101010101010100101010101
010010101010101010101001010
101010101010010101010111101
001010101010101010101001110
010100010010100101010100100
101100100101010101010101001
010010101101011100101010001
101010101101010100100101001
010100110100101010101010010
101010101001010101010100101
010101010100101010101010101
010010101010010101010101010
Our data philosophy
Single truth Sharing is caring We own our future
???
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pre-lake architecture Q1/2017
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Single truth
¯_(ツ)_/¯
Our data philosophy revisited
“It’s not perfect, but don’t we have more important
things to do?”
Multiple dashboards with different DAU
Sharing is caring
We own our future
Limited access to game
dashboards and raw data
I suppose…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
…And then this happened!
Our analytics platform provider got acquired by
another mobile games company!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Let’s have a quick look at
what we already had
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Payment Ads Push
Analytics A/B testing Segmentation
What is Beacon?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is missing?
The ability for data analysts to build game-specific
dashboards using game events joined with portfolio-wide
games business data.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BUY BUILD
What we were set to do
“Well, whatever…
but do it in seven weeks!”
or
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building the lake
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What do we need?
Petabyte scale “data warehouse” with fast query access to raw
datasets
Ability to efficiently query and process data from multiple systems
Data visualization SDK for data analysts to create dashboards that can
be integrated into our Beacon UI
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data lakes are all the rage
“A data lake is a method of storing data within a system or
repository, in its natural format, that facilitates the
collocation of data in various schemata and structural
forms, usually object blobs or files.”–Wikipedia
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
But we have a data lake already then?
Human latency due to cluster launching and lacking
schemas
Query latency due to inefficient data formats
Only few a people who knew how and wanted to access
the raw data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why Amazon Athena?
We are fluent with AWS
Data and processing layers separated
Pay-per-use billing
Performance is at least on par with alternative solutions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SELECT m.t, COUNT(DISTINCT s.aid1) AS dau FROM abba_raw where processdate = 20170116 group by m.t;
Amazon Athena performance
Data Format Database X Presto 21 x
m3.xlarge
Amazon Athena
Json + SEQ + gzip N/A 9 min 1 min 50 sec
ORC N/A 37 sec 9 sec
DB Native 35 sec N/A N/A
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
We need columnar format
Flatten
Re-partition
Manage schema
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
We also need profiles
Analysis is almost always done for a specific player cohort
In the old system, each raw event row was enriched with player state
Can we build a system where we do not need to bake in player state,
but instead join dynamically with acceptable performance?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
select b.registration_date, avg(a.level_id) as avg_level from
(
select
max(cast(m_level_id as integer)) as level_id,
s_aid2 as player_id
from default.abba_orc
where m_t = 'level_finished' and processdate = 20170116
group by s_aid2
) as a
left join wolery_orc b
on a.player_id = b.player_id
and b.app_id = 'Abba'
group by b.registration_date;
How about joins in Amazon Athena?
Run time: 12.73 seconds
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Profiles and daily aggregates
• Profile is the user journey across the Rovio portfolio
• Origin: paid, xpromo, network, organic
• Time spent
• Purchases
• Predictions: LTV, churn, cheating…
• Aggregates are game agnostic daily/cumulative activities
• Time spent on one day / cumulative time spent until that day
• Data in Amazon Redshift
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena
Technology selections
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rovio data crash course
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Produced by game clients and services
• Have standard headers and custom
message body
• In JSON format
• Protobuf packets encoded from
game clients
Analytics events
{
"o": {
"ts": "2015-02-20T13:29:06.054+0000"
},
"s": {
"aid1": "AN7624...",
"cid": "analyticstestapp_7bc0125b",
"cver": "1.6.0",
},
"t": {
"dt": "iPad3,1",
"geo": "FI",
"os": "iOS",
"osv": "7.0.4",
},
"m": {
"t": "Conquer_The_World",
"weapon_of_choice": "water pistol"
}
}
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Service topics contain events from all
games
• Game topics have events from a single
game
• Game clients and/or server
Kafka topics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
audit.ads/
audit.session/
processdate=20170926/
1_0_00000000002252646951.gz
1_0_00000000002252832986.gz
1_0_00000000002252943052.gz
1_0_00000000002253053431.gz
...
processdate=20170925/
processdate=20170924/
...
audit.ua/
audit.wallet/
collector.AngryBirdsFriends/
collector.Crimson/
collector.Yellow/
collector.nibblers2_2c37376e/
collector.ocean_2ee459c8/
collector.tntgame_149fb4d3/
...
• Events from Kafka are stored to Amazon
S3 as compressed sequence files with
one JSON object per row
• Partitioned by topic and processing date
Raw event Amazon S3 sink
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
app_id=Abba/
app_id=AngryBirdsClassic/
topic=audit.ads/
topic=audit.identity/
topic=audit.session/
topic=audit.supermoon/
topic=audit.wallet/
topic=collector.angrybirdsclassic/
processdate=20170926/
eventtype=mightyleaguedemotion/
eventtype=mightyleagueentered/
000058_0
processdate=20170925/
processdate=20170924/
...
app_id=BadPiggiesFull/
...
• Events stored as compressed ORC files
• Partitioned by game, topic, processing
date, and event type
• Additional event type partition
improved performance over
ORC indexing
Target: ORC Amazon S3 sink
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CREATE EXTERNAL TABLE abba.client_events (
`o_ts` STRING,
`s_aid1` STRING,
`s_cid` STRING,
`s_cver` STRING,
`t_dt` STRING,
`t_geo` STRING,
`t_os` STRING,
`t_osv` STRING,
`m_weapon_of_choice` STRING,
`t_os` STRING
)
PARTITIONED BY (processdate INT, eventtype STRING)
STORED AS ORC
LOCATION
's3://bucket/app_id=Abba/topic=collector.abba';
• Flat structure
• Schema per game
• Table per Kafka topic
• Partitioned by date and event type
Target: ORC table DDL
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Problem: ORC schema evolution
• Years of raw data from multiple games and services
• Monthly game updates introduce new fields
• Reprocessing everything takes time and is expensive
• Solution: Maintain ORC file compatibility
• Do not remove fields
• Do not change the data type of the field
• Append the new fields to the end
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
static void discover(AmazonS3Client s3, JavaSparkContext sc, String input, String output,
String processdate, String[] whiteList) throws Exception {
for (TableMeta folder : listFolders(s3, input, whiteList)) {
String path = folder.getPath() + (folder.getPath().endsWith("/") ? "" : "/") + "processdate=" + processdate;
JavaPairRDD<Text, Text> rdd = sc.sequenceFile(path, Text.class, Text.class);
JavaRDD<String> raw = rdd.map(tuple -> tuple._2().toString());
SQLContext sql = new SQLContext(sc);
Dataset<Row> df = sql.jsonRDD(raw);
df.printSchema();
// write json schema
writeToS3(s3, folder, df.schema().prettyJson(), output, ".json");
}
}
Schema discovery Spark job
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Schema discovery pipeline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Problem: Number of partitions
• Angry Birds 2 has ~100 event types and 2 years of ORC
data
• 100 * 365 * 2 = 73,000 partitions
• ADD COLUMNS did not work
• Schema update needs DROP and CREATE TABLE
• MSCK REPAIR and DROP TABLE times out
• Workaround: Add and remove partitions in batches of 10
and parallelize
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SET hive.optimize.sort.dynamic.partition=false;
INSERT OVERWRITE TABLE tmp.abba_orc_tmp_${hiveconf:processdate}
PARTITION (app_id, topic, processdate, eventtype)
SELECT
`h`.`v` AS h_v,
`m`.`ad_type` AS m_ad_type,
`m`.`ANDROID_ADVERTISING_ID` AS m_ANDROID_ADVERTISING_ID,
`m`.`arena_birds_used` AS m_arena_birds_used,
`m`.`arena_entry` AS m_arena_entry,
`m`.`arena_position` AS m_arena_position,
`s`.`aid1` AS s_aid1,
'Abba' AS app_id,
'collector.abba' AS topic,
processdate AS processdate,
regexp_replace(regexp_replace(lower(substr(m.t,1,100)), 's', '__'), '[^w]', '_') AS eventtype
FROM abba.client_collector
WHERE processdate=${hiveconf:processdate}
DISTRIBUTE BY CASE WHEN eventtype IN ('frame_unlocked', 'performance_event', 'feat_progress', 'level_finished',
'level_started', 'currency_change') THEN CONCAT(eventtype, ABS(HASH(o_ts))%10) ELSE eventtype END;
Data conversion Hive job
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data conversion pipeline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The lake architecture Q3/2017
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
So…what was our deliverable?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Well-defined schema…
Table Description
client_events Analytics events sent from the game client
identity_events Analytics events sent from login service
session_events Analytics events sent from new login service
ads_events Impression and click information received from ads service
wallet_events IAP purchase events received from wallet service
supermoon_events Rule matching information from Manage tool
player_profile Player profile information such as registration date and user
origin
player_daily Cumulative daily player activity and purchase information
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
…You can query in Amazon Athena…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
…And publish insight in Beacon
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
…And as a bonus we got the lake
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Learnings
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ORC vs. Parquet vs. something else
“ORC is newest and fastest but Parquet is more widely
supported at the moment”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“INSERT INTO” support
“We are running reporting queries in Hive and Presto,
which brings extra cost and complexity to the system”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
External metastore support
“We keep two identical metastores up-to-date”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Mind the number of partitions
“The biggest performance problems come from running
the DDL statements”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Concurrent query limits
“The default concurrent query limit of five is far from what
data hungry enterprise needs”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
New products and services
“Stay informed on the new product launches and
constantly evolve your architecture”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Schema discovery is easy,
schema management can get complicated
“Maybe take a look at Amazon Glue instead of building
everything by yourself”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bias
“Unbiased information is hard to find. For every blog there
is a counter blog. Test your own use case, if possible.”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena, Presto, Hive, and Spark are
great together
“Amazon Athena provides the data for a wider audience,
but our data architecture allows us to access the Lake of
Wisdom with other tools for more advanced use cases, if
needed”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank You!

More Related Content

What's hot

RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...Amazon Web Services
 
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...Amazon Web Services
 
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...Amazon Web Services
 
DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...
DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...
DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...Amazon Web Services
 
MAE401_Designing for DisneyMarvel Studio-Grade Security
MAE401_Designing for DisneyMarvel Studio-Grade SecurityMAE401_Designing for DisneyMarvel Studio-Grade Security
MAE401_Designing for DisneyMarvel Studio-Grade SecurityAmazon Web Services
 
AMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AIAMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AIAmazon Web Services
 
DAT307_Modern Cloud Data Warehousing
DAT307_Modern Cloud Data WarehousingDAT307_Modern Cloud Data Warehousing
DAT307_Modern Cloud Data WarehousingAmazon Web Services
 
ARC210_Building Scalable Multi-Tenant Email Sending Programs
ARC210_Building Scalable Multi-Tenant Email Sending ProgramsARC210_Building Scalable Multi-Tenant Email Sending Programs
ARC210_Building Scalable Multi-Tenant Email Sending ProgramsAmazon Web Services
 
ABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsAmazon Web Services
 
ABD338_MirrorWeb - Powering Large-scale, Full-text Search for the UK Governme...
ABD338_MirrorWeb - Powering Large-scale, Full-text Search for the UK Governme...ABD338_MirrorWeb - Powering Large-scale, Full-text Search for the UK Governme...
ABD338_MirrorWeb - Powering Large-scale, Full-text Search for the UK Governme...Amazon Web Services
 
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017Amazon Web Services
 
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204) NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204) Amazon Web Services
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansAmazon Web Services
 
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...Amazon Web Services
 
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017Amazon Web Services
 
MCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonMCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonAmazon Web Services
 
GPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting PartnersGPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting PartnersAmazon Web Services
 
STG302_Best Practices for Amazon S3
STG302_Best Practices for Amazon S3STG302_Best Practices for Amazon S3
STG302_Best Practices for Amazon S3Amazon Web Services
 
Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017
Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017
Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017Amazon Web Services
 
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdfAMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdfAmazon Web Services
 

What's hot (20)

RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
 
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
 
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
 
DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...
DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...
DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...
 
MAE401_Designing for DisneyMarvel Studio-Grade Security
MAE401_Designing for DisneyMarvel Studio-Grade SecurityMAE401_Designing for DisneyMarvel Studio-Grade Security
MAE401_Designing for DisneyMarvel Studio-Grade Security
 
AMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AIAMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AI
 
DAT307_Modern Cloud Data Warehousing
DAT307_Modern Cloud Data WarehousingDAT307_Modern Cloud Data Warehousing
DAT307_Modern Cloud Data Warehousing
 
ARC210_Building Scalable Multi-Tenant Email Sending Programs
ARC210_Building Scalable Multi-Tenant Email Sending ProgramsARC210_Building Scalable Multi-Tenant Email Sending Programs
ARC210_Building Scalable Multi-Tenant Email Sending Programs
 
ABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data Applications
 
ABD338_MirrorWeb - Powering Large-scale, Full-text Search for the UK Governme...
ABD338_MirrorWeb - Powering Large-scale, Full-text Search for the UK Governme...ABD338_MirrorWeb - Powering Large-scale, Full-text Search for the UK Governme...
ABD338_MirrorWeb - Powering Large-scale, Full-text Search for the UK Governme...
 
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
 
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204) NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data Oceans
 
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
 
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017
 
MCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonMCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and Gluon
 
GPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting PartnersGPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
 
STG302_Best Practices for Amazon S3
STG302_Best Practices for Amazon S3STG302_Best Practices for Amazon S3
STG302_Best Practices for Amazon S3
 
Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017
Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017
Migrating Large Scale Data Sets to the Cloud - STG204 - re:Invent 2017
 
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdfAMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
 

Similar to GAM306_Building a Lake of Wisdom

GAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game BalancingGAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game BalancingAmazon Web Services
 
GAM309-Breathe Life into a Mobile Game_NoNotes.pdf
GAM309-Breathe Life into a Mobile Game_NoNotes.pdfGAM309-Breathe Life into a Mobile Game_NoNotes.pdf
GAM309-Breathe Life into a Mobile Game_NoNotes.pdfAmazon Web Services
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...Amazon Web Services
 
Detective Controls: Gain Visibility and Record Change
Detective Controls: Gain Visibility and Record ChangeDetective Controls: Gain Visibility and Record Change
Detective Controls: Gain Visibility and Record ChangeAmazon Web Services
 
ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317Amazon Web Services
 
AI Services on AWS - CTO Club JLM
AI Services on AWS - CTO Club JLMAI Services on AWS - CTO Club JLM
AI Services on AWS - CTO Club JLMBoaz Ziniman
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...Amazon Web Services
 
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Amazon Web Services
 
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech TalksAWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech TalksAmazon Web Services
 
SID331_Architecting Security and Governance Across a Multi-Account Strategy
SID331_Architecting Security and Governance Across a Multi-Account StrategySID331_Architecting Security and Governance Across a Multi-Account Strategy
SID331_Architecting Security and Governance Across a Multi-Account StrategyAmazon Web Services
 
Containers on AWS - re:Invent Comes to London 2.0
Containers on AWS - re:Invent Comes to London 2.0Containers on AWS - re:Invent Comes to London 2.0
Containers on AWS - re:Invent Comes to London 2.0Amazon Web Services
 
An Introduction to AI Services on AWS - Web Summit Lisbon
An Introduction to AI Services on AWS -  Web Summit LisbonAn Introduction to AI Services on AWS -  Web Summit Lisbon
An Introduction to AI Services on AWS - Web Summit LisbonBoaz Ziniman
 
CON320_Monitoring, Logging and Debugging Containerized Services
CON320_Monitoring, Logging and Debugging Containerized ServicesCON320_Monitoring, Logging and Debugging Containerized Services
CON320_Monitoring, Logging and Debugging Containerized ServicesAmazon Web Services
 
Manage Infrastructure Securely at Scale and Eliminate Operational Risks - DEV...
Manage Infrastructure Securely at Scale and Eliminate Operational Risks - DEV...Manage Infrastructure Securely at Scale and Eliminate Operational Risks - DEV...
Manage Infrastructure Securely at Scale and Eliminate Operational Risks - DEV...Amazon Web Services
 
Devoxx: Building AI-powered applications on AWS
Devoxx: Building AI-powered applications on AWSDevoxx: Building AI-powered applications on AWS
Devoxx: Building AI-powered applications on AWSAdrian Hornsby
 
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS Summit
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS SummitServerless Stream Processing Tips & Tricks - BDA311 - Chicago AWS Summit
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS SummitAmazon Web Services
 
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksReal-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksAmazon Web Services
 
re:Invent CON320 Tracing and Debugging for Containerized Services
re:Invent CON320 Tracing and Debugging for Containerized Servicesre:Invent CON320 Tracing and Debugging for Containerized Services
re:Invent CON320 Tracing and Debugging for Containerized ServicesCalvin French-Owen
 
Analyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAnalyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAmazon Web Services
 

Similar to GAM306_Building a Lake of Wisdom (20)

GAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game BalancingGAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
 
GAM309-Breathe Life into a Mobile Game_NoNotes.pdf
GAM309-Breathe Life into a Mobile Game_NoNotes.pdfGAM309-Breathe Life into a Mobile Game_NoNotes.pdf
GAM309-Breathe Life into a Mobile Game_NoNotes.pdf
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
 
Detective Controls: Gain Visibility and Record Change
Detective Controls: Gain Visibility and Record ChangeDetective Controls: Gain Visibility and Record Change
Detective Controls: Gain Visibility and Record Change
 
ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317
 
AI Services on AWS - CTO Club JLM
AI Services on AWS - CTO Club JLMAI Services on AWS - CTO Club JLM
AI Services on AWS - CTO Club JLM
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
 
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
 
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech TalksAWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks
 
SID331_Architecting Security and Governance Across a Multi-Account Strategy
SID331_Architecting Security and Governance Across a Multi-Account StrategySID331_Architecting Security and Governance Across a Multi-Account Strategy
SID331_Architecting Security and Governance Across a Multi-Account Strategy
 
Containers on AWS - re:Invent Comes to London 2.0
Containers on AWS - re:Invent Comes to London 2.0Containers on AWS - re:Invent Comes to London 2.0
Containers on AWS - re:Invent Comes to London 2.0
 
Intro to Amazon AI Services
Intro to Amazon AI ServicesIntro to Amazon AI Services
Intro to Amazon AI Services
 
An Introduction to AI Services on AWS - Web Summit Lisbon
An Introduction to AI Services on AWS -  Web Summit LisbonAn Introduction to AI Services on AWS -  Web Summit Lisbon
An Introduction to AI Services on AWS - Web Summit Lisbon
 
CON320_Monitoring, Logging and Debugging Containerized Services
CON320_Monitoring, Logging and Debugging Containerized ServicesCON320_Monitoring, Logging and Debugging Containerized Services
CON320_Monitoring, Logging and Debugging Containerized Services
 
Manage Infrastructure Securely at Scale and Eliminate Operational Risks - DEV...
Manage Infrastructure Securely at Scale and Eliminate Operational Risks - DEV...Manage Infrastructure Securely at Scale and Eliminate Operational Risks - DEV...
Manage Infrastructure Securely at Scale and Eliminate Operational Risks - DEV...
 
Devoxx: Building AI-powered applications on AWS
Devoxx: Building AI-powered applications on AWSDevoxx: Building AI-powered applications on AWS
Devoxx: Building AI-powered applications on AWS
 
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS Summit
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS SummitServerless Stream Processing Tips & Tricks - BDA311 - Chicago AWS Summit
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS Summit
 
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksReal-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
 
re:Invent CON320 Tracing and Debugging for Containerized Services
re:Invent CON320 Tracing and Debugging for Containerized Servicesre:Invent CON320 Tracing and Debugging for Containerized Services
re:Invent CON320 Tracing and Debugging for Containerized Services
 
Analyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAnalyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon Kinesis
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

GAM306_Building a Lake of Wisdom

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT Building a Lake of Wisdom H e n r i H e i s k a n e n G A M 3 0 6 N o v e m b e r 2 7 , 2 0 1 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hollywood Production Executive... Moonlighting as Data Engineering Lead at Rovio Games 20 years of software development 5 years of analytics Who am I?
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Lake of Wisdom”
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Rovio and data Building the lake Learnings
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Possible takeaways How should I build my data lake? How should I handle schema evolution? What are the limitations of Amazon Athena that I need to consider? Are there services out there that would help to accomplish this easier? Could this be our high level architecture in the future? How should I partition my data in S3?
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rovio and data
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Creator of Angry Birds • Founded in 2003 • HQ in Espoo, Finland • ~400 employees in Finland, Sweden, UK, China, and US • Games-first entertainment company • Games • Brand licensing • 3.7 billion game downloads* • 80 million monthly active users and 11 million daily active users** • Angry Birds Movie About Rovio * at June 30, 2017 ** during the three months ended June 30, 2017
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 0 2 4 6 8 10 12 14 Puzzle & Casino subgenres Tap & Clear Puzzle Shooter Casino Card Casino Bingo Puzzle Card Puzzle Words String Match Impression Click Install Purchase Rovio games and data Market research Performance marketing Game optimization
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 011101010010110001001001010 101001010101010101010101010 111111100101010101010100101 010101010101010100101010101 010010101010101010101001010 101010101010010101010111101 001010101010101010101001110 010100010010100101010100100 101100100101010101010101001 010010101101011100101010001 101010101101010100100101001 010100110100101010101010010 101010101001010101010100101 010101010100101010101010101 010010101010010101010101010 Our data philosophy Single truth Sharing is caring We own our future ???
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pre-lake architecture Q1/2017
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Single truth ¯_(ツ)_/¯ Our data philosophy revisited “It’s not perfect, but don’t we have more important things to do?” Multiple dashboards with different DAU Sharing is caring We own our future Limited access to game dashboards and raw data I suppose…
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. …And then this happened! Our analytics platform provider got acquired by another mobile games company!
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Let’s have a quick look at what we already had
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Payment Ads Push Analytics A/B testing Segmentation What is Beacon?
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is missing? The ability for data analysts to build game-specific dashboards using game events joined with portfolio-wide games business data.
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. BUY BUILD What we were set to do “Well, whatever… but do it in seven weeks!” or
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building the lake
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What do we need? Petabyte scale “data warehouse” with fast query access to raw datasets Ability to efficiently query and process data from multiple systems Data visualization SDK for data analysts to create dashboards that can be integrated into our Beacon UI
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data lakes are all the rage “A data lake is a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms, usually object blobs or files.”–Wikipedia
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. But we have a data lake already then? Human latency due to cluster launching and lacking schemas Query latency due to inefficient data formats Only few a people who knew how and wanted to access the raw data
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why Amazon Athena? We are fluent with AWS Data and processing layers separated Pay-per-use billing Performance is at least on par with alternative solutions
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SELECT m.t, COUNT(DISTINCT s.aid1) AS dau FROM abba_raw where processdate = 20170116 group by m.t; Amazon Athena performance Data Format Database X Presto 21 x m3.xlarge Amazon Athena Json + SEQ + gzip N/A 9 min 1 min 50 sec ORC N/A 37 sec 9 sec DB Native 35 sec N/A N/A
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. We need columnar format Flatten Re-partition Manage schema
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. We also need profiles Analysis is almost always done for a specific player cohort In the old system, each raw event row was enriched with player state Can we build a system where we do not need to bake in player state, but instead join dynamically with acceptable performance?
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. select b.registration_date, avg(a.level_id) as avg_level from ( select max(cast(m_level_id as integer)) as level_id, s_aid2 as player_id from default.abba_orc where m_t = 'level_finished' and processdate = 20170116 group by s_aid2 ) as a left join wolery_orc b on a.player_id = b.player_id and b.app_id = 'Abba' group by b.registration_date; How about joins in Amazon Athena? Run time: 12.73 seconds
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Profiles and daily aggregates • Profile is the user journey across the Rovio portfolio • Origin: paid, xpromo, network, organic • Time spent • Purchases • Predictions: LTV, churn, cheating… • Aggregates are game agnostic daily/cumulative activities • Time spent on one day / cumulative time spent until that day • Data in Amazon Redshift
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena Technology selections
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rovio data crash course
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Produced by game clients and services • Have standard headers and custom message body • In JSON format • Protobuf packets encoded from game clients Analytics events { "o": { "ts": "2015-02-20T13:29:06.054+0000" }, "s": { "aid1": "AN7624...", "cid": "analyticstestapp_7bc0125b", "cver": "1.6.0", }, "t": { "dt": "iPad3,1", "geo": "FI", "os": "iOS", "osv": "7.0.4", }, "m": { "t": "Conquer_The_World", "weapon_of_choice": "water pistol" } }
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Service topics contain events from all games • Game topics have events from a single game • Game clients and/or server Kafka topics
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. audit.ads/ audit.session/ processdate=20170926/ 1_0_00000000002252646951.gz 1_0_00000000002252832986.gz 1_0_00000000002252943052.gz 1_0_00000000002253053431.gz ... processdate=20170925/ processdate=20170924/ ... audit.ua/ audit.wallet/ collector.AngryBirdsFriends/ collector.Crimson/ collector.Yellow/ collector.nibblers2_2c37376e/ collector.ocean_2ee459c8/ collector.tntgame_149fb4d3/ ... • Events from Kafka are stored to Amazon S3 as compressed sequence files with one JSON object per row • Partitioned by topic and processing date Raw event Amazon S3 sink
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. app_id=Abba/ app_id=AngryBirdsClassic/ topic=audit.ads/ topic=audit.identity/ topic=audit.session/ topic=audit.supermoon/ topic=audit.wallet/ topic=collector.angrybirdsclassic/ processdate=20170926/ eventtype=mightyleaguedemotion/ eventtype=mightyleagueentered/ 000058_0 processdate=20170925/ processdate=20170924/ ... app_id=BadPiggiesFull/ ... • Events stored as compressed ORC files • Partitioned by game, topic, processing date, and event type • Additional event type partition improved performance over ORC indexing Target: ORC Amazon S3 sink
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CREATE EXTERNAL TABLE abba.client_events ( `o_ts` STRING, `s_aid1` STRING, `s_cid` STRING, `s_cver` STRING, `t_dt` STRING, `t_geo` STRING, `t_os` STRING, `t_osv` STRING, `m_weapon_of_choice` STRING, `t_os` STRING ) PARTITIONED BY (processdate INT, eventtype STRING) STORED AS ORC LOCATION 's3://bucket/app_id=Abba/topic=collector.abba'; • Flat structure • Schema per game • Table per Kafka topic • Partitioned by date and event type Target: ORC table DDL
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Problem: ORC schema evolution • Years of raw data from multiple games and services • Monthly game updates introduce new fields • Reprocessing everything takes time and is expensive • Solution: Maintain ORC file compatibility • Do not remove fields • Do not change the data type of the field • Append the new fields to the end
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. static void discover(AmazonS3Client s3, JavaSparkContext sc, String input, String output, String processdate, String[] whiteList) throws Exception { for (TableMeta folder : listFolders(s3, input, whiteList)) { String path = folder.getPath() + (folder.getPath().endsWith("/") ? "" : "/") + "processdate=" + processdate; JavaPairRDD<Text, Text> rdd = sc.sequenceFile(path, Text.class, Text.class); JavaRDD<String> raw = rdd.map(tuple -> tuple._2().toString()); SQLContext sql = new SQLContext(sc); Dataset<Row> df = sql.jsonRDD(raw); df.printSchema(); // write json schema writeToS3(s3, folder, df.schema().prettyJson(), output, ".json"); } } Schema discovery Spark job
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Schema discovery pipeline
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Problem: Number of partitions • Angry Birds 2 has ~100 event types and 2 years of ORC data • 100 * 365 * 2 = 73,000 partitions • ADD COLUMNS did not work • Schema update needs DROP and CREATE TABLE • MSCK REPAIR and DROP TABLE times out • Workaround: Add and remove partitions in batches of 10 and parallelize
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SET hive.optimize.sort.dynamic.partition=false; INSERT OVERWRITE TABLE tmp.abba_orc_tmp_${hiveconf:processdate} PARTITION (app_id, topic, processdate, eventtype) SELECT `h`.`v` AS h_v, `m`.`ad_type` AS m_ad_type, `m`.`ANDROID_ADVERTISING_ID` AS m_ANDROID_ADVERTISING_ID, `m`.`arena_birds_used` AS m_arena_birds_used, `m`.`arena_entry` AS m_arena_entry, `m`.`arena_position` AS m_arena_position, `s`.`aid1` AS s_aid1, 'Abba' AS app_id, 'collector.abba' AS topic, processdate AS processdate, regexp_replace(regexp_replace(lower(substr(m.t,1,100)), 's', '__'), '[^w]', '_') AS eventtype FROM abba.client_collector WHERE processdate=${hiveconf:processdate} DISTRIBUTE BY CASE WHEN eventtype IN ('frame_unlocked', 'performance_event', 'feat_progress', 'level_finished', 'level_started', 'currency_change') THEN CONCAT(eventtype, ABS(HASH(o_ts))%10) ELSE eventtype END; Data conversion Hive job
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data conversion pipeline
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The lake architecture Q3/2017
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. So…what was our deliverable?
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Well-defined schema… Table Description client_events Analytics events sent from the game client identity_events Analytics events sent from login service session_events Analytics events sent from new login service ads_events Impression and click information received from ads service wallet_events IAP purchase events received from wallet service supermoon_events Rule matching information from Manage tool player_profile Player profile information such as registration date and user origin player_daily Cumulative daily player activity and purchase information
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. …You can query in Amazon Athena…
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. …And publish insight in Beacon
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. …And as a bonus we got the lake
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Learnings
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ORC vs. Parquet vs. something else “ORC is newest and fastest but Parquet is more widely supported at the moment”
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “INSERT INTO” support “We are running reporting queries in Hive and Presto, which brings extra cost and complexity to the system”
  • 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. External metastore support “We keep two identical metastores up-to-date”
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Mind the number of partitions “The biggest performance problems come from running the DDL statements”
  • 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Concurrent query limits “The default concurrent query limit of five is far from what data hungry enterprise needs”
  • 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. New products and services “Stay informed on the new product launches and constantly evolve your architecture”
  • 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Schema discovery is easy, schema management can get complicated “Maybe take a look at Amazon Glue instead of building everything by yourself”
  • 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bias “Unbiased information is hard to find. For every blog there is a counter blog. Test your own use case, if possible.”
  • 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena, Presto, Hive, and Spark are great together “Amazon Athena provides the data for a wider audience, but our data architecture allows us to access the Lake of Wisdom with other tools for more advanced use cases, if needed”
  • 56. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank You!