Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

MOVING
MOUNTAINS OF
PLAYER DATA SEAN MALONEY
RIOT GAMES
@SEAN_SEANNERY
SCALABLE INTERNET SERVICES
UCLA/UCSB - NOV 2015

SEAN MALONEY
BIG DATA ENGINEER
WHO IS THIS GUY?
Lead developer on Riot’s ETL tools
FUN FACT:
Was a student in this class 4 years
ago
Intern at Appfolio

MOVING MOUNTAINS OF DATA
INTRODUCTION1.
THE GAME PLATFORM: OUR MAIN DATA SOURCE2.
HOW WE INGEST AND QUERY DATA3.
HOW WE SCALE IN AWS4.
CONCLUSION - SEAN’S PRO TIPS5.

WHAT IS LEAGUE OF LEGENDS?
2009
LAUNCH
ONLINE
MULTIPLAYER
WINDOWS
/ OSX
40-50 MIN
GAMES

THE
TEAM
YOUR CHAMP
THE
BATTLE
GROUND

CHAT
STORE AUDIT
Load Balancers and Firewalls

CHAT
ORACLE COHERENCE (IN MEMORY DB)
STORE AUDIT GAME ETC.
CHAT
CHAT
PRIMARY DB
HOT BACKUP DB
2nd BACKUP DB
/ ETL

PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL
- OLTP game data
- External Data Sources
MASTER WAREHOUSE
HONU
- Anything pushed to it
- Server logs
DATA AUDITING

PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE
QUERY /
VIEWS
VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL
- OLTP game data
- External Data Sources
MASTER WAREHOUSE
HONU
- Anything pushed to it
- Server logs
DATA AUDITING

Distributed ETL Software written in
Ruby.
Scales Horizontally
Same ETL applied to multiple regions
/ datacenters
Self-Service UI with SQL query
templating.

Amazon S3
SQS
(S)FTP
Hive
Microsoft SQL Server
MySQL
DynamoDB
Vertica
Redshift
REST websites
FUETL
CAN
CONNECT
TO

Webapp
Core Libraries
Task Service
Tasks
Helper Service
Helpers
Environment
Service
Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool
View
- backbone.js
- Bootstrap CSS
Task DAO Helper DAOEnvironment
DAO
Env. Task DAO Env. Helper DAO

Webapp
Core Libraries
Task Service
Tasks
Task DAO
Helper Service
Helpers
Helper DAO
Environment
Service
Environment
DAO
Scheduler Process Worker Process Task / Helper / Controllers
Env. Task DAO Env. Helper DAO
Command Line Tool
View
- backbone.js
- Bootstrap CSS

FuETL STATISTICS
14 TB
DATA MOVED DAILY
5213
ACTIVE REGIONAL
ETLS
23125
DAILY ETL RUNS

Idempotency
Idempotent - an operation that will produce the
same results if executed once or multiple times
EXAMPLE:
Non-Idempotent: - x = x * 5;
- Submitting a purchase
Idempotent: - abs( abs(x) ) = abs(X)
- Cancelling a purchase

Idempotent?
In the transactional OLTP world….
INSERT INTO games_played
(SELECT * FROM games_played_na
WHERE date >= ‘2015-10-25’)

Idempotent?
In the big data / OLAP world….
INSERT INTO games_played
(SELECT * FROM games_played_na
WHERE date >= ‘2015-10-25’)

Message Queues
ETL2ETL3ETL4ETL5. . .ETLN
ETL1
X
XSCHEDULER
aka
PRODUCER
WORKER
aka
CONSUMER

Message Queues
● REDUNDANCY
● DELIVERY GUARANTEE
● SCALABILITY
● ASYCH. COMMUNICATION
● ABSTRACTION / DECOUPLING

Message Queues
● AMAZON SIMPLE QUEUE SERVICE
● APACHE ACTIVEMQ
● RABBITMQ
● HORNETQ
● MICROSOFT MQ (MSMQ)

Self Service, Custom HTTP Edge
Service (Java)
0
Fronted by ELB in front of ~40
autoscaled m1.xlarge instances
Forwards JSON data indirectly to S3
Honu
The batches need to then be unpacked
and converted into Hive tables
0

Custom Collector Infrastructure
(Java) - Derived from Netflix Suro
0
Deployed in every data center
worldwide and also AWS
Self Service, Custom HTTP Edge
Service (Java API)
Honu

Custom HTTP Edge Service (Java)
0
DRADIS Fronted by ELB in front of ~40 m1.
xlarge instances
Forwards data indirectly to S3 via
Honu Collectors

Honu
JSONJSONJSONJSONJSONJSON
COLLECTORS
R
E
S
T
E
N
D
P
O
I
N
T

Honu
COLLECTORS
R
E
S
T
E
N
D
P
O
I
N
T
batchid = 20150512

Honu
COLLECTORS
R
E
S
T
E
N
D
P
O
I
N
T
GAM1GAM1GAM1
GAM
X
GAM1GAM1

Idempotency
Use application logic to make idempotent
msg = queue.pop;
if (processed_games.contains( msg.game_id )
{
return; //do nothing
else {
process_game(msg);
}

What’s in there?
Data team doesn’t know everything that is submitted
Compliance
Are we violating international data laws?
Inconsistent data structure
Its formatted however developer submits it
THE
DOWN
SIDE

User Documentation
No one likes doing it, but it helps a lot.
Onboard training
Get new coworkers in-the-know
Familiar Protocols
Use REST or RPC so developers are on the same page
Focus on UX
Your tools need to be easy for non-technical people to use.
SELF
SERVICE
HOW?

AMAZON S3
s3n://datawarehouse/
schema1/
table1/
env/
dt/
time/
table2/
table3/
schema2/
s3n://telemetrydata/
application1/
table1/
env/
dt/
table2/
application2/
AMAZON S3 STRUCTURE
HIVE
‣ schema1
table1
env
dt
time
table2
table3
‣ schema2
table1
...
‣ schema3
‣ schema4

REST micro-service built with Java
and docker.
Reports and visualizations we can
use to find problems.
Source and target comparison.
Warehouse
Auditing
Service
Platform

RDS
AWS Infrastructure Today
EMR EC2 Storage
Data
Science
Analytics /
Hue
ETL Telemetry
PlatforaDynamoDB
Loading
Auditing ETL
Telemetry
collectors
Data
dictionary
Rocana
(real time
dashboard)
Solr (real
time)
Point Data
Service
Metastore
Data
Science
Fraud
DYNAMODB
ETL App DB
Point Data
Store
S3
Source of “Truth”
Networking
VPC
AWS Direct
Connect
AWS Direct
Connect
AWS Direct
Connect
AWS Direct
Connect

DON’T
SEAN’S PRO TIPS OF THE DAY
DO
➔ Don’t wait. Create S3
permissions and naming
standards early
➔ Get an auditing solution
for DW accuracy
➔ Allocate time for tuning
AWS infrastructure
➔ Don’t forget to track cost.
AWS bills can surprise you
➔ Don’t underestimate simple
problems in big data.
➔ Prepare for multiple data
access patterns
➔ Keep idempotency in mind
and use MQ architecture
➔ Don’t stop. Believing

Custom rewards for mastering
different champions
Intensive query that spans every
game that every player has played
Improves player engagement
CHAMPION
MASTERY

Full copy of our data warehouse in
DynamoDB
Hive->DynamoDB Dynamic Partition
Support can answer questions faster
than ever.
PLAYER
SUPPORT

Data science team queries all chat
messages in game
Sentiment analysis and
classification
Identifies negative, offensive players
and mutes them automatically.
OFFENSIVE
CHAT
DETECTION

QUESTIONS?
SMALONEY
@RIOTGAMES.COM
@SEAN_SEANNERYengineering.riotgames.com
ENGINEERING
BLOG

Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

Similar to Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA (20)

Recently uploaded

Recently uploaded (20)

Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA