SlideShare a Scribd company logo
RIOT GAMES
SOME CATCHY STATEMENT ABOUT WORKFLOWS
ANDYORDLES
MATT GOEKE
INTRODUCTION
1
2
3
4
5
6
7
INTRO1
2
3
4
5
6
7
ABOUT THE SPEAKER
•  Previous workflow architecture
•  What Oozie is
•  How we incorporated Oozie
– Relational Data Pipeline
– Non-relational Data Pipeline
•  Lessons learned
•  Where we’re headed
THIS PRESENTATION IS ABOUT…1
2
3
4
5
6
7
INTRO
•  Developer and publisher of League of Legends
•  Founded 2006 by gamers for gamers
•  Player experience focused
– Needless to say, data is pretty important to
understanding the player experience!
WHO is RIOT GAMES?1
2
3
4
5
6
7
INTRO
1
2
3
4
5
6
7
INTRO
LEAGUE OF LEGENDS
ARCHITECTURE
1
2
3
4
5
6
7
ClientMobile
WWW
1
2
3
4
5
6
7
Architecture
HIGH LEVEL OVERVIEW
ClientMobile
WWW
1
2
3
4
5
6
7
Architecture
HIGH LEVEL OVERVIEW
1
2
3
4
5
6
7
Architecture
WHY WORKFLOWS?
•  Retry a series of jobs in the event of failure
•  Execute jobs at a specific time or when data is
available
•  Correctly order job execution based on
resolved dependencies
•  Provide a common framework for
communication and execution of production
process
•  Use the the workflow to couple resources
instead of having a monolithic code base
1
2
3
4
5
6
7
Architecture
PREVIOUS ARCHITECTURE
Tableau
Hive Data
Warehouse
CRON
+
Pentaho
+
Custom ETL
+
Sqoop
MySQLPentaho
Analysts
EUROPE
Audit Plat
LoL
KOREA
Audit Plat
LoL
NORTH AMERICA
Audit Plat
LoL
Business
Analyst
1
2
3
4
5
6
7
Architecture
ISSUES WITH PREVIOUS PROCESS
•  All of the ETL processes were run on one node
which limited concurrency
•  If our main runner execution died then the
whole ETL for that day would need to be
restarted
•  No reporting of what was run or the
configuration of the ETL without log diving on
the actual machine
•  No retries (outside of native MR tasks) and no
good way to rerun a previous config if the
underlying code has been changed
1
2
3
4
5
6
7
Architecture
PREVIOUS ARCHITECTURE
Tableau
Hive Data
Warehouse
CRON
+
Pentaho
+
Custom ETL
+
Sqoop
MySQLPentaho
Analysts
EUROPE
Audit Plat
LoL
KOREA
Audit Plat
LoL
NORTH AMERICA
Audit Plat
LoL
Business
Analyst
1
2
3
4
5
6
7
Architecture
SOLUTION
Tableau
Hive Data
Warehouse
Oozie MySQLPentaho
Analysts
EUROPE
Audit Plat
LoL
KOREA
Audit Plat
LoL
NORTH AMERICA
Audit Plat
LoL
Business
Analyst
OOZIE
1
2
3
4
5
6
7
Oozie
1
2
3
4
5
6
7
WHAT IS OOZIE?
•  Oozie is a workflow scheduler system to
manage Apache Hadoop jobs
•  Oozie is integrated with the rest of the Hadoop
stack supporting several types of Hadoop jobs
out of the box as well as system specific jobs
•  Oozie is a scalable, reliable and extensible
system
Oozie
1
2
3
4
5
6
7
WHY OOZIE?
No need to create custom hooks for job submission
NATIVE HADOOP INTEGRATION
Jobs are spread against available mappers
HORIZONTALLY SCALABLE
The project has strong community backing and has
committers from several companies
OPEN SOURCE
Logging and debugging is extremely quick with the web
console and SQL
VERBOSE REPORTING
Oozie
1
2
3
4
5
6
7
HADOOP ECOSYSTEM
Oozie
1
2
3
4
5
6
7
HADOOP ECOSYSTEM
HDFS
Oozie
1
2
3
4
5
6
7
HADOOP ECOSYSTEM
MAPREDUCE
HDFS
Oozie
1
2
3
4
5
6
7
HADOOP ECOSYSTEM
PIG SQOOP HIVE
MAPREDUCE
HDFS
JAVA
Oozie
1
2
3
4
5
6
7
HADOOP ECOSYSTEM
OOZIE
PIG SQOOP HIVE
MAPREDUCE
HDFS
JAVA
1
2
3
4
5
6
7
Oozie
LAYERS OF OOZIE
Action (1..N)
Workflow
Coordinator
(1..N)
Bundle Bundle
Coord
Action
WF Job
MR / Pig /
Java / Hive /
Sqoop
1
2
3
4
5
6
7
Oozie
LAYERS OF OOZIE
Action (1..N)
Workflow
Coordinator
(1..N)
Bundle Bundle
Coord
Action
WF Job
MR / Pig /
Java / Hive /
Sqoop
Oozie
1
2
3
4
5
6
7
WORKFLOW ACTION: JAVA
<action name=”java-node">
<java>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<main-class>com.riotgames.MyMainClass</main-class>
<java-opts>-Dfoo</java-opts>
<arg>bar<arg>
</java>
<ok to=”next"/>
<error to=”error"/>
</action>
•  Workflow actions are the most
granular unit of work
Oozie
1
2
3
4
5
6
7
WORKFLOW ACTION: JAVA
<action name=”java-node">
<java>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<main-class>com.riotgames.MyMainClass</main-class>
<java-opts>-Dfoo</java-opts>
<arg>bar<arg>
</java>
<ok to=”next"/>
<error to=”error"/>
</action>
1
java-node
1
•  Workflow actions are the most
granular unit of work
Oozie
1
2
3
4
5
6
7
WORKFLOW ACTION: JAVA
<action name=”java-node">
<java>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<main-class>com.riotgames.MyMainClass</main-class>
<java-opts>-Dfoo</java-opts>
<arg>bar<arg>
</java>
<ok to=”next"/>
<error to=”error"/>
</action>
1
2
nextjava-node
1 2
•  Workflow actions are the most
granular unit of work
Oozie
1
2
3
4
5
6
7
WORKFLOW ACTION: JAVA
<action name=”java-node">
<java>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<main-class>com.riotgames.MyMainClass</main-class>
<java-opts>-Dfoo</java-opts>
<arg>bar<arg>
</java>
<ok to=”next"/>
<error to=”error"/>
</action>
1
2
3
nextjava-node
error
Error
1 2
3
•  Workflow actions are the most
granular unit of work
Oozie
1
2
3
4
5
6
7
WORKFLOW ACTION: MAPREDUCE
<action name="myfirstHadoopJob">
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<prepare>
<delete path="hdfs://foo:9000/usr/foo/output-data"/>
</prepare>
<job-xml>/myfirstjob.xml</job-xml>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>/usr/foo/input-data</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/usr/foo/input-data</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>${firstJobReducers}</value>
</property>
</configuration>
</map-reduce>
<ok to="myNextAction"/>
<error to="errorCleanup"/>
</action>
Oozie
1
2
3
4
5
6
7
WORKFLOW ACTION: MAPREDUCE
<action name="myfirstHadoopJob">
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<prepare>
<delete path="hdfs://foo:9000/usr/foo/output-data"/>
</prepare>
<job-xml>/myfirstjob.xml</job-xml>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>/usr/foo/input-data</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/usr/foo/input-data</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>${firstJobReducers}</value>
</property>
</configuration>
</map-reduce>
<ok to="myNextAction"/>
<error to="errorCleanup"/>
</action>
•  Each action has a type
and each type has
defined set of key:values
that can be used to
configure it
Oozie
1
2
3
4
5
6
7
WORKFLOW ACTION: MAPREDUCE
<action name="myfirstHadoopJob">
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<prepare>
<delete path="hdfs://foo:9000/usr/foo/output-data"/>
</prepare>
<job-xml>/myfirstjob.xml</job-xml>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>/usr/foo/input-data</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/usr/foo/input-data</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>${firstJobReducers}</value>
</property>
</configuration>
</map-reduce>
<ok to="myNextAction"/>
<error to="errorCleanup"/>
</action>
•  Each action has a type
and each type has
defined set of key:values
that can be used to
configure it
The action must also
specify which actions to
direct to based on
success or failure
1
2
3
4
5
6
7
Oozie
LAYERS OF OOZIE
Action (1..N)
Workflow
Coordinator
(1..N)
Bundle Bundle
Coord
Action
WF Job
MR / Pig /
Java / Hive /
Sqoop
1
2
3
4
5
6
7
Oozie
THE WORKFLOW ENGINE
Start
End
fork joinMapReduce
Java
Sqoop
Hive
HDFS
Shell
decision
•  Oozie runs workflows in
the form of DAGs (directed
acyclical graphs)
•  Each element in this
workflow is an action
•  Some node types are
processed internally to
Oozie vs farmed to the
cluster
1
2
3
4
5
6
7
Oozie
WORKFLOW EXAMPLE
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
<start to=‘java-node’/>
<action name=”java-node">
...
</action>
<end name=‘end’/>
<kill name=‘fail’/>
</workflow-app>
•  This workflow will run
the action defined as
java-node
1
2
3
4
5
6
7
Oozie
WORKFLOW EXAMPLE
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
<start to=‘java-node’/>
<action name=”java-node">
...
</action>
<end name=‘end’/>
<kill name=‘fail’/>
</workflow-app> start java-node
•  This workflow will run
the action defined as
java-node
1
1
1
2
3
4
5
6
7
Oozie
WORKFLOW EXAMPLE
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
<start to=‘java-node’/>
<action name=”java-node">
...
</action>
<end name=‘end’/>
<kill name=‘fail’/>
</workflow-app> start endjava-node
•  This workflow will run
the action defined as
java-node
1
2
1 2
1
2
3
4
5
6
7
Oozie
WORKFLOW EXAMPLE
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
<start to=‘java-node’/>
<action name=”java-node">
...
</action>
<end name=‘end’/>
<kill name=‘fail’/>
</workflow-app> start endjava-node
fail
Error
•  This workflow will run
the action defined as
java-node
1
2
3
1 2
3
1
2
3
4
5
6
7
Oozie
LAYERS OF OOZIE
Action (1..N)
Workflow
Coordinator
(1..N)
Bundle Bundle
Coord
Action
WF Job
MR / Pig /
Java / Hive /
Sqoop
1
2
3
4
5
6
7
Oozie
COORDINATOR
•  Oozie coordinators can execute workflows based on time and data
dependencies
•  Each coordinator is specified a workflow to execute upon meeting its
trigger criteria
•  Coordinators can pass variables to the workflow layer allowing for
dynamic resolution
Client Oozie Coordinator
Oozie Workflow
Oozie Server
Hadoop
1
2
3
4
5
6
7
Oozie
EXAMPLE COORDINATOR
<?xml version="1.0" ?><coordinator-app end="${COORD_END}"
frequency="${coord:hours(1)}" name="test_job_coord" start="$
{COORD_START}" timezone="UTC" xmlns="uri:oozie:coordinator:
0.1">
<action>
<workflow>
<app-path>hdfs://bar:9000/user/hadoop/oozie/app/test_job</
app-path>
</workflow>
</action>
</coordinator-app> •  This coordinator
will run every
hour and invoke
the workflow
found in the /
test_job folder
1
2
3
4
5
6
7
Oozie
EXAMPLE COORDINATOR
<?xml version="1.0" ?><coordinator-app end="${COORD_END}"
frequency="${coord:hours(1)}" name="test_job_coord" start="$
{COORD_START}" timezone="UTC" xmlns="uri:oozie:coordinator:
0.1">
<action>
<workflow>
<app-path>hdfs://bar:9000/user/hadoop/oozie/app/test_job</
app-path>
</workflow>
</action>
</coordinator-app> •  This coordinator
will run every
hour and invoke
the workflow
found in the /
test_job folder
1
2
3
4
5
6
7
Oozie
EXAMPLE COORDINATOR
<?xml version="1.0" ?><coordinator-app end="${COORD_END}"
frequency="${coord:hours(1)}" name="test_job_coord" start="$
{COORD_START}" timezone="UTC" xmlns="uri:oozie:coordinator:
0.1">
<action>
<workflow>
<app-path>hdfs://bar:9000/user/hadoop/oozie/app/test_job</
app-path>
</workflow>
</action>
</coordinator-app> •  This coordinator
will run every
hour and invoke
the workflow
found in the /
test_job folder
1
2
3
4
5
6
7
Oozie
LAYERS OF OOZIE
Action (1..N)
Workflow
Coordinator
(1..N)
Bundle Bundle
Coord
Action
WF Job
MR / Pig /
Java / Hive /
Sqoop
Oozie
1
2
3
4
5
6
7
BUNDLE
Client
Oozie Coordinator
Oozie Workflow
Oozie Server
Hadoop
Oozie Coordinator
Oozie Workflow
Oozie Bundle
•  Bundles are higher level abstractions that will batch a set of
coordinators together.
•  There is no explicit dependency between coordinators within a
bundle but it can be used to more formally define a data pipeline
1
2
3
4
5
6
7
Oozie
THE INTERFACE
Multiple ways to interact with Oozie:
•  Web Console (read only)
•  CLI
•  Java client
•  Web Service Endpoints
•  Directly with the DB using SQL
The Java client / CLI are just an abstraction for
the web service endpoints and it is easy to
extend this functionality in your own apps.
1
2
3
4
5
6
7
Oozie
PIECES OF A DEPLOYABLE
The list of components that are needed for a scheduled
workflow:
•  Coordinator.xml
Contains the scheduler definition and path to
workflow.xml
•  Workflow.xml
Contains the job definition
•  Libraries
Optional jar files
•  Properties file (also possible through WS call)
Initial parameterization and mandatory
specification of coordinator path
1
2
3
4
5
6
7
Oozie
JOB.PROPERTIES
NAME_NODE=hdfs://foo:9000
JOB_TRACKER=bar:9001
oozie.libpath=${NAME_NODE}/user/hadoop/oozie/
share/lib
oozie.coord.application.path=${NAME_NODE}/user/
hadoop/oozie/app/test_job
Important note:
•  Any variable put into the job.properties will be
inherited by the coordinator / workflow
•  E.g. Given the key:value workflow_name=test_job
you can access it using ${workflow_name}
1
2
3
4
5
6
7
Oozie
COORDINATOR SUBMISSION
•  Deploy the workflow and coordinator to HDFS
$ hadoop fs –put test_job oozie/app/
•  Submit and run the workflow job
$ oozie job -run -config job.properties
•  Check the coordinator status on the web console
1
2
3
4
5
6
7
Oozie
WEB CONSOLE
1
2
3
4
5
6
7
Oozie
WEB CONSOLE: COORDINATORS
1
2
3
4
5
6
7
Oozie
WEB CONSOLE: COORDINATOR DETAILS
WEB CONSOLE: JOB DETAILS1
2
3
4
5
6
7
Oozie
WEB CONSOLE: JOB DAG1
2
3
4
5
6
7
Oozie
WEB CONSOLE: JOB DETAILS1
2
3
4
5
6
7
Oozie
WEB CONSOLE:ACTION DETAILS1
2
3
4
5
6
7
Oozie
JOB TRACKER1
2
3
4
5
6
7
Oozie
A USE CASE: HOURLY JOBS1
2
3
4
5
6
7
Oozie
Replace a current CRON job that runs a bash script once a
day (6):
•  The shell will execute a Java main which pulls data from a
filestream (1), dumps it to HDFS and then runs a
MapReduce job on the files (2). It will then email a person
when the report is done (3).
•  It should start within X amount of time (4)
•  It should complete withinY amount of time (5)
•  It should retry Z times on failure (automatic)
WORKFLOW.XML1
2
3
4
5
6
7
Oozie
<workflow-app name=“filestream_wf" xmlns="uri:oozie:workflow:0.1">
<start to=‘java-node’/>
<action name=”java-node">
<java>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<main-class>org.foo.bar.PullFileStream</main-class>
<arg>argument1</arg>
</java>
<ok to=”mr-node"/>
<error to=”fail"/>
</action>
<action name=“mr-node”>
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<configuration>
...
</configuration>
</map-reduce>
<ok to=”email-node"/>
<error to=”fail"/>
</action>
...
...
<action name=”email-node">
<email xmlns="uri:oozie:email-action:0.1">
<to>customer@foo.bar</to>
<cc>employee@foo.bar</cc>
<subject>Email notification</subject>
<body>The wf completed</body>
</email>
<ok to="myotherjob"/>
<error to="errorcleanup"/>
</action>
<end name=‘end’/>
<kill name=‘fail’/>
</workflow-app>
WORKFLOW.XML1
2
3
4
5
6
7
Oozie
<workflow-app name=“filestream_wf" xmlns="uri:oozie:workflow:0.1">
<start to=‘java-node’/>
<action name=”java-node">
<java>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<main-class>org.foo.bar.PullFileStream</main-class>
<arg>argument1</arg>
</java>
<ok to=”mr-node"/>
<error to=”fail"/>
</action>
<action name=“mr-node”>
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<configuration>
...
</configuration>
</map-reduce>
<ok to=”email-node"/>
<error to=”fail"/>
</action>
...
...
<action name=”email-node">
<email xmlns="uri:oozie:email-action:0.1">
<to>customer@foo.bar</to>
<cc>employee@foo.bar</cc>
<subject>Email notification</subject>
<body>The wf completed</body>
</email>
<ok to="myotherjob"/>
<error to="errorcleanup"/>
</action>
<end name=‘end’/>
<kill name=‘fail’/>
</workflow-app>
1
WORKFLOW.XML1
2
3
4
5
6
7
Oozie
<workflow-app name=“filestream_wf" xmlns="uri:oozie:workflow:0.1">
<start to=‘java-node’/>
<action name=”java-node">
<java>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<main-class>org.foo.bar.PullFileStream</main-class>
<arg>argument1</arg>
</java>
<ok to=”mr-node"/>
<error to=”fail"/>
</action>
<action name=“mr-node”>
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<configuration>
...
</configuration>
</map-reduce>
<ok to=”email-node"/>
<error to=”fail"/>
</action>
...
...
<action name=”email-node">
<email xmlns="uri:oozie:email-action:0.1">
<to>customer@foo.bar</to>
<cc>employee@foo.bar</cc>
<subject>Email notification</subject>
<body>The wf completed</body>
</email>
<ok to="myotherjob"/>
<error to="errorcleanup"/>
</action>
<end name=‘end’/>
<kill name=‘fail’/>
</workflow-app>
1
2
WORKFLOW.XML1
2
3
4
5
6
7
Oozie
<workflow-app name=“filestream_wf" xmlns="uri:oozie:workflow:0.1">
<start to=‘java-node’/>
<action name=”java-node">
<java>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<main-class>org.foo.bar.PullFileStream</main-class>
<arg>argument1</arg>
</java>
<ok to=”mr-node"/>
<error to=”fail"/>
</action>
<action name=“mr-node”>
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<configuration>
...
</configuration>
</map-reduce>
<ok to=”email-node"/>
<error to=”fail"/>
</action>
...
...
<action name=”email-node">
<email xmlns="uri:oozie:email-action:0.1">
<to>customer@foo.bar</to>
<cc>employee@foo.bar</cc>
<subject>Email notification</subject>
<body>The wf completed</body>
</email>
<ok to="myotherjob"/>
<error to="errorcleanup"/>
</action>
<end name=‘end’/>
<kill name=‘fail’/>
</workflow-app>
1
2
3
COORDINATOR.XML1
2
3
4
5
6
7
Oozie
<?xml version="1.0" ?><coordinator-app end="${COORD_END}"
frequency="${coord:days(1)}" name=”daily_job_coord" start="$
{COORD_START}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1”
xmlns:sla="uri:oozie:sla:0.1">
<action>
<workflow>
<app-path>hdfs://bar:9000/user/hadoop/oozie/app/test_job</app-path>
</workflow>
<sla:info>
<sla:nominal-time>${coord:nominalTime()}</sla:nominal-time>
<sla:should-start>${X * MINUTES}</sla:should-start>
<sla:should-end>${Y * MINUTES}</sla:should-end>
<sla:alert-contact>foo@bar.com</sla:alert-contact>
</sla:info>
</action>
</coordinator-app>
COORDINATOR.XML1
2
3
4
5
6
7
Oozie
<?xml version="1.0" ?><coordinator-app end="${COORD_END}"
frequency="${coord:days(1)}" name=”daily_job_coord" start="$
{COORD_START}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1”
xmlns:sla="uri:oozie:sla:0.1">
<action>
<workflow>
<app-path>hdfs://bar:9000/user/hadoop/oozie/app/test_job</app-path>
</workflow>
<sla:info>
<sla:nominal-time>${coord:nominalTime()}</sla:nominal-time>
<sla:should-start>${X * MINUTES}</sla:should-start>
<sla:should-end>${Y * MINUTES}</sla:should-end>
<sla:alert-contact>foo@bar.com</sla:alert-contact>
</sla:info>
</action>
</coordinator-app>
4,5
COORDINATOR.XML1
2
3
4
5
6
7
Oozie
<?xml version="1.0" ?><coordinator-app end="${COORD_END}"
frequency="${coord:days(1)}" name=”daily_job_coord" start="$
{COORD_START}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1”
xmlns:sla="uri:oozie:sla:0.1">
<action>
<workflow>
<app-path>hdfs://bar:9000/user/hadoop/oozie/app/test_job</app-path>
</workflow>
<sla:info>
<sla:nominal-time>${coord:nominalTime()}</sla:nominal-time>
<sla:should-start>${X * MINUTES}</sla:should-start>
<sla:should-end>${Y * MINUTES}</sla:should-end>
<sla:alert-contact>foo@bar.com</sla:alert-contact>
</sla:info>
</action>
</coordinator-app>
6
4,5
WORKFLOWS @
1
2
3
4
5
6
7
Use Case 1
1
2
3
4
5
6
7
USE CASE 1 – Global Data Means Global Data
Problems
Use Case 1
WORKFLOWS: RELATIONAL1
2
3
4
5
6
7
Tableau
Hive Data
Warehouse
Oozie MySQLPentaho
Analysts
EUROPE
Audit Plat
LoL
KOREA
Audit Plat
LoL
NORTH AMERICA
Audit Plat
LoL
Business
Analyst
Use Case 1
WORKFLOWS: RELATIONAL1
2
3
4
5
6
7
Tableau
Hive Data
Warehouse
Oozie MySQLPentaho
Analysts
EUROPE
Audit Plat
LoL
KOREA
Audit Plat
LoL
NORTH AMERICA
Audit Plat
LoL
Business
Analyst
Use Case 1
WORKFLOWS: RELATIONAL1
2
3
4
5
6
7
Hive
Final Tables provide
more descriptive
column naming and
native type conversions
REGION X
Audit Plat
LoL
Hive Staging Transform
Temp Tables map 1:1
with DB table meta
Extract
Oozie Actions
Use Case 1
WORKFLOWS: RELATIONAL1
2
3
4
5
6
7
Hive
Final Tables provide
more descriptive
column naming and
native type conversions
REGION X
Audit Plat
LoL
Hive Staging Transform
Temp Tables map 1:1
with DB table meta
Extract
Oozie Actions
1. [Java] Check the partitions for the table and pull
the latest date found.Write the key:value pair for
latest date back out to a properties file so that it
can be referenced by the rest of the workflow.
Use Case 1
WORKFLOWS: RELATIONAL1
2
3
4
5
6
7
Hive
Final Tables provide
more descriptive
column naming and
native type conversions
REGION X
Audit Plat
LoL
Hive Staging Transform
Temp Tables map 1:1
with DB table meta
Extract
Oozie Actions
2. [Sqoop] If the table is flagged as dynamically
partitioned, pull data from the table from the latest
partition (referencing the output of the Java node)
through todays date. If not, pull data just for the
current date.
Use Case 1
WORKFLOWS: RELATIONAL1
2
3
4
5
6
7
Hive
Final Tables provide
more descriptive
column naming and
native type conversions
REGION X
Audit Plat
LoL
Hive Staging Transform
Temp Tables map 1:1
with DB table meta
Extract
Oozie Actions
3. [Hive] Copy the table from the updated partitions
from the staging DB to the prod DB while also
performing column name and type conversions.
Use Case 1
WORKFLOWS: RELATIONAL1
2
3
4
5
6
7
Hive
Final Tables provide
more descriptive
column naming and
native type conversions
REGION X
Audit Plat
LoL
Hive Staging Transform
Temp Tables map 1:1
with DB table meta
Extract
Oozie Actions
4. [Java] Grab row counts for both source and Hive
across the dates pulled.Write this as well as some
other meta out to a audit DB for reporting.
Validation
Use Case 1
AUDITING1
2
3
4
5
6
7
•  We have a Tableau report pointing at the output
audit data for a rapid high level view of the health
of our ETLs
Use Case 1
SINGLE TABLE ACTION FLOW1
2
3
4
5
6
7
Initialize-
node
Sqoop-
node
Oozie-
node
Extraction
actions
Use Case 1
SINGLE TABLE ACTION FLOW1
2
3
4
5
6
7
End
Initialize-
node
Hive-node Audit-node
Sqoop-
node
Oozie-
node
Start
•  This action flow is done once per table
Extraction
actions
Transform
workflow
Use Case 1
SINGLE TABLE ACTION FLOW1
2
3
4
5
6
7
End
Initialize-
node
Hive-node Audit-node
Sqoop-
node
Oozie-
node
Start
•  This action flow is done once per table
Extraction
actions
Transform
workflow
•  The Oozie action allows us to asynchronously run the
Hive staging->prod action and the auditing action. It is
a Java action which uses the Oozie java client and
submits key:value pairs to another workflow.
Use Case 1
FULL SCHEMA WORKFLOW1
2
3
4
5
6
7
End
Start
Table 1
Extraction
actions
Use Case 1
FULL SCHEMA WORKFLOW1
2
3
4
5
6
7
End
Start
Table 1
Transform
workflow
Table 1
Extraction
actions
Use Case 1
FULL SCHEMA WORKFLOW1
2
3
4
5
6
7
End
Start
Table 1
Transform
workflow
Table 2
Extraction
actions
Table 2
Transform
workflow
Table 1
Extraction
actions
Use Case 1
FULL SCHEMA WORKFLOW1
2
3
4
5
6
7
End
Start
Table 1
Transform
workflow
Table 2
Extraction
actions
Table 2
Transform
workflow
Table N
Extraction
actions
Table N
Transform
workflow
•  We have one of
these workflows
per schema
•  Different schemas
have a different
number of tables
(e.g. range from
5-20 tables)
•  We could fork and
do each of these
table extractions in
parallel but we are
trying to limit the
I/O load we create
on the sources
Use Case 1
COORDINATORS1
2
3
4
5
6
7
Schema 1
Workflow
Schema 1 Coordinator
•  We have one coordinator per schema workflow
•  Currently coordinators are staged in groups
based on schema type.
Schema 2
Workflow
Schema 2 Coordinator
Schema N
Workflow
Schema N Coordinator
•  20+ Regions
•  5+ DBs per region
•  5-20 Tables per DB
20 * 5 * 12(avg) = ~1200 tables!
Use Case 1
IMPORTANT NUMBERS1
2
3
4
5
6
7
•  20+ Regions
•  5+ DBs per region
•  5-20 Tables per DB
20 * 5 * 12(avg) = ~1200 tables!
Use Case 1
IMPORTANT NUMBERS1
2
3
4
5
6
7
•  Not if you have a good
deployment pipeline!
Use Case 1
TOO UNWIELDY?1
2
3
4
5
6
7
Use Case 1
DEPLOYMENT STACK1
2
3
4
5
6
7
Use Case 1
DEPLOYMENT STACK: JAVA1
2
3
4
5
6
7
•  The java project compiles into the library that is used by the workflows
•  It also contains some custom functionality for interacting with the Oozie
WS endpoints / Oozie DB Tables
Use Case 1
DEPLOYMENT STACK: PYTHON1
2
3
4
5
6
7
•  The python project dynamically generates all of our workflow/coordinator
xml files. It has a multipleYML configs which hold the meta associated with
tall of the tables. It also interacts with a DB table for the various DB
connection meta.
Use Case 1
DEPLOYMENT STACK: GITHUB1
2
3
4
5
6
7
•  GitHub houses all of the Big Data group’s code bases no matter the
language.
Use Case 1
DEPLOYMENT STACK: JENKINS1
2
3
4
5
6
7
•  Jenkins polls GitHub and builds either set of artifacts (Java lib / tar
containing workflows/coordinators) whenever it detects changes. It
deploys the build artifacts to a simple mount point.
Use Case 1
DEPLOYMENT STACK: CHEF1
2
3
4
5
6
7
•  The Chef cookbook will check for the version declared for both sets of
artifacts and grab them from the mount point. It runs a shell which deploys
the deflated workflows/coordinators and mounts the jar lib file.
•  20+ Regions
•  5+ DBs per region
•  5-20 Tables per DB
20 * 5 * 12(avg) = ~1200 tables!
Use Case 1
IMPORTANT NUMBERS1
2
3
4
5
6
7
•  20+ Regions
•  5+ DBs per region
•  5-20 Tables per DB
20 * 5 * 12(avg) = ~1200 tables!
Use Case 1
IMPORTANT NUMBERS1
2
3
4
5
6
7
•  20+ Regions
•  5+ DBs per region
•  5-20 Tables per DB
20 * 5 * 12(avg) = ~1200 tables!
Use Case 1
IMPORTANT NUMBERS1
2
3
4
5
6
7
•  20+ Regions
•  5+ DBs per region
•  5-20 Tables per DB
20 * 5 * 12(avg) = ~1200 tables per day!
1 person < 5 hours a week!
Use Case 1
IMPORTANT NUMBERS1
2
3
4
5
6
7
Use Case 2
USE CASE 2 – Dashboarding Cloud Data1
2
3
4
5
6
7
Use Case 2
WORKFLOWS: NON-RELATIONAL1
2
3
4
5
6
7
DashboardHive Data
Warehouse
Honu
Analysts
Business
Analyst
Client
Mobile
WWW
Self Service
App (Workflow
and Meta)
Use Case 2
WORKFLOWS: NON-RELATIONAL1
2
3
4
5
6
7
DashboardHive Data
Warehouse
Honu
Analysts
Business
Analyst
Client
Mobile
WWW
Self Service
App (Workflow
and Meta)
WORKFLOWS: NON-RELATIONAL1
2
3
4
5
6
7
External
Queue
Amazon SQS is a
message queue we use
for asynchronous
communication
HONU SOURCE TABLES
Audit Plat
LoL
Honu
Derived
Message
Derived Tables are
filtered datasets joined
from 1 or more sources
Transform
Oozie ActionsUse Case 2
WORKFLOWS: NON-RELATIONAL1
2
3
4
5
6
7
External
Queue
Amazon SQS is a
message queue we use
for asynchronous
communication
HONU SOURCE TABLES
Audit Plat
LoL
Honu
Derived
Message
Derived Tables are
filtered datasets joined
from 1 or more sources
Transform
Oozie Actions
1. [Java] Check that the required partitions for the
derived query exist and contain data. Send a
message to an SNS endpoint if a partition exists but
contains no rows.
Use Case 2
WORKFLOWS: NON-RELATIONAL1
2
3
4
5
6
7
External
Queue
Amazon SQS is a
message queue we use
for asynchronous
communication
HONU SOURCE TABLES
Audit Plat
LoL
Honu
Derived
Message
Derived Tables are
filtered datasets joined
from 1 or more sources
Transform
Oozie Actions
2. [Hive] Perform the table transformation query on
the selected partition(s).This query can filter any
subset of source columns and join any number of
source tables.
Use Case 2
WORKFLOWS: NON-RELATIONAL1
2
3
4
5
6
7
External
Queue
Amazon SQS is a
message queue we use
for asynchronous
communication
HONU SOURCE TABLES
Audit Plat
LoL
Honu
Derived
Message
Derived Tables are
filtered datasets joined
from 1 or more sources
Transform
Oozie Actions
3. [Java] Send an SQS message to an external queue
based on the consumer type. Consumers will pull
from these queues regularly and update the various
dashboards artifacts.
Use Case 2
WORKFLOWS: NON-RELATIONAL1
2
3
4
5
6
7
•  End result is that our dashboards get updated
either hourly or daily depending on the
workflow
Use Case 2
LESSONS
1
2
3
4
5
6
7
LESSONS
LESSON #1
Distros andVersioning
•  If you choose to go with a distro for
your Hadoop stack, be extremely
vigilant about upgrading to the latest
versions whenever possible.You will
receive a lot more community support
and a lot less headaches if you are not
running into bugs that were patched in
trunk over a year ago!
1
2
3
4
5
6
7
LESSONS
LESSON #2
Solidify Deployment
•  The usefulness of Oozie can degrade as
complexity creeps into your pipeline. If
you do not work towards an
automated deployment pipeline at the
early stages of your development, you
will quickly find maintenance costs
rising significantly over time.
1
2
3
4
5
6
7
LESSONS
LESSON #3
Extend Capabilities
•  Don’t feel limited to using tools based
on the supplied APIs. Feel free to
implement harnesses that extend
capabilities and submit them back to
the community – we will welcome it
with open arms J
1
2
3
4
5
6
7
LESSONS
LESSON #4
Ask for Help!
•  Oozie is an open source project and is
getting new members/organizations
everyday. Don’t spend multiple hours
trying to solve an issue that many of us
have already worked through.
•  There is also a large amount of
documentation both in the wikis AND
archived listserv responses – leverage
them both!
1
2
3
4
5
6
7
THE FUTURE
1
2
3
4
5
6
7
1
2
3
4
5
6
7
CONTINUE INCREASINGVELOCITY
THE
FUTURE
June 2012 July 2013
MySQL tables 180 1200
Pipeline Events/day 0 7+ Billion
Workflows Cronjob + Pentaho Oozie
Environment Datacenter DC + AWS
SLA 1 day 2 hours
Event tracking •  2+ weeks (DB
update)
•  Dependencies: DBA
teams + ETL teams +
Tools teams
•  Downtime (3h min.)
•  10 minutes
•  Self-Service
•  No downtime
OUR IMMEDIATE GOALS1
2
3
4
5
6
7 THE
FUTURE
•  Improve Self-service workflow & tooling
•  Realtime event aggregation
•  Global Data Infrastructure
•  Replace legacy audit/event logging services
CHALLENGE: MAKE IT GLOBAL
•  Data centers across the globe since latency has huge effect on
gameplay à log data scattered around the world
•  Large presence in Asia -- some areas (e.g., PH) have bandwidth
challenges or bandwidth is expensive
1
2
3
4
5
6
7 THE
FUTURE
CHALLENGE: WE HAVE BIG DATA
+  chat logs
+  detailed gameplay event tracking
+  so on….
1
2
3
4
5
6
7
500G DAILY
STRUCTURED DATA
> 7PB
GAME EVENT DATA
3MM SUBSCRIBERS
448+ MMVIEWS
RIOTYOUTUBE CHANNEL
THE
FUTURE
OUR AUDACIOUS GOALS
Have deep, real-time understanding of our systems
from player experience and operational standpoints
1
2
3
4
5
6
7
Have ability to identify, understand and react to
meaningful trends in real time
Build a world-class data and analytics organization
•  Deeply understand players across the globe
•  Apply that understanding to improve games for players
•  Deeply understand our entire ecosystem, including social media
THE
FUTURE
SHAMELESS HIRING PLUG1
2
3
4
5
6
7 THE
FUTURE
Like most everybody else at this conference… we’re hiring!
PLAYER EXPERIENCE FIRST
CHALLENGE CONVENTION
FOCUS ON TALENT AND TEAM
TAKE PLAY SERIOUSLY
STAY HUNGRY, STAY HUMBLE
THE RIOT
MANIFESTO
SHAMELESS HIRING PLUG1
2
3
4
5
6
7
And yes, you can play games at work.
It’s encouraged!
THE
FUTURE
MATT GOEKE
mgoeke@riotgames.com
THANK YOU! QUESTIONS?

More Related Content

What's hot

Apache Oozie
Apache OozieApache Oozie
Apache Oozie
Shalish VJ
 
Oozie &amp; sqoop by pradeep
Oozie &amp; sqoop by pradeepOozie &amp; sqoop by pradeep
Oozie &amp; sqoop by pradeep
Pradeep Pandey
 
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopMay 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
Yahoo Developer Network
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
ShareThis
 
October 2014 HUG : Oozie HA
October 2014 HUG : Oozie HAOctober 2014 HUG : Oozie HA
October 2014 HUG : Oozie HA
Yahoo Developer Network
 
Big Data at Riot Games
Big Data at Riot GamesBig Data at Riot Games
Big Data at Riot Games
DataWorks Summit
 
Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...
All Things Open
 
DevOps for DBAs
DevOps for DBAsDevOps for DBAs
DevOps for DBAs
Bjoern Rost
 
Avoid boring work_v2
Avoid boring work_v2Avoid boring work_v2
Avoid boring work_v2
Marcin Przepiórowski
 
Making Oracle Services work
Making Oracle Services workMaking Oracle Services work
Making Oracle Services work
Bjoern Rost
 
Reactive Jersey Client
Reactive Jersey ClientReactive Jersey Client
Reactive Jersey Client
Michal Gajdos
 
SQL in the Hybrid World
SQL in the Hybrid WorldSQL in the Hybrid World
SQL in the Hybrid World
Tanel Poder
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
DataWorks Summit
 
Hitchhiker's Guide to free Oracle tuning tools
Hitchhiker's Guide to free Oracle tuning toolsHitchhiker's Guide to free Oracle tuning tools
Hitchhiker's Guide to free Oracle tuning tools
Bjoern Rost
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under control
Marcin Przepiórowski
 
03 integrate webapisignalr
03 integrate webapisignalr03 integrate webapisignalr
03 integrate webapisignalr
Erhwen Kuo
 
Awr doag
Awr doagAwr doag
Oracle SQL tuning with SQL Plan Management
Oracle SQL tuning with SQL Plan ManagementOracle SQL tuning with SQL Plan Management
Oracle SQL tuning with SQL Plan Management
Bjoern Rost
 
06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearch
Erhwen Kuo
 

What's hot (20)

Apache Oozie
Apache OozieApache Oozie
Apache Oozie
 
Oozie &amp; sqoop by pradeep
Oozie &amp; sqoop by pradeepOozie &amp; sqoop by pradeep
Oozie &amp; sqoop by pradeep
 
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopMay 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Advanced Oozie
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
 
October 2014 HUG : Oozie HA
October 2014 HUG : Oozie HAOctober 2014 HUG : Oozie HA
October 2014 HUG : Oozie HA
 
Big Data at Riot Games
Big Data at Riot GamesBig Data at Riot Games
Big Data at Riot Games
 
Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...
 
DevOps for DBAs
DevOps for DBAsDevOps for DBAs
DevOps for DBAs
 
Avoid boring work_v2
Avoid boring work_v2Avoid boring work_v2
Avoid boring work_v2
 
Making Oracle Services work
Making Oracle Services workMaking Oracle Services work
Making Oracle Services work
 
Reactive Jersey Client
Reactive Jersey ClientReactive Jersey Client
Reactive Jersey Client
 
SQL in the Hybrid World
SQL in the Hybrid WorldSQL in the Hybrid World
SQL in the Hybrid World
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
 
Hitchhiker's Guide to free Oracle tuning tools
Hitchhiker's Guide to free Oracle tuning toolsHitchhiker's Guide to free Oracle tuning tools
Hitchhiker's Guide to free Oracle tuning tools
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under control
 
03 integrate webapisignalr
03 integrate webapisignalr03 integrate webapisignalr
03 integrate webapisignalr
 
Awr doag
Awr doagAwr doag
Awr doag
 
Oracle SQL tuning with SQL Plan Management
Oracle SQL tuning with SQL Plan ManagementOracle SQL tuning with SQL Plan Management
Oracle SQL tuning with SQL Plan Management
 
06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearch
 

Similar to Oozie @ Riot Games

Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
Joe Crobak
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
Madhur Nawandar
 
oozieee.pdf
oozieee.pdfoozieee.pdf
oozieee.pdf
wwww63
 
DevOps Columbus Meetup Kickoff - Infrastructure as Code
DevOps Columbus Meetup Kickoff - Infrastructure as CodeDevOps Columbus Meetup Kickoff - Infrastructure as Code
DevOps Columbus Meetup Kickoff - Infrastructure as Code
Michael Ducy
 
Case Study: Plus Retail - Moving from the Old World to the New World
Case Study: Plus Retail - Moving from the Old World to the New WorldCase Study: Plus Retail - Moving from the Old World to the New World
Case Study: Plus Retail - Moving from the Old World to the New World
ForgeRock
 
Java EE 7 Soup to Nuts at JavaOne 2014
Java EE 7 Soup to Nuts at JavaOne 2014Java EE 7 Soup to Nuts at JavaOne 2014
Java EE 7 Soup to Nuts at JavaOne 2014
Arun Gupta
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Intro to DefectDojo at OWASP Switzerland
Intro to DefectDojo at OWASP SwitzerlandIntro to DefectDojo at OWASP Switzerland
Intro to DefectDojo at OWASP Switzerland
Matt Tesauro
 
Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011
Brian Ritchie
 
Apply chat automation today - work smarter tomorrow
Apply chat automation today - work smarter tomorrowApply chat automation today - work smarter tomorrow
Apply chat automation today - work smarter tomorrow
Jaap Brasser
 
Chat automation in a Modern IT environment
Chat automation in a Modern IT environmentChat automation in a Modern IT environment
Chat automation in a Modern IT environment
Jaap Brasser
 
Chat automation in a modern it environment
Chat automation in a modern it environmentChat automation in a modern it environment
Chat automation in a modern it environment
Jaap Brasser
 
SkyBase - a Devops Platform for Hybrid Cloud
SkyBase - a Devops Platform for Hybrid CloudSkyBase - a Devops Platform for Hybrid Cloud
SkyBase - a Devops Platform for Hybrid Cloud
Vlad Kuusk
 
Apply chat automation today - work smarter tomorrow
Apply chat automation today - work smarter tomorrowApply chat automation today - work smarter tomorrow
Apply chat automation today - work smarter tomorrow
Jaap Brasser
 
SharePoint Saturday San Antonio: Workflow 2013
SharePoint Saturday San Antonio: Workflow 2013SharePoint Saturday San Antonio: Workflow 2013
SharePoint Saturday San Antonio: Workflow 2013
Sam Larko
 
Java EE 6 and NoSQL Workshop DevFest Austria
Java EE 6 and NoSQL Workshop DevFest AustriaJava EE 6 and NoSQL Workshop DevFest Austria
Java EE 6 and NoSQL Workshop DevFest Austria
Shekhar Gulati
 
2020 oct zowe quarterly webinar series
2020 oct zowe quarterly webinar series2020 oct zowe quarterly webinar series
2020 oct zowe quarterly webinar series
Open Mainframe Project
 
미들웨어 엔지니어의 클라우드 탐방기
미들웨어 엔지니어의 클라우드 탐방기미들웨어 엔지니어의 클라우드 탐방기
미들웨어 엔지니어의 클라우드 탐방기
jbugkorea
 
Intro to Spring Boot and Spring Cloud OSS - Twin Cities Cloud Foundry Meetup
Intro to Spring Boot and Spring Cloud OSS - Twin Cities Cloud Foundry MeetupIntro to Spring Boot and Spring Cloud OSS - Twin Cities Cloud Foundry Meetup
Intro to Spring Boot and Spring Cloud OSS - Twin Cities Cloud Foundry Meetup
Josh Ghiloni
 
OWASP Developer Guide Reboot
OWASP Developer Guide RebootOWASP Developer Guide Reboot
OWASP Developer Guide Reboot
Andrew van der Stock
 

Similar to Oozie @ Riot Games (20)

Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
oozieee.pdf
oozieee.pdfoozieee.pdf
oozieee.pdf
 
DevOps Columbus Meetup Kickoff - Infrastructure as Code
DevOps Columbus Meetup Kickoff - Infrastructure as CodeDevOps Columbus Meetup Kickoff - Infrastructure as Code
DevOps Columbus Meetup Kickoff - Infrastructure as Code
 
Case Study: Plus Retail - Moving from the Old World to the New World
Case Study: Plus Retail - Moving from the Old World to the New WorldCase Study: Plus Retail - Moving from the Old World to the New World
Case Study: Plus Retail - Moving from the Old World to the New World
 
Java EE 7 Soup to Nuts at JavaOne 2014
Java EE 7 Soup to Nuts at JavaOne 2014Java EE 7 Soup to Nuts at JavaOne 2014
Java EE 7 Soup to Nuts at JavaOne 2014
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
 
Intro to DefectDojo at OWASP Switzerland
Intro to DefectDojo at OWASP SwitzerlandIntro to DefectDojo at OWASP Switzerland
Intro to DefectDojo at OWASP Switzerland
 
Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011
 
Apply chat automation today - work smarter tomorrow
Apply chat automation today - work smarter tomorrowApply chat automation today - work smarter tomorrow
Apply chat automation today - work smarter tomorrow
 
Chat automation in a Modern IT environment
Chat automation in a Modern IT environmentChat automation in a Modern IT environment
Chat automation in a Modern IT environment
 
Chat automation in a modern it environment
Chat automation in a modern it environmentChat automation in a modern it environment
Chat automation in a modern it environment
 
SkyBase - a Devops Platform for Hybrid Cloud
SkyBase - a Devops Platform for Hybrid CloudSkyBase - a Devops Platform for Hybrid Cloud
SkyBase - a Devops Platform for Hybrid Cloud
 
Apply chat automation today - work smarter tomorrow
Apply chat automation today - work smarter tomorrowApply chat automation today - work smarter tomorrow
Apply chat automation today - work smarter tomorrow
 
SharePoint Saturday San Antonio: Workflow 2013
SharePoint Saturday San Antonio: Workflow 2013SharePoint Saturday San Antonio: Workflow 2013
SharePoint Saturday San Antonio: Workflow 2013
 
Java EE 6 and NoSQL Workshop DevFest Austria
Java EE 6 and NoSQL Workshop DevFest AustriaJava EE 6 and NoSQL Workshop DevFest Austria
Java EE 6 and NoSQL Workshop DevFest Austria
 
2020 oct zowe quarterly webinar series
2020 oct zowe quarterly webinar series2020 oct zowe quarterly webinar series
2020 oct zowe quarterly webinar series
 
미들웨어 엔지니어의 클라우드 탐방기
미들웨어 엔지니어의 클라우드 탐방기미들웨어 엔지니어의 클라우드 탐방기
미들웨어 엔지니어의 클라우드 탐방기
 
Intro to Spring Boot and Spring Cloud OSS - Twin Cities Cloud Foundry Meetup
Intro to Spring Boot and Spring Cloud OSS - Twin Cities Cloud Foundry MeetupIntro to Spring Boot and Spring Cloud OSS - Twin Cities Cloud Foundry Meetup
Intro to Spring Boot and Spring Cloud OSS - Twin Cities Cloud Foundry Meetup
 
OWASP Developer Guide Reboot
OWASP Developer Guide RebootOWASP Developer Guide Reboot
OWASP Developer Guide Reboot
 

Recently uploaded

System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 

Recently uploaded (20)

System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 

Oozie @ Riot Games

  • 1. RIOT GAMES SOME CATCHY STATEMENT ABOUT WORKFLOWS ANDYORDLES MATT GOEKE
  • 4. •  Previous workflow architecture •  What Oozie is •  How we incorporated Oozie – Relational Data Pipeline – Non-relational Data Pipeline •  Lessons learned •  Where we’re headed THIS PRESENTATION IS ABOUT…1 2 3 4 5 6 7 INTRO
  • 5. •  Developer and publisher of League of Legends •  Founded 2006 by gamers for gamers •  Player experience focused – Needless to say, data is pretty important to understanding the player experience! WHO is RIOT GAMES?1 2 3 4 5 6 7 INTRO
  • 10. 1 2 3 4 5 6 7 Architecture WHY WORKFLOWS? •  Retry a series of jobs in the event of failure •  Execute jobs at a specific time or when data is available •  Correctly order job execution based on resolved dependencies •  Provide a common framework for communication and execution of production process •  Use the the workflow to couple resources instead of having a monolithic code base
  • 11. 1 2 3 4 5 6 7 Architecture PREVIOUS ARCHITECTURE Tableau Hive Data Warehouse CRON + Pentaho + Custom ETL + Sqoop MySQLPentaho Analysts EUROPE Audit Plat LoL KOREA Audit Plat LoL NORTH AMERICA Audit Plat LoL Business Analyst
  • 12. 1 2 3 4 5 6 7 Architecture ISSUES WITH PREVIOUS PROCESS •  All of the ETL processes were run on one node which limited concurrency •  If our main runner execution died then the whole ETL for that day would need to be restarted •  No reporting of what was run or the configuration of the ETL without log diving on the actual machine •  No retries (outside of native MR tasks) and no good way to rerun a previous config if the underlying code has been changed
  • 13. 1 2 3 4 5 6 7 Architecture PREVIOUS ARCHITECTURE Tableau Hive Data Warehouse CRON + Pentaho + Custom ETL + Sqoop MySQLPentaho Analysts EUROPE Audit Plat LoL KOREA Audit Plat LoL NORTH AMERICA Audit Plat LoL Business Analyst
  • 14. 1 2 3 4 5 6 7 Architecture SOLUTION Tableau Hive Data Warehouse Oozie MySQLPentaho Analysts EUROPE Audit Plat LoL KOREA Audit Plat LoL NORTH AMERICA Audit Plat LoL Business Analyst
  • 16. Oozie 1 2 3 4 5 6 7 WHAT IS OOZIE? •  Oozie is a workflow scheduler system to manage Apache Hadoop jobs •  Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box as well as system specific jobs •  Oozie is a scalable, reliable and extensible system
  • 17. Oozie 1 2 3 4 5 6 7 WHY OOZIE? No need to create custom hooks for job submission NATIVE HADOOP INTEGRATION Jobs are spread against available mappers HORIZONTALLY SCALABLE The project has strong community backing and has committers from several companies OPEN SOURCE Logging and debugging is extremely quick with the web console and SQL VERBOSE REPORTING
  • 23. 1 2 3 4 5 6 7 Oozie LAYERS OF OOZIE Action (1..N) Workflow Coordinator (1..N) Bundle Bundle Coord Action WF Job MR / Pig / Java / Hive / Sqoop
  • 24. 1 2 3 4 5 6 7 Oozie LAYERS OF OOZIE Action (1..N) Workflow Coordinator (1..N) Bundle Bundle Coord Action WF Job MR / Pig / Java / Hive / Sqoop
  • 25. Oozie 1 2 3 4 5 6 7 WORKFLOW ACTION: JAVA <action name=”java-node"> <java> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <main-class>com.riotgames.MyMainClass</main-class> <java-opts>-Dfoo</java-opts> <arg>bar<arg> </java> <ok to=”next"/> <error to=”error"/> </action> •  Workflow actions are the most granular unit of work
  • 26. Oozie 1 2 3 4 5 6 7 WORKFLOW ACTION: JAVA <action name=”java-node"> <java> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <main-class>com.riotgames.MyMainClass</main-class> <java-opts>-Dfoo</java-opts> <arg>bar<arg> </java> <ok to=”next"/> <error to=”error"/> </action> 1 java-node 1 •  Workflow actions are the most granular unit of work
  • 27. Oozie 1 2 3 4 5 6 7 WORKFLOW ACTION: JAVA <action name=”java-node"> <java> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <main-class>com.riotgames.MyMainClass</main-class> <java-opts>-Dfoo</java-opts> <arg>bar<arg> </java> <ok to=”next"/> <error to=”error"/> </action> 1 2 nextjava-node 1 2 •  Workflow actions are the most granular unit of work
  • 28. Oozie 1 2 3 4 5 6 7 WORKFLOW ACTION: JAVA <action name=”java-node"> <java> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <main-class>com.riotgames.MyMainClass</main-class> <java-opts>-Dfoo</java-opts> <arg>bar<arg> </java> <ok to=”next"/> <error to=”error"/> </action> 1 2 3 nextjava-node error Error 1 2 3 •  Workflow actions are the most granular unit of work
  • 29. Oozie 1 2 3 4 5 6 7 WORKFLOW ACTION: MAPREDUCE <action name="myfirstHadoopJob"> <map-reduce> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <prepare> <delete path="hdfs://foo:9000/usr/foo/output-data"/> </prepare> <job-xml>/myfirstjob.xml</job-xml> <configuration> <property> <name>mapred.input.dir</name> <value>/usr/foo/input-data</value> </property> <property> <name>mapred.output.dir</name> <value>/usr/foo/input-data</value> </property> <property> <name>mapred.reduce.tasks</name> <value>${firstJobReducers}</value> </property> </configuration> </map-reduce> <ok to="myNextAction"/> <error to="errorCleanup"/> </action>
  • 30. Oozie 1 2 3 4 5 6 7 WORKFLOW ACTION: MAPREDUCE <action name="myfirstHadoopJob"> <map-reduce> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <prepare> <delete path="hdfs://foo:9000/usr/foo/output-data"/> </prepare> <job-xml>/myfirstjob.xml</job-xml> <configuration> <property> <name>mapred.input.dir</name> <value>/usr/foo/input-data</value> </property> <property> <name>mapred.output.dir</name> <value>/usr/foo/input-data</value> </property> <property> <name>mapred.reduce.tasks</name> <value>${firstJobReducers}</value> </property> </configuration> </map-reduce> <ok to="myNextAction"/> <error to="errorCleanup"/> </action> •  Each action has a type and each type has defined set of key:values that can be used to configure it
  • 31. Oozie 1 2 3 4 5 6 7 WORKFLOW ACTION: MAPREDUCE <action name="myfirstHadoopJob"> <map-reduce> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <prepare> <delete path="hdfs://foo:9000/usr/foo/output-data"/> </prepare> <job-xml>/myfirstjob.xml</job-xml> <configuration> <property> <name>mapred.input.dir</name> <value>/usr/foo/input-data</value> </property> <property> <name>mapred.output.dir</name> <value>/usr/foo/input-data</value> </property> <property> <name>mapred.reduce.tasks</name> <value>${firstJobReducers}</value> </property> </configuration> </map-reduce> <ok to="myNextAction"/> <error to="errorCleanup"/> </action> •  Each action has a type and each type has defined set of key:values that can be used to configure it The action must also specify which actions to direct to based on success or failure
  • 32. 1 2 3 4 5 6 7 Oozie LAYERS OF OOZIE Action (1..N) Workflow Coordinator (1..N) Bundle Bundle Coord Action WF Job MR / Pig / Java / Hive / Sqoop
  • 33. 1 2 3 4 5 6 7 Oozie THE WORKFLOW ENGINE Start End fork joinMapReduce Java Sqoop Hive HDFS Shell decision •  Oozie runs workflows in the form of DAGs (directed acyclical graphs) •  Each element in this workflow is an action •  Some node types are processed internally to Oozie vs farmed to the cluster
  • 34. 1 2 3 4 5 6 7 Oozie WORKFLOW EXAMPLE <workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1"> <start to=‘java-node’/> <action name=”java-node"> ... </action> <end name=‘end’/> <kill name=‘fail’/> </workflow-app> •  This workflow will run the action defined as java-node
  • 35. 1 2 3 4 5 6 7 Oozie WORKFLOW EXAMPLE <workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1"> <start to=‘java-node’/> <action name=”java-node"> ... </action> <end name=‘end’/> <kill name=‘fail’/> </workflow-app> start java-node •  This workflow will run the action defined as java-node 1 1
  • 36. 1 2 3 4 5 6 7 Oozie WORKFLOW EXAMPLE <workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1"> <start to=‘java-node’/> <action name=”java-node"> ... </action> <end name=‘end’/> <kill name=‘fail’/> </workflow-app> start endjava-node •  This workflow will run the action defined as java-node 1 2 1 2
  • 37. 1 2 3 4 5 6 7 Oozie WORKFLOW EXAMPLE <workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1"> <start to=‘java-node’/> <action name=”java-node"> ... </action> <end name=‘end’/> <kill name=‘fail’/> </workflow-app> start endjava-node fail Error •  This workflow will run the action defined as java-node 1 2 3 1 2 3
  • 38. 1 2 3 4 5 6 7 Oozie LAYERS OF OOZIE Action (1..N) Workflow Coordinator (1..N) Bundle Bundle Coord Action WF Job MR / Pig / Java / Hive / Sqoop
  • 39. 1 2 3 4 5 6 7 Oozie COORDINATOR •  Oozie coordinators can execute workflows based on time and data dependencies •  Each coordinator is specified a workflow to execute upon meeting its trigger criteria •  Coordinators can pass variables to the workflow layer allowing for dynamic resolution Client Oozie Coordinator Oozie Workflow Oozie Server Hadoop
  • 40. 1 2 3 4 5 6 7 Oozie EXAMPLE COORDINATOR <?xml version="1.0" ?><coordinator-app end="${COORD_END}" frequency="${coord:hours(1)}" name="test_job_coord" start="$ {COORD_START}" timezone="UTC" xmlns="uri:oozie:coordinator: 0.1"> <action> <workflow> <app-path>hdfs://bar:9000/user/hadoop/oozie/app/test_job</ app-path> </workflow> </action> </coordinator-app> •  This coordinator will run every hour and invoke the workflow found in the / test_job folder
  • 41. 1 2 3 4 5 6 7 Oozie EXAMPLE COORDINATOR <?xml version="1.0" ?><coordinator-app end="${COORD_END}" frequency="${coord:hours(1)}" name="test_job_coord" start="$ {COORD_START}" timezone="UTC" xmlns="uri:oozie:coordinator: 0.1"> <action> <workflow> <app-path>hdfs://bar:9000/user/hadoop/oozie/app/test_job</ app-path> </workflow> </action> </coordinator-app> •  This coordinator will run every hour and invoke the workflow found in the / test_job folder
  • 42. 1 2 3 4 5 6 7 Oozie EXAMPLE COORDINATOR <?xml version="1.0" ?><coordinator-app end="${COORD_END}" frequency="${coord:hours(1)}" name="test_job_coord" start="$ {COORD_START}" timezone="UTC" xmlns="uri:oozie:coordinator: 0.1"> <action> <workflow> <app-path>hdfs://bar:9000/user/hadoop/oozie/app/test_job</ app-path> </workflow> </action> </coordinator-app> •  This coordinator will run every hour and invoke the workflow found in the / test_job folder
  • 43. 1 2 3 4 5 6 7 Oozie LAYERS OF OOZIE Action (1..N) Workflow Coordinator (1..N) Bundle Bundle Coord Action WF Job MR / Pig / Java / Hive / Sqoop
  • 44. Oozie 1 2 3 4 5 6 7 BUNDLE Client Oozie Coordinator Oozie Workflow Oozie Server Hadoop Oozie Coordinator Oozie Workflow Oozie Bundle •  Bundles are higher level abstractions that will batch a set of coordinators together. •  There is no explicit dependency between coordinators within a bundle but it can be used to more formally define a data pipeline
  • 45. 1 2 3 4 5 6 7 Oozie THE INTERFACE Multiple ways to interact with Oozie: •  Web Console (read only) •  CLI •  Java client •  Web Service Endpoints •  Directly with the DB using SQL The Java client / CLI are just an abstraction for the web service endpoints and it is easy to extend this functionality in your own apps.
  • 46. 1 2 3 4 5 6 7 Oozie PIECES OF A DEPLOYABLE The list of components that are needed for a scheduled workflow: •  Coordinator.xml Contains the scheduler definition and path to workflow.xml •  Workflow.xml Contains the job definition •  Libraries Optional jar files •  Properties file (also possible through WS call) Initial parameterization and mandatory specification of coordinator path
  • 47. 1 2 3 4 5 6 7 Oozie JOB.PROPERTIES NAME_NODE=hdfs://foo:9000 JOB_TRACKER=bar:9001 oozie.libpath=${NAME_NODE}/user/hadoop/oozie/ share/lib oozie.coord.application.path=${NAME_NODE}/user/ hadoop/oozie/app/test_job Important note: •  Any variable put into the job.properties will be inherited by the coordinator / workflow •  E.g. Given the key:value workflow_name=test_job you can access it using ${workflow_name}
  • 48. 1 2 3 4 5 6 7 Oozie COORDINATOR SUBMISSION •  Deploy the workflow and coordinator to HDFS $ hadoop fs –put test_job oozie/app/ •  Submit and run the workflow job $ oozie job -run -config job.properties •  Check the coordinator status on the web console
  • 52. WEB CONSOLE: JOB DETAILS1 2 3 4 5 6 7 Oozie
  • 53. WEB CONSOLE: JOB DAG1 2 3 4 5 6 7 Oozie
  • 54. WEB CONSOLE: JOB DETAILS1 2 3 4 5 6 7 Oozie
  • 57. A USE CASE: HOURLY JOBS1 2 3 4 5 6 7 Oozie Replace a current CRON job that runs a bash script once a day (6): •  The shell will execute a Java main which pulls data from a filestream (1), dumps it to HDFS and then runs a MapReduce job on the files (2). It will then email a person when the report is done (3). •  It should start within X amount of time (4) •  It should complete withinY amount of time (5) •  It should retry Z times on failure (automatic)
  • 58. WORKFLOW.XML1 2 3 4 5 6 7 Oozie <workflow-app name=“filestream_wf" xmlns="uri:oozie:workflow:0.1"> <start to=‘java-node’/> <action name=”java-node"> <java> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <main-class>org.foo.bar.PullFileStream</main-class> <arg>argument1</arg> </java> <ok to=”mr-node"/> <error to=”fail"/> </action> <action name=“mr-node”> <map-reduce> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <configuration> ... </configuration> </map-reduce> <ok to=”email-node"/> <error to=”fail"/> </action> ... ... <action name=”email-node"> <email xmlns="uri:oozie:email-action:0.1"> <to>customer@foo.bar</to> <cc>employee@foo.bar</cc> <subject>Email notification</subject> <body>The wf completed</body> </email> <ok to="myotherjob"/> <error to="errorcleanup"/> </action> <end name=‘end’/> <kill name=‘fail’/> </workflow-app>
  • 59. WORKFLOW.XML1 2 3 4 5 6 7 Oozie <workflow-app name=“filestream_wf" xmlns="uri:oozie:workflow:0.1"> <start to=‘java-node’/> <action name=”java-node"> <java> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <main-class>org.foo.bar.PullFileStream</main-class> <arg>argument1</arg> </java> <ok to=”mr-node"/> <error to=”fail"/> </action> <action name=“mr-node”> <map-reduce> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <configuration> ... </configuration> </map-reduce> <ok to=”email-node"/> <error to=”fail"/> </action> ... ... <action name=”email-node"> <email xmlns="uri:oozie:email-action:0.1"> <to>customer@foo.bar</to> <cc>employee@foo.bar</cc> <subject>Email notification</subject> <body>The wf completed</body> </email> <ok to="myotherjob"/> <error to="errorcleanup"/> </action> <end name=‘end’/> <kill name=‘fail’/> </workflow-app> 1
  • 60. WORKFLOW.XML1 2 3 4 5 6 7 Oozie <workflow-app name=“filestream_wf" xmlns="uri:oozie:workflow:0.1"> <start to=‘java-node’/> <action name=”java-node"> <java> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <main-class>org.foo.bar.PullFileStream</main-class> <arg>argument1</arg> </java> <ok to=”mr-node"/> <error to=”fail"/> </action> <action name=“mr-node”> <map-reduce> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <configuration> ... </configuration> </map-reduce> <ok to=”email-node"/> <error to=”fail"/> </action> ... ... <action name=”email-node"> <email xmlns="uri:oozie:email-action:0.1"> <to>customer@foo.bar</to> <cc>employee@foo.bar</cc> <subject>Email notification</subject> <body>The wf completed</body> </email> <ok to="myotherjob"/> <error to="errorcleanup"/> </action> <end name=‘end’/> <kill name=‘fail’/> </workflow-app> 1 2
  • 61. WORKFLOW.XML1 2 3 4 5 6 7 Oozie <workflow-app name=“filestream_wf" xmlns="uri:oozie:workflow:0.1"> <start to=‘java-node’/> <action name=”java-node"> <java> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <main-class>org.foo.bar.PullFileStream</main-class> <arg>argument1</arg> </java> <ok to=”mr-node"/> <error to=”fail"/> </action> <action name=“mr-node”> <map-reduce> <job-tracker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <configuration> ... </configuration> </map-reduce> <ok to=”email-node"/> <error to=”fail"/> </action> ... ... <action name=”email-node"> <email xmlns="uri:oozie:email-action:0.1"> <to>customer@foo.bar</to> <cc>employee@foo.bar</cc> <subject>Email notification</subject> <body>The wf completed</body> </email> <ok to="myotherjob"/> <error to="errorcleanup"/> </action> <end name=‘end’/> <kill name=‘fail’/> </workflow-app> 1 2 3
  • 62. COORDINATOR.XML1 2 3 4 5 6 7 Oozie <?xml version="1.0" ?><coordinator-app end="${COORD_END}" frequency="${coord:days(1)}" name=”daily_job_coord" start="$ {COORD_START}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1” xmlns:sla="uri:oozie:sla:0.1"> <action> <workflow> <app-path>hdfs://bar:9000/user/hadoop/oozie/app/test_job</app-path> </workflow> <sla:info> <sla:nominal-time>${coord:nominalTime()}</sla:nominal-time> <sla:should-start>${X * MINUTES}</sla:should-start> <sla:should-end>${Y * MINUTES}</sla:should-end> <sla:alert-contact>foo@bar.com</sla:alert-contact> </sla:info> </action> </coordinator-app>
  • 63. COORDINATOR.XML1 2 3 4 5 6 7 Oozie <?xml version="1.0" ?><coordinator-app end="${COORD_END}" frequency="${coord:days(1)}" name=”daily_job_coord" start="$ {COORD_START}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1” xmlns:sla="uri:oozie:sla:0.1"> <action> <workflow> <app-path>hdfs://bar:9000/user/hadoop/oozie/app/test_job</app-path> </workflow> <sla:info> <sla:nominal-time>${coord:nominalTime()}</sla:nominal-time> <sla:should-start>${X * MINUTES}</sla:should-start> <sla:should-end>${Y * MINUTES}</sla:should-end> <sla:alert-contact>foo@bar.com</sla:alert-contact> </sla:info> </action> </coordinator-app> 4,5
  • 64. COORDINATOR.XML1 2 3 4 5 6 7 Oozie <?xml version="1.0" ?><coordinator-app end="${COORD_END}" frequency="${coord:days(1)}" name=”daily_job_coord" start="$ {COORD_START}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1” xmlns:sla="uri:oozie:sla:0.1"> <action> <workflow> <app-path>hdfs://bar:9000/user/hadoop/oozie/app/test_job</app-path> </workflow> <sla:info> <sla:nominal-time>${coord:nominalTime()}</sla:nominal-time> <sla:should-start>${X * MINUTES}</sla:should-start> <sla:should-end>${Y * MINUTES}</sla:should-end> <sla:alert-contact>foo@bar.com</sla:alert-contact> </sla:info> </action> </coordinator-app> 6 4,5
  • 66. Use Case 1 1 2 3 4 5 6 7 USE CASE 1 – Global Data Means Global Data Problems
  • 67. Use Case 1 WORKFLOWS: RELATIONAL1 2 3 4 5 6 7 Tableau Hive Data Warehouse Oozie MySQLPentaho Analysts EUROPE Audit Plat LoL KOREA Audit Plat LoL NORTH AMERICA Audit Plat LoL Business Analyst
  • 68. Use Case 1 WORKFLOWS: RELATIONAL1 2 3 4 5 6 7 Tableau Hive Data Warehouse Oozie MySQLPentaho Analysts EUROPE Audit Plat LoL KOREA Audit Plat LoL NORTH AMERICA Audit Plat LoL Business Analyst
  • 69. Use Case 1 WORKFLOWS: RELATIONAL1 2 3 4 5 6 7 Hive Final Tables provide more descriptive column naming and native type conversions REGION X Audit Plat LoL Hive Staging Transform Temp Tables map 1:1 with DB table meta Extract Oozie Actions
  • 70. Use Case 1 WORKFLOWS: RELATIONAL1 2 3 4 5 6 7 Hive Final Tables provide more descriptive column naming and native type conversions REGION X Audit Plat LoL Hive Staging Transform Temp Tables map 1:1 with DB table meta Extract Oozie Actions 1. [Java] Check the partitions for the table and pull the latest date found.Write the key:value pair for latest date back out to a properties file so that it can be referenced by the rest of the workflow.
  • 71. Use Case 1 WORKFLOWS: RELATIONAL1 2 3 4 5 6 7 Hive Final Tables provide more descriptive column naming and native type conversions REGION X Audit Plat LoL Hive Staging Transform Temp Tables map 1:1 with DB table meta Extract Oozie Actions 2. [Sqoop] If the table is flagged as dynamically partitioned, pull data from the table from the latest partition (referencing the output of the Java node) through todays date. If not, pull data just for the current date.
  • 72. Use Case 1 WORKFLOWS: RELATIONAL1 2 3 4 5 6 7 Hive Final Tables provide more descriptive column naming and native type conversions REGION X Audit Plat LoL Hive Staging Transform Temp Tables map 1:1 with DB table meta Extract Oozie Actions 3. [Hive] Copy the table from the updated partitions from the staging DB to the prod DB while also performing column name and type conversions.
  • 73. Use Case 1 WORKFLOWS: RELATIONAL1 2 3 4 5 6 7 Hive Final Tables provide more descriptive column naming and native type conversions REGION X Audit Plat LoL Hive Staging Transform Temp Tables map 1:1 with DB table meta Extract Oozie Actions 4. [Java] Grab row counts for both source and Hive across the dates pulled.Write this as well as some other meta out to a audit DB for reporting. Validation
  • 74. Use Case 1 AUDITING1 2 3 4 5 6 7 •  We have a Tableau report pointing at the output audit data for a rapid high level view of the health of our ETLs
  • 75. Use Case 1 SINGLE TABLE ACTION FLOW1 2 3 4 5 6 7 Initialize- node Sqoop- node Oozie- node Extraction actions
  • 76. Use Case 1 SINGLE TABLE ACTION FLOW1 2 3 4 5 6 7 End Initialize- node Hive-node Audit-node Sqoop- node Oozie- node Start •  This action flow is done once per table Extraction actions Transform workflow
  • 77. Use Case 1 SINGLE TABLE ACTION FLOW1 2 3 4 5 6 7 End Initialize- node Hive-node Audit-node Sqoop- node Oozie- node Start •  This action flow is done once per table Extraction actions Transform workflow •  The Oozie action allows us to asynchronously run the Hive staging->prod action and the auditing action. It is a Java action which uses the Oozie java client and submits key:value pairs to another workflow.
  • 78. Use Case 1 FULL SCHEMA WORKFLOW1 2 3 4 5 6 7 End Start
  • 79. Table 1 Extraction actions Use Case 1 FULL SCHEMA WORKFLOW1 2 3 4 5 6 7 End Start Table 1 Transform workflow
  • 80. Table 1 Extraction actions Use Case 1 FULL SCHEMA WORKFLOW1 2 3 4 5 6 7 End Start Table 1 Transform workflow Table 2 Extraction actions Table 2 Transform workflow
  • 81. Table 1 Extraction actions Use Case 1 FULL SCHEMA WORKFLOW1 2 3 4 5 6 7 End Start Table 1 Transform workflow Table 2 Extraction actions Table 2 Transform workflow Table N Extraction actions Table N Transform workflow •  We have one of these workflows per schema •  Different schemas have a different number of tables (e.g. range from 5-20 tables) •  We could fork and do each of these table extractions in parallel but we are trying to limit the I/O load we create on the sources
  • 82. Use Case 1 COORDINATORS1 2 3 4 5 6 7 Schema 1 Workflow Schema 1 Coordinator •  We have one coordinator per schema workflow •  Currently coordinators are staged in groups based on schema type. Schema 2 Workflow Schema 2 Coordinator Schema N Workflow Schema N Coordinator
  • 83. •  20+ Regions •  5+ DBs per region •  5-20 Tables per DB 20 * 5 * 12(avg) = ~1200 tables! Use Case 1 IMPORTANT NUMBERS1 2 3 4 5 6 7
  • 84. •  20+ Regions •  5+ DBs per region •  5-20 Tables per DB 20 * 5 * 12(avg) = ~1200 tables! Use Case 1 IMPORTANT NUMBERS1 2 3 4 5 6 7
  • 85. •  Not if you have a good deployment pipeline! Use Case 1 TOO UNWIELDY?1 2 3 4 5 6 7
  • 86. Use Case 1 DEPLOYMENT STACK1 2 3 4 5 6 7
  • 87. Use Case 1 DEPLOYMENT STACK: JAVA1 2 3 4 5 6 7 •  The java project compiles into the library that is used by the workflows •  It also contains some custom functionality for interacting with the Oozie WS endpoints / Oozie DB Tables
  • 88. Use Case 1 DEPLOYMENT STACK: PYTHON1 2 3 4 5 6 7 •  The python project dynamically generates all of our workflow/coordinator xml files. It has a multipleYML configs which hold the meta associated with tall of the tables. It also interacts with a DB table for the various DB connection meta.
  • 89. Use Case 1 DEPLOYMENT STACK: GITHUB1 2 3 4 5 6 7 •  GitHub houses all of the Big Data group’s code bases no matter the language.
  • 90. Use Case 1 DEPLOYMENT STACK: JENKINS1 2 3 4 5 6 7 •  Jenkins polls GitHub and builds either set of artifacts (Java lib / tar containing workflows/coordinators) whenever it detects changes. It deploys the build artifacts to a simple mount point.
  • 91. Use Case 1 DEPLOYMENT STACK: CHEF1 2 3 4 5 6 7 •  The Chef cookbook will check for the version declared for both sets of artifacts and grab them from the mount point. It runs a shell which deploys the deflated workflows/coordinators and mounts the jar lib file.
  • 92. •  20+ Regions •  5+ DBs per region •  5-20 Tables per DB 20 * 5 * 12(avg) = ~1200 tables! Use Case 1 IMPORTANT NUMBERS1 2 3 4 5 6 7
  • 93. •  20+ Regions •  5+ DBs per region •  5-20 Tables per DB 20 * 5 * 12(avg) = ~1200 tables! Use Case 1 IMPORTANT NUMBERS1 2 3 4 5 6 7
  • 94. •  20+ Regions •  5+ DBs per region •  5-20 Tables per DB 20 * 5 * 12(avg) = ~1200 tables! Use Case 1 IMPORTANT NUMBERS1 2 3 4 5 6 7
  • 95. •  20+ Regions •  5+ DBs per region •  5-20 Tables per DB 20 * 5 * 12(avg) = ~1200 tables per day! 1 person < 5 hours a week! Use Case 1 IMPORTANT NUMBERS1 2 3 4 5 6 7
  • 96. Use Case 2 USE CASE 2 – Dashboarding Cloud Data1 2 3 4 5 6 7
  • 97. Use Case 2 WORKFLOWS: NON-RELATIONAL1 2 3 4 5 6 7 DashboardHive Data Warehouse Honu Analysts Business Analyst Client Mobile WWW Self Service App (Workflow and Meta)
  • 98. Use Case 2 WORKFLOWS: NON-RELATIONAL1 2 3 4 5 6 7 DashboardHive Data Warehouse Honu Analysts Business Analyst Client Mobile WWW Self Service App (Workflow and Meta)
  • 99. WORKFLOWS: NON-RELATIONAL1 2 3 4 5 6 7 External Queue Amazon SQS is a message queue we use for asynchronous communication HONU SOURCE TABLES Audit Plat LoL Honu Derived Message Derived Tables are filtered datasets joined from 1 or more sources Transform Oozie ActionsUse Case 2
  • 100. WORKFLOWS: NON-RELATIONAL1 2 3 4 5 6 7 External Queue Amazon SQS is a message queue we use for asynchronous communication HONU SOURCE TABLES Audit Plat LoL Honu Derived Message Derived Tables are filtered datasets joined from 1 or more sources Transform Oozie Actions 1. [Java] Check that the required partitions for the derived query exist and contain data. Send a message to an SNS endpoint if a partition exists but contains no rows. Use Case 2
  • 101. WORKFLOWS: NON-RELATIONAL1 2 3 4 5 6 7 External Queue Amazon SQS is a message queue we use for asynchronous communication HONU SOURCE TABLES Audit Plat LoL Honu Derived Message Derived Tables are filtered datasets joined from 1 or more sources Transform Oozie Actions 2. [Hive] Perform the table transformation query on the selected partition(s).This query can filter any subset of source columns and join any number of source tables. Use Case 2
  • 102. WORKFLOWS: NON-RELATIONAL1 2 3 4 5 6 7 External Queue Amazon SQS is a message queue we use for asynchronous communication HONU SOURCE TABLES Audit Plat LoL Honu Derived Message Derived Tables are filtered datasets joined from 1 or more sources Transform Oozie Actions 3. [Java] Send an SQS message to an external queue based on the consumer type. Consumers will pull from these queues regularly and update the various dashboards artifacts. Use Case 2
  • 103. WORKFLOWS: NON-RELATIONAL1 2 3 4 5 6 7 •  End result is that our dashboards get updated either hourly or daily depending on the workflow Use Case 2
  • 105. LESSONS LESSON #1 Distros andVersioning •  If you choose to go with a distro for your Hadoop stack, be extremely vigilant about upgrading to the latest versions whenever possible.You will receive a lot more community support and a lot less headaches if you are not running into bugs that were patched in trunk over a year ago! 1 2 3 4 5 6 7
  • 106. LESSONS LESSON #2 Solidify Deployment •  The usefulness of Oozie can degrade as complexity creeps into your pipeline. If you do not work towards an automated deployment pipeline at the early stages of your development, you will quickly find maintenance costs rising significantly over time. 1 2 3 4 5 6 7
  • 107. LESSONS LESSON #3 Extend Capabilities •  Don’t feel limited to using tools based on the supplied APIs. Feel free to implement harnesses that extend capabilities and submit them back to the community – we will welcome it with open arms J 1 2 3 4 5 6 7
  • 108. LESSONS LESSON #4 Ask for Help! •  Oozie is an open source project and is getting new members/organizations everyday. Don’t spend multiple hours trying to solve an issue that many of us have already worked through. •  There is also a large amount of documentation both in the wikis AND archived listserv responses – leverage them both! 1 2 3 4 5 6 7
  • 110. 1 2 3 4 5 6 7 CONTINUE INCREASINGVELOCITY THE FUTURE June 2012 July 2013 MySQL tables 180 1200 Pipeline Events/day 0 7+ Billion Workflows Cronjob + Pentaho Oozie Environment Datacenter DC + AWS SLA 1 day 2 hours Event tracking •  2+ weeks (DB update) •  Dependencies: DBA teams + ETL teams + Tools teams •  Downtime (3h min.) •  10 minutes •  Self-Service •  No downtime
  • 111. OUR IMMEDIATE GOALS1 2 3 4 5 6 7 THE FUTURE •  Improve Self-service workflow & tooling •  Realtime event aggregation •  Global Data Infrastructure •  Replace legacy audit/event logging services
  • 112. CHALLENGE: MAKE IT GLOBAL •  Data centers across the globe since latency has huge effect on gameplay à log data scattered around the world •  Large presence in Asia -- some areas (e.g., PH) have bandwidth challenges or bandwidth is expensive 1 2 3 4 5 6 7 THE FUTURE
  • 113. CHALLENGE: WE HAVE BIG DATA +  chat logs +  detailed gameplay event tracking +  so on…. 1 2 3 4 5 6 7 500G DAILY STRUCTURED DATA > 7PB GAME EVENT DATA 3MM SUBSCRIBERS 448+ MMVIEWS RIOTYOUTUBE CHANNEL THE FUTURE
  • 114. OUR AUDACIOUS GOALS Have deep, real-time understanding of our systems from player experience and operational standpoints 1 2 3 4 5 6 7 Have ability to identify, understand and react to meaningful trends in real time Build a world-class data and analytics organization •  Deeply understand players across the globe •  Apply that understanding to improve games for players •  Deeply understand our entire ecosystem, including social media THE FUTURE
  • 115. SHAMELESS HIRING PLUG1 2 3 4 5 6 7 THE FUTURE Like most everybody else at this conference… we’re hiring! PLAYER EXPERIENCE FIRST CHALLENGE CONVENTION FOCUS ON TALENT AND TEAM TAKE PLAY SERIOUSLY STAY HUNGRY, STAY HUMBLE THE RIOT MANIFESTO
  • 116. SHAMELESS HIRING PLUG1 2 3 4 5 6 7 And yes, you can play games at work. It’s encouraged! THE FUTURE