SlideShare a Scribd company logo
1 of 39
Download to read offline
Oozie – Now and Beyond
§ 

PRESENTED BY

Mona Chitnis⎪ Hadoop User Group, Yahoo Sunnyvale, October 16, 2013
Team In Action

§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 

2

Alejandro Abdelnur
Mohammad Islam
Rohini Palaniswamy
Robert Kanter
Virag Kothari
Mona Chitnis
Ryota Egashira
Michelle Chiang
Bowen Zhang

Yahoo Confidential & Proprietary
OVERVIEW
Overview

Why Oozie?
The Need

The Problem
§ 

Doing something on the grid often
required multiple steps

§ 

Workflow scheduler with better support for
grid jobs (native integration with Hadoop)

§ 

MapReduce job

§ 

orchestrate dependency between jobs

§ 

Pig job

§ 

§ 

Streaming job

execute at specific time or on data
availability

§ 

HDFS operation (mkdir, chmod, etc)…

§ 

retry jobs in the event of failures
(reliable)

§ 

custom job control

Common framework for communication
and execution of production process

§ 

shell scripts

§ 

§ 

§ 

Multiple ad-hoc solutions existed

cron…

§ 

§ 

sync (clocked dataset) awareness
A server-based workflow
async (unspecifiedsystem to
scheduling freq) data
awareness

manage Hadoop jobs

§ 

Cost of building and running apps were
high
§ 
§ 

4

development and applications
engineering
support, operations, and hardware

Yahoo Confidential & Proprietary

§ 

Horizontally scalable and extensible
system
§ 

Open-source

§ 

Workflows to couple resources instead
of having a monolithic code base
Overview

Oozie – A Workflow Engine
§  Oozie executes workflow defined as DAG of jobs
§  The job type includes MapReduce, Pig, Hive, shell script, custom Java code
etc.
§  Introduced in Oozie 1.x
M/R
job
start

M/R
job

OK

fork

join
MORE

Pig
job

ERROR

kill

Control-flow nodes
(start, kill, end | fork, join, decision)

M/R
job

end

FS
job

Action nodes
(map reduce, pig, hive, distcp, java, fs, sub-workflow, shell, ssh, email)
5

Yahoo Confidential & Proprietary

decision
ENOUGH

Java
Overview

Example M/R Action
JT and NN

Mapper
Reducer

Input Directory
Output Directory

Queue Name

6

Yahoo Confidential & Proprietary
Overview

Workflow State Transitions

Source: Chicago HUG, Dec 2012
7

Yahoo Confidential & Proprietary
Overview

Oozie (Coordinator) – A Scheduler
§  Oozie executes workflow based on
§  time dependency (frequency)
§  data dependency

§  Introduced in 2.x

Oozie Server
WS API

Oozie
Client

8

Yahoo Confidential & Proprietary

Oozie
Coordinator
Oozie
Workflow

Check
Data Availability

HDFS/ HCat
Overview

Oozie (Bundle) – A Pipeline Framework
§  Users can define and execute a “bundle” of coordinator apps
§  large scale data processing (inter-related coordinators)
§  operability and manageability of pipelines

§  User can start/stop/suspend/resume/rerun in the bundle level
§  Introduced in 3.x, bundles are optional

Oozie Server
Bundle
WS API

Check
Data Availability

Oozie
Coordinator
Oozie
Client
9

Yahoo Confidential & Proprietary

Oozie
Workflow

HDFS/ HCat
Overview

Layers of Abstraction in Oozie
1. Bundle

Bundle	
  
	
  
Coord	
  Job	
  

Coord	
  Job	
  

2. Coordinator
Coord	
  
Action	
  

WF	
  Job	
  

Coord	
  
Action	
  

WF	
  Job	
  

Coord	
  
Action	
  

WF	
  Job	
  

Coord	
  
Action	
  

WF	
  Job	
  

3. Workflow
M/R	
  
Job	
  

10

Yahoo Confidential & Proprietary

PIG	
  
Job	
  

M/R	
  
Job	
  

PIG	
  
Job	
  
Overview

Architectural Overview
Web Services (JSON/REST API)
Security
WS API

WS Callback
DAG Engine

submit

start

rerun

callback

suspend

resume

kill

signal
job

Recovery
Daemon Thread

info

check
action

start
action

end
action

notification

M/R

11

Yahoo Confidential & Proprietary

Pig

fs

Oracle DB

executed
Asynchronously
via Command Queue

Action Executors

Oozie (Java Web-App)

WF store

Command
Executor
Thread Pool

WF lib

Command
Queue

Instrumentation

Commands

sub-wf
pluggable, to
support additional
action types
Overview

Oozie Security, Multi-tenancy and Scalability

Hadoop Cluster
YARN
RM

Oozie
Server
1
Auth.
End User
(Kerberos, Y! specific)

12

Yahoo Confidential & Proprietary

2
Create
Launcher Job
(super-user)
5
Async Callback

3
Execute
User Job
(doAs)

Launcher
Mapper

Actual
M/R Job

4
Response
USE CASES
Use Cases and Common Patterns

Use Case 1: Time Triggers
Execute your workflow every 15 minutes

00:15

14

Yahoo Confidential & Proprietary

00:30

00:45

01:00
Use Cases and Common Patterns

Use Case 2: Time and Data Triggers
Materialize your workflow every hour, but only run them when the input
data is ready (that is loaded to the grid every hour)

Hadoop
Input Data
Exists?

01:00

15

Yahoo Confidential & Proprietary

02:00

03:00

04:00
Use Cases and Common Patterns

Use Case 2: Time and Data Triggers
<coordinator-app name=“coord1” frequency=“${1*HOURS}”…> 	

<datasets>	

<dataset name="logs" frequency=“${1*HOURS}” initial-instance="2009-01-01T23:59Z">	

<uri-template>hdfs://bar:9000/app/logs/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>	

</dataset>	


Dataset Definition

</datasets>	

<input-events>	

<data-in name=“inputLogs” dataset="logs">	

<instance>${current(0)}</instance>	

</data-in>	

</input-events>	


Input Events Definition
with time of coordinator action materialized (created)

<action>	

<workflow>	

<app-path>hdfs://bar:9000/usr/abc/logsprocessor-wf</app-path>	

<configuration>	

<property> <name>inputData</name><value>${dataIn(‘inputLogs’)}</value> </property>	

</configuration>	

</workflow>	

</action>	

16

Yahoo Confidential & Proprietary

	


Action Definition
Use Cases and Common Patterns

Use Case 3: Rolling Window
Access 15 minute datasets and roll them up into hourly datasets

00:15

00:30

00:45

01:15

01:00

01:00

17

Yahoo Confidential & Proprietary

01:30

01:45

02:00

02:00
Use Cases and Common Patterns

Use Case 4: Sliding Window
Access last 24 hours of data, and roll them up every hour

01:00

02:00

03:00

…

24:00
24:00

02:00

03:00

04:00

…

+1 day
01:00
+1 day
01:00

03:00

04:00

05:00

…

+1 day
02:00
+1 day
02:00

18

Yahoo Confidential & Proprietary
Where are We Today

Proven Scale and Multi-tenancy

§ 

2.8 M jobs/month

13,000 jobs/server day

§ 

16% of all Hadoop jobs

§ 

75 products

§ 

255 monthly users

§ 

2,000+ projects

§ 

5.4 M compute hrs/month

§ 

770,000 workflows

§ 

Between 1-8 actions

§ 

250 coordinator jobs/day

§ 

Yahoo Confidential & Proprietary

§ 

§ 

19

17 clusters

Avg. 4 actions/workflow

§ 

67% of Oozie jobs kicked
thru coordinator
Where are We Today

Mix Of Job Types For Workflows
Pig

MapReduce

100%

Java

Other

4%

90%
80%

SAMPLE USE OF JOB TYPES
28%

§  Data processing/ filtering
§  Aggregation

MapReduce

§  Publishing data (HDFS/
HCat)

Java

§  Legacy code and logic

Others

70%

Pig

§  Distcp and shell
§  Data copy/ transfer

60%
50%

29%

40%
30%
20%

39%

10%
0%
Jobs
20

Yahoo Confidential & Proprietary
FEATURE DEEP-DIVE
What’s New in Oozie

Existing Features (Oozie 3.x)
§  HBase access through Oozie, via credentials
§  HCatalog access through Oozie, via credentials
§  Email action
§  DistCp action (intra as well as inter-cluster copy)
§  Shell action (run any script e.g. perl, python, hadoop CLI)
§  Workflow dry-run & Fork-Join validation
§  Bulk monitoring (REST API)
§  Coordinator EL functions for parameterized workflows
§  Job DAG

22

Yahoo Confidential & Proprietary
What’s New in Oozie

HBase Credentials
§  Add in workflow.xml
§ 

Add a section of "credentials". The type is "hbase”.

§ 

Specify the java action to use the credentials.

§ 

Put hbase-site.xml in oozie application path. And use <file> in workflow.xml to put hbase-site.xml in the distributed cache. A copy of the
hbase-site.xml can be found in gateway:/home/gs/conf/hbase/hbase-site.xml.

§ 

Put jars "guava-*.jar, zookeeper-*.jar, hbase-*.jar, protobuf-java-*.jar” in workflow “lib” dir

§  Make sure you are using Oozie XSD version 0.3 and above for the tag.

	
  	
  

	
  	
  
	
  	
  <workflow-­‐app	
  name="foo-­‐wf"	
  xmlns="uri:oozie:workflow:0.3">	
  

	
  	
  	
  	
  

	
  	
  	
  	
  	
  <credentials>	
  

	
  	
  	
  	
  	
  	
  	
  	
  

	
  	
  	
  	
  	
  <credential	
  name="hbase.cert"	
  type="hbase">	
  </credential>	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  //	
  optional	
  properties	
  -­‐	
  zookeeper.znode.parent,	
  hbase.zookeeper.quorum	
  

	
  	
  	
  	
  

	
  	
  	
  	
  	
  </credentials>	
  

	
  	
  	
  	
  

	
  	
  	
  	
  	
  <start	
  to=”map-­‐reduce-­‐action"	
  />	
  

	
  	
  	
  	
  

	
  	
  	
  	
  	
  <action	
  name=’map-­‐reduce-­‐action'	
  cred="hbase.cert">	
  

	
  	
  	
  	
  	
  	
  	
  	
  

	
  	
  	
  	
  	
  <map-­‐reduce>	
  

	
  	
  	
  	
  	
  	
  	
  	
  

	
  	
  	
  	
  	
  <configuration>	
  
	
  

	
  <property>	
  <name>mapred.mapper.class</name>	
  

	
  

	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  <value>SampleMapperHBase</value>	
  </property>	
  

	
  

	
  <property>	
  <name>mapred.reducer.class</name>	
  

	
  

	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  <value>org.apache.oozie.example.DemoReducer</value>	
  </property>	
  </configuration>	
  

	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  <file>hbase-­‐site.xml#hbase-­‐site.xml</file>	
  
	
  	
  	
  	
  	
  	
  	
  	
  

	
  	
  	
  	
  	
  </java>	
  

	
  

§  Refer to http://twiki.corp.yahoo.com/view/CCDI/UseHbaseCred
23

Yahoo Confidential & Proprietary
What’s New in Oozie

Oozie 4.0
1
2

Job Notifications

3

24

HCatalog Integration

SLA Monitoring

Yahoo Confidential & Proprietary
What’s New in Oozie

1

HCatalog Integration
§  Oozie now supports HCatalog datasets, in addition to HDFS
§ 

Query HCat server directly -OR-

§ 

Receive ‘partition created’ notifications

§  With HDFS datasets, poll NameNode to check data availability
§ 

Delay

§ 

Single source

data exists?
Oozie

data exists?
…….

NameNode
HDFS
/data/click/2013/03/10
/data/click/2013/03/11
/data/click/2013/03/12
…….

25

Yahoo Confidential & Proprietary
What’s New in Oozie

Latest Oozie 4.0 Features
HCatalog Integration

<coordinator-­‐app	
  name=”hcat-­‐coord”	
  …	
  >	
  	
  

› 

HCat - metastore has info about HDFS
datasets, locations and file formats.

› 

Using HCat loader and storer, dataset can be

	
  	
  <datasets>	
  
	
  	
  	
  	
  <dataset	
  name=”inp-­‐logs"	
  frequency="${coord:hours(1)}”>	
  
	
  	
  	
  	
  	
  	
  <uri-­‐template>${hcatNode}/${db}/${table}/ds=${YEAR}-­‐$
{MONTH}-­‐${DAY};region=${region}</uri-­‐template>	
  
	
  	
  	
  	
  	
  	
  <done-­‐flag></done-­‐flag>	
  

consumed uniformly using Pig, Hive and
Map/Reduce in Oozie, using the “database,

	
  	
  	
  	
  <dataset	
  name=”out-­‐logs"	
  frequency=”${coord:days(1)}”>	
  

table, partition” abstraction.
› 

	
  	
  	
  	
  </dataset>	
  
	
  	
  	
  	
  	
  	
  <uri-­‐template>${hcatNode}/${db}/${outputtable}/ds=$
{dataOut};region=${region}</uri-­‐template>	
  

Oozie notified on partition availability via JMS
messages, to trigger workflows immediately

› 

Use JARs hcatalog-core.jar, webhcat-javaclient.jar, hive-common.jar, hive-exec.jar,

	
  	
  	
  	
  	
  	
  <done-­‐flag></done-­‐flag>	
  
	
  	
  	
  	
  </dataset>	
  
...	
  
<property>	
  
	
  	
  	
  	
  	
  	
  <name>FILTER</name>	
  
	
  	
  	
  	
  	
  	
  <value>${coord:dataInPartitionFilter('input',	
  'pig')}	
  
	
  	
  	
  	
  	
  	
  </value>	
  

hive-metastore.jar, hive-serde.jar and
libfb303.jar in workflow ‘lib’
§ 

26

Docs http://oozie.apache.org/docs/4.0.0/
DG_HCatalogIntegration.html

Yahoo Confidential & Proprietary

Pig	
  action	
  script:	
  
A	
  =	
  load	
  '$DB.$TABLE'	
  using	
  
org.apache.hcatalog.pig.HCatLoader();	
  
	
  	
  B	
  =	
  FILTER	
  A	
  BY	
  $FILTER;	
  
	
  	
  C	
  =	
  foreach	
  B	
  generate	
  foo,	
  bar;	
  
	
  	
  store	
  C	
  into	
  '$OUTPUT_DB.$OUTPUT_TABLE'	
  USING	
  
org.apache.hcatalog.pig.HCatStorer('$OUTPUT_PARTITION');	
  
With HCatalog + Notifications

What’s New in Oozie

High-level Diagram
/data/click/2013/03/12

Data Producer

Produce data (distcp, pig, M/R..)

HDFS

Update metadata
(ALTER TABLE click ADD PARTITION(data=‘2013/03/12’)
location ’hdfs://data/click/2013/03/12’)

HCatalog

27

Yahoo Confidential & Proprietary
What’s New in Oozie

With HCatalog + Notifications
High-level Diagram

Data Producer

Oozie

HDFS

1. Query/Poll Partition

2. Register Topic

Message Bus
(e..g, ActiveMQ)
28

Yahoo Confidential & Proprietary

HCatalog
What’s New in Oozie

With HCatalog + Notifications
High-level Diagram

/data/click/2013/03/12

Data Producer

Produce data (distcp, pig, M/R..)

HDFS

Update metadata
(ALTER TABLE click ADD PARTITION(data=‘2013/03/12’)
location ’hdfs://data/click/2013/03/12’)

Oozie

1. Query/Poll Partition

2. Register Topic
Start workflow
4. Notify New Partition
Message Bus
(e..g, ActiveMQ)
29

Yahoo Confidential & Proprietary

HCatalog

3. Push notification
<New Partition>
What’s New in Oozie

Latest Oozie 4.0 Features
2

Job Notifications

§  Notification event sent on jobs’ status change
§  Messages sent on the configured JMScompliant message broker
§  Users should write message listeners to listen
on select topics (e.g. username)
§  To filter more, apply JMS selectors on

Filter desired app-types for notification:
<property>	
  
<name>oozie.service.EventHandlerService.	
  
filter.app.types</name>	
  
<value>workflow_job,	
  workflow_action,	
  
coordinator_job,	
  coordinator_action</value>	
  
</property>	
  

Notification Msg Example:
Coordinator Action Failure Event
›  Header (Selectors)

messages.

• 
• 
• 
• 

§  E.g. user, jobid, app-type, status, msg-type (JOB
or SLA).

§  Docs http://oozie.apache.org/docs/4.0.0/
DG_JMSNotifications.html

30

Yahoo Confidential & Proprietary

› 

AppType – Coordinator_Action
Status - FAILURE
User
App-Name

Message Body (JSON)
• 
• 
• 
• 
• 
• 
• 

ID (coord action id)
Parent ID (coord Job ID)
NominalTime
StartTime
EndTime
Status - FAILED, KILLED, SUSPENDED, TIMEDOUT
Error-Code, Error-Message (if KILLED or FAILED)
Latest Oozie 4.0 Features
SLA Monitoring

3

§  Oozie can actively track SLAs on Jobs’
§ 

Start-time, End-time, Duration

§  Event Status
§ 

START_MET, START_MISS

§ 

END_MET, END_MISS

§ 

DURATION_MET, DURATION_MISS

§  At any time, the SLA processing stage will reflect:
§ 

Not_Started <-- Job not yet begun

§ 

In_Process <-- Job started and is running, and SLAs are
being tracked

§ 

Met <-- caused by an END_MET

§ 

Miss <-- caused by an END_MISS

§  Access/Filter SLA info via
§ 
§ 

JMS Messages

§ 

31

REST API

§ 

§ 

Web-console dashboard

Email alert

Docs http://oozie.apache.org/docs/4.0.0/DG_SLAMonitoring.html
Yahoo Confidential & Proprietary

What’s New in Oozie

	
  
<workflow-­‐app	
  xmlns="uri:oozie:workflow:
0.5"	
  xmlns:sla="uri:oozie:sla:0.2"	
  
name=”sla-­‐wf">	
  
...	
  
	
  	
  <end	
  name="end"/>	
  
	
  	
  <sla:info>	
  
	
  	
  	
  	
  <sla:nominal-­‐time>${nominalTime}	
  	
  	
  	
  
</sla:nominal-­‐time>	
  
	
  	
  	
  	
  <sla:should-­‐start>${shouldStart}	
  	
  	
  	
  	
  
</sla:should-­‐start>	
  
	
  	
  	
  	
  <sla:should-­‐end>${shouldEnd}	
  	
  	
  	
  	
  	
  	
  	
  
</sla:should-­‐end>	
  
	
  	
  	
  	
  <sla:max-­‐duration>${duration}	
  	
  	
  	
  	
  	
  	
  
</sla:max-­‐duration>	
  
	
  	
  	
  	
  <sla:alert-­‐events>start_miss,end_miss	
  
</sla:alert-­‐events>	
  
	
  	
  	
  	
  <sla:alert-­‐contact>joe@yahoo	
  	
  	
  	
  	
  	
  	
  	
  
</sla:alert-­‐contact>	
  
	
  	
  </sla:info>	
  
</workflow-­‐app>	
  
What’s New in Oozie

SLA Monitoring Dashboard

32

Yahoo Confidential & Proprietary
Demo

Checking Oozie Job
1. CLI (yoozie_client)
$ oozie job -oozie http://localhost:11000/oozie -info 14-20090525161321-oozie-joe
---------------------------------------------------------------------------------------------------------------Workflow Name : map-reduce-wf
App Path : hdfs://localhost:8020/user/joe/workflows/map-reduce
Status : SUCCEEDED
Run : 0
User : joe
Group : users
Created : 2009-05-26 05:01
Started : 2009-05-26 05:01
Ended : 2009-05-26 05:01
Actions
--------------------------------------------------------------------------------------------------------------------Action Name Type Status Transition External Id External Status Error Code Start End
-----------------------------------------------------------------------------------------------------------------------------------------------------hadoop1 map-reduce OK end job_200904281535_0254 SUCCEEDED - 2009-05-26 05:01 2009-05-26 05:01
------------------------------------------------------------------------------------------------------------------------------------------------------

33

Yahoo Confidential & Proprietary
Demo

Checking / Debugging Oozie Jobs
2. Web-Console
e.g. http://my-oozie-server:4080/oozie

Docs - https://cwiki.apache.org/confluence/display/OOZIE/Map+Reduce+Cookbook
34

Yahoo Confidential & Proprietary
What else is out there?
Oozie at ASF

Oozie vs. Other Workflow Systems

Champion

LinkedIn

Spotify

Apache
Affiliation

TLP

License only

License only

Language

Java

Java

Python

Adoption

High, part of all standard Hadoop
distributions

Low

Low

Code
Complexity

High (>100K lines)

Medium (< 50K lines)

Low (<10K lines)

Hadoop Job
Support

Extensive built-in support

Limited job types

Limited job types

Docs &
Support

Excellent

Limited

Limited

Auth.

Kerberos, custom

xml-based, custom

Linux-based

Reruns

Yes (recovery, retries at all levels)

Partial

After removing output,
idempotent

UI
36

Yahoo! (now ASF)

Average

Good

-

Yahoo Confidential & Proprietary
Roadmap

The Next Release
§  Scalability and performance improvements to handle higher loads
§ 

More 1 and 5 min frequency jobs

§  High Availability with Load Balancing
§  Flexible Cron-Based Scheduling
§  Handling cluster Rolling upgrades for Hadoop 2.0

37

Yahoo Confidential & Proprietary
Q & A
39

Yahoo Confidential & Proprietary

More Related Content

What's hot

Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Rohit Agrawal
 
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NYahoo Developer Network
 
Oozie HUG May12
Oozie HUG May12Oozie HUG May12
Oozie HUG May12mislam77
 
July 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification ProcessJuly 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification ProcessYahoo Developer Network
 
Oozie Summit 2011
Oozie Summit 2011Oozie Summit 2011
Oozie Summit 2011mislam77
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieShareThis
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayDataWorks Summit
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewMadhur Nawandar
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsYahoo Developer Network
 
TXLF: Chef- Software Defined Infrastructure Today & Tomorrow
TXLF: Chef- Software Defined Infrastructure Today & TomorrowTXLF: Chef- Software Defined Infrastructure Today & Tomorrow
TXLF: Chef- Software Defined Infrastructure Today & TomorrowMatt Ray
 
Empowering developers to deploy their own data stores
Empowering developers to deploy their own data storesEmpowering developers to deploy their own data stores
Empowering developers to deploy their own data storesTomas Doran
 
Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...All Things Open
 
Learn you some Ansible for great good!
Learn you some Ansible for great good!Learn you some Ansible for great good!
Learn you some Ansible for great good!David Lapsley
 

What's hot (20)

Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10
 
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
 
Oozie HUG May12
Oozie HUG May12Oozie HUG May12
Oozie HUG May12
 
July 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification ProcessJuly 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification Process
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Advanced Oozie
 
Oozie meetup - HA
Oozie meetup - HAOozie meetup - HA
Oozie meetup - HA
 
Oozie Summit 2011
Oozie Summit 2011Oozie Summit 2011
Oozie Summit 2011
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
 
October 2014 HUG : Oozie HA
October 2014 HUG : Oozie HAOctober 2014 HUG : Oozie HA
October 2014 HUG : Oozie HA
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overview
 
Hadoop HDFS
Hadoop HDFS Hadoop HDFS
Hadoop HDFS
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
 
TXLF: Chef- Software Defined Infrastructure Today & Tomorrow
TXLF: Chef- Software Defined Infrastructure Today & TomorrowTXLF: Chef- Software Defined Infrastructure Today & Tomorrow
TXLF: Chef- Software Defined Infrastructure Today & Tomorrow
 
Empowering developers to deploy their own data stores
Empowering developers to deploy their own data storesEmpowering developers to deploy their own data stores
Empowering developers to deploy their own data stores
 
Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...
 
Gradle - Build System
Gradle - Build SystemGradle - Build System
Gradle - Build System
 
Learn you some Ansible for great good!
Learn you some Ansible for great good!Learn you some Ansible for great good!
Learn you some Ansible for great good!
 

Viewers also liked

처음 접하는 Oozie Workflow, Coordinator
처음 접하는 Oozie Workflow, Coordinator처음 접하는 Oozie Workflow, Coordinator
처음 접하는 Oozie Workflow, CoordinatorKim Log
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieYahoo Developer Network
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Sumeet Singh
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayDataWorks Summit
 
Hue architecture in the Hadoop ecosystem and SQL Editor
Hue architecture in the Hadoop ecosystem and SQL EditorHue architecture in the Hadoop ecosystem and SQL Editor
Hue architecture in the Hadoop ecosystem and SQL EditorRomain Rigaux
 
HiveServer2 for Apache Hive
HiveServer2 for Apache HiveHiveServer2 for Apache Hive
HiveServer2 for Apache HiveCarl Steinbach
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache PigTapan Avasthi
 
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopDataWorks Summit
 
An introduction to hadoop
An introduction to hadoopAn introduction to hadoop
An introduction to hadoopMinJae Kang
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceHortonworks
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieDataWorks Summit/Hadoop Summit
 

Viewers also liked (13)

Hive tuning
Hive tuningHive tuning
Hive tuning
 
처음 접하는 Oozie Workflow, Coordinator
처음 접하는 Oozie Workflow, Coordinator처음 접하는 Oozie Workflow, Coordinator
처음 접하는 Oozie Workflow, Coordinator
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache Oozie
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
 
Hue architecture in the Hadoop ecosystem and SQL Editor
Hue architecture in the Hadoop ecosystem and SQL EditorHue architecture in the Hadoop ecosystem and SQL Editor
Hue architecture in the Hadoop ecosystem and SQL Editor
 
HiveServer2 for Apache Hive
HiveServer2 for Apache HiveHiveServer2 for Apache Hive
HiveServer2 for Apache Hive
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
 
An introduction to hadoop
An introduction to hadoopAn introduction to hadoop
An introduction to hadoop
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
 

Similar to October 2013 HUG: Oozie 4.x

How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauCodemotion
 
Rapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 PlatformRapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 PlatformWSO2
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
Dmp hadoop getting_start
Dmp hadoop getting_startDmp hadoop getting_start
Dmp hadoop getting_startGim GyungJin
 
Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Hao Chen
 
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...Yahoo Developer Network
 
Building a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackBuilding a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackke4qqq
 
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo! HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo! Sumeet Singh
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...Big Data Spain
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
A glimpse of test automation in hadoop ecosystem by Deepika Achary
A glimpse of test automation in hadoop ecosystem by Deepika AcharyA glimpse of test automation in hadoop ecosystem by Deepika Achary
A glimpse of test automation in hadoop ecosystem by Deepika AcharyQA or the Highway
 
Building a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackBuilding a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackke4qqq
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Imply
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioBig Data Aplications Meetup
 

Similar to October 2013 HUG: Oozie 4.x (20)

How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin Leau
 
Rapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 PlatformRapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 Platform
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Dmp hadoop getting_start
Dmp hadoop getting_startDmp hadoop getting_start
Dmp hadoop getting_start
 
Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015
 
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
 
Building a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackBuilding a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStack
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo! HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
A glimpse of test automation in hadoop ecosystem by Deepika Achary
A glimpse of test automation in hadoop ecosystem by Deepika AcharyA glimpse of test automation in hadoop ecosystem by Deepika Achary
A glimpse of test automation in hadoop ecosystem by Deepika Achary
 
Building a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackBuilding a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStack
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 

More from Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

More from Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

October 2013 HUG: Oozie 4.x

  • 1. Oozie – Now and Beyond §  PRESENTED BY Mona Chitnis⎪ Hadoop User Group, Yahoo Sunnyvale, October 16, 2013
  • 2. Team In Action §  §  §  §  §  §  §  §  §  2 Alejandro Abdelnur Mohammad Islam Rohini Palaniswamy Robert Kanter Virag Kothari Mona Chitnis Ryota Egashira Michelle Chiang Bowen Zhang Yahoo Confidential & Proprietary
  • 4. Overview Why Oozie? The Need The Problem §  Doing something on the grid often required multiple steps §  Workflow scheduler with better support for grid jobs (native integration with Hadoop) §  MapReduce job §  orchestrate dependency between jobs §  Pig job §  §  Streaming job execute at specific time or on data availability §  HDFS operation (mkdir, chmod, etc)… §  retry jobs in the event of failures (reliable) §  custom job control Common framework for communication and execution of production process §  shell scripts §  §  §  Multiple ad-hoc solutions existed cron… §  §  sync (clocked dataset) awareness A server-based workflow async (unspecifiedsystem to scheduling freq) data awareness manage Hadoop jobs §  Cost of building and running apps were high §  §  4 development and applications engineering support, operations, and hardware Yahoo Confidential & Proprietary §  Horizontally scalable and extensible system §  Open-source §  Workflows to couple resources instead of having a monolithic code base
  • 5. Overview Oozie – A Workflow Engine §  Oozie executes workflow defined as DAG of jobs §  The job type includes MapReduce, Pig, Hive, shell script, custom Java code etc. §  Introduced in Oozie 1.x M/R job start M/R job OK fork join MORE Pig job ERROR kill Control-flow nodes (start, kill, end | fork, join, decision) M/R job end FS job Action nodes (map reduce, pig, hive, distcp, java, fs, sub-workflow, shell, ssh, email) 5 Yahoo Confidential & Proprietary decision ENOUGH Java
  • 6. Overview Example M/R Action JT and NN Mapper Reducer Input Directory Output Directory Queue Name 6 Yahoo Confidential & Proprietary
  • 7. Overview Workflow State Transitions Source: Chicago HUG, Dec 2012 7 Yahoo Confidential & Proprietary
  • 8. Overview Oozie (Coordinator) – A Scheduler §  Oozie executes workflow based on §  time dependency (frequency) §  data dependency §  Introduced in 2.x Oozie Server WS API Oozie Client 8 Yahoo Confidential & Proprietary Oozie Coordinator Oozie Workflow Check Data Availability HDFS/ HCat
  • 9. Overview Oozie (Bundle) – A Pipeline Framework §  Users can define and execute a “bundle” of coordinator apps §  large scale data processing (inter-related coordinators) §  operability and manageability of pipelines §  User can start/stop/suspend/resume/rerun in the bundle level §  Introduced in 3.x, bundles are optional Oozie Server Bundle WS API Check Data Availability Oozie Coordinator Oozie Client 9 Yahoo Confidential & Proprietary Oozie Workflow HDFS/ HCat
  • 10. Overview Layers of Abstraction in Oozie 1. Bundle Bundle     Coord  Job   Coord  Job   2. Coordinator Coord   Action   WF  Job   Coord   Action   WF  Job   Coord   Action   WF  Job   Coord   Action   WF  Job   3. Workflow M/R   Job   10 Yahoo Confidential & Proprietary PIG   Job   M/R   Job   PIG   Job  
  • 11. Overview Architectural Overview Web Services (JSON/REST API) Security WS API WS Callback DAG Engine submit start rerun callback suspend resume kill signal job Recovery Daemon Thread info check action start action end action notification M/R 11 Yahoo Confidential & Proprietary Pig fs Oracle DB executed Asynchronously via Command Queue Action Executors Oozie (Java Web-App) WF store Command Executor Thread Pool WF lib Command Queue Instrumentation Commands sub-wf pluggable, to support additional action types
  • 12. Overview Oozie Security, Multi-tenancy and Scalability Hadoop Cluster YARN RM Oozie Server 1 Auth. End User (Kerberos, Y! specific) 12 Yahoo Confidential & Proprietary 2 Create Launcher Job (super-user) 5 Async Callback 3 Execute User Job (doAs) Launcher Mapper Actual M/R Job 4 Response
  • 14. Use Cases and Common Patterns Use Case 1: Time Triggers Execute your workflow every 15 minutes 00:15 14 Yahoo Confidential & Proprietary 00:30 00:45 01:00
  • 15. Use Cases and Common Patterns Use Case 2: Time and Data Triggers Materialize your workflow every hour, but only run them when the input data is ready (that is loaded to the grid every hour) Hadoop Input Data Exists? 01:00 15 Yahoo Confidential & Proprietary 02:00 03:00 04:00
  • 16. Use Cases and Common Patterns Use Case 2: Time and Data Triggers <coordinator-app name=“coord1” frequency=“${1*HOURS}”…> <datasets> <dataset name="logs" frequency=“${1*HOURS}” initial-instance="2009-01-01T23:59Z"> <uri-template>hdfs://bar:9000/app/logs/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template> </dataset> Dataset Definition </datasets> <input-events> <data-in name=“inputLogs” dataset="logs"> <instance>${current(0)}</instance> </data-in> </input-events> Input Events Definition with time of coordinator action materialized (created) <action> <workflow> <app-path>hdfs://bar:9000/usr/abc/logsprocessor-wf</app-path> <configuration> <property> <name>inputData</name><value>${dataIn(‘inputLogs’)}</value> </property> </configuration> </workflow> </action> 16 Yahoo Confidential & Proprietary Action Definition
  • 17. Use Cases and Common Patterns Use Case 3: Rolling Window Access 15 minute datasets and roll them up into hourly datasets 00:15 00:30 00:45 01:15 01:00 01:00 17 Yahoo Confidential & Proprietary 01:30 01:45 02:00 02:00
  • 18. Use Cases and Common Patterns Use Case 4: Sliding Window Access last 24 hours of data, and roll them up every hour 01:00 02:00 03:00 … 24:00 24:00 02:00 03:00 04:00 … +1 day 01:00 +1 day 01:00 03:00 04:00 05:00 … +1 day 02:00 +1 day 02:00 18 Yahoo Confidential & Proprietary
  • 19. Where are We Today Proven Scale and Multi-tenancy §  2.8 M jobs/month 13,000 jobs/server day §  16% of all Hadoop jobs §  75 products §  255 monthly users §  2,000+ projects §  5.4 M compute hrs/month §  770,000 workflows §  Between 1-8 actions §  250 coordinator jobs/day §  Yahoo Confidential & Proprietary §  §  19 17 clusters Avg. 4 actions/workflow §  67% of Oozie jobs kicked thru coordinator
  • 20. Where are We Today Mix Of Job Types For Workflows Pig MapReduce 100% Java Other 4% 90% 80% SAMPLE USE OF JOB TYPES 28% §  Data processing/ filtering §  Aggregation MapReduce §  Publishing data (HDFS/ HCat) Java §  Legacy code and logic Others 70% Pig §  Distcp and shell §  Data copy/ transfer 60% 50% 29% 40% 30% 20% 39% 10% 0% Jobs 20 Yahoo Confidential & Proprietary
  • 22. What’s New in Oozie Existing Features (Oozie 3.x) §  HBase access through Oozie, via credentials §  HCatalog access through Oozie, via credentials §  Email action §  DistCp action (intra as well as inter-cluster copy) §  Shell action (run any script e.g. perl, python, hadoop CLI) §  Workflow dry-run & Fork-Join validation §  Bulk monitoring (REST API) §  Coordinator EL functions for parameterized workflows §  Job DAG 22 Yahoo Confidential & Proprietary
  • 23. What’s New in Oozie HBase Credentials §  Add in workflow.xml §  Add a section of "credentials". The type is "hbase”. §  Specify the java action to use the credentials. §  Put hbase-site.xml in oozie application path. And use <file> in workflow.xml to put hbase-site.xml in the distributed cache. A copy of the hbase-site.xml can be found in gateway:/home/gs/conf/hbase/hbase-site.xml. §  Put jars "guava-*.jar, zookeeper-*.jar, hbase-*.jar, protobuf-java-*.jar” in workflow “lib” dir §  Make sure you are using Oozie XSD version 0.3 and above for the tag.            <workflow-­‐app  name="foo-­‐wf"  xmlns="uri:oozie:workflow:0.3">                    <credentials>                            <credential  name="hbase.cert"  type="hbase">  </credential>                      //  optional  properties  -­‐  zookeeper.znode.parent,  hbase.zookeeper.quorum                    </credentials>                    <start  to=”map-­‐reduce-­‐action"  />                    <action  name=’map-­‐reduce-­‐action'  cred="hbase.cert">                            <map-­‐reduce>                            <configuration>      <property>  <name>mapred.mapper.class</name>                            <value>SampleMapperHBase</value>  </property>      <property>  <name>mapred.reducer.class</name>                            <value>org.apache.oozie.example.DemoReducer</value>  </property>  </configuration>                                        <file>hbase-­‐site.xml#hbase-­‐site.xml</file>                            </java>     §  Refer to http://twiki.corp.yahoo.com/view/CCDI/UseHbaseCred 23 Yahoo Confidential & Proprietary
  • 24. What’s New in Oozie Oozie 4.0 1 2 Job Notifications 3 24 HCatalog Integration SLA Monitoring Yahoo Confidential & Proprietary
  • 25. What’s New in Oozie 1 HCatalog Integration §  Oozie now supports HCatalog datasets, in addition to HDFS §  Query HCat server directly -OR- §  Receive ‘partition created’ notifications §  With HDFS datasets, poll NameNode to check data availability §  Delay §  Single source data exists? Oozie data exists? ……. NameNode HDFS /data/click/2013/03/10 /data/click/2013/03/11 /data/click/2013/03/12 ……. 25 Yahoo Confidential & Proprietary
  • 26. What’s New in Oozie Latest Oozie 4.0 Features HCatalog Integration <coordinator-­‐app  name=”hcat-­‐coord”  …  >     ›  HCat - metastore has info about HDFS datasets, locations and file formats. ›  Using HCat loader and storer, dataset can be    <datasets>          <dataset  name=”inp-­‐logs"  frequency="${coord:hours(1)}”>              <uri-­‐template>${hcatNode}/${db}/${table}/ds=${YEAR}-­‐$ {MONTH}-­‐${DAY};region=${region}</uri-­‐template>              <done-­‐flag></done-­‐flag>   consumed uniformly using Pig, Hive and Map/Reduce in Oozie, using the “database,        <dataset  name=”out-­‐logs"  frequency=”${coord:days(1)}”>   table, partition” abstraction. ›         </dataset>              <uri-­‐template>${hcatNode}/${db}/${outputtable}/ds=$ {dataOut};region=${region}</uri-­‐template>   Oozie notified on partition availability via JMS messages, to trigger workflows immediately ›  Use JARs hcatalog-core.jar, webhcat-javaclient.jar, hive-common.jar, hive-exec.jar,            <done-­‐flag></done-­‐flag>          </dataset>   ...   <property>              <name>FILTER</name>              <value>${coord:dataInPartitionFilter('input',  'pig')}              </value>   hive-metastore.jar, hive-serde.jar and libfb303.jar in workflow ‘lib’ §  26 Docs http://oozie.apache.org/docs/4.0.0/ DG_HCatalogIntegration.html Yahoo Confidential & Proprietary Pig  action  script:   A  =  load  '$DB.$TABLE'  using   org.apache.hcatalog.pig.HCatLoader();      B  =  FILTER  A  BY  $FILTER;      C  =  foreach  B  generate  foo,  bar;      store  C  into  '$OUTPUT_DB.$OUTPUT_TABLE'  USING   org.apache.hcatalog.pig.HCatStorer('$OUTPUT_PARTITION');  
  • 27. With HCatalog + Notifications What’s New in Oozie High-level Diagram /data/click/2013/03/12 Data Producer Produce data (distcp, pig, M/R..) HDFS Update metadata (ALTER TABLE click ADD PARTITION(data=‘2013/03/12’) location ’hdfs://data/click/2013/03/12’) HCatalog 27 Yahoo Confidential & Proprietary
  • 28. What’s New in Oozie With HCatalog + Notifications High-level Diagram Data Producer Oozie HDFS 1. Query/Poll Partition 2. Register Topic Message Bus (e..g, ActiveMQ) 28 Yahoo Confidential & Proprietary HCatalog
  • 29. What’s New in Oozie With HCatalog + Notifications High-level Diagram /data/click/2013/03/12 Data Producer Produce data (distcp, pig, M/R..) HDFS Update metadata (ALTER TABLE click ADD PARTITION(data=‘2013/03/12’) location ’hdfs://data/click/2013/03/12’) Oozie 1. Query/Poll Partition 2. Register Topic Start workflow 4. Notify New Partition Message Bus (e..g, ActiveMQ) 29 Yahoo Confidential & Proprietary HCatalog 3. Push notification <New Partition>
  • 30. What’s New in Oozie Latest Oozie 4.0 Features 2 Job Notifications §  Notification event sent on jobs’ status change §  Messages sent on the configured JMScompliant message broker §  Users should write message listeners to listen on select topics (e.g. username) §  To filter more, apply JMS selectors on Filter desired app-types for notification: <property>   <name>oozie.service.EventHandlerService.   filter.app.types</name>   <value>workflow_job,  workflow_action,   coordinator_job,  coordinator_action</value>   </property>   Notification Msg Example: Coordinator Action Failure Event ›  Header (Selectors) messages. •  •  •  •  §  E.g. user, jobid, app-type, status, msg-type (JOB or SLA). §  Docs http://oozie.apache.org/docs/4.0.0/ DG_JMSNotifications.html 30 Yahoo Confidential & Proprietary ›  AppType – Coordinator_Action Status - FAILURE User App-Name Message Body (JSON) •  •  •  •  •  •  •  ID (coord action id) Parent ID (coord Job ID) NominalTime StartTime EndTime Status - FAILED, KILLED, SUSPENDED, TIMEDOUT Error-Code, Error-Message (if KILLED or FAILED)
  • 31. Latest Oozie 4.0 Features SLA Monitoring 3 §  Oozie can actively track SLAs on Jobs’ §  Start-time, End-time, Duration §  Event Status §  START_MET, START_MISS §  END_MET, END_MISS §  DURATION_MET, DURATION_MISS §  At any time, the SLA processing stage will reflect: §  Not_Started <-- Job not yet begun §  In_Process <-- Job started and is running, and SLAs are being tracked §  Met <-- caused by an END_MET §  Miss <-- caused by an END_MISS §  Access/Filter SLA info via §  §  JMS Messages §  31 REST API §  §  Web-console dashboard Email alert Docs http://oozie.apache.org/docs/4.0.0/DG_SLAMonitoring.html Yahoo Confidential & Proprietary What’s New in Oozie   <workflow-­‐app  xmlns="uri:oozie:workflow: 0.5"  xmlns:sla="uri:oozie:sla:0.2"   name=”sla-­‐wf">   ...      <end  name="end"/>      <sla:info>          <sla:nominal-­‐time>${nominalTime}         </sla:nominal-­‐time>          <sla:should-­‐start>${shouldStart}           </sla:should-­‐start>          <sla:should-­‐end>${shouldEnd}                 </sla:should-­‐end>          <sla:max-­‐duration>${duration}               </sla:max-­‐duration>          <sla:alert-­‐events>start_miss,end_miss   </sla:alert-­‐events>          <sla:alert-­‐contact>joe@yahoo                 </sla:alert-­‐contact>      </sla:info>   </workflow-­‐app>  
  • 32. What’s New in Oozie SLA Monitoring Dashboard 32 Yahoo Confidential & Proprietary
  • 33. Demo Checking Oozie Job 1. CLI (yoozie_client) $ oozie job -oozie http://localhost:11000/oozie -info 14-20090525161321-oozie-joe ---------------------------------------------------------------------------------------------------------------Workflow Name : map-reduce-wf App Path : hdfs://localhost:8020/user/joe/workflows/map-reduce Status : SUCCEEDED Run : 0 User : joe Group : users Created : 2009-05-26 05:01 Started : 2009-05-26 05:01 Ended : 2009-05-26 05:01 Actions --------------------------------------------------------------------------------------------------------------------Action Name Type Status Transition External Id External Status Error Code Start End -----------------------------------------------------------------------------------------------------------------------------------------------------hadoop1 map-reduce OK end job_200904281535_0254 SUCCEEDED - 2009-05-26 05:01 2009-05-26 05:01 ------------------------------------------------------------------------------------------------------------------------------------------------------ 33 Yahoo Confidential & Proprietary
  • 34. Demo Checking / Debugging Oozie Jobs 2. Web-Console e.g. http://my-oozie-server:4080/oozie Docs - https://cwiki.apache.org/confluence/display/OOZIE/Map+Reduce+Cookbook 34 Yahoo Confidential & Proprietary
  • 35. What else is out there?
  • 36. Oozie at ASF Oozie vs. Other Workflow Systems Champion LinkedIn Spotify Apache Affiliation TLP License only License only Language Java Java Python Adoption High, part of all standard Hadoop distributions Low Low Code Complexity High (>100K lines) Medium (< 50K lines) Low (<10K lines) Hadoop Job Support Extensive built-in support Limited job types Limited job types Docs & Support Excellent Limited Limited Auth. Kerberos, custom xml-based, custom Linux-based Reruns Yes (recovery, retries at all levels) Partial After removing output, idempotent UI 36 Yahoo! (now ASF) Average Good - Yahoo Confidential & Proprietary
  • 37. Roadmap The Next Release §  Scalability and performance improvements to handle higher loads §  More 1 and 5 min frequency jobs §  High Availability with Load Balancing §  Flexible Cron-Based Scheduling §  Handling cluster Rolling upgrades for Hadoop 2.0 37 Yahoo Confidential & Proprietary
  • 38. Q & A
  • 39. 39 Yahoo Confidential & Proprietary