SlideShare a Scribd company logo
1 of 45
Download to read offline
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
What is Oozie
Oozie is a workflow scheduler system to manage
Apache Hadoop jobs.
Its a system for running workflows of dependent jobs
Oozie Workflow jobs are Directed Acyclical Graphs
(DAGs) of actions.
Oozie is integrated with the rest of the Hadoop stack
supporting several types of Hadoop jobs out of the
box (such as Java map-reduce, Streaming map-reduce,
Pig, Hive, Sqoop and Distcp) as well as system specific
jobs (such as Java programs and shell scripts).
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Oozie Features
Designed to scale
Can manage the timely execution of thousands of
workflows in a Hadoop cluster
Makes rerunning failed workflows more tractable
Runs as a service in the cluster
Clients can submit workflow definitions for
immediate or later execution
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Oozie in Hadoop Eco-System
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Oozie Components
Composed of 2 parts:
• Workflow engine
 Stores and runs workflows composed of different types of
Hadoop jobs
• Coordinator engine
 Runs workflow jobs based on predefined schedules and data
availability
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Workflow
Workflow is a DAG(Directed Acyclic Graph) of
action nodes and control-flow nodes.
Action node
• performs a workflow task, such as moving files in HDFS,
running a MapReduce, Streaming, Pig, or Hive job
Control-flow node
• governs the workflow execution between actions
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Schedular
Oozie executes workflow based on:
• Time Dependency (Frequency)
• Data Dependency
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Oozie Server Setup
Oozie is distributed as two separate packages, a
client package (oozie-client) and a server package
(oozie).
We will install oozie server which also installs
oozie-client.
$ yum –y install oozie
When you install Oozie from an RPM, Oozie server
creates all configuration, documentation and
runtime files in the standard Unix directories, as
follows:
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Oozie Server Setup
Type of File Where installed
Binaries /usr/lib/oozie/
Configuration /etc/oozie/conf/
Documentation /user/share/doc/oozie/
Examples /user/share/doc/oozie/
Sharelib TAR.GZ /usr/lib/oozie/
Data /var/lib/oozie/
Logs /var/log/oozie/
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configuring Oozie to Use MySQL
Oozie needs a database to store all the workflow
job information
We will be configuring it to use Mysql as database
Step 1: Install and start MySQL 5.x
$ yum –y install mysql-server
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configuring Oozie to Use MySQL
Step 2: Create the Oozie database and Oozie
MySQL user
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configuring Oozie to Use MySQL
Step 3: Configure Oozie to use MySQL
• Edit properties in the oozie-site.xml file as follows:
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configuring Oozie to Use MySQL
Step 4: Add the MySQL JDBC driver JAR to Oozie
• $ ln -s /usr/share/java/mysql-connector-java.jar
/var/lib/oozie/mysql-connector-java.jar
Step 5:Creating the Oozie Database Schema
After configuring Oozie database information and
creating the corresponding database, create the
Oozie database schema. Oozie provides a database
tool for this purpose.
• $ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create –
run
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configuring Oozie to Use MySQL
You should see output such as the following:
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Enabling the Oozie Web Console
By default Oozie does not enable web console.
Following steps must be followed to enable it
Step 1: Download the Library
• $ wget http://dev.sencha.com/deploy/ext-2.2.zip
Step 2: Install the Library
• $ unzip ext-2.2.zip
• $ cp -r ext-2.2 /var/lib/oozie/
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Installing the Oozie ShareLib in HDFS
The Oozie installation bundles Oozie ShareLib, which
contains all of the necessary JARs to enable workflow
jobs to run streaming, DistCp, Pig, Hive, and Sqoop
actions.
ShareLib must be copied in the home directory of
oozie user in HDFS:
• $ sudo –u hdfs hadoop fs –mkdir /user/oozie
• $ sudo –u hdfs hadoop fs –chown oozie:oozie /user/oozie
• $ mkdir /tmp/ooziesharelib
• $ cd /tmp/ooziesharelib
• $ tar –xzf /user/lib/oozie/oozie-sharelib.tar.gz
• $ sudo –u oozie hadoop fs –put share /user/oozie/share
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Starting, Stopping, and Accessing the
Oozie Server
Starting the Oozie Server
• $ service oozie start
Stopping the Oozie Server
• $ service oozie stop
Accessing the Oozie Server with the Oozie Client
• The Oozie client is a command-line utility that interacts with the Oozie server
via the Oozie web-services API
• Use the /usr/bin/oozie script to run the Oozie client.
• For example, if you want to invoke the client on the same machine where the
Oozie server is running:
• $ oozie admin –oozie http://localhost:11000/oozie -status
– System mode: NORMAL
Accessing the Oozie Server with a Browser
• If you have enabled the Oozie web console by adding the ExtJS library, you can
connect to the console at
• http://localhost:11000/oozie
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Defining an Oozie Workflow
Workflow definitions are written in XML using the
Hadoop Process Definition Language
Consists of 2 components
• Control Node
 Start
 End
 Decision
 Fork
 Join
 Kill
• Action Node
 Map-reduce
 Pig, etc..
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Control Flow Nodes
Start Control Node
• The start node is the entry point for a workflow job
• It indicates the first workflow node the workflow job must
transition to
• When a workflow is started, it automatically transitions to the
node specified in the start
• A workflow definition must have one start node
Syntax
 <workflow-app name="[WF-DEF-NAME]"
xmlns="uri:oozie:workflow:0.1">
 ...
 <start to="[NODE-NAME]"/>
 ...
 </workflow-app>
The node
name(action) from
which the
workflow should
start
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Control Flow Nodes
End Control Node
• The end node is the end for a workflow job
• Indicates that the workflow job has completed successfully
• When a workflow job reaches the end it finishes successfully
• If one or more actions started by the workflow job are executing
when the end node is reached, the actions will be killed
• A workflow definition must have one end node.
Syntax
 <workflow-app name="[WF-DEF-NAME]"
xmlns="uri:oozie:workflow:0.1">
 ...
 <end name="[NODE-NAME]"/>
 ...
 </workflow-app>
The node
name(action) on
which the
workflow should
end
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Control Flow Nodes
Kill Control Node
• The kill node allows a workflow job to kill itself
• When a workflow job reaches the kill it finishes in error
• If one or more actions started by the workflow job are executing
when the kill node is reached, the actions will be killed
• A workflow definition may have zero or more kill nodes
Syntax
 <workflow-app name="[WF-DEF-NAME]"
xmlns="uri:oozie:workflow:0.1">
 ...
 <kill name="[NODE-NAME]">
 <message>[MESSAGE-TO-LOG]</message>
 </kill>
 ...
 </workflow-app>
If the workflow
execution reaches
this node the
workflow will be
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Control Flow Nodes
Decision Control Node
• Enables a workflow to make a selection on the execution path to
follow
• The behavior of a decision node can be seen as a switch-case
statement
• Predicates are evaluated in order or appearance until one of them
evaluates to true and the corresponding transition is taken
• If none of the predicates evaluates to true the default transition is
taken
Syntax
 <decision name="[NODE-NAME]">
 <switch> <case to="[NODE_NAME]">[PREDICATE]</case>
 ...
 <case to="[NODE_NAME]">[PREDICATE]</case>
 <default to="[NODE_NAME]"/>
 </switch> </decision>
Switch case to
decide between
the execution of
nodes
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Control Flow Nodes
Fork and Join Control Nodes
• A fork node splits one path of execution into multiple concurrent
paths of execution
• A join node waits until every concurrent execution path of a
previous fork node arrives to it
• The fork and join nodes must be used in pairs
• Actions at fork runs parallel
Syntax
 <fork name="[FORK-NODE-NAME]">
 <path start="[NODE-NAME]" />
 ...
 <path start="[NODE-NAME]" />
 </fork>
 <join name="[JOIN-NODE-NAME]" to="[NODE-NAME]" />
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Workflow Action Nodes
Action Basis
• Action Computation/Processing is always remote
• Actions are Asynchronous
• Actions have two transitions, ok and error
• Action Recovery
 Oozie provides recovery capabilities when starting or ending
actions
 Recovery strategies differ on the nature of failure
 For non-transient failures action is suspended
 For transient failures Oozie will perform retries after a fixed
time interval
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Workflow Action Nodes
Fs(HDFS) Action
• The fs action allows to manipulate files and directories in HDFS from a
workflow application
• The supported commands are move , delete and mkdir
• The FS commands are executed synchronously from within the FS action
• Syntax
 <action name="[NODE-NAME]">
 <fs>
 <delete path='[PATH]'/>
 ...
 <mkdir path='[PATH]'/>
 ...
 <move source='[SOURCE-PATH]' target='[TARGET-PATH]'/>
 </fs> <ok to="[NODE-NAME]"/>
 <error to="[NODE-NAME]"/>
 </action>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Workflow Action Nodes
Pig Action
• The pig action starts a Pig job
• The workflow job will wait until the pig job completes
before continuing to the next action
• The pig action has to be configured with the job-tracker,
name-node, pig script and the necessary parameters and
configuration to run the Pig job.
• The configuration properties are loaded in the following
order, job-xml and configuration , and later values override
earlier values.
• Hadoop mapred.job.tracker and fs.default.name properties
must not be present in the job-xml and inline configuration
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Workflow Action Nodes
Pig Action
• Syntax
 <pig>
 <job-tracker>[JOB-TRACKER]</job-tracker>
 <name-node>[NAME-NODE]</name-node>
 <prepare> <delete path="[PATH]"/>
 ... <mkdir path="[PATH]"/> ... </prepare>
 <job-xml>[JOB-XML-FILE]</job-xml>
necessary
configuration
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Workflow Action Nodes
 <configuration>
 <property>
 <name>[PROPERTY-NAME]</name> <value>[PROPERTY-
VALUE]</value>
 </property>
 ... </configuration>
 <script>[PIG-SCRIPT]</script>
 <param>[PARAM-VALUE]</param>
 ... <param>[PARAM-VALUE]</param>
 <argument>[ARGUMENT-VALUE]</argument>
 ... <argument>[ARGUMENT-VALUE]</argument>
 <file>[FILE-PATH]</file>
 ... <archive>[FILE-PATH]</archive>
 ... </pig>
Cluster
wide
configura
tion
Pig script, its
parameters
and arguments
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Oozie Job States
A workflow job can have be in any of the following states:
• PREP: When a workflow job is first created it will be in PREP state. The
workflow job is defined but it is not running.
• RUNNING: When a CREATED workflow job is started it goes
into RUNNING state, it will remain in RUNNING state while it does not
reach its end state, ends in error or it is suspended.
• SUSPENDED: A RUNNING workflow job can be suspended, it will
remain in SUSPENDED state until the workflow job is resumed or it is
killed.
• SUCCEEDED: When a RUNNING workflow job reaches the end node it
ends reaching the SUCCEEDED final state.
• KILLED: When a CREATED , RUNNING or SUSPENDED workflow job is
killed by an administrator or the owner via a request to Oozie the
workflow job ends reaching the KILLED final state.
• FAILED: When a RUNNING workflow job fails due to an unexpected
error it ends reaching the FAILED final state.
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Example
$ cp /usr/share/doc/oozie-3.3.2+49/oozie-
examples.tar.gz .
$ tar -xvf oozie-examples.tar.gz
$ hadoop fs -put examples/ .
$ cd examples/apps/pig/
$ oozie job --oozie http://localhost:11000/oozie
-config job.properties –run
$ oozie job -oozie http://localhost:11000/oozie
-info <job_id>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Understand the Example
Pig Script
• $ cat id.pig
 A = load '$INPUT' using PigStorage(':');
 B = foreach A generate $0 as id;
 store B into '$OUTPUT' USING PigStorage();
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Understand the Example
Workflow xml
• $ cat workflow.xml
 <workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">
 <start to="pig-node"/>
 <action name="pig-node">
 <pig>
 <job-tracker>${jobTracker}</job-tracker>
 <name-node>${nameNode}</name-node>
 <prepare>
 <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/pig"/>
 </prepare>
 <configuration>
 <property>
 <name>mapred.job.queue.name</name>
 <value>${queueName}</value>
 </property>
 <property>
 <name>mapred.compress.map.output</name>
 <value>true</value>
 </property>
 </configuration>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Understand the Example
 <script>id.pig</script>
 <param>INPUT=/user/${wf:user()}/${examplesRoot}/input-
data/text</param>
 <param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-
data/pig</param>
 </pig>
 <ok to="end"/>
 <error to="fail"/>
 </action>
 <kill name="fail">
 <message>Pig failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
 </kill>
 <end name="end"/>
 </workflow-app>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Understand the Example
$ cat job.properties
 nameNode=hdfs://localhost:8020
 jobTracker=localhost:8021
 queueName=default
 examplesRoot=examples
 oozie.use.system.libpath=true
 oozie.wf.application.path=${nameNode}/user/${user.name}/$
{examplesRoot}/apps/pig
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
$ cd /root/examples/apps/demo
$ cat workflow.xml
 <workflow-app xmlns="uri:oozie:workflow:0.2" name="demo-wf">
 <start to="cleanup-node"/>
 <action name="cleanup-node">
 <fs>
 <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-
data/demo"/>
 </fs>
 <ok to="fork-node"/>
 <error to="fail"/>
 </action>
 <fork name="fork-node">
 <path start="pig-node"/>
 <path start="streaming-node"/>
 </fork>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <action name="pig-node">
 <pig>
 <job-tracker>${jobTracker}</job-tracker>
 <name-node>${nameNode}</name-node>
 <prepare>
 <delete
path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-
data/demo/pig-node"/>
 </prepare>
 <configuration>
 <property>
 <name>mapred.job.queue.name</name>
 <value>${queueName}</value>
 </property>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <property>
 <name>mapred.map.output.compress</name>
 <value>false</value>
 </property>
 </configuration>
 <script>id.pig</script>
 <param>INPUT=/user/${wf:user()}/${examplesRoot}/input-
data/text</param>

<param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-
data/demo/pig-node</param>
 </pig>
 <ok to="join-node"/>
 <error to="fail"/>
 </action>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <action name="streaming-node">
 <map-reduce>
 <job-tracker>${jobTracker}</job-tracker>
 <name-node>${nameNode}</name-node>
 <prepare>
 <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-
data/demo/streaming-node"/>
 </prepare>
 <streaming>
 <mapper>/bin/cat</mapper>
 <reducer>/usr/bin/wc</reducer>
 </streaming>
 <configuration>
 <property>
 <name>mapred.job.queue.name</name>
 <value>${queueName}</value>
 </property>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <property>
 <name>mapred.input.dir</name>
 <value>/user/${wf:user()}/${examplesRoot}/input-
data/text</value>
 </property>
 <property>
 <name>mapred.output.dir</name>
 <value>/user/${wf:user()}/${examplesRoot}/output-
data/demo/streaming-node</value>
 </property>
 </configuration>
 </map-reduce>
 <ok to="join-node"/>
 <error to="fail"/>
 </action>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <join name="join-node" to="mr-node"/>
 <action name="mr-node">
 <map-reduce>
 <job-tracker>${jobTracker}</job-tracker>
 <name-node>${nameNode}</name-node>
 <prepare>
 <delete
path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-
data/demo/mr-node"/>
 </prepare>
 <configuration>
 <property>
 <name>mapred.job.queue.name</name>
 <value>${queueName}</value>
 </property>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <property>
 <name>mapred.mapper.class</name>

<value>org.apache.oozie.example.DemoMapper</value>
 </property>
 <property>
 <name>mapred.mapoutput.key.class</name>
 <value>org.apache.hadoop.io.Text</value>
 </property>
 <property>
 <name>mapred.mapoutput.value.class</name>
 <value>org.apache.hadoop.io.IntWritable</value>
 </property>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <property>
 <name>mapred.reducer.class</name>
 <value>org.apache.oozie.example.DemoReducer</value>
 </property>
 <property>
 <name>mapred.map.tasks</name>
 <value>1</value>
 </property>
 <property>
 <name>mapred.input.dir</name>
 <value>/user/${wf:user()}/${examplesRoot}/output-data/demo/pig-
node,/user/${wf:user()}/${examplesRoot}/output-data/demo/streaming-node</value>
 </property>
 <property>
 <name>mapred.output.dir</name>
 <value>/user/${wf:user()}/${examplesRoot}/output-data/demo/mr-node</value>
 </property>
 </configuration>
 </map-reduce>
 <ok to="decision-node"/>
 <error to="fail"/>
 </action>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <decision name="decision-node">
 <switch>
 <case to="hdfs-
node">${fs:exists(concat(concat(concat(concat(concat(name
Node, '/user/'), wf:user()), '/'), examplesRoot), '/output-
data/demo/mr-node')) == "true"}</case>
 <default to="end"/>
 </switch>
 </decision>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <action name="hdfs-node">
 <fs>
 <move source="${nameNode}/user/${wf:user()}/${examplesRoot}/output-
data/demo/mr-node"
 target="/user/${wf:user()}/${examplesRoot}/output-data/demo/final-
data"/>
 </fs>
 <ok to="end"/>
 <error to="fail"/>
 </action>
 <kill name="fail">
 <message>Demo workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
 </kill>
 <end name="end"/>
 </workflow-app>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
• At the end of the Job Completion you will see
something like this:

More Related Content

Similar to oozieee.pdf

Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Alfredo Krieg
 
2016_1201_gangler_ppt
2016_1201_gangler_ppt2016_1201_gangler_ppt
2016_1201_gangler_ppt
Secure-24
 

Similar to oozieee.pdf (20)

Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
 
24HOP Introduction to Linux for SQL Server DBAs
24HOP Introduction to Linux for SQL Server DBAs24HOP Introduction to Linux for SQL Server DBAs
24HOP Introduction to Linux for SQL Server DBAs
 
Omaha (Google Update) server
Omaha (Google Update) serverOmaha (Google Update) server
Omaha (Google Update) server
 
Oozie @ Riot Games
Oozie @ Riot GamesOozie @ Riot Games
Oozie @ Riot Games
 
DevOps for database
DevOps for databaseDevOps for database
DevOps for database
 
Instant hadoop of your own
Instant hadoop of your ownInstant hadoop of your own
Instant hadoop of your own
 
October 2013 HUG: Oozie 4.x
October 2013 HUG: Oozie 4.xOctober 2013 HUG: Oozie 4.x
October 2013 HUG: Oozie 4.x
 
Osquery
OsqueryOsquery
Osquery
 
Real World Lessons on the Pain Points of Node.js Applications
Real World Lessons on the Pain Points of Node.js ApplicationsReal World Lessons on the Pain Points of Node.js Applications
Real World Lessons on the Pain Points of Node.js Applications
 
Osgi
OsgiOsgi
Osgi
 
Oracle Fusion Middleware provisioning with Puppet
Oracle Fusion Middleware provisioning with PuppetOracle Fusion Middleware provisioning with Puppet
Oracle Fusion Middleware provisioning with Puppet
 
Monitoring Oracle Databases with Opsview
Monitoring Oracle Databases with OpsviewMonitoring Oracle Databases with Opsview
Monitoring Oracle Databases with Opsview
 
SDN Onboarding: Open vSwitch CLIs, OpenDaylight
SDN Onboarding: Open vSwitch CLIs, OpenDaylightSDN Onboarding: Open vSwitch CLIs, OpenDaylight
SDN Onboarding: Open vSwitch CLIs, OpenDaylight
 
AWS Application Migration Service-Hands-On Guide
AWS Application Migration Service-Hands-On GuideAWS Application Migration Service-Hands-On Guide
AWS Application Migration Service-Hands-On Guide
 
[HES2013] Virtually secure, analysis to remote root 0day on an industry leadi...
[HES2013] Virtually secure, analysis to remote root 0day on an industry leadi...[HES2013] Virtually secure, analysis to remote root 0day on an industry leadi...
[HES2013] Virtually secure, analysis to remote root 0day on an industry leadi...
 
Making Sense out of Amazon ECS
Making Sense out of Amazon ECSMaking Sense out of Amazon ECS
Making Sense out of Amazon ECS
 
Case Study: Plus Retail - Moving from the Old World to the New World
Case Study: Plus Retail - Moving from the Old World to the New WorldCase Study: Plus Retail - Moving from the Old World to the New World
Case Study: Plus Retail - Moving from the Old World to the New World
 
Tampere Technical University - Seminar Presentation in testind day 2016 - Sca...
Tampere Technical University - Seminar Presentation in testind day 2016 - Sca...Tampere Technical University - Seminar Presentation in testind day 2016 - Sca...
Tampere Technical University - Seminar Presentation in testind day 2016 - Sca...
 
Splunk: Forward me the REST of those shells
Splunk: Forward me the REST of those shellsSplunk: Forward me the REST of those shells
Splunk: Forward me the REST of those shells
 
2016_1201_gangler_ppt
2016_1201_gangler_ppt2016_1201_gangler_ppt
2016_1201_gangler_ppt
 

Recently uploaded

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
LuisMiguelPaz5
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh +966572737505 get cytotec
 

Recently uploaded (20)

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
ℂall Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class ℂall Girl Serviℂe...
ℂall Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class ℂall Girl Serviℂe...ℂall Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class ℂall Girl Serviℂe...
ℂall Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class ℂall Girl Serviℂe...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
 

oozieee.pdf

  • 1. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 What is Oozie Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Its a system for running workflows of dependent jobs Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
  • 2. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Oozie Features Designed to scale Can manage the timely execution of thousands of workflows in a Hadoop cluster Makes rerunning failed workflows more tractable Runs as a service in the cluster Clients can submit workflow definitions for immediate or later execution
  • 3. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Oozie in Hadoop Eco-System
  • 4. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Oozie Components Composed of 2 parts: • Workflow engine  Stores and runs workflows composed of different types of Hadoop jobs • Coordinator engine  Runs workflow jobs based on predefined schedules and data availability
  • 5. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Workflow Workflow is a DAG(Directed Acyclic Graph) of action nodes and control-flow nodes. Action node • performs a workflow task, such as moving files in HDFS, running a MapReduce, Streaming, Pig, or Hive job Control-flow node • governs the workflow execution between actions
  • 6. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Schedular Oozie executes workflow based on: • Time Dependency (Frequency) • Data Dependency
  • 7. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Oozie Server Setup Oozie is distributed as two separate packages, a client package (oozie-client) and a server package (oozie). We will install oozie server which also installs oozie-client. $ yum –y install oozie When you install Oozie from an RPM, Oozie server creates all configuration, documentation and runtime files in the standard Unix directories, as follows:
  • 8. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Oozie Server Setup Type of File Where installed Binaries /usr/lib/oozie/ Configuration /etc/oozie/conf/ Documentation /user/share/doc/oozie/ Examples /user/share/doc/oozie/ Sharelib TAR.GZ /usr/lib/oozie/ Data /var/lib/oozie/ Logs /var/log/oozie/
  • 9. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configuring Oozie to Use MySQL Oozie needs a database to store all the workflow job information We will be configuring it to use Mysql as database Step 1: Install and start MySQL 5.x $ yum –y install mysql-server
  • 10. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configuring Oozie to Use MySQL Step 2: Create the Oozie database and Oozie MySQL user
  • 11. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configuring Oozie to Use MySQL Step 3: Configure Oozie to use MySQL • Edit properties in the oozie-site.xml file as follows:
  • 12. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configuring Oozie to Use MySQL Step 4: Add the MySQL JDBC driver JAR to Oozie • $ ln -s /usr/share/java/mysql-connector-java.jar /var/lib/oozie/mysql-connector-java.jar Step 5:Creating the Oozie Database Schema After configuring Oozie database information and creating the corresponding database, create the Oozie database schema. Oozie provides a database tool for this purpose. • $ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create – run
  • 13. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configuring Oozie to Use MySQL You should see output such as the following:
  • 14. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Enabling the Oozie Web Console By default Oozie does not enable web console. Following steps must be followed to enable it Step 1: Download the Library • $ wget http://dev.sencha.com/deploy/ext-2.2.zip Step 2: Install the Library • $ unzip ext-2.2.zip • $ cp -r ext-2.2 /var/lib/oozie/
  • 15. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Installing the Oozie ShareLib in HDFS The Oozie installation bundles Oozie ShareLib, which contains all of the necessary JARs to enable workflow jobs to run streaming, DistCp, Pig, Hive, and Sqoop actions. ShareLib must be copied in the home directory of oozie user in HDFS: • $ sudo –u hdfs hadoop fs –mkdir /user/oozie • $ sudo –u hdfs hadoop fs –chown oozie:oozie /user/oozie • $ mkdir /tmp/ooziesharelib • $ cd /tmp/ooziesharelib • $ tar –xzf /user/lib/oozie/oozie-sharelib.tar.gz • $ sudo –u oozie hadoop fs –put share /user/oozie/share
  • 16. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Starting, Stopping, and Accessing the Oozie Server Starting the Oozie Server • $ service oozie start Stopping the Oozie Server • $ service oozie stop Accessing the Oozie Server with the Oozie Client • The Oozie client is a command-line utility that interacts with the Oozie server via the Oozie web-services API • Use the /usr/bin/oozie script to run the Oozie client. • For example, if you want to invoke the client on the same machine where the Oozie server is running: • $ oozie admin –oozie http://localhost:11000/oozie -status – System mode: NORMAL Accessing the Oozie Server with a Browser • If you have enabled the Oozie web console by adding the ExtJS library, you can connect to the console at • http://localhost:11000/oozie
  • 17. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Defining an Oozie Workflow Workflow definitions are written in XML using the Hadoop Process Definition Language Consists of 2 components • Control Node  Start  End  Decision  Fork  Join  Kill • Action Node  Map-reduce  Pig, etc..
  • 18. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Control Flow Nodes Start Control Node • The start node is the entry point for a workflow job • It indicates the first workflow node the workflow job must transition to • When a workflow is started, it automatically transitions to the node specified in the start • A workflow definition must have one start node Syntax  <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">  ...  <start to="[NODE-NAME]"/>  ...  </workflow-app> The node name(action) from which the workflow should start
  • 19. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Control Flow Nodes End Control Node • The end node is the end for a workflow job • Indicates that the workflow job has completed successfully • When a workflow job reaches the end it finishes successfully • If one or more actions started by the workflow job are executing when the end node is reached, the actions will be killed • A workflow definition must have one end node. Syntax  <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">  ...  <end name="[NODE-NAME]"/>  ...  </workflow-app> The node name(action) on which the workflow should end
  • 20. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Control Flow Nodes Kill Control Node • The kill node allows a workflow job to kill itself • When a workflow job reaches the kill it finishes in error • If one or more actions started by the workflow job are executing when the kill node is reached, the actions will be killed • A workflow definition may have zero or more kill nodes Syntax  <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">  ...  <kill name="[NODE-NAME]">  <message>[MESSAGE-TO-LOG]</message>  </kill>  ...  </workflow-app> If the workflow execution reaches this node the workflow will be
  • 21. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Control Flow Nodes Decision Control Node • Enables a workflow to make a selection on the execution path to follow • The behavior of a decision node can be seen as a switch-case statement • Predicates are evaluated in order or appearance until one of them evaluates to true and the corresponding transition is taken • If none of the predicates evaluates to true the default transition is taken Syntax  <decision name="[NODE-NAME]">  <switch> <case to="[NODE_NAME]">[PREDICATE]</case>  ...  <case to="[NODE_NAME]">[PREDICATE]</case>  <default to="[NODE_NAME]"/>  </switch> </decision> Switch case to decide between the execution of nodes
  • 22. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Control Flow Nodes Fork and Join Control Nodes • A fork node splits one path of execution into multiple concurrent paths of execution • A join node waits until every concurrent execution path of a previous fork node arrives to it • The fork and join nodes must be used in pairs • Actions at fork runs parallel Syntax  <fork name="[FORK-NODE-NAME]">  <path start="[NODE-NAME]" />  ...  <path start="[NODE-NAME]" />  </fork>  <join name="[JOIN-NODE-NAME]" to="[NODE-NAME]" />
  • 23. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Workflow Action Nodes Action Basis • Action Computation/Processing is always remote • Actions are Asynchronous • Actions have two transitions, ok and error • Action Recovery  Oozie provides recovery capabilities when starting or ending actions  Recovery strategies differ on the nature of failure  For non-transient failures action is suspended  For transient failures Oozie will perform retries after a fixed time interval
  • 24. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Workflow Action Nodes Fs(HDFS) Action • The fs action allows to manipulate files and directories in HDFS from a workflow application • The supported commands are move , delete and mkdir • The FS commands are executed synchronously from within the FS action • Syntax  <action name="[NODE-NAME]">  <fs>  <delete path='[PATH]'/>  ...  <mkdir path='[PATH]'/>  ...  <move source='[SOURCE-PATH]' target='[TARGET-PATH]'/>  </fs> <ok to="[NODE-NAME]"/>  <error to="[NODE-NAME]"/>  </action>
  • 25. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Workflow Action Nodes Pig Action • The pig action starts a Pig job • The workflow job will wait until the pig job completes before continuing to the next action • The pig action has to be configured with the job-tracker, name-node, pig script and the necessary parameters and configuration to run the Pig job. • The configuration properties are loaded in the following order, job-xml and configuration , and later values override earlier values. • Hadoop mapred.job.tracker and fs.default.name properties must not be present in the job-xml and inline configuration
  • 26. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Workflow Action Nodes Pig Action • Syntax  <pig>  <job-tracker>[JOB-TRACKER]</job-tracker>  <name-node>[NAME-NODE]</name-node>  <prepare> <delete path="[PATH]"/>  ... <mkdir path="[PATH]"/> ... </prepare>  <job-xml>[JOB-XML-FILE]</job-xml> necessary configuration
  • 27. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Workflow Action Nodes  <configuration>  <property>  <name>[PROPERTY-NAME]</name> <value>[PROPERTY- VALUE]</value>  </property>  ... </configuration>  <script>[PIG-SCRIPT]</script>  <param>[PARAM-VALUE]</param>  ... <param>[PARAM-VALUE]</param>  <argument>[ARGUMENT-VALUE]</argument>  ... <argument>[ARGUMENT-VALUE]</argument>  <file>[FILE-PATH]</file>  ... <archive>[FILE-PATH]</archive>  ... </pig> Cluster wide configura tion Pig script, its parameters and arguments
  • 28. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Oozie Job States A workflow job can have be in any of the following states: • PREP: When a workflow job is first created it will be in PREP state. The workflow job is defined but it is not running. • RUNNING: When a CREATED workflow job is started it goes into RUNNING state, it will remain in RUNNING state while it does not reach its end state, ends in error or it is suspended. • SUSPENDED: A RUNNING workflow job can be suspended, it will remain in SUSPENDED state until the workflow job is resumed or it is killed. • SUCCEEDED: When a RUNNING workflow job reaches the end node it ends reaching the SUCCEEDED final state. • KILLED: When a CREATED , RUNNING or SUSPENDED workflow job is killed by an administrator or the owner via a request to Oozie the workflow job ends reaching the KILLED final state. • FAILED: When a RUNNING workflow job fails due to an unexpected error it ends reaching the FAILED final state.
  • 29. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Example $ cp /usr/share/doc/oozie-3.3.2+49/oozie- examples.tar.gz . $ tar -xvf oozie-examples.tar.gz $ hadoop fs -put examples/ . $ cd examples/apps/pig/ $ oozie job --oozie http://localhost:11000/oozie -config job.properties –run $ oozie job -oozie http://localhost:11000/oozie -info <job_id>
  • 30. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Understand the Example Pig Script • $ cat id.pig  A = load '$INPUT' using PigStorage(':');  B = foreach A generate $0 as id;  store B into '$OUTPUT' USING PigStorage();
  • 31. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Understand the Example Workflow xml • $ cat workflow.xml  <workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">  <start to="pig-node"/>  <action name="pig-node">  <pig>  <job-tracker>${jobTracker}</job-tracker>  <name-node>${nameNode}</name-node>  <prepare>  <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/pig"/>  </prepare>  <configuration>  <property>  <name>mapred.job.queue.name</name>  <value>${queueName}</value>  </property>  <property>  <name>mapred.compress.map.output</name>  <value>true</value>  </property>  </configuration>
  • 32. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Understand the Example  <script>id.pig</script>  <param>INPUT=/user/${wf:user()}/${examplesRoot}/input- data/text</param>  <param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output- data/pig</param>  </pig>  <ok to="end"/>  <error to="fail"/>  </action>  <kill name="fail">  <message>Pig failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>  </kill>  <end name="end"/>  </workflow-app>
  • 33. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Understand the Example $ cat job.properties  nameNode=hdfs://localhost:8020  jobTracker=localhost:8021  queueName=default  examplesRoot=examples  oozie.use.system.libpath=true  oozie.wf.application.path=${nameNode}/user/${user.name}/$ {examplesRoot}/apps/pig
  • 34. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job
  • 35. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job $ cd /root/examples/apps/demo $ cat workflow.xml  <workflow-app xmlns="uri:oozie:workflow:0.2" name="demo-wf">  <start to="cleanup-node"/>  <action name="cleanup-node">  <fs>  <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output- data/demo"/>  </fs>  <ok to="fork-node"/>  <error to="fail"/>  </action>  <fork name="fork-node">  <path start="pig-node"/>  <path start="streaming-node"/>  </fork>
  • 36. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <action name="pig-node">  <pig>  <job-tracker>${jobTracker}</job-tracker>  <name-node>${nameNode}</name-node>  <prepare>  <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output- data/demo/pig-node"/>  </prepare>  <configuration>  <property>  <name>mapred.job.queue.name</name>  <value>${queueName}</value>  </property>
  • 37. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <property>  <name>mapred.map.output.compress</name>  <value>false</value>  </property>  </configuration>  <script>id.pig</script>  <param>INPUT=/user/${wf:user()}/${examplesRoot}/input- data/text</param>  <param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output- data/demo/pig-node</param>  </pig>  <ok to="join-node"/>  <error to="fail"/>  </action>
  • 38. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <action name="streaming-node">  <map-reduce>  <job-tracker>${jobTracker}</job-tracker>  <name-node>${nameNode}</name-node>  <prepare>  <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output- data/demo/streaming-node"/>  </prepare>  <streaming>  <mapper>/bin/cat</mapper>  <reducer>/usr/bin/wc</reducer>  </streaming>  <configuration>  <property>  <name>mapred.job.queue.name</name>  <value>${queueName}</value>  </property>
  • 39. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <property>  <name>mapred.input.dir</name>  <value>/user/${wf:user()}/${examplesRoot}/input- data/text</value>  </property>  <property>  <name>mapred.output.dir</name>  <value>/user/${wf:user()}/${examplesRoot}/output- data/demo/streaming-node</value>  </property>  </configuration>  </map-reduce>  <ok to="join-node"/>  <error to="fail"/>  </action>
  • 40. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <join name="join-node" to="mr-node"/>  <action name="mr-node">  <map-reduce>  <job-tracker>${jobTracker}</job-tracker>  <name-node>${nameNode}</name-node>  <prepare>  <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output- data/demo/mr-node"/>  </prepare>  <configuration>  <property>  <name>mapred.job.queue.name</name>  <value>${queueName}</value>  </property>
  • 41. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <property>  <name>mapred.mapper.class</name>  <value>org.apache.oozie.example.DemoMapper</value>  </property>  <property>  <name>mapred.mapoutput.key.class</name>  <value>org.apache.hadoop.io.Text</value>  </property>  <property>  <name>mapred.mapoutput.value.class</name>  <value>org.apache.hadoop.io.IntWritable</value>  </property>
  • 42. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <property>  <name>mapred.reducer.class</name>  <value>org.apache.oozie.example.DemoReducer</value>  </property>  <property>  <name>mapred.map.tasks</name>  <value>1</value>  </property>  <property>  <name>mapred.input.dir</name>  <value>/user/${wf:user()}/${examplesRoot}/output-data/demo/pig- node,/user/${wf:user()}/${examplesRoot}/output-data/demo/streaming-node</value>  </property>  <property>  <name>mapred.output.dir</name>  <value>/user/${wf:user()}/${examplesRoot}/output-data/demo/mr-node</value>  </property>  </configuration>  </map-reduce>  <ok to="decision-node"/>  <error to="fail"/>  </action>
  • 43. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <decision name="decision-node">  <switch>  <case to="hdfs- node">${fs:exists(concat(concat(concat(concat(concat(name Node, '/user/'), wf:user()), '/'), examplesRoot), '/output- data/demo/mr-node')) == "true"}</case>  <default to="end"/>  </switch>  </decision>
  • 44. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <action name="hdfs-node">  <fs>  <move source="${nameNode}/user/${wf:user()}/${examplesRoot}/output- data/demo/mr-node"  target="/user/${wf:user()}/${examplesRoot}/output-data/demo/final- data"/>  </fs>  <ok to="end"/>  <error to="fail"/>  </action>  <kill name="fail">  <message>Demo workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>  </kill>  <end name="end"/>  </workflow-app>
  • 45. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job • At the end of the Job Completion you will see something like this: