2. Apache Oozie
Apache ooize is a java web application used to
schedule Apache Hadoop jobs.
Oozie combines multiple jobs sequentially into
one logical unit of work.
It’s integrated with the Hadoop stack.
Is an server based work flow scheduling system
to Manage Hadoop jobs, It Supports,
3.
4. Three types of workflows
• Oozie workflow jobs
• Oozie Bundle
• Coordinator jobs
Oozie workflow jobs
Sequence of action to be executed.
Oozie Bundle
Package of multiple coordinator And
workflow jobs.
Coordinator jobs
workflow jobs triggered by time and date
availability.
5. • Users are permitted to create Directed Acyclic
Graphs of workflow which can be run in parallel
and sequentially in Hadoop.
• It consist of two parts:
workflow engine
coordinator engine
workflow engine
Responsibility of a workflow engine is to store
and run workflow composed of Hadoop jobs.
coordinator engine
It runs workflow jobs based on predefined
schedules and a availability of data.
6. • Ooize is scalable and can manage the timely
execution of thousands of workflow in a Hadoop
cluster.
• Ooize is very much flexible as well one can easily
start ,stop, suspend and rerun jobs.
• Ooize makes it very easy to return failed workflow.
7. How it is work
• Ooize workflow consists of Action Nodes and
Control Nodes.
An Action node represents a workflow jobs .
• Moving files into HDFS,running a map reduce, pig
or Hive jobs, importing data using sqoop or
running a Shell Script of a program written java.
Control node
• Controls the workflow execution between actions
by allowing contracts like conditional logic where
in different branches dependent action node
8. Types of node
Start Node
Designates the start of the workflow jobs.
End Node
Signals end of the job.
Error node
Designates the occurrences of an error and
corresponding error message to be printed.
9. Features of ooize
Using it’s web service APIs one control jobs
from anywhere.
Ooize has to send email notification upon
computation jobs
Oozie has provision to execute jobs which are
scheduled to run periodically
Using its Web Service APIs one can control
jobs from anywhere.
18. 9. The status of ooize can be checked from command line or the
web console
19. 10. To setup the ooize client, copy the client tar file to the
“ooize client” and the path in bashrc file .
20. Ooize workflow for IOT data analysis
Assuming that the data received from a machine has the following
structure.
21. The goal of the analysis is to find the counts of each
status/error code and produce an output with a structure
22. The ooize workflow comprising of hadoop streaming map
reduce job action and email action that notify the
success or failure of the job.
The map program parses the status/error code from each
line in the input and emits key-value pairs.
Where key is the status/error code and value is 1.
The reduce program receives the key-pairs emitted by
the map program aggregated by the same key.
Each key ,the reduce program calculates the count and
emitskey,value pairs where key is the status/error code
and the value is the count