● What is it ?
● Why use it ?
Oozie – What is it ?
● Work flow scheduler for Hadoop
● Manages Hadoop Jobs
● Integrated with many Hadoop apps i.e. Pig
● Schedule jobs
● A work flow is a collection of actions i.e.
– map/reduce, pig, hfs
● A work flow is
– Arranged as a DAG ( direct acyclic graph )
– Graph stored as hPDL ( XML process definition )
Oozie – Why use it ?
● It is designed for Hadoop
● It is open source
● It is designed for big data
● It allows you to design task work flow
● It allows you to interact with jobs
– Stop, start, suspend, resume, rerun
Oozie – Architecture
● Install Oozie on edge node / not on cluster
● Oozie has client
– Launches jobs and talks to server
● Ozzie has server
– Controls jobs
– Launches jobs
– Chained workflows
– Work flow output
– Is input to next
● Feel free to contact us at
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems