ADVANCED OOZIE
Alexey Yakubovich
In Addition to
Boris Lublinsky, Kevin T.
Smith and Alexey
Yakubovich.
“Professional Hadoop
Solutions”
New cases
• Use cases
• Organizing all data processing steps: up-down
• Regular data injection
• Regular data transformation
• Regular report generation

• Extensions
• File movement on HDFS (synch. java action)
• Data transfer (synch - ftp, synch - ssh)
• Logging / monitoring (beyond Oozie console)
New & rediscovered Oozie features
• 1. JMS notifications (job life cycle, SLA)
http://oozie.apache.org/docs/4.0.0/DG_JMSNotifications.html

• 2. Overriding the launcher
https://github.com/yahoo/oozie/blob/master/examples/src/main/java/org
/apache/oozie/example/DemoPigMain.java
• Unit testing Oozie with MiniOozie
http://oozie.apache.org/docs/4.0.0/ENG_MiniOozie.html
JMS notifications
“Push” JMS notifications for action status, SLA met and SLA
miss
Needs “JMS broker” to interprets notifications
Apache ActiveMQ
Need “JMS notification configuration” in the oozie-site.xml:
oozie.services.ext
oozie.services.EventHandlerService…
oozie.jms.producer.connection.properties (topic)
Notification types
Job status: start, success, failure, suspended …
SLA: start| end| duration && met | miss
Message format: javax.jms.TextMessage with Oozie job specific
headers
Overriding the launcher (cross-cutting concerns)
• Regular Pig job launcher –
org.apache.oozie.action.hadoop.PigMain
Reminder: action executor provides all preparations for submitting
action as a hadoop job(s). In particularly the PigMain executor invokes
the Pig runtime on an Edge (Gateway) node.
public class SpecialPigExec extends PigMain() {
e.g. logging, external services (security,, transactions)
}
• Oozie workflow
<action name=“pig-special”>
<pig>
…
<property>
<name> oozie.launcher.action.main.class </name>
<value> … SpecialPigExec</value>
Unit testing Oozie with MiniOozie
• MiniOozieTestCase is a junit test class
• Allows to test workflow and coordinator applications
• Tests workflow directly from IDE (Eclipse for sure)

• Does not require access to cluster or running Oozie server
• Runs against the local file system
• Tested on Linux and Max OS X, configured with Maven (simple)
• Needs most (all) Oozie libraries

Action choice restricted:
java actions is straight forward.
others can be “simulated”
I can’t tell if possible to combine with PigUnit and Hive standalone
mode.

Advanced Oozie

  • 1.
  • 2.
    In Addition to BorisLublinsky, Kevin T. Smith and Alexey Yakubovich. “Professional Hadoop Solutions”
  • 3.
    New cases • Usecases • Organizing all data processing steps: up-down • Regular data injection • Regular data transformation • Regular report generation • Extensions • File movement on HDFS (synch. java action) • Data transfer (synch - ftp, synch - ssh) • Logging / monitoring (beyond Oozie console)
  • 4.
    New & rediscoveredOozie features • 1. JMS notifications (job life cycle, SLA) http://oozie.apache.org/docs/4.0.0/DG_JMSNotifications.html • 2. Overriding the launcher https://github.com/yahoo/oozie/blob/master/examples/src/main/java/org /apache/oozie/example/DemoPigMain.java • Unit testing Oozie with MiniOozie http://oozie.apache.org/docs/4.0.0/ENG_MiniOozie.html
  • 5.
    JMS notifications “Push” JMSnotifications for action status, SLA met and SLA miss Needs “JMS broker” to interprets notifications Apache ActiveMQ Need “JMS notification configuration” in the oozie-site.xml: oozie.services.ext oozie.services.EventHandlerService… oozie.jms.producer.connection.properties (topic) Notification types Job status: start, success, failure, suspended … SLA: start| end| duration && met | miss Message format: javax.jms.TextMessage with Oozie job specific headers
  • 6.
    Overriding the launcher(cross-cutting concerns) • Regular Pig job launcher – org.apache.oozie.action.hadoop.PigMain Reminder: action executor provides all preparations for submitting action as a hadoop job(s). In particularly the PigMain executor invokes the Pig runtime on an Edge (Gateway) node. public class SpecialPigExec extends PigMain() { e.g. logging, external services (security,, transactions) } • Oozie workflow <action name=“pig-special”> <pig> … <property> <name> oozie.launcher.action.main.class </name> <value> … SpecialPigExec</value>
  • 7.
    Unit testing Ooziewith MiniOozie • MiniOozieTestCase is a junit test class • Allows to test workflow and coordinator applications • Tests workflow directly from IDE (Eclipse for sure) • Does not require access to cluster or running Oozie server • Runs against the local file system • Tested on Linux and Max OS X, configured with Maven (simple) • Needs most (all) Oozie libraries Action choice restricted: java actions is straight forward. others can be “simulated” I can’t tell if possible to combine with PigUnit and Hive standalone mode.