SlideShare a Scribd company logo
ADVANCED OOZIE
Alexey Yakubovich
In Addition to
Boris Lublinsky, Kevin T.
Smith and Alexey
Yakubovich.
“Professional Hadoop
Solutions”
New cases
• Use cases
• Organizing all data processing steps: up-down
• Regular data injection
• Regular data transformation
• Regular report generation

• Extensions
• File movement on HDFS (synch. java action)
• Data transfer (synch - ftp, synch - ssh)
• Logging / monitoring (beyond Oozie console)
New & rediscovered Oozie features
• 1. JMS notifications (job life cycle, SLA)
http://oozie.apache.org/docs/4.0.0/DG_JMSNotifications.html

• 2. Overriding the launcher
https://github.com/yahoo/oozie/blob/master/examples/src/main/java/org
/apache/oozie/example/DemoPigMain.java
• Unit testing Oozie with MiniOozie
http://oozie.apache.org/docs/4.0.0/ENG_MiniOozie.html
JMS notifications
“Push” JMS notifications for action status, SLA met and SLA
miss
Needs “JMS broker” to interprets notifications
Apache ActiveMQ
Need “JMS notification configuration” in the oozie-site.xml:
oozie.services.ext
oozie.services.EventHandlerService…
oozie.jms.producer.connection.properties (topic)
Notification types
Job status: start, success, failure, suspended …
SLA: start| end| duration && met | miss
Message format: javax.jms.TextMessage with Oozie job specific
headers
Overriding the launcher (cross-cutting concerns)
• Regular Pig job launcher –
org.apache.oozie.action.hadoop.PigMain
Reminder: action executor provides all preparations for submitting
action as a hadoop job(s). In particularly the PigMain executor invokes
the Pig runtime on an Edge (Gateway) node.
public class SpecialPigExec extends PigMain() {
e.g. logging, external services (security,, transactions)
}
• Oozie workflow
<action name=“pig-special”>
<pig>
…
<property>
<name> oozie.launcher.action.main.class </name>
<value> … SpecialPigExec</value>
Unit testing Oozie with MiniOozie
• MiniOozieTestCase is a junit test class
• Allows to test workflow and coordinator applications
• Tests workflow directly from IDE (Eclipse for sure)

• Does not require access to cluster or running Oozie server
• Runs against the local file system
• Tested on Linux and Max OS X, configured with Maven (simple)
• Needs most (all) Oozie libraries

Action choice restricted:
java actions is straight forward.
others can be “simulated”
I can’t tell if possible to combine with PigUnit and Hive standalone
mode.

More Related Content

What's hot

May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopMay 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopYahoo Developer Network
 
Oozie HUG May12
Oozie HUG May12Oozie HUG May12
Oozie HUG May12mislam77
 
Oozie @ Riot Games
Oozie @ Riot GamesOozie @ Riot Games
Oozie @ Riot GamesMatt Goeke
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieShareThis
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Oozie Summit 2011
Oozie Summit 2011Oozie Summit 2011
Oozie Summit 2011mislam77
 
Boulder dev ops-meetup-11-2012-rundeck
Boulder dev ops-meetup-11-2012-rundeckBoulder dev ops-meetup-11-2012-rundeck
Boulder dev ops-meetup-11-2012-rundeckWill Sterling
 
Dmp hadoop getting_start
Dmp hadoop getting_startDmp hadoop getting_start
Dmp hadoop getting_startGim GyungJin
 
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜Michitoshi Yoshida
 
RESTFul development with Apache sling
RESTFul development with Apache slingRESTFul development with Apache sling
RESTFul development with Apache slingSergii Fesenko
 
Database Change Management as a Service
Database Change Management as a ServiceDatabase Change Management as a Service
Database Change Management as a ServiceAndrew Solomon
 
Database migration with flyway
Database migration  with flywayDatabase migration  with flyway
Database migration with flywayJonathan Holloway
 
Getting started with agile database migrations for java flywaydb
Getting started with agile database migrations for java flywaydbGetting started with agile database migrations for java flywaydb
Getting started with agile database migrations for java flywaydbGirish Bapat
 

What's hot (20)

Oozie meetup - HA
Oozie meetup - HAOozie meetup - HA
Oozie meetup - HA
 
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopMay 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
 
Oozie HUG May12
Oozie HUG May12Oozie HUG May12
Oozie HUG May12
 
Oozie @ Riot Games
Oozie @ Riot GamesOozie @ Riot Games
Oozie @ Riot Games
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
 
October 2014 HUG : Oozie HA
October 2014 HUG : Oozie HAOctober 2014 HUG : Oozie HA
October 2014 HUG : Oozie HA
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
Oozie Summit 2011
Oozie Summit 2011Oozie Summit 2011
Oozie Summit 2011
 
Boulder dev ops-meetup-11-2012-rundeck
Boulder dev ops-meetup-11-2012-rundeckBoulder dev ops-meetup-11-2012-rundeck
Boulder dev ops-meetup-11-2012-rundeck
 
JavaCro'14 - Continuous deployment tool – Aleksandar Dostić and Emir Džaferović
JavaCro'14 - Continuous deployment tool – Aleksandar Dostić and Emir DžaferovićJavaCro'14 - Continuous deployment tool – Aleksandar Dostić and Emir Džaferović
JavaCro'14 - Continuous deployment tool – Aleksandar Dostić and Emir Džaferović
 
Dmp hadoop getting_start
Dmp hadoop getting_startDmp hadoop getting_start
Dmp hadoop getting_start
 
Chef
ChefChef
Chef
 
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
 
LiquiBase
LiquiBaseLiquiBase
LiquiBase
 
Gradle - Build System
Gradle - Build SystemGradle - Build System
Gradle - Build System
 
RESTFul development with Apache sling
RESTFul development with Apache slingRESTFul development with Apache sling
RESTFul development with Apache sling
 
Database Change Management as a Service
Database Change Management as a ServiceDatabase Change Management as a Service
Database Change Management as a Service
 
Database migration with flyway
Database migration  with flywayDatabase migration  with flyway
Database migration with flyway
 
Getting started with agile database migrations for java flywaydb
Getting started with agile database migrations for java flywaydbGetting started with agile database migrations for java flywaydb
Getting started with agile database migrations for java flywaydb
 

Viewers also liked

July 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification ProcessJuly 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification ProcessYahoo Developer Network
 
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NYahoo Developer Network
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieDataWorks Summit/Hadoop Summit
 
A Basic Hive Inspection
A Basic Hive InspectionA Basic Hive Inspection
A Basic Hive InspectionLinda Tillman
 
처음 접하는 Oozie Workflow, Coordinator
처음 접하는 Oozie Workflow, Coordinator처음 접하는 Oozie Workflow, Coordinator
처음 접하는 Oozie Workflow, CoordinatorKim Log
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieYahoo Developer Network
 

Viewers also liked (9)

July 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification ProcessJuly 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification Process
 
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
 
HBase 훑어보기
HBase 훑어보기HBase 훑어보기
HBase 훑어보기
 
A Basic Hive Inspection
A Basic Hive InspectionA Basic Hive Inspection
A Basic Hive Inspection
 
처음 접하는 Oozie Workflow, Coordinator
처음 접하는 Oozie Workflow, Coordinator처음 접하는 Oozie Workflow, Coordinator
처음 접하는 Oozie Workflow, Coordinator
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache Oozie
 

Similar to Advanced Oozie

Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
Hbase coprocessor with Oozie WF referencing 3rd Party jars
Hbase coprocessor with Oozie WF referencing 3rd Party jarsHbase coprocessor with Oozie WF referencing 3rd Party jars
Hbase coprocessor with Oozie WF referencing 3rd Party jarsJinith Joseph
 
oozieee.pdf
oozieee.pdfoozieee.pdf
oozieee.pdfwwww63
 
The Practice of Alluxio in JD.com
The Practice of Alluxio in JD.comThe Practice of Alluxio in JD.com
The Practice of Alluxio in JD.comAlluxio, Inc.
 
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoAlluxio, Inc.
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Monitoring OSGi Applications with the Web Console
Monitoring OSGi Applications with the Web ConsoleMonitoring OSGi Applications with the Web Console
Monitoring OSGi Applications with the Web ConsoleAdobe
 
Monitoring OSGi Applications with the Web Console
Monitoring OSGi Applications with the Web ConsoleMonitoring OSGi Applications with the Web Console
Monitoring OSGi Applications with the Web ConsoleCarsten Ziegeler
 
Monitoring OSGi Applications with the Web Console - Carsten Ziegeler
Monitoring OSGi Applications with the Web Console - Carsten ZiegelerMonitoring OSGi Applications with the Web Console - Carsten Ziegeler
Monitoring OSGi Applications with the Web Console - Carsten Ziegelermfrancis
 
Case Study: Plus Retail - Moving from the Old World to the New World
Case Study: Plus Retail - Moving from the Old World to the New WorldCase Study: Plus Retail - Moving from the Old World to the New World
Case Study: Plus Retail - Moving from the Old World to the New WorldForgeRock
 
Overview of PaaS: Java experience
Overview of PaaS: Java experienceOverview of PaaS: Java experience
Overview of PaaS: Java experienceIgor Anishchenko
 
Overview of PaaS: Java experience
Overview of PaaS: Java experienceOverview of PaaS: Java experience
Overview of PaaS: Java experienceAlex Tumanoff
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeIan Lumb
 
WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.
WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.
WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.Dimitris Andreadis
 
How to upgrade OJS 2 to OJS 3 latest
How to upgrade OJS 2 to OJS 3 latestHow to upgrade OJS 2 to OJS 3 latest
How to upgrade OJS 2 to OJS 3 latestOpenjournaltheme
 
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopDataWorks Summit
 
WildFly AppServer - State of the Union
WildFly AppServer - State of the UnionWildFly AppServer - State of the Union
WildFly AppServer - State of the UnionDimitris Andreadis
 
Node.js in a heterogeneous system
Node.js in a heterogeneous systemNode.js in a heterogeneous system
Node.js in a heterogeneous systemGeeksLab Odessa
 
Java EE 6, Eclipse, GlassFish @EclipseCon 2010
Java EE 6, Eclipse, GlassFish @EclipseCon 2010Java EE 6, Eclipse, GlassFish @EclipseCon 2010
Java EE 6, Eclipse, GlassFish @EclipseCon 2010Ludovic Champenois
 
Devtest: using Lean and Devops practices to bring QA and coders together by L...
Devtest: using Lean and Devops practices to bring QA and coders together by L...Devtest: using Lean and Devops practices to bring QA and coders together by L...
Devtest: using Lean and Devops practices to bring QA and coders together by L...Institut Lean France
 

Similar to Advanced Oozie (20)

Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Hbase coprocessor with Oozie WF referencing 3rd Party jars
Hbase coprocessor with Oozie WF referencing 3rd Party jarsHbase coprocessor with Oozie WF referencing 3rd Party jars
Hbase coprocessor with Oozie WF referencing 3rd Party jars
 
oozieee.pdf
oozieee.pdfoozieee.pdf
oozieee.pdf
 
The Practice of Alluxio in JD.com
The Practice of Alluxio in JD.comThe Practice of Alluxio in JD.com
The Practice of Alluxio in JD.com
 
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Monitoring OSGi Applications with the Web Console
Monitoring OSGi Applications with the Web ConsoleMonitoring OSGi Applications with the Web Console
Monitoring OSGi Applications with the Web Console
 
Monitoring OSGi Applications with the Web Console
Monitoring OSGi Applications with the Web ConsoleMonitoring OSGi Applications with the Web Console
Monitoring OSGi Applications with the Web Console
 
Monitoring OSGi Applications with the Web Console - Carsten Ziegeler
Monitoring OSGi Applications with the Web Console - Carsten ZiegelerMonitoring OSGi Applications with the Web Console - Carsten Ziegeler
Monitoring OSGi Applications with the Web Console - Carsten Ziegeler
 
Case Study: Plus Retail - Moving from the Old World to the New World
Case Study: Plus Retail - Moving from the Old World to the New WorldCase Study: Plus Retail - Moving from the Old World to the New World
Case Study: Plus Retail - Moving from the Old World to the New World
 
Overview of PaaS: Java experience
Overview of PaaS: Java experienceOverview of PaaS: Java experience
Overview of PaaS: Java experience
 
Overview of PaaS: Java experience
Overview of PaaS: Java experienceOverview of PaaS: Java experience
Overview of PaaS: Java experience
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
 
WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.
WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.
WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.
 
How to upgrade OJS 2 to OJS 3 latest
How to upgrade OJS 2 to OJS 3 latestHow to upgrade OJS 2 to OJS 3 latest
How to upgrade OJS 2 to OJS 3 latest
 
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
 
WildFly AppServer - State of the Union
WildFly AppServer - State of the UnionWildFly AppServer - State of the Union
WildFly AppServer - State of the Union
 
Node.js in a heterogeneous system
Node.js in a heterogeneous systemNode.js in a heterogeneous system
Node.js in a heterogeneous system
 
Java EE 6, Eclipse, GlassFish @EclipseCon 2010
Java EE 6, Eclipse, GlassFish @EclipseCon 2010Java EE 6, Eclipse, GlassFish @EclipseCon 2010
Java EE 6, Eclipse, GlassFish @EclipseCon 2010
 
Devtest: using Lean and Devops practices to bring QA and coders together by L...
Devtest: using Lean and Devops practices to bring QA and coders together by L...Devtest: using Lean and Devops practices to bring QA and coders together by L...
Devtest: using Lean and Devops practices to bring QA and coders together by L...
 

More from Chicago Hadoop Users Group

Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopChicago Hadoop Users Group
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917Chicago Hadoop Users Group
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Chicago Hadoop Users Group
 

More from Chicago Hadoop Users Group (18)

Kinetica master chug_9.12
Kinetica master chug_9.12Kinetica master chug_9.12
Kinetica master chug_9.12
 
Chug dl presentation
Chug dl presentationChug dl presentation
Chug dl presentation
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Meet Spark
Meet SparkMeet Spark
Meet Spark
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Financial Data Analytics with Hadoop
Financial Data Analytics with HadoopFinancial Data Analytics with Hadoop
Financial Data Analytics with Hadoop
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604
 
Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416
 

Recently uploaded

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Alison B. Lowndes
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 

Recently uploaded (20)

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 

Advanced Oozie

  • 2. In Addition to Boris Lublinsky, Kevin T. Smith and Alexey Yakubovich. “Professional Hadoop Solutions”
  • 3. New cases • Use cases • Organizing all data processing steps: up-down • Regular data injection • Regular data transformation • Regular report generation • Extensions • File movement on HDFS (synch. java action) • Data transfer (synch - ftp, synch - ssh) • Logging / monitoring (beyond Oozie console)
  • 4. New & rediscovered Oozie features • 1. JMS notifications (job life cycle, SLA) http://oozie.apache.org/docs/4.0.0/DG_JMSNotifications.html • 2. Overriding the launcher https://github.com/yahoo/oozie/blob/master/examples/src/main/java/org /apache/oozie/example/DemoPigMain.java • Unit testing Oozie with MiniOozie http://oozie.apache.org/docs/4.0.0/ENG_MiniOozie.html
  • 5. JMS notifications “Push” JMS notifications for action status, SLA met and SLA miss Needs “JMS broker” to interprets notifications Apache ActiveMQ Need “JMS notification configuration” in the oozie-site.xml: oozie.services.ext oozie.services.EventHandlerService… oozie.jms.producer.connection.properties (topic) Notification types Job status: start, success, failure, suspended … SLA: start| end| duration && met | miss Message format: javax.jms.TextMessage with Oozie job specific headers
  • 6. Overriding the launcher (cross-cutting concerns) • Regular Pig job launcher – org.apache.oozie.action.hadoop.PigMain Reminder: action executor provides all preparations for submitting action as a hadoop job(s). In particularly the PigMain executor invokes the Pig runtime on an Edge (Gateway) node. public class SpecialPigExec extends PigMain() { e.g. logging, external services (security,, transactions) } • Oozie workflow <action name=“pig-special”> <pig> … <property> <name> oozie.launcher.action.main.class </name> <value> … SpecialPigExec</value>
  • 7. Unit testing Oozie with MiniOozie • MiniOozieTestCase is a junit test class • Allows to test workflow and coordinator applications • Tests workflow directly from IDE (Eclipse for sure) • Does not require access to cluster or running Oozie server • Runs against the local file system • Tested on Linux and Max OS X, configured with Maven (simple) • Needs most (all) Oozie libraries Action choice restricted: java actions is straight forward. others can be “simulated” I can’t tell if possible to combine with PigUnit and Hive standalone mode.