SlideShare a Scribd company logo
1 of 26
Download to read offline
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
April 19, 2018
Artem Ervits – Hortonworks
Clay Baenziger – Bloomberg
Breathing New Life into Apache Oozie
with Apache Ambari Workflow Manager
© 2018 Bloomberg Finance L.P. All rights reserved.
Poll:
• Who here uses Oozie?
— In production?
With kerberos?
— Do you use HUE with Oozie?
— How many workflows have you in production?
1-10? 10-50? 50+?
— How many actions does the largest workflow contain?
1-10? 10-50? 50+?
— Do you use Oozie with (or want to)?
HBase? Spark? Python? Deployment Automation?
• Do you like XML?
— Do you have a favorite editor for Oozie workflows?
© 2018 Bloomberg Finance L.P. All rights reserved.
Open Source Workflow Managers
• Apache Airflow (Incubating)
• Luigi by Spotify
• Azkaban by LinkedIn
• (And of course) Apache Oozie
© 2018 Bloomberg Finance L.P. All rights reserved.
Introduction to Oozie
• Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
• Oozie workflow jobs are Directed Acyclic Graphs (DAGs) of actions.
• Oozie coordinator jobs are recurrent Oozie workflow jobs triggered by time and data availability.
• Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs as well as
system specific jobs out of the box.
• Oozie is a scalable, reliable and extensible system.
- Paraphrased from http://oozie.apache.org
Actions:
• Map/Reduce
• Hive
• Pig
• HDFS
• Java
• Shell
• Spark
• Sub-Workflow
• E-Mail
• Decision
• Fork
• Join
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Release Timeline
• 1.x released in 2010. Yahoo! project with two GitHub releases. Added support for workflow jobs.
• 2.x released in 2011. Still with Yahoo! with nine GitHub releases. Added support for coordinator jobs.
• 3.x released in 2013. Project under Apache. Added support for bundle jobs and HBase credentials.
• 4.x released in 2014. Added support for Hive/HCatalog, Spark integration and Oozie server high
availability.
• 5.0 released April 2018. Removes support for Hadoop 1, adds support for Hadoop 3, YARN AM instead
of MR launcher, new actions, code clean up.
- Adopted from: Apache Oozie by
Mohammad Kamrul Islam and Aravind Srinivasan
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Complaints
• Launcher jobs as map tasks
• Dated UI
• Confusing object model – workflows, coordinators, bundles
• Complicated setup
• XML
• DAG visualization
• SLA alerting
• Fine grained authorization
• Easy access to log files
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Complaints Improvements
• Launcher jobs as map tasks – solved by Oozie 5.0.0, OOZIE-1770
• Dated UI – OOZIE-2683, targeted for Oozie 5.X (Hue and Workflow Manager today)
• Confusing object model – jobs API, patch available, targeted for 5.X, OOZIE-2339
• Complicated setup – can deploy with embedded Jetty in Oozie 5.0.0, OOZIE-2666
• XML – fluent job API, patch available, targeted for 5.X, OOZIE-2339
• DAG visualization – solved by Oozie 5.0.0, OOZIE-2406
• SLA alerting – since Oozie 4.0.0, OOZIE-1294
• Fine grained authorization – targeted for Oozie 5.X, OOZIE-3196
• Easy access to log files – solved by Oozie 5.0.0, OOZIE-2296
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Launcher – Prior to Release 5.0
• MR launcher job
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Launcher – Release 5.0
• OYA: OOZIE-1770: Create Oozie Application Master for YARN
— Removes MR launcher job
• Design Doc
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Documentation – Before Release 5.0 and After
Documentation redesign
OOZIE-3163: Improve documentation rendering: use fluido skin and better config
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Workflow Visualization – Prior to 5.0 and After
Jung GraphViz
OOZIE-2406: Completely rewrite Graph Generator code
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Fluent Job API – Apache Oozie 5.X (Preview)
OOZIE-2339: Provide an API for writing jobs based on the XSD schemas
© 2018 Bloomberg Finance L.P. All rights reserved.
Apache Ambari
Ambari Provides:
• Provisioning of a Hadoop Cluster
• Management of a Hadoop Cluster
• Monitoring of a Hadoop Cluster
— A Metrics System for metrics collection
— An Alert Framework
— A dashboard for monitoring the Hadoop cluster
-Paraphrased from http://ambari.apache.org
© 2018 Bloomberg Finance L.P. All rights reserved.
Ambari Views
• Ambari Views ”offer a systematic way to plug-in UI capabilities to surface custom
visualization, management and monitoring features in Ambari Web. A "view" is a way of
extending Ambari that allows 3rd parties to plug in new resource types along with the
APIs, providers and UI to support them. In other words, a view is an application that is
deployed into the Ambari container.”
• Key takeaways:
— One does not need an Ambari managed (administrated) cluster
— Third parties can build views packages to run in the Ambari framework too
— Major views available:
(YARN) Capacity Scheduler, (HDFS) Files, HAWQ, Hive, Pig, Storm, Tez, (YARN
ATS) Jobs, (Oozie) Workflow Manager
• Alternatives: Cloudera Hue, bespoke applications
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Motivation
• Oozie workflows are defined in XML – too verbose
— Provide GUI workflow builder and editor
— Reduce possibility of user introduced errors
— Provide browser based workflow manager
• Integration with File Browser
— Includes S3 support
— Can replace existing
Oozie web UI
• Oozie is hard-coded to
display only 25 actions
— WFM doesn’t have this
limit; tested with 300+
action nodes
• Oozie is scalable
— Can scale WFM by
standing-up multiple
Ambari Views servers
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Editor Example
Workflow Manager:
• Available as an Ambari View
• Enables visual editing of Oozie workflows
• Integrated with file browser
• Reduces user input errors
• Minimal input required
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Execution View Example
• Integrated Dashboard with Workflow Manager View
• Manage Oozie jobs
• Drill down to logs
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Design Component
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Dashboard Component
Good Documentation: HDP 2.6 – Workflow Manager Basics
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
• Setup Oozie – Server and Workflows
• Data Definition – Tables, ACLs
• Compactions – Operational
Workflow Manager Examples with
HBase
© 2018 Bloomberg Finance L.P. All rights reserved.
HBase – Setup
Oozie needs HBase Configuration:
• Oozie Server Code (to support HBase delegation tokens)
— In libexec (see Server JARs list)
— In oozie-site.xml
<name>oozie.credentials.credentialclasses</name>
<value>hbase=org.apache.oozie.action.hadoop.HbaseCredentials,…</value>
</name>
• Client Workflow Code:
— Add to workflow.xml:
<credentials>
<credential name=”myhbase_creds” type=”hbase”>
[…]
</credential>
</credentials>
— All your normal HBase security settings in the credential section
• Server JARs:
(Copy the following to Oozie’s libexec)
— hbase-common.jar
— hbase-client.jar
— hbase-server.jar
— hbase-protocol.jar
— hbase-hadoop2-compat.jar
© 2018 Bloomberg Finance L.P. All rights reserved.
create_my_table.rb:
tables = list
tables.select { |table|
table.eql?('my_table') }
if tables.empty?
create 'my_table',
{NAME => 'my_col'}
end
exit
HBase – Data Definition
HBase Shell:
<action name="HBASE-Shell" cred="hbase_creds">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>hbase</exec>
<argument>shell</argument>
<argument>-n</argument>
<argument>create_my_table.rb</argument>
</shell>
<ok to="do_more_things"/>
<error to="fail"/>
</action>
© 2018 Bloomberg Finance L.P. All rights reserved.
HBase – Compactions
HBASE-19528: Major Compaction Tool
• Automatically scales compaction to selected number of servers
• Requires read ability to /hbase
usage: MajorCompactor [-cf <arg>] [-dryRun] -servers <arg> -table <arg>
[...]
Usage instructions
-cf <arg> column families: comma separated eg: a,b,c
-dryRun Dry run, will just output a list of regions that
require compaction based on parameters passed
-minModTime <arg> Compact if store files have
modification time < minModTime
-servers <arg> Concurrent servers compacting
-table <arg> table name
...
© 2018 Bloomberg Finance L.P. All rights reserved.
More Resources
• Apache Oozie Mailing Lists: http://oozie.apache.org/mail-lists.html
• Artem’s Oozie Resources:
—12 Part Series on WFM: http://bit.ly/2syKUIh
— Oozie Examples: https://github.com/dbist/oozie-examples
• Clay’s Past Oozie Presentations:
— Code Deployment via Oozie: Apache BigData http://bit.ly/2sP2qbj
— HBase Multi-Tenancy with Oozie: DataWorks Summit http://bit.ly/2rw7FIR
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
Demo!
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
Questions?

More Related Content

What's hot

What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?Hortonworks
 
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storagehive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata StorageDataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks
 
Accelerating query processing
Accelerating query processingAccelerating query processing
Accelerating query processingDataWorks Summit
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache CalciteJulian Hyde
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopDataWorks Summit
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014alanfgates
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Julian Hyde
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHortonworks
 

What's hot (20)

What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storagehive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical Applications
 
Accelerating query processing
Accelerating query processingAccelerating query processing
Accelerating query processing
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Creating the Internet of Your Things
Creating the Internet of Your ThingsCreating the Internet of Your Things
Creating the Internet of Your Things
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Apache Hive ACID Project
Apache Hive ACID ProjectApache Hive ACID Project
Apache Hive ACID Project
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
 
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
 
From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
 

Similar to Breathing new life into Apache Oozie with Apache Ambari Workflow Manager

Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerDataWorks Summit
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerDataWorks Summit
 
APIdays 2016 - The State of Web API Languages
APIdays 2016  - The State of Web API LanguagesAPIdays 2016  - The State of Web API Languages
APIdays 2016 - The State of Web API LanguagesRestlet
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my dataAndrejs Vorobjovs
 
API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17Phil Wilkins
 
Modernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIsModernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIsApigee | Google Cloud
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoopGergely Devenyi
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Top 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementationTop 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementationOCTO Technology
 
First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)Daniel Toomey
 
Add Apache Web Server to your Unified Monitoring Toolkit
Add Apache Web Server to your Unified Monitoring ToolkitAdd Apache Web Server to your Unified Monitoring Toolkit
Add Apache Web Server to your Unified Monitoring ToolkitAppDynamics
 
SOA - From Webservices to APIs
SOA - From Webservices to APIsSOA - From Webservices to APIs
SOA - From Webservices to APIsHolger Reinhardt
 
Running SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite CustomersRunning SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite CustomersSimon Haslam
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtechYuta Imai
 
A high profile project with Symfony and API Platform: beIN SPORTS
A high profile project with Symfony and API Platform: beIN SPORTSA high profile project with Symfony and API Platform: beIN SPORTS
A high profile project with Symfony and API Platform: beIN SPORTSSmile I.T is open
 
Lessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxLessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxapidays
 

Similar to Breathing new life into Apache Oozie with Apache Ambari Workflow Manager (20)

Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
 
APIdays 2016 - The State of Web API Languages
APIdays 2016  - The State of Web API LanguagesAPIdays 2016  - The State of Web API Languages
APIdays 2016 - The State of Web API Languages
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my data
 
API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17
 
Modernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIsModernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIs
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Oozie meetup - HA
Oozie meetup - HAOozie meetup - HA
Oozie meetup - HA
 
Top 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementationTop 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementation
 
Octo API-days 2015
Octo API-days 2015Octo API-days 2015
Octo API-days 2015
 
First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)
 
Add Apache Web Server to your Unified Monitoring Toolkit
Add Apache Web Server to your Unified Monitoring ToolkitAdd Apache Web Server to your Unified Monitoring Toolkit
Add Apache Web Server to your Unified Monitoring Toolkit
 
SOA - From Webservices to APIs
SOA - From Webservices to APIsSOA - From Webservices to APIs
SOA - From Webservices to APIs
 
Running SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite CustomersRunning SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite Customers
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
 
A high profile project with Symfony and API Platform: beIN SPORTS
A high profile project with Symfony and API Platform: beIN SPORTSA high profile project with Symfony and API Platform: beIN SPORTS
A high profile project with Symfony and API Platform: beIN SPORTS
 
Lessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxLessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptx
 
yii framework
yii frameworkyii framework
yii framework
 

Recently uploaded

WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 

Recently uploaded (20)

WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 

Breathing new life into Apache Oozie with Apache Ambari Workflow Manager

  • 1. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 April 19, 2018 Artem Ervits – Hortonworks Clay Baenziger – Bloomberg Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
  • 2. © 2018 Bloomberg Finance L.P. All rights reserved. Poll: • Who here uses Oozie? — In production? With kerberos? — Do you use HUE with Oozie? — How many workflows have you in production? 1-10? 10-50? 50+? — How many actions does the largest workflow contain? 1-10? 10-50? 50+? — Do you use Oozie with (or want to)? HBase? Spark? Python? Deployment Automation? • Do you like XML? — Do you have a favorite editor for Oozie workflows?
  • 3. © 2018 Bloomberg Finance L.P. All rights reserved. Open Source Workflow Managers • Apache Airflow (Incubating) • Luigi by Spotify • Azkaban by LinkedIn • (And of course) Apache Oozie
  • 4. © 2018 Bloomberg Finance L.P. All rights reserved. Introduction to Oozie • Oozie is a workflow scheduler system to manage Apache Hadoop jobs. • Oozie workflow jobs are Directed Acyclic Graphs (DAGs) of actions. • Oozie coordinator jobs are recurrent Oozie workflow jobs triggered by time and data availability. • Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs as well as system specific jobs out of the box. • Oozie is a scalable, reliable and extensible system. - Paraphrased from http://oozie.apache.org Actions: • Map/Reduce • Hive • Pig • HDFS • Java • Shell • Spark • Sub-Workflow • E-Mail • Decision • Fork • Join
  • 5. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Release Timeline • 1.x released in 2010. Yahoo! project with two GitHub releases. Added support for workflow jobs. • 2.x released in 2011. Still with Yahoo! with nine GitHub releases. Added support for coordinator jobs. • 3.x released in 2013. Project under Apache. Added support for bundle jobs and HBase credentials. • 4.x released in 2014. Added support for Hive/HCatalog, Spark integration and Oozie server high availability. • 5.0 released April 2018. Removes support for Hadoop 1, adds support for Hadoop 3, YARN AM instead of MR launcher, new actions, code clean up. - Adopted from: Apache Oozie by Mohammad Kamrul Islam and Aravind Srinivasan
  • 6. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Complaints • Launcher jobs as map tasks • Dated UI • Confusing object model – workflows, coordinators, bundles • Complicated setup • XML • DAG visualization • SLA alerting • Fine grained authorization • Easy access to log files
  • 7. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Complaints Improvements • Launcher jobs as map tasks – solved by Oozie 5.0.0, OOZIE-1770 • Dated UI – OOZIE-2683, targeted for Oozie 5.X (Hue and Workflow Manager today) • Confusing object model – jobs API, patch available, targeted for 5.X, OOZIE-2339 • Complicated setup – can deploy with embedded Jetty in Oozie 5.0.0, OOZIE-2666 • XML – fluent job API, patch available, targeted for 5.X, OOZIE-2339 • DAG visualization – solved by Oozie 5.0.0, OOZIE-2406 • SLA alerting – since Oozie 4.0.0, OOZIE-1294 • Fine grained authorization – targeted for Oozie 5.X, OOZIE-3196 • Easy access to log files – solved by Oozie 5.0.0, OOZIE-2296
  • 8. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Launcher – Prior to Release 5.0 • MR launcher job
  • 9. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Launcher – Release 5.0 • OYA: OOZIE-1770: Create Oozie Application Master for YARN — Removes MR launcher job • Design Doc
  • 10. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Documentation – Before Release 5.0 and After Documentation redesign OOZIE-3163: Improve documentation rendering: use fluido skin and better config
  • 11. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Workflow Visualization – Prior to 5.0 and After Jung GraphViz OOZIE-2406: Completely rewrite Graph Generator code
  • 12. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Fluent Job API – Apache Oozie 5.X (Preview) OOZIE-2339: Provide an API for writing jobs based on the XSD schemas
  • 13. © 2018 Bloomberg Finance L.P. All rights reserved. Apache Ambari Ambari Provides: • Provisioning of a Hadoop Cluster • Management of a Hadoop Cluster • Monitoring of a Hadoop Cluster — A Metrics System for metrics collection — An Alert Framework — A dashboard for monitoring the Hadoop cluster -Paraphrased from http://ambari.apache.org
  • 14. © 2018 Bloomberg Finance L.P. All rights reserved. Ambari Views • Ambari Views ”offer a systematic way to plug-in UI capabilities to surface custom visualization, management and monitoring features in Ambari Web. A "view" is a way of extending Ambari that allows 3rd parties to plug in new resource types along with the APIs, providers and UI to support them. In other words, a view is an application that is deployed into the Ambari container.” • Key takeaways: — One does not need an Ambari managed (administrated) cluster — Third parties can build views packages to run in the Ambari framework too — Major views available: (YARN) Capacity Scheduler, (HDFS) Files, HAWQ, Hive, Pig, Storm, Tez, (YARN ATS) Jobs, (Oozie) Workflow Manager • Alternatives: Cloudera Hue, bespoke applications
  • 15. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Motivation • Oozie workflows are defined in XML – too verbose — Provide GUI workflow builder and editor — Reduce possibility of user introduced errors — Provide browser based workflow manager • Integration with File Browser — Includes S3 support — Can replace existing Oozie web UI • Oozie is hard-coded to display only 25 actions — WFM doesn’t have this limit; tested with 300+ action nodes • Oozie is scalable — Can scale WFM by standing-up multiple Ambari Views servers
  • 16. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Editor Example Workflow Manager: • Available as an Ambari View • Enables visual editing of Oozie workflows • Integrated with file browser • Reduces user input errors • Minimal input required
  • 17. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Execution View Example • Integrated Dashboard with Workflow Manager View • Manage Oozie jobs • Drill down to logs
  • 18. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Design Component
  • 19. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Dashboard Component Good Documentation: HDP 2.6 – Workflow Manager Basics
  • 20. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 • Setup Oozie – Server and Workflows • Data Definition – Tables, ACLs • Compactions – Operational Workflow Manager Examples with HBase
  • 21. © 2018 Bloomberg Finance L.P. All rights reserved. HBase – Setup Oozie needs HBase Configuration: • Oozie Server Code (to support HBase delegation tokens) — In libexec (see Server JARs list) — In oozie-site.xml <name>oozie.credentials.credentialclasses</name> <value>hbase=org.apache.oozie.action.hadoop.HbaseCredentials,…</value> </name> • Client Workflow Code: — Add to workflow.xml: <credentials> <credential name=”myhbase_creds” type=”hbase”> […] </credential> </credentials> — All your normal HBase security settings in the credential section • Server JARs: (Copy the following to Oozie’s libexec) — hbase-common.jar — hbase-client.jar — hbase-server.jar — hbase-protocol.jar — hbase-hadoop2-compat.jar
  • 22. © 2018 Bloomberg Finance L.P. All rights reserved. create_my_table.rb: tables = list tables.select { |table| table.eql?('my_table') } if tables.empty? create 'my_table', {NAME => 'my_col'} end exit HBase – Data Definition HBase Shell: <action name="HBASE-Shell" cred="hbase_creds"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>hbase</exec> <argument>shell</argument> <argument>-n</argument> <argument>create_my_table.rb</argument> </shell> <ok to="do_more_things"/> <error to="fail"/> </action>
  • 23. © 2018 Bloomberg Finance L.P. All rights reserved. HBase – Compactions HBASE-19528: Major Compaction Tool • Automatically scales compaction to selected number of servers • Requires read ability to /hbase usage: MajorCompactor [-cf <arg>] [-dryRun] -servers <arg> -table <arg> [...] Usage instructions -cf <arg> column families: comma separated eg: a,b,c -dryRun Dry run, will just output a list of regions that require compaction based on parameters passed -minModTime <arg> Compact if store files have modification time < minModTime -servers <arg> Concurrent servers compacting -table <arg> table name ...
  • 24. © 2018 Bloomberg Finance L.P. All rights reserved. More Resources • Apache Oozie Mailing Lists: http://oozie.apache.org/mail-lists.html • Artem’s Oozie Resources: —12 Part Series on WFM: http://bit.ly/2syKUIh — Oozie Examples: https://github.com/dbist/oozie-examples • Clay’s Past Oozie Presentations: — Code Deployment via Oozie: Apache BigData http://bit.ly/2sP2qbj — HBase Multi-Tenancy with Oozie: DataWorks Summit http://bit.ly/2rw7FIR
  • 25. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 Demo!
  • 26. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 Questions?