SlideShare a Scribd company logo
1 of 26
Download to read offline
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
April 19, 2018
Artem Ervits – Hortonworks
Clay Baenziger – Bloomberg
Breathing New Life into Apache Oozie
with Apache Ambari Workflow Manager
© 2018 Bloomberg Finance L.P. All rights reserved.
Poll:
• Who here uses Oozie?
— In production?
— Do you use HUE with Oozie?
— How many workflows have you in production?
1-10? 10-50? 50+?
— How many actions does the largest workflow contain?
1-10? 10-50? 50+?
— Do you use Oozie with (or want to)?
HBase? Spark? Python? Deployment Automation?
• Do you like XML?
— Do you have a favorite editor for Oozie workflows?
© 2018 Bloomberg Finance L.P. All rights reserved.
Open Source Workflow Managers
• Apache Airflow (Incubating)
• Luigi by Spotify
• Azkaban by LinkedIn
• (And of course) Apache Oozie
© 2018 Bloomberg Finance L.P. All rights reserved.
Introduction to Oozie
• Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
• Oozie workflow jobs are Directed Acyclical Graphs (DAGs) of actions.
• Oozie coordinator jobs are recurrent Oozie workflow jobs triggerd by time and data availability.
• Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs as well as
system specific jobs out of the box.
• Oozie is a scalable, reliable and extensible system.
- Paraphrased from http://oozie.apache.org
Actions:
• Map/Reduce
• Hive
• Pig
• HDFS
• Java
• Shell
• Spark
• Sub-Workflow
• E-Mail
• Decision
• Fork
• Join
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Release Timeline
• 1.x released in 2010. Yahoo! project with two GitHub releases. Added support for workflow jobs.
• 2.x released in 2011. Still with Yahoo! with nine GitHub releases. Added support for coordinator jobs.
• 3.x released in 2013. Project under Apache. Added support for bundle jobs and HBase credentials.
• 4.x released in 2014. Added support for Hive/HCatalog, Spark integration and Oozie server high
availability.
• 5.0 released April 2018. Removes support for Hadoop 1, adds support for Hadoop 3, YARN AM instead
of MR launcher, new actions, code clean up.
- Adopted from: Apache Oozie by
Mohammad Kamrul Islam and Aravind Srinivasan
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Complaints
• Launcher jobs as map tasks
• Dated UI
• Confusing object model – workflows, coordinators, bundles
• Complicated setup
• XML
• DAG visualization
• SLA alerting
• Fine grained authorization
• Easy access to log files
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Complaints Improvements
• Launcher jobs as map tasks – solved by Oozie 5.0.0, OOZIE-1770
• Dated UI – OOZIE-2683, targeted for Oozie 5.X (Hue and Workflow Manager today)
• Confusing object model – jobs API, patch available, targeted for 5.X, OOZIE-2339
• Complicated setup – can deploy with embedded Jetty in Oozie 5.0.0, OOZIE-2666
• XML – fluent job API, patch available, targeted for 5.X, OOZIE-2339
• DAG visualization – solved by Oozie 5.0.0, OOZIE-2406
• SLA alerting – since Oozie 4.0.0, OOZIE-1294
• Fine grained authorization – targeted for Oozie 5.X, OOZIE-3196
• Easy access to log files – solved by Oozie 5.0.0, OOZIE-2296
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Launcher – Prior to Release 5.0
• MR launcher job
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Launcher – Release 5.0
• OYA: OOZIE-1770: Create Oozie Application Master for YARN
— Removes MR launcher job
• Design Doc
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Documentation – Before Release 5.0 and After
Documentation redesign
OOZIE-3163: Improve documentation rendering: use fluido skin and better config
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Workflow Visualiztion – Prior to 5.0 and After
Jung GraphViz
OOZIE-2406: Completely rewrite GraphGenerator code
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Fluent Job API – Apache Oozie 5.X (Preview)
OOZIE-2339: Provide an API for writing jobs based on the XSD schemas
© 2018 Bloomberg Finance L.P. All rights reserved.
Apache Ambari
Ambari Provides:
• Provisioning of a Hadoop Cluster
• Management of a Hadoop Cluster
• Monitoring of a Hadoop Cluster
— A Metrics System for metrics collection
— An Alert Framework
— A dashboard for monitoring the Hadoop cluster
-Paraphrased from http://ambari.apache.org
© 2018 Bloomberg Finance L.P. All rights reserved.
Ambari Views
• Ambari Views ”offer a systematic way to plug-in UI capabilities to surface custom
visualization, management and monitoring features in Ambari Web. A "view" is a way of
extending Ambari that allows 3rd parties to plug in new resource types along with the
APIs, providers and UI to support them. In other words, a view is an application that is
deployed into the Ambari container.”
• Key take-aways:
— One does not need an Ambari managed (administrated) cluster
— Third parties can build views packages to run in the Ambari framework too
— Major views available:
(YARN) Capacity Scheduler, (HDFS) Files, HAWQ, Hive, Pig, Storm, Tez, (YARN
ATS) Jobs, (Oozie) Workflow Manager
• Alternatives: Cloudera Hue, bespoke applications
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Motivation
• Oozie workflows are defined in XML – too verbose
— Provide GUI workflow builder and editor
— Reduce possibility of user introduced errors
— Provide browser based workflow manager
• Integration with File
Browser (includes S3 support)
— Tighter integration with
Ambari in future
— Can replace existing
Oozie web UI
• Oozie is hard-coded to
display only 25 actions
— WFM doesn’t have this
limit; tested with 300+
action nodes
• Oozie is scalable
— Can scale WFM by
standing-up multiple
Ambari Views servers
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Designer Example
Workflow Manager:
• Available as an Ambari View
• Enables visual editing of Oozie workflows
• Integrated with file browser
• Reduces user input errors
• Minimal input required
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Execution View
• Integrated Dashboard with Workflow Manager View
• Manage Oozie jobs
• Drill down to logs
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Design Component
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Dashboard Component
Good Documentation: HDP 2.6 – Workflow Manager Basics
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
• Setup
• Data Definition
• Compactions
Workflow Manager Examples with
HBase
© 2018 Bloomberg Finance L.P. All rights reserved.
HBase – Setup
Oozie needs HBase Confguration:
• Oozie Server Code (to support Hbase delegation tokens)
— In libexec (see Server JARs list)
— In oozie-site.xml
<name>oozie.credentials.credentialclasses</name>
<value>hbase=org.apache.oozie.action.hadoop.HbaseCredentials,…</value>
</name>
• Client Workflow Code:
— Add to workflow.xml:
<credentials>
<credential name=”myhbase_creds” type=”hbase”>
[…]
</credential>
</credentials>
— All your normal HBase security settings in the credential section
• Server JARs:
(Copy the following to Oozie’s libexec)
— hbase-common.jar
— hbase-client.jar
— hbase-server.jar
— hbase-protocol.jar
— hbase-hadoop2-compat.jar
© 2018 Bloomberg Finance L.P. All rights reserved.
create_my_table.rb:
tables = list
tables.select { |table|
table.eql?('my_table') }
if tables.empty?
create 'my_table',
{NAME => 'my_col'}
end
exit
HBase – Data Definition
HBase Shell:
<action name="HBASE-Shell" cred="hbase_creds">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>hbase</exec>
<argument>shell</argument>
<argument>-n</argument>
<argument>create_my_table.rb</argument>
</shell>
<ok to="do_more_things"/>
<error to="fail"/>
</action>
© 2018 Bloomberg Finance L.P. All rights reserved.
HBase – Compactions
HBASE-19528: Major Compaction Tool
• Automatically scales compaction to selected number of servers
• Requires read ability to /hbase
usage: MajorCompactor [-cf <arg>] [-dryRun] -servers <arg> -table <arg>
[...]
Usage instructions
-cf <arg> column families: comma separated eg: a,b,c
-dryRun Dry run, will just output a list of regions that
require compaction based on parameters passed
-minModTime <arg> Compact if store files have
modification time < minModTime
-servers <arg> Concurrent servers compacting
-table <arg> table name
...
© 2018 Bloomberg Finance L.P. All rights reserved.
More Resources
• Oozie Examples: https://github.com/dbist/oozie-examples
• Oozie Mailing Lists: http://oozie.apache.org/mail-lists.html
• Artem’s 12 Part Series on WFM: http://bit.ly/2syKUIh
• Clay’s Past Oozie presentations:
— Code Deployment via Oozie: Apache BigData http://bit.ly/2sP2qbj
— HBase Multi-Tenancy with Oozie: DataWorks Summit http://bit.ly/2rw7FIR
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
Demo!
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
Questions?

More Related Content

What's hot

Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...Lucas Jellema
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 
Obiee 12C and the Leap Forward in Lifecycle Management
Obiee 12C and the Leap Forward in Lifecycle ManagementObiee 12C and the Leap Forward in Lifecycle Management
Obiee 12C and the Leap Forward in Lifecycle ManagementStewart Bryson
 
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)Lucas Jellema
 
Building a modern end-to-end open source Big Data reference application
Building a modern end-to-end open source Big Data reference applicationBuilding a modern end-to-end open source Big Data reference application
Building a modern end-to-end open source Big Data reference applicationDataWorks Summit
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithDatabricks
 
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceOpenshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceJohn Archer
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...DataWorks Summit
 
Exploring All options to move your Oracle Databases to the Oracle Cloud
Exploring All options to move your Oracle Databases to the Oracle CloudExploring All options to move your Oracle Databases to the Oracle Cloud
Exploring All options to move your Oracle Databases to the Oracle CloudAlex Zaballa
 
ODPi 101: Who we are, What we do
ODPi 101: Who we are, What we doODPi 101: Who we are, What we do
ODPi 101: Who we are, What we doHortonworks
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in EnterpriseDataWorks Summit
 
a Real-time Processing System based on Spark streaming int he field of Teleco...
a Real-time Processing System based on Spark streaming int he field of Teleco...a Real-time Processing System based on Spark streaming int he field of Teleco...
a Real-time Processing System based on Spark streaming int he field of Teleco...DataWorks Summit
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easyDataWorks Summit
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiDataWorks Summit
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...DataWorks Summit
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto MeetupHortonworks
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionTravis Wright
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)Kay Lerch
 

What's hot (20)

Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Obiee 12C and the Leap Forward in Lifecycle Management
Obiee 12C and the Leap Forward in Lifecycle ManagementObiee 12C and the Leap Forward in Lifecycle Management
Obiee 12C and the Leap Forward in Lifecycle Management
 
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
 
Building a modern end-to-end open source Big Data reference application
Building a modern end-to-end open source Big Data reference applicationBuilding a modern end-to-end open source Big Data reference application
Building a modern end-to-end open source Big Data reference application
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
 
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceOpenshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Exploring All options to move your Oracle Databases to the Oracle Cloud
Exploring All options to move your Oracle Databases to the Oracle CloudExploring All options to move your Oracle Databases to the Oracle Cloud
Exploring All options to move your Oracle Databases to the Oracle Cloud
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
 
ODPi 101: Who we are, What we do
ODPi 101: Who we are, What we doODPi 101: Who we are, What we do
ODPi 101: Who we are, What we do
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
 
a Real-time Processing System based on Spark streaming int he field of Teleco...
a Real-time Processing System based on Spark streaming int he field of Teleco...a Real-time Processing System based on Spark streaming int he field of Teleco...
a Real-time Processing System based on Spark streaming int he field of Teleco...
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
 

Similar to Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager

Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerBreathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerArtem Ervits
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerDataWorks Summit
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
APIdays 2016 - The State of Web API Languages
APIdays 2016  - The State of Web API LanguagesAPIdays 2016  - The State of Web API Languages
APIdays 2016 - The State of Web API LanguagesRestlet
 
API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17Phil Wilkins
 
First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)Daniel Toomey
 
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...Edward Wilde
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Running SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite CustomersRunning SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite CustomersSimon Haslam
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoopGergely Devenyi
 
SOA - From Webservices to APIs
SOA - From Webservices to APIsSOA - From Webservices to APIs
SOA - From Webservices to APIsHolger Reinhardt
 
Lessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxLessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxapidays
 
Deploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARIDeploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARIDataWorks Summit
 
Modernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIsModernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIsApigee | Google Cloud
 
Top 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementationTop 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementationOCTO Technology
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineCsaba Toth
 

Similar to Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager (20)

Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerBreathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
APIdays 2016 - The State of Web API Languages
APIdays 2016  - The State of Web API LanguagesAPIdays 2016  - The State of Web API Languages
APIdays 2016 - The State of Web API Languages
 
API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17
 
Oozie meetup - HA
Oozie meetup - HAOozie meetup - HA
Oozie meetup - HA
 
First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)
 
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Running SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite CustomersRunning SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite Customers
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
SOA - From Webservices to APIs
SOA - From Webservices to APIsSOA - From Webservices to APIs
SOA - From Webservices to APIs
 
Apache Oozie
Apache OozieApache Oozie
Apache Oozie
 
Lessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxLessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptx
 
Deploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARIDeploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARI
 
Modernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIsModernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIs
 
Top 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementationTop 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementation
 
Octo API-days 2015
Octo API-days 2015Octo API-days 2015
Octo API-days 2015
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App Engine
 
yii framework
yii frameworkyii framework
yii framework
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPTiSEO AI
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 

Recently uploaded (20)

Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 

Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager

  • 1. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 April 19, 2018 Artem Ervits – Hortonworks Clay Baenziger – Bloomberg Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
  • 2. © 2018 Bloomberg Finance L.P. All rights reserved. Poll: • Who here uses Oozie? — In production? — Do you use HUE with Oozie? — How many workflows have you in production? 1-10? 10-50? 50+? — How many actions does the largest workflow contain? 1-10? 10-50? 50+? — Do you use Oozie with (or want to)? HBase? Spark? Python? Deployment Automation? • Do you like XML? — Do you have a favorite editor for Oozie workflows?
  • 3. © 2018 Bloomberg Finance L.P. All rights reserved. Open Source Workflow Managers • Apache Airflow (Incubating) • Luigi by Spotify • Azkaban by LinkedIn • (And of course) Apache Oozie
  • 4. © 2018 Bloomberg Finance L.P. All rights reserved. Introduction to Oozie • Oozie is a workflow scheduler system to manage Apache Hadoop jobs. • Oozie workflow jobs are Directed Acyclical Graphs (DAGs) of actions. • Oozie coordinator jobs are recurrent Oozie workflow jobs triggerd by time and data availability. • Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs as well as system specific jobs out of the box. • Oozie is a scalable, reliable and extensible system. - Paraphrased from http://oozie.apache.org Actions: • Map/Reduce • Hive • Pig • HDFS • Java • Shell • Spark • Sub-Workflow • E-Mail • Decision • Fork • Join
  • 5. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Release Timeline • 1.x released in 2010. Yahoo! project with two GitHub releases. Added support for workflow jobs. • 2.x released in 2011. Still with Yahoo! with nine GitHub releases. Added support for coordinator jobs. • 3.x released in 2013. Project under Apache. Added support for bundle jobs and HBase credentials. • 4.x released in 2014. Added support for Hive/HCatalog, Spark integration and Oozie server high availability. • 5.0 released April 2018. Removes support for Hadoop 1, adds support for Hadoop 3, YARN AM instead of MR launcher, new actions, code clean up. - Adopted from: Apache Oozie by Mohammad Kamrul Islam and Aravind Srinivasan
  • 6. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Complaints • Launcher jobs as map tasks • Dated UI • Confusing object model – workflows, coordinators, bundles • Complicated setup • XML • DAG visualization • SLA alerting • Fine grained authorization • Easy access to log files
  • 7. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Complaints Improvements • Launcher jobs as map tasks – solved by Oozie 5.0.0, OOZIE-1770 • Dated UI – OOZIE-2683, targeted for Oozie 5.X (Hue and Workflow Manager today) • Confusing object model – jobs API, patch available, targeted for 5.X, OOZIE-2339 • Complicated setup – can deploy with embedded Jetty in Oozie 5.0.0, OOZIE-2666 • XML – fluent job API, patch available, targeted for 5.X, OOZIE-2339 • DAG visualization – solved by Oozie 5.0.0, OOZIE-2406 • SLA alerting – since Oozie 4.0.0, OOZIE-1294 • Fine grained authorization – targeted for Oozie 5.X, OOZIE-3196 • Easy access to log files – solved by Oozie 5.0.0, OOZIE-2296
  • 8. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Launcher – Prior to Release 5.0 • MR launcher job
  • 9. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Launcher – Release 5.0 • OYA: OOZIE-1770: Create Oozie Application Master for YARN — Removes MR launcher job • Design Doc
  • 10. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Documentation – Before Release 5.0 and After Documentation redesign OOZIE-3163: Improve documentation rendering: use fluido skin and better config
  • 11. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Workflow Visualiztion – Prior to 5.0 and After Jung GraphViz OOZIE-2406: Completely rewrite GraphGenerator code
  • 12. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Fluent Job API – Apache Oozie 5.X (Preview) OOZIE-2339: Provide an API for writing jobs based on the XSD schemas
  • 13. © 2018 Bloomberg Finance L.P. All rights reserved. Apache Ambari Ambari Provides: • Provisioning of a Hadoop Cluster • Management of a Hadoop Cluster • Monitoring of a Hadoop Cluster — A Metrics System for metrics collection — An Alert Framework — A dashboard for monitoring the Hadoop cluster -Paraphrased from http://ambari.apache.org
  • 14. © 2018 Bloomberg Finance L.P. All rights reserved. Ambari Views • Ambari Views ”offer a systematic way to plug-in UI capabilities to surface custom visualization, management and monitoring features in Ambari Web. A "view" is a way of extending Ambari that allows 3rd parties to plug in new resource types along with the APIs, providers and UI to support them. In other words, a view is an application that is deployed into the Ambari container.” • Key take-aways: — One does not need an Ambari managed (administrated) cluster — Third parties can build views packages to run in the Ambari framework too — Major views available: (YARN) Capacity Scheduler, (HDFS) Files, HAWQ, Hive, Pig, Storm, Tez, (YARN ATS) Jobs, (Oozie) Workflow Manager • Alternatives: Cloudera Hue, bespoke applications
  • 15. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Motivation • Oozie workflows are defined in XML – too verbose — Provide GUI workflow builder and editor — Reduce possibility of user introduced errors — Provide browser based workflow manager • Integration with File Browser (includes S3 support) — Tighter integration with Ambari in future — Can replace existing Oozie web UI • Oozie is hard-coded to display only 25 actions — WFM doesn’t have this limit; tested with 300+ action nodes • Oozie is scalable — Can scale WFM by standing-up multiple Ambari Views servers
  • 16. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Designer Example Workflow Manager: • Available as an Ambari View • Enables visual editing of Oozie workflows • Integrated with file browser • Reduces user input errors • Minimal input required
  • 17. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Execution View • Integrated Dashboard with Workflow Manager View • Manage Oozie jobs • Drill down to logs
  • 18. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Design Component
  • 19. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Dashboard Component Good Documentation: HDP 2.6 – Workflow Manager Basics
  • 20. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 • Setup • Data Definition • Compactions Workflow Manager Examples with HBase
  • 21. © 2018 Bloomberg Finance L.P. All rights reserved. HBase – Setup Oozie needs HBase Confguration: • Oozie Server Code (to support Hbase delegation tokens) — In libexec (see Server JARs list) — In oozie-site.xml <name>oozie.credentials.credentialclasses</name> <value>hbase=org.apache.oozie.action.hadoop.HbaseCredentials,…</value> </name> • Client Workflow Code: — Add to workflow.xml: <credentials> <credential name=”myhbase_creds” type=”hbase”> […] </credential> </credentials> — All your normal HBase security settings in the credential section • Server JARs: (Copy the following to Oozie’s libexec) — hbase-common.jar — hbase-client.jar — hbase-server.jar — hbase-protocol.jar — hbase-hadoop2-compat.jar
  • 22. © 2018 Bloomberg Finance L.P. All rights reserved. create_my_table.rb: tables = list tables.select { |table| table.eql?('my_table') } if tables.empty? create 'my_table', {NAME => 'my_col'} end exit HBase – Data Definition HBase Shell: <action name="HBASE-Shell" cred="hbase_creds"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>hbase</exec> <argument>shell</argument> <argument>-n</argument> <argument>create_my_table.rb</argument> </shell> <ok to="do_more_things"/> <error to="fail"/> </action>
  • 23. © 2018 Bloomberg Finance L.P. All rights reserved. HBase – Compactions HBASE-19528: Major Compaction Tool • Automatically scales compaction to selected number of servers • Requires read ability to /hbase usage: MajorCompactor [-cf <arg>] [-dryRun] -servers <arg> -table <arg> [...] Usage instructions -cf <arg> column families: comma separated eg: a,b,c -dryRun Dry run, will just output a list of regions that require compaction based on parameters passed -minModTime <arg> Compact if store files have modification time < minModTime -servers <arg> Concurrent servers compacting -table <arg> table name ...
  • 24. © 2018 Bloomberg Finance L.P. All rights reserved. More Resources • Oozie Examples: https://github.com/dbist/oozie-examples • Oozie Mailing Lists: http://oozie.apache.org/mail-lists.html • Artem’s 12 Part Series on WFM: http://bit.ly/2syKUIh • Clay’s Past Oozie presentations: — Code Deployment via Oozie: Apache BigData http://bit.ly/2sP2qbj — HBase Multi-Tenancy with Oozie: DataWorks Summit http://bit.ly/2rw7FIR
  • 25. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 Demo!
  • 26. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 Questions?