SlideShare a Scribd company logo
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit San Jose 2018
June 20, 2018
Artem Ervits – Hortonworks
Clay Baenziger – Bloomberg
Breathing New Life into Apache Oozie
with Apache Ambari Workflow Manager
© 2018 Bloomberg Finance L.P. All rights reserved.
Poll:
• Who here uses Oozie?
— In production?
With kerberos?
— Do you use HUE with Oozie?
— How many workflows have you in production?
1-10? 10-50? 50+?
— How many actions does the largest workflow contain?
1-10? 10-50? 50+?
— Do you use Oozie with (or want to)?
HBase? Spark? Python? Deployment Automation?
• Do you like XML?
— Do you have a favorite editor for Oozie workflows?
© 2018 Bloomberg Finance L.P. All rights reserved.
Open Source Workflow Managers
• Apache Airflow (Incubating)
• Luigi by Spotify
• Azkaban by LinkedIn
• (And of course) Apache Oozie
© 2018 Bloomberg Finance L.P. All rights reserved.
Introduction to Oozie
• Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
• Oozie workflow jobs are Directed Acyclic Graphs (DAGs) of actions.
• Oozie coordinator jobs are recurrent Oozie workflow jobs triggered by time and data availability.
• Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs as well as
system specific jobs out of the box.
• Oozie is a scalable, reliable and extensible system.
- Paraphrased from http://oozie.apache.org
Actions:
• Map/Reduce
• Hive
• Pig
• HDFS
• Java
• Shell
• Spark
• Sub-Workflow
• E-Mail
• Decision
• Fork
• Join
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Release Timeline
• 1.x released in 2010. Yahoo! project with two GitHub releases. Added support for workflow jobs.
• 2.x released in 2011. Still with Yahoo! with nine GitHub releases. Added support for coordinator jobs.
• 3.x released in 2013. Project under Apache. Added support for bundle jobs and HBase credentials.
• 4.x released in 2014. Added support for Hive/HCatalog, Spark integration and Oozie server high
availability.
• 5.0 released April 2018. Removes support for Hadoop 1, adds support for Hadoop 3, YARN AM instead
of MR launcher, new actions, code clean up.
- Adopted from: Apache Oozie by
Mohammad Kamrul Islam and Aravind Srinivasan
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Complaints
• Launcher jobs as map tasks
• Dated UI
• Confusing object model & XML – workflows, coordinators, bundles
• Complicated setup
• DAG visualization
• SLA alerting
• Fine grained authorization
• Easy access to log files
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Complaints Improvements
• Launcher jobs as map tasks – solved by Oozie 5.0.0, OOZIE-1770
• Dated UI – OOZIE-2683, targeted for Oozie 5.X (Hue and Workflow Manager today)
• Confusing object model & XML – jobs API, patch available, targeted for 5.1, OOZIE-2339
• Complicated setup – can deploy with embedded Jetty in Oozie 5.0.0, OOZIE-2666
• DAG visualization – solved by Oozie 5.0.0, OOZIE-2406
• SLA alerting – since Oozie 4.0.0, OOZIE-1294
• Fine grained authorization – targeted for Oozie 5.X, OOZIE-3196
• Easy access to log files – solved by Diagnostic Tool in Oozie 5.0.0, OOZIE-2296
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie UI – React Mock-Up - OOZIE-3283
• Workflows
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Launcher – Prior to Release 5.0
• MR launcher job
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Launcher – Release 5.0
• OYA: OOZIE-1770: Create Oozie Application Master for YARN
— Removes MR launcher job
• Design Doc
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Documentation – Before Release 5.0 and After
Documentation redesign
OOZIE-3163: Improve documentation rendering: use fluido skin and better config
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Workflow Visualization – Prior to 5.0 and After
Jung GraphViz
OOZIE-2406: Completely rewrite Graph Generator code
© 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Fluent Job API – Apache Oozie 5.1
OOZIE-2339: Provide an API for writing jobs based on the XSD schemas
© 2018 Bloomberg Finance L.P. All rights reserved.
Apache Ambari
Ambari Provides:
• Provisioning of a Hadoop Cluster
• Management of a Hadoop Cluster
• Monitoring of a Hadoop Cluster
— A Metrics System for metrics collection
— An Alert Framework
— A dashboard for monitoring the Hadoop cluster
-Paraphrased from http://ambari.apache.org
© 2018 Bloomberg Finance L.P. All rights reserved.
Ambari Views
• Ambari Views ”offer a systematic way to plug-in UI capabilities to surface custom
visualization, management and monitoring features in Ambari Web. A "view" is a way of
extending Ambari that allows 3rd parties to plug in new resource types along with the
APIs, providers and UI to support them. In other words, a view is an application that is
deployed into the Ambari container.”
• Key takeaways:
— One does not need an Ambari managed (administrated) cluster
— Third parties can build views packages to run in the Ambari framework too
— Major views available:
(YARN) Capacity Scheduler, (HDFS) Files, HAWQ, Hive, Pig, Storm, Tez, (YARN
ATS) Jobs, (Oozie) Workflow Manager
• Alternatives: Cloudera Hue, bespoke applications
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Motivation
• Oozie workflows are defined in XML – too verbose
— Provide GUI workflow builder and editor
— Reduce possibility of user introduced errors
— Provide browser based workflow manager
• Integration with File Browser
— Includes S3 support
— Can replace existing
Oozie web UI
• Oozie is hard-coded to
display only 25 actions
— WFM doesn’t have this
limit; tested with 300+
action nodes
• Oozie is scalable
— Can scale WFM by
standing-up multiple
Ambari Views servers
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Editor Example
Workflow Manager:
• Available as an Ambari View
• Enables visual editing of Oozie workflows
• Integrated with file browser
• Reduces user input errors
• Minimal input required
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Execution View Example
• Integrated Dashboard with Workflow Manager View
• Manage Oozie jobs
• Drill down to logs
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Design Component
© 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Dashboard Component
Good Documentation: HDP 2.6 – Workflow Manager Basics
© 2018 Bloomberg Finance L.P. All rights reserved.
Art of Possible
• Scheduling “non-traditional” Hadoop workflows
— Schedule SQL maintenance operations
— Launch SQL Server on Linux in Docker on YARN for tests
— Warming Caches (HBase, LLAP, etc.)
• Administrative Tasks
— Log clean-up
— Clean-up crashed/abandoned Hive temporary data
— HBase management
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit San Jose 2018
• Setup Oozie – Server and Workflows
• Data Definition – Tables, ACLs
• Compactions – Operational
Workflow Manager Examples with
HBase
© 2018 Bloomberg Finance L.P. All rights reserved.
HBase – Setup
Oozie needs HBase Configuration:
• Oozie Server Code (to support HBase delegation tokens)
— In libexec (see Server JARs list)
— In oozie-site.xml
<name>oozie.credentials.credentialclasses</name>
<value>hbase=org.apache.oozie.action.hadoop.HbaseCredentials,…</value>
</name>
• Client Workflow Code:
— Add to workflow.xml:
<credentials>
<credential name=”myhbase_creds” type=”hbase”>
[…]
</credential>
</credentials>
— All your normal HBase security settings in the credential section
• Server JARs:
(Copy the following to Oozie’s libexec)
— hbase-common.jar
— hbase-client.jar
— hbase-server.jar
— hbase-protocol.jar
— hbase-hadoop2-compat.jar
© 2018 Bloomberg Finance L.P. All rights reserved.
create_my_table.rb:
tables = list
tables.select { |table|
table.eql?('my_table') }
if tables.empty?
create 'my_table',
{NAME => 'my_col'}
end
exit
HBase – Data Definition
HBase Shell:
<action name="HBASE-Shell" cred="hbase_creds">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>hbase</exec>
<argument>shell</argument>
<argument>-n</argument>
<argument>create_my_table.rb</argument>
</shell>
<ok to="do_more_things"/>
<error to="fail"/>
</action>
© 2018 Bloomberg Finance L.P. All rights reserved.
HBase – Compactions
HBASE-19528: Major Compaction Tool
• Automatically scales compaction to selected number of servers
• Requires read ability to /hbase
usage: MajorCompactor [-cf <arg>] [-dryRun] -servers <arg> -table <arg>
[...]
Usage instructions
-cf <arg> column families: comma separated eg: a,b,c
-dryRun Dry run, will just output a list of regions that
require compaction based on parameters passed
-minModTime <arg> Compact if store files have
modification time < minModTime
-servers <arg> Concurrent servers compacting
-table <arg> table name
...
© 2018 Bloomberg Finance L.P. All rights reserved.
More Resources
• Apache Oozie Mailing Lists: http://oozie.apache.org/mail-lists.html
• Artem’s Oozie Resources:
—12 Part Series on WFM: http://bit.ly/2syKUIh
— Oozie Examples: https://github.com/dbist/oozie-examples
• Clay’s Past Oozie Presentations:
— Code Deployment via Oozie: Apache BigData http://bit.ly/2sP2qbj
— HBase Multi-Tenancy with Oozie: DataWorks Summit http://bit.ly/2rw7FIR
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit San Jose 2018
Demo!
© 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit San Jose 2018
Questions?

More Related Content

What's hot

Accelerating query processing
Accelerating query processingAccelerating query processing
Accelerating query processing
DataWorks Summit
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache Ambari
DataWorks Summit
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
 
Enabling real interactive BI on Hadoop
Enabling real interactive BI on HadoopEnabling real interactive BI on Hadoop
Enabling real interactive BI on Hadoop
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
HBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, SolutionsHBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, Solutions
DataWorks Summit
 
Multitenancy At Bloomberg - HBase and Oozie
Multitenancy At Bloomberg - HBase and OozieMultitenancy At Bloomberg - HBase and Oozie
Multitenancy At Bloomberg - HBase and Oozie
DataWorks Summit
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
DataWorks Summit
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
Database as a Service - Tutorial @ICDE 2010
Database as a Service - Tutorial @ICDE 2010Database as a Service - Tutorial @ICDE 2010
Database as a Service - Tutorial @ICDE 2010
DBIS @ Ilmenau University of Technology
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
DataWorks Summit
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
Artem Ervits
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data Free
DataWorks Summit
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
DataWorks Summit
 

What's hot (20)

Accelerating query processing
Accelerating query processingAccelerating query processing
Accelerating query processing
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache Ambari
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Enabling real interactive BI on Hadoop
Enabling real interactive BI on HadoopEnabling real interactive BI on Hadoop
Enabling real interactive BI on Hadoop
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
HBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, SolutionsHBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, Solutions
 
Multitenancy At Bloomberg - HBase and Oozie
Multitenancy At Bloomberg - HBase and OozieMultitenancy At Bloomberg - HBase and Oozie
Multitenancy At Bloomberg - HBase and Oozie
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Database as a Service - Tutorial @ICDE 2010
Database as a Service - Tutorial @ICDE 2010Database as a Service - Tutorial @ICDE 2010
Database as a Service - Tutorial @ICDE 2010
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data Free
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 

Similar to Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager

Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerBreathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Artem Ervits
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
DataWorks Summit
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
Joe Crobak
 
APIdays 2016 - The State of Web API Languages
APIdays 2016  - The State of Web API LanguagesAPIdays 2016  - The State of Web API Languages
APIdays 2016 - The State of Web API Languages
Restlet
 
Oozie meetup - HA
Oozie meetup - HAOozie meetup - HA
Oozie meetup - HA
Mona Chitnis
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my data
Andrejs Vorobjovs
 
Lessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxLessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptx
apidays
 
API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17
Phil Wilkins
 
Modernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIsModernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIs
Apigee | Google Cloud
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
Running SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite CustomersRunning SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite Customers
Simon Haslam
 
First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)
Daniel Toomey
 
Custom Development in SharePoint – What are my options now?
Custom Development in SharePoint – What are my options now?Custom Development in SharePoint – What are my options now?
Custom Development in SharePoint – What are my options now?
Talbott Crowell
 
Building APIs with Apigee Edge and Microsoft Azure
Building APIs with Apigee Edge and Microsoft AzureBuilding APIs with Apigee Edge and Microsoft Azure
Building APIs with Apigee Edge and Microsoft Azure
Apigee | Google Cloud
 
SOA - From Webservices to APIs
SOA - From Webservices to APIsSOA - From Webservices to APIs
SOA - From Webservices to APIs
Holger Reinhardt
 
Top 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementationTop 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementation
OCTO Technology
 
Octo API-days 2015
Octo API-days 2015Octo API-days 2015
Octo API-days 2015
Antoine CHANTALOU
 
Add Apache Web Server to your Unified Monitoring Toolkit
Add Apache Web Server to your Unified Monitoring ToolkitAdd Apache Web Server to your Unified Monitoring Toolkit
Add Apache Web Server to your Unified Monitoring Toolkit
AppDynamics
 
Web jobs, Azure Functions and Serverless Computing
Web jobs, Azure Functions and Serverless ComputingWeb jobs, Azure Functions and Serverless Computing
Web jobs, Azure Functions and Serverless Computing
Paris Polyzos
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App Engine
Csaba Toth
 

Similar to Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager (20)

Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerBreathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
APIdays 2016 - The State of Web API Languages
APIdays 2016  - The State of Web API LanguagesAPIdays 2016  - The State of Web API Languages
APIdays 2016 - The State of Web API Languages
 
Oozie meetup - HA
Oozie meetup - HAOozie meetup - HA
Oozie meetup - HA
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my data
 
Lessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxLessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptx
 
API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17API Platform Cloud Service best practice - OOW17
API Platform Cloud Service best practice - OOW17
 
Modernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIsModernizing an Existing SOA-based Architecture with APIs
Modernizing an Existing SOA-based Architecture with APIs
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Running SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite CustomersRunning SOA in the Cloud: SOA CS for SOA Suite Customers
Running SOA in the Cloud: SOA CS for SOA Suite Customers
 
First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)
 
Custom Development in SharePoint – What are my options now?
Custom Development in SharePoint – What are my options now?Custom Development in SharePoint – What are my options now?
Custom Development in SharePoint – What are my options now?
 
Building APIs with Apigee Edge and Microsoft Azure
Building APIs with Apigee Edge and Microsoft AzureBuilding APIs with Apigee Edge and Microsoft Azure
Building APIs with Apigee Edge and Microsoft Azure
 
SOA - From Webservices to APIs
SOA - From Webservices to APIsSOA - From Webservices to APIs
SOA - From Webservices to APIs
 
Top 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementationTop 7 wrong common beliefs about Enterprise API implementation
Top 7 wrong common beliefs about Enterprise API implementation
 
Octo API-days 2015
Octo API-days 2015Octo API-days 2015
Octo API-days 2015
 
Add Apache Web Server to your Unified Monitoring Toolkit
Add Apache Web Server to your Unified Monitoring ToolkitAdd Apache Web Server to your Unified Monitoring Toolkit
Add Apache Web Server to your Unified Monitoring Toolkit
 
Web jobs, Azure Functions and Serverless Computing
Web jobs, Azure Functions and Serverless ComputingWeb jobs, Azure Functions and Serverless Computing
Web jobs, Azure Functions and Serverless Computing
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App Engine
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 

Recently uploaded (20)

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 

Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager

  • 1. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit San Jose 2018 June 20, 2018 Artem Ervits – Hortonworks Clay Baenziger – Bloomberg Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
  • 2. © 2018 Bloomberg Finance L.P. All rights reserved. Poll: • Who here uses Oozie? — In production? With kerberos? — Do you use HUE with Oozie? — How many workflows have you in production? 1-10? 10-50? 50+? — How many actions does the largest workflow contain? 1-10? 10-50? 50+? — Do you use Oozie with (or want to)? HBase? Spark? Python? Deployment Automation? • Do you like XML? — Do you have a favorite editor for Oozie workflows?
  • 3. © 2018 Bloomberg Finance L.P. All rights reserved. Open Source Workflow Managers • Apache Airflow (Incubating) • Luigi by Spotify • Azkaban by LinkedIn • (And of course) Apache Oozie
  • 4. © 2018 Bloomberg Finance L.P. All rights reserved. Introduction to Oozie • Oozie is a workflow scheduler system to manage Apache Hadoop jobs. • Oozie workflow jobs are Directed Acyclic Graphs (DAGs) of actions. • Oozie coordinator jobs are recurrent Oozie workflow jobs triggered by time and data availability. • Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs as well as system specific jobs out of the box. • Oozie is a scalable, reliable and extensible system. - Paraphrased from http://oozie.apache.org Actions: • Map/Reduce • Hive • Pig • HDFS • Java • Shell • Spark • Sub-Workflow • E-Mail • Decision • Fork • Join
  • 5. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Release Timeline • 1.x released in 2010. Yahoo! project with two GitHub releases. Added support for workflow jobs. • 2.x released in 2011. Still with Yahoo! with nine GitHub releases. Added support for coordinator jobs. • 3.x released in 2013. Project under Apache. Added support for bundle jobs and HBase credentials. • 4.x released in 2014. Added support for Hive/HCatalog, Spark integration and Oozie server high availability. • 5.0 released April 2018. Removes support for Hadoop 1, adds support for Hadoop 3, YARN AM instead of MR launcher, new actions, code clean up. - Adopted from: Apache Oozie by Mohammad Kamrul Islam and Aravind Srinivasan
  • 6. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Complaints • Launcher jobs as map tasks • Dated UI • Confusing object model & XML – workflows, coordinators, bundles • Complicated setup • DAG visualization • SLA alerting • Fine grained authorization • Easy access to log files
  • 7. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Complaints Improvements • Launcher jobs as map tasks – solved by Oozie 5.0.0, OOZIE-1770 • Dated UI – OOZIE-2683, targeted for Oozie 5.X (Hue and Workflow Manager today) • Confusing object model & XML – jobs API, patch available, targeted for 5.1, OOZIE-2339 • Complicated setup – can deploy with embedded Jetty in Oozie 5.0.0, OOZIE-2666 • DAG visualization – solved by Oozie 5.0.0, OOZIE-2406 • SLA alerting – since Oozie 4.0.0, OOZIE-1294 • Fine grained authorization – targeted for Oozie 5.X, OOZIE-3196 • Easy access to log files – solved by Diagnostic Tool in Oozie 5.0.0, OOZIE-2296
  • 8. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie UI – React Mock-Up - OOZIE-3283 • Workflows
  • 9. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Launcher – Prior to Release 5.0 • MR launcher job
  • 10. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Launcher – Release 5.0 • OYA: OOZIE-1770: Create Oozie Application Master for YARN — Removes MR launcher job • Design Doc
  • 11. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Documentation – Before Release 5.0 and After Documentation redesign OOZIE-3163: Improve documentation rendering: use fluido skin and better config
  • 12. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Workflow Visualization – Prior to 5.0 and After Jung GraphViz OOZIE-2406: Completely rewrite Graph Generator code
  • 13. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Fluent Job API – Apache Oozie 5.1 OOZIE-2339: Provide an API for writing jobs based on the XSD schemas
  • 14. © 2018 Bloomberg Finance L.P. All rights reserved. Apache Ambari Ambari Provides: • Provisioning of a Hadoop Cluster • Management of a Hadoop Cluster • Monitoring of a Hadoop Cluster — A Metrics System for metrics collection — An Alert Framework — A dashboard for monitoring the Hadoop cluster -Paraphrased from http://ambari.apache.org
  • 15. © 2018 Bloomberg Finance L.P. All rights reserved. Ambari Views • Ambari Views ”offer a systematic way to plug-in UI capabilities to surface custom visualization, management and monitoring features in Ambari Web. A "view" is a way of extending Ambari that allows 3rd parties to plug in new resource types along with the APIs, providers and UI to support them. In other words, a view is an application that is deployed into the Ambari container.” • Key takeaways: — One does not need an Ambari managed (administrated) cluster — Third parties can build views packages to run in the Ambari framework too — Major views available: (YARN) Capacity Scheduler, (HDFS) Files, HAWQ, Hive, Pig, Storm, Tez, (YARN ATS) Jobs, (Oozie) Workflow Manager • Alternatives: Cloudera Hue, bespoke applications
  • 16. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Motivation • Oozie workflows are defined in XML – too verbose — Provide GUI workflow builder and editor — Reduce possibility of user introduced errors — Provide browser based workflow manager • Integration with File Browser — Includes S3 support — Can replace existing Oozie web UI • Oozie is hard-coded to display only 25 actions — WFM doesn’t have this limit; tested with 300+ action nodes • Oozie is scalable — Can scale WFM by standing-up multiple Ambari Views servers
  • 17. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Editor Example Workflow Manager: • Available as an Ambari View • Enables visual editing of Oozie workflows • Integrated with file browser • Reduces user input errors • Minimal input required
  • 18. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Execution View Example • Integrated Dashboard with Workflow Manager View • Manage Oozie jobs • Drill down to logs
  • 19. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Design Component
  • 20. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Dashboard Component Good Documentation: HDP 2.6 – Workflow Manager Basics
  • 21. © 2018 Bloomberg Finance L.P. All rights reserved. Art of Possible • Scheduling “non-traditional” Hadoop workflows — Schedule SQL maintenance operations — Launch SQL Server on Linux in Docker on YARN for tests — Warming Caches (HBase, LLAP, etc.) • Administrative Tasks — Log clean-up — Clean-up crashed/abandoned Hive temporary data — HBase management
  • 22. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit San Jose 2018 • Setup Oozie – Server and Workflows • Data Definition – Tables, ACLs • Compactions – Operational Workflow Manager Examples with HBase
  • 23. © 2018 Bloomberg Finance L.P. All rights reserved. HBase – Setup Oozie needs HBase Configuration: • Oozie Server Code (to support HBase delegation tokens) — In libexec (see Server JARs list) — In oozie-site.xml <name>oozie.credentials.credentialclasses</name> <value>hbase=org.apache.oozie.action.hadoop.HbaseCredentials,…</value> </name> • Client Workflow Code: — Add to workflow.xml: <credentials> <credential name=”myhbase_creds” type=”hbase”> […] </credential> </credentials> — All your normal HBase security settings in the credential section • Server JARs: (Copy the following to Oozie’s libexec) — hbase-common.jar — hbase-client.jar — hbase-server.jar — hbase-protocol.jar — hbase-hadoop2-compat.jar
  • 24. © 2018 Bloomberg Finance L.P. All rights reserved. create_my_table.rb: tables = list tables.select { |table| table.eql?('my_table') } if tables.empty? create 'my_table', {NAME => 'my_col'} end exit HBase – Data Definition HBase Shell: <action name="HBASE-Shell" cred="hbase_creds"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>hbase</exec> <argument>shell</argument> <argument>-n</argument> <argument>create_my_table.rb</argument> </shell> <ok to="do_more_things"/> <error to="fail"/> </action>
  • 25. © 2018 Bloomberg Finance L.P. All rights reserved. HBase – Compactions HBASE-19528: Major Compaction Tool • Automatically scales compaction to selected number of servers • Requires read ability to /hbase usage: MajorCompactor [-cf <arg>] [-dryRun] -servers <arg> -table <arg> [...] Usage instructions -cf <arg> column families: comma separated eg: a,b,c -dryRun Dry run, will just output a list of regions that require compaction based on parameters passed -minModTime <arg> Compact if store files have modification time < minModTime -servers <arg> Concurrent servers compacting -table <arg> table name ...
  • 26. © 2018 Bloomberg Finance L.P. All rights reserved. More Resources • Apache Oozie Mailing Lists: http://oozie.apache.org/mail-lists.html • Artem’s Oozie Resources: —12 Part Series on WFM: http://bit.ly/2syKUIh — Oozie Examples: https://github.com/dbist/oozie-examples • Clay’s Past Oozie Presentations: — Code Deployment via Oozie: Apache BigData http://bit.ly/2sP2qbj — HBase Multi-Tenancy with Oozie: DataWorks Summit http://bit.ly/2rw7FIR
  • 27. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit San Jose 2018 Demo!
  • 28. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit San Jose 2018 Questions?