SlideShare a Scribd company logo
Interactive Workflow
Management using
Azkaban
API driven workflow management for Spark
https://github.com/phatak-dev/interactive-azkaban
● Madhukara Phatak
● Technical Lead at Tellius
● Consultant and Trainer at
datamantra.io
● Consult in Hadoop, Spark
and Scala
● www.madhukaraphatak.com
Agenda
● Different Kind of Applications in Spark
● Why Interactive?
● Building an Interactive Application
● Workflow in Big data
● Challenges of Interactive Application
● Azkaban
● Azkaban in manual/batch mode
● Azkaban AJAX API
● Azkaban client in Scala
Big Data Applications
● Typically applications in big data are divided depending
upon the their work loads.
● Major divisions are
○ Batch Applications
○ Streaming Applications
● Most of the existing platforms support both of these
applications these days
● But there is new category of applications are in raise,
they are known as interactive applications
Big data Interactive Applications
● Ability to manipulate data in interactive way
● Exploratory in nature
● Moves away from notion that ETL, Analysis has to be in
silos
● Combines batch and streaming data
● For Development
○ Zepplin, Jupyter Notebook etc
● For Production
○ DataMeer, Tellius,ZoomData etc
Spark and Interactive Applications
● Apache Spark is only big data platform built from
scratch to support interactive applications
● Spark made interactive data exploration using
notebooks popular
● Caching and Intelligent lazy mechanism makes it great
tool for interactive systems
● As spark system combines ETL, Exploration and
Advanced Analytics in one platform, we can do all the
data work in interactive fashion.
Building an Interactive Application
REST based Spark Application
Spark Cluster
REST
API Client
Databa
se
HDFS
Akka-Http
● Framework to build reactive web application/ services
● Build on top AKKA abstractions for concurrency
● Next version of popular REST framework spray
● As stream is the base abstraction, works well with the
spark
● Written in Scala. Has API’s in Java and Scala
● We will use local spark session to interact with Spark
Simple API
● The below is the API we expose
○ /load - for loading the data
○ /view - for looking at the sample data
○ /schedule - for schedule operations
● All these operations are simple, but they give you what
an API based system look like
● We test the API’s using postman to emulate interactive
mode
● Ex : RestService.scala
Workflow management in Big Data
Need of Workflow in Big data
● Most of the tasks we do in big data are repetitive in
nature
● Once we have determined our flow, we want to run it on
new data as and when it arrives
● Two parts -
○ Flow Definition
○ Scheduling
● Use cases
○ ETL, Updating models etc
Workflow for Batch
● Most of the scheduling for batch applications is done
using some kind of scripting
● Many ways are there to define flow and executing
● Once code is tested, code is deployed and scripts are
scheduled
● These scripts define the flow structure and use some
scheduling to run the operations
● Well known frameworks for batch scheduling are
○ Oozie
○ Airflow
Workflow for Streaming
● Streaming frameworks itself most of the time handle the
workflow need of the application
● The spark streaming code defines the flow that needs to
be run
● Spark Streaming Scheduler runs the flow as and when
new data appears
● So rarely we use an external workflow framework for
executing these work loads
Workflow for Interactive Application
● Ability to define the workflows on the fly rather than
fixed workflows as in case of batch
● Ability to schedule and unscheduled using API’s
● Should be able to handle both batch and streaming
sources of data
● Should integrate with the state build up using the
interactions in the interactive mode
● Ability to monitor the status of the running jobs in
realtime
Challenges of scheduling for
interactive
● Most of the workflow systems does not expose REST
API to interact with system to define flow and
scheduling
● Many lack good monitoring system to query the status
of the running tasks which is critical
● Most of the workflow systems run on their own
sandboxed execution engine which makes them hard to
integrate with the application state
● More details [2]
Azkaban
● Azkaban is a workflow job scheduler created at LinkedIn
to run Hadoop Jobs
● Has good support to define the dependencies through
flow mechanism and monitoring of the jobs
● Allows extending the UI to track new metrics
● Supports for multiple runtimes like
○ Hadoop
○ Spark
○ Java
Azkaban Batch Mode
● Azkaban is primarily built for scheduling big data batch
jobs
● It has a simple dsl to define the flows
● It allows us to define different executors for a given flow
● The abstractions
○ Project
○ Flow
● Ex : Running a java flow using Azkaban UI
Azkaban for Interactive Workflows
Azkaban AJAX API
● Though Azkaban is primarily build for the batch jobs, it
has a AJAX API to interact with the workflow system
● This is an API primarily built for the UI to interact with
the engine
● Though it’s not a full fledged REST API, it’s good
enough to build an interactive workflow system with this
API
● This AJAX API makes Azkaban ideal workflow
management system for the interactive applications.
Azkaban Scala Client
● Azkaban AJAX API has some rough edges as it’s not
meant to be work as standard REST API
● Interacting with API directly will be painful in your
application
● azkaban-scala-client is a scala client which makes
interactive with azkaban much easier
● Most of the API’s are exposed using scala, feature
requests are welcomed
● https://github.com/phatak-dev/azkaban-scala-client
Schedule in REST API
● As we understood how to use Azkaban API to interact
with workflow manager now we can use it in our REST
API
● We will use our scala client to interact with azkaban
● The implementation of the flow will do a request to the
rest server in order to use the state available in rest
server
● Ex : Scheduler.scala
References
● http://blog.madhukaraphatak.com/interactive-scheduling
-using-azkaban-setting-up-solo-server/
● http://blog.madhukaraphatak.com/interactive-scheduling
-using-azkaban-challenges-in-scheduling-interactive-wo
rkloads/
● http://azkaban.github.io/azkaban/docs/latest/#ajax-api
● https://github.com/azkaban/azkaban

More Related Content

What's hot

An introduction to Apache Thrift
An introduction to Apache ThriftAn introduction to Apache Thrift
An introduction to Apache Thrift
Mike Frampton
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
Muralidharan Deenathayalan
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Spark Summit
 
6.hive
6.hive6.hive
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
Mostafa
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
Flink Forward
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Databricks
 
DAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraDAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon Aurora
Amazon Web Services
 
소프트웨어 개발 트랜드 및 MSA (마이크로 서비스 아키텍쳐)의 이해
소프트웨어 개발 트랜드 및 MSA (마이크로 서비스 아키텍쳐)의 이해소프트웨어 개발 트랜드 및 MSA (마이크로 서비스 아키텍쳐)의 이해
소프트웨어 개발 트랜드 및 MSA (마이크로 서비스 아키텍쳐)의 이해
Terry Cho
 
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
NTT DATA Technology & Innovation
 
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
Amazon Web Services Japan
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
Girish Khanzode
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Vinoth Chandar
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
Ji-Woong Choi
 
Integrating Microservices with Apache Camel
Integrating Microservices with Apache CamelIntegrating Microservices with Apache Camel
Integrating Microservices with Apache Camel
Christian Posta
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
Databricks
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
Osama Hussein
 

What's hot (20)

An introduction to Apache Thrift
An introduction to Apache ThriftAn introduction to Apache Thrift
An introduction to Apache Thrift
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
 
6.hive
6.hive6.hive
6.hive
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
 
DAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraDAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon Aurora
 
소프트웨어 개발 트랜드 및 MSA (마이크로 서비스 아키텍쳐)의 이해
소프트웨어 개발 트랜드 및 MSA (마이크로 서비스 아키텍쳐)의 이해소프트웨어 개발 트랜드 및 MSA (마이크로 서비스 아키텍쳐)의 이해
소프트웨어 개발 트랜드 및 MSA (마이크로 서비스 아키텍쳐)의 이해
 
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
 
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
 
Integrating Microservices with Apache Camel
Integrating Microservices with Apache CamelIntegrating Microservices with Apache Camel
Integrating Microservices with Apache Camel
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 

Viewers also liked

Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
datamantra
 
Azkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EngineAzkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation Engine
Praveen Thirukonda
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
DataWorks Summit
 
Azkaban
AzkabanAzkaban
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
datamantra
 
Building end to end streaming application on Spark
Building end to end streaming application on SparkBuilding end to end streaming application on Spark
Building end to end streaming application on Spark
datamantra
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
Delhi/NCR HUG
 
Building a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkBuilding a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache Spark
DataWorks Summit
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
datamantra
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
Hakka Labs
 
Improving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time SparkImproving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time Spark
datamantra
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
Vlad Orlov
 
Azkaban and Pig at LinkedIn
Azkaban and Pig at LinkedInAzkaban and Pig at LinkedIn
Azkaban and Pig at LinkedIn
Russell Jurney
 
Spark Workflow Management
Spark Workflow ManagementSpark Workflow Management
Spark Workflow Management
Romi Kuntsman
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
Christian Gügi
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data Science
Erik Bernhardsson
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
Growth Intelligence
 
Azkaban-en
Azkaban-enAzkaban-en
Azkaban-en
wyukawa
 
Azkaban
AzkabanAzkaban
Azkaban
wyukawa
 

Viewers also liked (20)

Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Azkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EngineAzkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation Engine
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
 
Azkaban
AzkabanAzkaban
Azkaban
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Building end to end streaming application on Spark
Building end to end streaming application on SparkBuilding end to end streaming application on Spark
Building end to end streaming application on Spark
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Building a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkBuilding a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache Spark
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Improving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time SparkImproving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time Spark
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Azkaban and Pig at LinkedIn
Azkaban and Pig at LinkedInAzkaban and Pig at LinkedIn
Azkaban and Pig at LinkedIn
 
Spark Workflow Management
Spark Workflow ManagementSpark Workflow Management
Spark Workflow Management
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data Science
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
 
Azkaban-en
Azkaban-enAzkaban-en
Azkaban-en
 
Azkaban
AzkabanAzkaban
Azkaban
 

Similar to Interactive workflow management using Azkaban

A Tool For Big Data Analysis using Apache Spark
A Tool For Big Data Analysis using Apache SparkA Tool For Big Data Analysis using Apache Spark
A Tool For Big Data Analysis using Apache Spark
datamantra
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
datamantra
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
datamantra
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Orchestration service v2
Orchestration service v2Orchestration service v2
Orchestration service v2
Raman Gupta
 
Function Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
Function Mesh for Apache Pulsar, the Way for Simple Streaming SolutionsFunction Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
Function Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
StreamNative
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKC
Mark Smith
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
AKASH SIHAG
 
Integration Microservices
Integration MicroservicesIntegration Microservices
Integration Microservices
Kasun Indrasiri
 
Sashko Stubailo - The GraphQL and Apollo Stack: connecting everything together
Sashko Stubailo - The GraphQL and Apollo Stack: connecting everything togetherSashko Stubailo - The GraphQL and Apollo Stack: connecting everything together
Sashko Stubailo - The GraphQL and Apollo Stack: connecting everything together
React Conf Brasil
 
The Apollo and GraphQL Stack
The Apollo and GraphQL StackThe Apollo and GraphQL Stack
The Apollo and GraphQL Stack
Sashko Stubailo
 
Apache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosApache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectos
OpenSistemas
 
Cassandra Lunch #88: Cadence
Cassandra Lunch #88: CadenceCassandra Lunch #88: Cadence
Cassandra Lunch #88: Cadence
Anant Corporation
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
Karan Alang
 
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Eventador
 
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Skillenza Build with Serverless Challenge -  Advanced Serverless ConceptsSkillenza Build with Serverless Challenge -  Advanced Serverless Concepts
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Dhaval Nagar
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Garindra Prahandono
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
datamantra
 
DataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxDataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptx
John J Zhao
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
datamantra
 

Similar to Interactive workflow management using Azkaban (20)

A Tool For Big Data Analysis using Apache Spark
A Tool For Big Data Analysis using Apache SparkA Tool For Big Data Analysis using Apache Spark
A Tool For Big Data Analysis using Apache Spark
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Orchestration service v2
Orchestration service v2Orchestration service v2
Orchestration service v2
 
Function Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
Function Mesh for Apache Pulsar, the Way for Simple Streaming SolutionsFunction Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
Function Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKC
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 
Integration Microservices
Integration MicroservicesIntegration Microservices
Integration Microservices
 
Sashko Stubailo - The GraphQL and Apollo Stack: connecting everything together
Sashko Stubailo - The GraphQL and Apollo Stack: connecting everything togetherSashko Stubailo - The GraphQL and Apollo Stack: connecting everything together
Sashko Stubailo - The GraphQL and Apollo Stack: connecting everything together
 
The Apollo and GraphQL Stack
The Apollo and GraphQL StackThe Apollo and GraphQL Stack
The Apollo and GraphQL Stack
 
Apache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosApache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectos
 
Cassandra Lunch #88: Cadence
Cassandra Lunch #88: CadenceCassandra Lunch #88: Cadence
Cassandra Lunch #88: Cadence
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
 
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
Introduction to SQLStreamBuilder: Rich Streaming SQL Interface for Creating a...
 
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Skillenza Build with Serverless Challenge -  Advanced Serverless ConceptsSkillenza Build with Serverless Challenge -  Advanced Serverless Concepts
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
 
DataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxDataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptx
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
 

More from datamantra

Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
datamantra
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
datamantra
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
datamantra
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
datamantra
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
datamantra
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
datamantra
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
datamantra
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
datamantra
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
datamantra
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
datamantra
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
datamantra
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
datamantra
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scala
datamantra
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
datamantra
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
datamantra
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
datamantra
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
datamantra
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
datamantra
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
datamantra
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTP
datamantra
 

More from datamantra (20)

Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scala
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTP
 

Recently uploaded

New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
tanupasswan6
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
huseindihon
 
Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)
Alireza Kamrani
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
satpalsheravatmumbai
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
NABLAS株式会社
 
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
Grant McAlister
 
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdfWhy_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Alexander Teggin
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
revolutionary575
 
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
tanupasswan6
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
ginni singh$A17
 
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
sheetal singh$A17
 
Semantic Web and organizational data .pptx
Semantic Web and organizational data .pptxSemantic Web and organizational data .pptx
Semantic Web and organizational data .pptx
Kanchana Weerasinghe
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
revolutionary575
 
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Alireza Kamrani
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
kuldeepsharmaks8120
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
erynsouthern
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
rightmanforbloodline
 
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
sukaniyasunnu
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
kinni singh$A17
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
Kanchana Weerasinghe
 

Recently uploaded (20)

New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
 
Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
 
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
 
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdfWhy_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
 
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
 
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
 
Semantic Web and organizational data .pptx
Semantic Web and organizational data .pptxSemantic Web and organizational data .pptx
Semantic Web and organizational data .pptx
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
 
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
 
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
 

Interactive workflow management using Azkaban

  • 1. Interactive Workflow Management using Azkaban API driven workflow management for Spark https://github.com/phatak-dev/interactive-azkaban
  • 2. ● Madhukara Phatak ● Technical Lead at Tellius ● Consultant and Trainer at datamantra.io ● Consult in Hadoop, Spark and Scala ● www.madhukaraphatak.com
  • 3. Agenda ● Different Kind of Applications in Spark ● Why Interactive? ● Building an Interactive Application ● Workflow in Big data ● Challenges of Interactive Application ● Azkaban ● Azkaban in manual/batch mode ● Azkaban AJAX API ● Azkaban client in Scala
  • 4. Big Data Applications ● Typically applications in big data are divided depending upon the their work loads. ● Major divisions are ○ Batch Applications ○ Streaming Applications ● Most of the existing platforms support both of these applications these days ● But there is new category of applications are in raise, they are known as interactive applications
  • 5. Big data Interactive Applications ● Ability to manipulate data in interactive way ● Exploratory in nature ● Moves away from notion that ETL, Analysis has to be in silos ● Combines batch and streaming data ● For Development ○ Zepplin, Jupyter Notebook etc ● For Production ○ DataMeer, Tellius,ZoomData etc
  • 6. Spark and Interactive Applications ● Apache Spark is only big data platform built from scratch to support interactive applications ● Spark made interactive data exploration using notebooks popular ● Caching and Intelligent lazy mechanism makes it great tool for interactive systems ● As spark system combines ETL, Exploration and Advanced Analytics in one platform, we can do all the data work in interactive fashion.
  • 8. REST based Spark Application Spark Cluster REST API Client Databa se HDFS
  • 9. Akka-Http ● Framework to build reactive web application/ services ● Build on top AKKA abstractions for concurrency ● Next version of popular REST framework spray ● As stream is the base abstraction, works well with the spark ● Written in Scala. Has API’s in Java and Scala ● We will use local spark session to interact with Spark
  • 10. Simple API ● The below is the API we expose ○ /load - for loading the data ○ /view - for looking at the sample data ○ /schedule - for schedule operations ● All these operations are simple, but they give you what an API based system look like ● We test the API’s using postman to emulate interactive mode ● Ex : RestService.scala
  • 12. Need of Workflow in Big data ● Most of the tasks we do in big data are repetitive in nature ● Once we have determined our flow, we want to run it on new data as and when it arrives ● Two parts - ○ Flow Definition ○ Scheduling ● Use cases ○ ETL, Updating models etc
  • 13. Workflow for Batch ● Most of the scheduling for batch applications is done using some kind of scripting ● Many ways are there to define flow and executing ● Once code is tested, code is deployed and scripts are scheduled ● These scripts define the flow structure and use some scheduling to run the operations ● Well known frameworks for batch scheduling are ○ Oozie ○ Airflow
  • 14. Workflow for Streaming ● Streaming frameworks itself most of the time handle the workflow need of the application ● The spark streaming code defines the flow that needs to be run ● Spark Streaming Scheduler runs the flow as and when new data appears ● So rarely we use an external workflow framework for executing these work loads
  • 15. Workflow for Interactive Application ● Ability to define the workflows on the fly rather than fixed workflows as in case of batch ● Ability to schedule and unscheduled using API’s ● Should be able to handle both batch and streaming sources of data ● Should integrate with the state build up using the interactions in the interactive mode ● Ability to monitor the status of the running jobs in realtime
  • 16. Challenges of scheduling for interactive ● Most of the workflow systems does not expose REST API to interact with system to define flow and scheduling ● Many lack good monitoring system to query the status of the running tasks which is critical ● Most of the workflow systems run on their own sandboxed execution engine which makes them hard to integrate with the application state ● More details [2]
  • 17. Azkaban ● Azkaban is a workflow job scheduler created at LinkedIn to run Hadoop Jobs ● Has good support to define the dependencies through flow mechanism and monitoring of the jobs ● Allows extending the UI to track new metrics ● Supports for multiple runtimes like ○ Hadoop ○ Spark ○ Java
  • 18. Azkaban Batch Mode ● Azkaban is primarily built for scheduling big data batch jobs ● It has a simple dsl to define the flows ● It allows us to define different executors for a given flow ● The abstractions ○ Project ○ Flow ● Ex : Running a java flow using Azkaban UI
  • 20. Azkaban AJAX API ● Though Azkaban is primarily build for the batch jobs, it has a AJAX API to interact with the workflow system ● This is an API primarily built for the UI to interact with the engine ● Though it’s not a full fledged REST API, it’s good enough to build an interactive workflow system with this API ● This AJAX API makes Azkaban ideal workflow management system for the interactive applications.
  • 21. Azkaban Scala Client ● Azkaban AJAX API has some rough edges as it’s not meant to be work as standard REST API ● Interacting with API directly will be painful in your application ● azkaban-scala-client is a scala client which makes interactive with azkaban much easier ● Most of the API’s are exposed using scala, feature requests are welcomed ● https://github.com/phatak-dev/azkaban-scala-client
  • 22. Schedule in REST API ● As we understood how to use Azkaban API to interact with workflow manager now we can use it in our REST API ● We will use our scala client to interact with azkaban ● The implementation of the flow will do a request to the rest server in order to use the state available in rest server ● Ex : Scheduler.scala