SlideShare a Scribd company logo
Orchestration Service with
Apache Airflow on Kubernetes
Presented By:- Raman Gupta
Sr Computer Scientist, ADOBE
Agenda:-
• Overview
• Orchestration Service: Requirements
• Why we selected Apache Airflow as underlying execution Engine
• Problems Faced with Apache Airflow
• Orchestration Service: Overcoming Challenges
• JSON DSL for workflow Authoring
• Architecture: Highly Scalable and Multi-Tenant Orchestration Service with Apache Airflow
on Kubernetes
 Orchestration Service Data Flow & Storage on Azure
 Running Apache Airflow on Kubernetes
 Orchestration Service support for multi-cluster.
• Key Learnings
Overview
• There are different teams in Adobe which need a workflow management system
to meet variety of use cases like Machine Learning, ETL, Scheduled Reporting etc..
• Each Team has set up its own scheduler like cron or workflow management
system. And they are facing similar kind of challenges while maintaining these
crons or workflow managers in production.
• There is no standardization and lot of duplication of work and effort in learning
and managing these workflow management systems.
• We started working n providing Orchestration Service as a fully managed &
Scalable, Multi-tenant service to help different Adobe teams to author / manage
& schedule multi step workflows.
Orchestration Service: Requirements
• Providing User Friendly APIs to author and Manage workflows lifecycle. Should
not expose underlying Workflow execution Engine.
• Should be multi-tenant and should provide resource guarantees
• Support for different workflow triggers like scheduled based trigger, ad-hoc
trigger, event based trigger
• Should be highly scalable. Should support 1000(s) of concurrent workflow
execution.
• Should manage different nature of workflows like workflow can run from few
minutes to few days.
• Scheduling latency should be within a threshold.
• It should support both static and dynamic workflows.
Why we selected Apache Airflow as underlying
execution Engine
• It has very active community.
• Its easily extensible through plugins.
• Very Rich UI to provide workflows Insights.
• Support for Distributed Executor like Kubernetes.
• Webserver can be extended support RESTful APIs for workflow
operations.
Problems Faced with Apache Airflow
• Workflow Scheduling Latency increases with increase in number of
workflow files.
• Airflow Releases are not backward compatible and might need
downtime.
• WorkflowTask is Memory intensive. Running single task can take
~150 MB of memory.
• Airflow scheduler is recommended to run as singleton so it can scale
vertically and not horizontally
• Kubernetes Executor is still evolving.
Orchestration Service: Overcoming Challenges
• Abstracts Apache Airflow by providing JSON based DSL to author workflow.
At the backend it transforms JSON to Airflow compatible DAG.
• Provides CRUD APIs to manage workflows
• It tracks Airflow Cluster Usage and provides necessary alerts
• To achieve high scalability it supports multiple Apache Airflow installation
and does all the routing.
• Support for custom Airflow config like “dag_dir_list_interval” for different
users.
• Support for Plugin interface to add and expose Airflow Operators as
reusable components.
• Contributions to Apache Airflow to make it more resilient and functionality
rich
As we have seen DAGS are python code comprising of airflow operators and various arguments of the DAGS. Adobe Orchestrator don't expect its
users to write a Python code for their DAGs, instead it provide python generation of DAGS using JSON where they can provide all the properties
and operators of the DAGs.
JSON DSL for workflow Authoring
JSON
validator
Parser and
Python code
generator
passes
Property Type
Id String
Description String
Type properties JSON
Connections List
Retries Number
dependsOn JSON
Using JSON properties
REST API
Create DAG
using JSON
Properties
for
JSON(Table
)
DAG
validator
Checks all
imports,
cycles,
code
syntax of
the
generated
Python
code
Adobe Orchestrator is like a service environment over Airflow which is used by the user. If any issue occurs then users does not get affected by that
and the service takes care of that. e.g. Scheduler goes down if there is any database connection issue, for which Orchestrator provides the AUTO-
RESTART which restarts the scheduler
Service Data Flow &
Storage
Running Apache Airflow on Kubernetes
Orchestration Service support for Multi-Cluster
Key Learnings
• Apache Airflow is not exposed to users so its possible to support different
or multiple Scheduling engine without Impacting usage.
• With JSON based DSL users can construct workflow without knowing
Airflow Primitive(s).
• HTTP interface between Orchestration Service & Airflow Webserver helped
in keeping separation of logic.
• Airflow metastore can become a bottleneck so provided a cleanup Api.
• Monitoring tools to check Airflow scheduler Health and restart it if
required.
• Monitoring & Alerts on number of Workflow files on Airflow Scheduler
otherwise scheduling latency can increase.
Thank You !!

More Related Content

What's hot

Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
Gary Stafford
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
Databricks
 
Review of Data Management Maturity Models
Review of Data Management Maturity ModelsReview of Data Management Maturity Models
Review of Data Management Maturity Models
Alan McSweeney
 
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayMigrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Databricks
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 
TPC-H Column Store and MPP systems
TPC-H Column Store and MPP systemsTPC-H Column Store and MPP systems
TPC-H Column Store and MPP systems
Mostafa Mokhtar
 
Data Modeling is Data Governance
Data Modeling is Data GovernanceData Modeling is Data Governance
Data Modeling is Data Governance
DATAVERSITY
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata Management
DATAVERSITY
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
Databricks
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
Lucian Neghina
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
Gokhan Atil
 
Cyber Security and Data Science
Cyber Security and Data Science Cyber Security and Data Science
Cyber Security and Data Science
Ania Wieczorek
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Timothy Spann
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
Dan Gunter
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
DataWorks Summit
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
 

What's hot (20)

Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 
Review of Data Management Maturity Models
Review of Data Management Maturity ModelsReview of Data Management Maturity Models
Review of Data Management Maturity Models
 
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayMigrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
TPC-H Column Store and MPP systems
TPC-H Column Store and MPP systemsTPC-H Column Store and MPP systems
TPC-H Column Store and MPP systems
 
Data Modeling is Data Governance
Data Modeling is Data GovernanceData Modeling is Data Governance
Data Modeling is Data Governance
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata Management
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
 
Cyber Security and Data Science
Cyber Security and Data Science Cyber Security and Data Science
Cyber Security and Data Science
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 

Similar to Orchestration service v2

DataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxDataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptx
John J Zhao
 
Azure serverless architectures
Azure serverless architecturesAzure serverless architectures
Azure serverless architectures
Benoit Le Pichon
 
Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkaban
datamantra
 
apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...
apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...
apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...
apidays
 
Managing Your Cloud Assets
Managing Your Cloud AssetsManaging Your Cloud Assets
Managing Your Cloud Assets
Amazon Web Services
 
SpringOne Tour: An Introduction to Azure Spring Apps Enterprise
SpringOne Tour: An Introduction to Azure Spring Apps EnterpriseSpringOne Tour: An Introduction to Azure Spring Apps Enterprise
SpringOne Tour: An Introduction to Azure Spring Apps Enterprise
VMware Tanzu
 
analytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsanalytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the aws
Scott Miao
 
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
Amazon Web Services
 
Building stateful serverless orchestrations with Azure Durable Azure Function...
Building stateful serverless orchestrations with Azure Durable Azure Function...Building stateful serverless orchestrations with Azure Durable Azure Function...
Building stateful serverless orchestrations with Azure Durable Azure Function...
Callon Campbell
 
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdfNET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
Tamir Dresher
 
以AWS Lambda與Amazon API Gateway打造無伺服器後端
以AWS Lambda與Amazon API Gateway打造無伺服器後端以AWS Lambda與Amazon API Gateway打造無伺服器後端
以AWS Lambda與Amazon API Gateway打造無伺服器後端
Amazon Web Services
 
WSO2 Intro Webinar - Scale your business with the cloud enabled WSO2 Applica...
WSO2 Intro Webinar -  Scale your business with the cloud enabled WSO2 Applica...WSO2 Intro Webinar -  Scale your business with the cloud enabled WSO2 Applica...
WSO2 Intro Webinar - Scale your business with the cloud enabled WSO2 Applica...WSO2
 
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
Amazon Web Services
 
Ppt for Online music store
Ppt for Online music storePpt for Online music store
Ppt for Online music store
ADEEBANADEEM
 
Serverless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloadsServerless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloads
Tensult
 
Copper: A high performance workflow engine
Copper: A high performance workflow engineCopper: A high performance workflow engine
Copper: A high performance workflow engine
dmoebius
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
Luciano Resende
 
Azure Functions - Introduction
Azure Functions - IntroductionAzure Functions - Introduction
Azure Functions - Introduction
Venkatesh Narayanan
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
Chinmay Kolhatkar
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
Terry Cho
 

Similar to Orchestration service v2 (20)

DataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxDataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptx
 
Azure serverless architectures
Azure serverless architecturesAzure serverless architectures
Azure serverless architectures
 
Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkaban
 
apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...
apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...
apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...
 
Managing Your Cloud Assets
Managing Your Cloud AssetsManaging Your Cloud Assets
Managing Your Cloud Assets
 
SpringOne Tour: An Introduction to Azure Spring Apps Enterprise
SpringOne Tour: An Introduction to Azure Spring Apps EnterpriseSpringOne Tour: An Introduction to Azure Spring Apps Enterprise
SpringOne Tour: An Introduction to Azure Spring Apps Enterprise
 
analytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsanalytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the aws
 
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computati...
 
Building stateful serverless orchestrations with Azure Durable Azure Function...
Building stateful serverless orchestrations with Azure Durable Azure Function...Building stateful serverless orchestrations with Azure Durable Azure Function...
Building stateful serverless orchestrations with Azure Durable Azure Function...
 
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdfNET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
 
以AWS Lambda與Amazon API Gateway打造無伺服器後端
以AWS Lambda與Amazon API Gateway打造無伺服器後端以AWS Lambda與Amazon API Gateway打造無伺服器後端
以AWS Lambda與Amazon API Gateway打造無伺服器後端
 
WSO2 Intro Webinar - Scale your business with the cloud enabled WSO2 Applica...
WSO2 Intro Webinar -  Scale your business with the cloud enabled WSO2 Applica...WSO2 Intro Webinar -  Scale your business with the cloud enabled WSO2 Applica...
WSO2 Intro Webinar - Scale your business with the cloud enabled WSO2 Applica...
 
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
 
Ppt for Online music store
Ppt for Online music storePpt for Online music store
Ppt for Online music store
 
Serverless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloadsServerless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloads
 
Copper: A high performance workflow engine
Copper: A high performance workflow engineCopper: A high performance workflow engine
Copper: A high performance workflow engine
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
 
Azure Functions - Introduction
Azure Functions - IntroductionAzure Functions - Introduction
Azure Functions - Introduction
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
 

Recently uploaded

一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
dxobcob
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
An Approach to Detecting Writing Styles Based on Clustering Techniques
An Approach to Detecting Writing Styles Based on Clustering TechniquesAn Approach to Detecting Writing Styles Based on Clustering Techniques
An Approach to Detecting Writing Styles Based on Clustering Techniques
ambekarshweta25
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 

Recently uploaded (20)

一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
An Approach to Detecting Writing Styles Based on Clustering Techniques
An Approach to Detecting Writing Styles Based on Clustering TechniquesAn Approach to Detecting Writing Styles Based on Clustering Techniques
An Approach to Detecting Writing Styles Based on Clustering Techniques
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 

Orchestration service v2

  • 1. Orchestration Service with Apache Airflow on Kubernetes Presented By:- Raman Gupta Sr Computer Scientist, ADOBE
  • 2. Agenda:- • Overview • Orchestration Service: Requirements • Why we selected Apache Airflow as underlying execution Engine • Problems Faced with Apache Airflow • Orchestration Service: Overcoming Challenges • JSON DSL for workflow Authoring • Architecture: Highly Scalable and Multi-Tenant Orchestration Service with Apache Airflow on Kubernetes  Orchestration Service Data Flow & Storage on Azure  Running Apache Airflow on Kubernetes  Orchestration Service support for multi-cluster. • Key Learnings
  • 3. Overview • There are different teams in Adobe which need a workflow management system to meet variety of use cases like Machine Learning, ETL, Scheduled Reporting etc.. • Each Team has set up its own scheduler like cron or workflow management system. And they are facing similar kind of challenges while maintaining these crons or workflow managers in production. • There is no standardization and lot of duplication of work and effort in learning and managing these workflow management systems. • We started working n providing Orchestration Service as a fully managed & Scalable, Multi-tenant service to help different Adobe teams to author / manage & schedule multi step workflows.
  • 4. Orchestration Service: Requirements • Providing User Friendly APIs to author and Manage workflows lifecycle. Should not expose underlying Workflow execution Engine. • Should be multi-tenant and should provide resource guarantees • Support for different workflow triggers like scheduled based trigger, ad-hoc trigger, event based trigger • Should be highly scalable. Should support 1000(s) of concurrent workflow execution. • Should manage different nature of workflows like workflow can run from few minutes to few days. • Scheduling latency should be within a threshold. • It should support both static and dynamic workflows.
  • 5. Why we selected Apache Airflow as underlying execution Engine • It has very active community. • Its easily extensible through plugins. • Very Rich UI to provide workflows Insights. • Support for Distributed Executor like Kubernetes. • Webserver can be extended support RESTful APIs for workflow operations.
  • 6. Problems Faced with Apache Airflow • Workflow Scheduling Latency increases with increase in number of workflow files. • Airflow Releases are not backward compatible and might need downtime. • WorkflowTask is Memory intensive. Running single task can take ~150 MB of memory. • Airflow scheduler is recommended to run as singleton so it can scale vertically and not horizontally • Kubernetes Executor is still evolving.
  • 7. Orchestration Service: Overcoming Challenges • Abstracts Apache Airflow by providing JSON based DSL to author workflow. At the backend it transforms JSON to Airflow compatible DAG. • Provides CRUD APIs to manage workflows • It tracks Airflow Cluster Usage and provides necessary alerts • To achieve high scalability it supports multiple Apache Airflow installation and does all the routing. • Support for custom Airflow config like “dag_dir_list_interval” for different users. • Support for Plugin interface to add and expose Airflow Operators as reusable components. • Contributions to Apache Airflow to make it more resilient and functionality rich
  • 8. As we have seen DAGS are python code comprising of airflow operators and various arguments of the DAGS. Adobe Orchestrator don't expect its users to write a Python code for their DAGs, instead it provide python generation of DAGS using JSON where they can provide all the properties and operators of the DAGs. JSON DSL for workflow Authoring JSON validator Parser and Python code generator passes Property Type Id String Description String Type properties JSON Connections List Retries Number dependsOn JSON Using JSON properties REST API Create DAG using JSON Properties for JSON(Table ) DAG validator Checks all imports, cycles, code syntax of the generated Python code Adobe Orchestrator is like a service environment over Airflow which is used by the user. If any issue occurs then users does not get affected by that and the service takes care of that. e.g. Scheduler goes down if there is any database connection issue, for which Orchestrator provides the AUTO- RESTART which restarts the scheduler
  • 9. Service Data Flow & Storage
  • 10. Running Apache Airflow on Kubernetes
  • 11. Orchestration Service support for Multi-Cluster
  • 12. Key Learnings • Apache Airflow is not exposed to users so its possible to support different or multiple Scheduling engine without Impacting usage. • With JSON based DSL users can construct workflow without knowing Airflow Primitive(s). • HTTP interface between Orchestration Service & Airflow Webserver helped in keeping separation of logic. • Airflow metastore can become a bottleneck so provided a cleanup Api. • Monitoring tools to check Airflow scheduler Health and restart it if required. • Monitoring & Alerts on number of Workflow files on Airflow Scheduler otherwise scheduling latency can increase.