SlideShare a Scribd company logo
1 of 6
Airflow on Kubernetes
An Anant Corporation Story.
Using kubernetes to deploy airflow
What Is Airflow?
Apache Airflow is one realization of the
DevOps philosophy of "Configuration As
Code." Airflow allows users to launch multi-
step pipelines using a simple Python object
DAG (Directed Acyclic Graph). You can
define dependencies, programmatically
construct complex workflows, and monitor
scheduled jobs in an easy to read UI.
Why Airflow on Kubernetes?
Since its inception, Airflow's greatest strength has been its flexibility. Airflow offers a wide range of integrations for services
ranging from Spark and HBase, to services on various cloud providers. Airflow also offers easy extensibility through its plug-
in framework. However, one limitation of the project is that Airflow users are confined to the frameworks and clients that
exist on the Airflow worker at the moment of execution. A single organization can have varied Airflow workflows ranging
from data science pipelines to application deployments. This difference in use-case creates issues in dependency
management as both teams might use vastly different libraries for their workflows.
To address this issue, we've utilized Kubernetes to allow users to launch arbitrary Kubernetes pods and configurations.
Airflow users can now have full power over their run-time environments, resources, and secrets, basically turning Airflow into
an "any job you want" workflow orchestrator.
The Kubernetes Operator
Before we move any further, we should clarify that an Operator in Airflow is a task definition. When a
user creates a DAG, they would use an operator like the "SparkSubmitOperator" or the
"PythonOperator" to submit/monitor a Spark job or a Python function respectively. Airflow comes with
built-in operators for frameworks like Apache Spark, BigQuery, Hive, and EMR. It also offers a Plugins
entrypoint that allows DevOps engineers to develop their own connectors.
Airflow users are always looking for
ways to make deployments and
ETL pipelines simpler to manage.
Any opportunity to decouple
pipeline steps, while increasing
monitoring, can reduce future
outages and fire-fights. The
following is a list of benefits
provided by the Airflow Kubernetes
Operator:
● Increased flexibility for deployments
● Flexibility of configurations and dependencies
● Usage of kubernetes secrets for added
security
Architecture
The Kubernetes Operator uses the
Kubernetes Python Client to generate a
request that is processed by the
APIServer (1). Kubernetes will then
launch your pod with whatever specs
you've defined (2). Images will be loaded
with all the necessary environment
variables, secrets and dependencies,
enacting a single command. Once the job
is launched, the operator only needs to
monitor the health of track logs (3). Users
will have the choice of gathering logs
locally to the scheduler or to any
distributed logging service currently in
their Kubernetes cluster.
Demo

More Related Content

What's hot

5 - Hands-on Kubernetes Workshop:
5 - Hands-on Kubernetes Workshop:5 - Hands-on Kubernetes Workshop:
5 - Hands-on Kubernetes Workshop:Kangaroot
 
Persist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystemPersist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystemLibbySchulze
 
Helm summit 2019_handling large number of charts_sept 10
Helm summit 2019_handling large number of charts_sept 10Helm summit 2019_handling large number of charts_sept 10
Helm summit 2019_handling large number of charts_sept 10Shikha Srivastava
 
Manage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrierManage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrierLibbySchulze
 
OpenStack - Tzu-Mainn Chen, Marek Aufart, Petr Blaho - ManageIQ Design Summit...
OpenStack - Tzu-Mainn Chen, Marek Aufart, Petr Blaho - ManageIQ Design Summit...OpenStack - Tzu-Mainn Chen, Marek Aufart, Petr Blaho - ManageIQ Design Summit...
OpenStack - Tzu-Mainn Chen, Marek Aufart, Petr Blaho - ManageIQ Design Summit...ManageIQ
 
[Lakmal] Automate Microservice to API
[Lakmal] Automate Microservice to API[Lakmal] Automate Microservice to API
[Lakmal] Automate Microservice to APILakmal Warusawithana
 
How kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedHow kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedShikha Srivastava
 
7 - Monitoring Kubernetes with Elastic
7 - Monitoring Kubernetes with Elastic7 - Monitoring Kubernetes with Elastic
7 - Monitoring Kubernetes with ElasticKangaroot
 
Storage os kubernetes clusters need persistent data
Storage os   kubernetes clusters need persistent dataStorage os   kubernetes clusters need persistent data
Storage os kubernetes clusters need persistent dataLibbySchulze
 
9 - Making Sense of Containers in the Microsoft Cloud
9 - Making Sense of Containers in the Microsoft Cloud9 - Making Sense of Containers in the Microsoft Cloud
9 - Making Sense of Containers in the Microsoft CloudKangaroot
 
Serverless Functions: Accelerating DevOps Adoption
Serverless Functions: Accelerating DevOps AdoptionServerless Functions: Accelerating DevOps Adoption
Serverless Functions: Accelerating DevOps AdoptionAll Things Open
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...CodeOps Technologies LLP
 

What's hot (18)

5 - Hands-on Kubernetes Workshop:
5 - Hands-on Kubernetes Workshop:5 - Hands-on Kubernetes Workshop:
5 - Hands-on Kubernetes Workshop:
 
Persist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystemPersist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystem
 
Helm summit 2019_handling large number of charts_sept 10
Helm summit 2019_handling large number of charts_sept 10Helm summit 2019_handling large number of charts_sept 10
Helm summit 2019_handling large number of charts_sept 10
 
Manage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrierManage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrier
 
OpenStack - Tzu-Mainn Chen, Marek Aufart, Petr Blaho - ManageIQ Design Summit...
OpenStack - Tzu-Mainn Chen, Marek Aufart, Petr Blaho - ManageIQ Design Summit...OpenStack - Tzu-Mainn Chen, Marek Aufart, Petr Blaho - ManageIQ Design Summit...
OpenStack - Tzu-Mainn Chen, Marek Aufart, Petr Blaho - ManageIQ Design Summit...
 
[Lakmal] Automate Microservice to API
[Lakmal] Automate Microservice to API[Lakmal] Automate Microservice to API
[Lakmal] Automate Microservice to API
 
Deploy prometheus on kubernetes
Deploy prometheus on kubernetesDeploy prometheus on kubernetes
Deploy prometheus on kubernetes
 
How kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedHow kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updated
 
Neutron Updates - Kilo Edition
Neutron Updates - Kilo EditionNeutron Updates - Kilo Edition
Neutron Updates - Kilo Edition
 
7 - Monitoring Kubernetes with Elastic
7 - Monitoring Kubernetes with Elastic7 - Monitoring Kubernetes with Elastic
7 - Monitoring Kubernetes with Elastic
 
Storage os kubernetes clusters need persistent data
Storage os   kubernetes clusters need persistent dataStorage os   kubernetes clusters need persistent data
Storage os kubernetes clusters need persistent data
 
9 - Making Sense of Containers in the Microsoft Cloud
9 - Making Sense of Containers in the Microsoft Cloud9 - Making Sense of Containers in the Microsoft Cloud
9 - Making Sense of Containers in the Microsoft Cloud
 
Monitoring Cockpit for OpenShift Clusters
Monitoring Cockpit for OpenShift ClustersMonitoring Cockpit for OpenShift Clusters
Monitoring Cockpit for OpenShift Clusters
 
Crunchy containers
Crunchy containersCrunchy containers
Crunchy containers
 
Serverless Functions: Accelerating DevOps Adoption
Serverless Functions: Accelerating DevOps AdoptionServerless Functions: Accelerating DevOps Adoption
Serverless Functions: Accelerating DevOps Adoption
 
DevOps: Infrastructure as Code
DevOps: Infrastructure as CodeDevOps: Infrastructure as Code
DevOps: Infrastructure as Code
 
Introduction to chef
Introduction to chefIntroduction to chef
Introduction to chef
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
 

Similar to Data Engineer's Lunch #47: Airflow on Kubernetes

Kubernetes Java Operator
Kubernetes Java OperatorKubernetes Java Operator
Kubernetes Java OperatorAnthony Dahanne
 
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuSpring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuVMware Tanzu
 
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB201904_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019Kumton Suttiraksiri
 
Argo Workflows 3.0, a detailed look at what’s new from the Argo Team
Argo Workflows 3.0, a detailed look at what’s new from the Argo TeamArgo Workflows 3.0, a detailed look at what’s new from the Argo Team
Argo Workflows 3.0, a detailed look at what’s new from the Argo TeamLibbySchulze
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Anthony Dahanne
 
6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf
6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf
6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdfMars Devs
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kuberneteskloia
 
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with KubernetesSumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with KubernetesSumo Logic
 
Jenkins_K8s (2).pptx
Jenkins_K8s (2).pptxJenkins_K8s (2).pptx
Jenkins_K8s (2).pptxkhalil Ismail
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesDatabricks
 
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019UA DevOps Conference
 
Kubernetes for Java developers
Kubernetes for Java developersKubernetes for Java developers
Kubernetes for Java developersRobert Barr
 
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...NETWAYS
 
OpenNebulaConf 2014 - Cloud Automation for OpenNebula - Kishorekumar Neelamegam
OpenNebulaConf 2014 - Cloud Automation for OpenNebula - Kishorekumar NeelamegamOpenNebulaConf 2014 - Cloud Automation for OpenNebula - Kishorekumar Neelamegam
OpenNebulaConf 2014 - Cloud Automation for OpenNebula - Kishorekumar NeelamegamOpenNebula Project
 
Manage your kubernetes cluster with cluster api, azure and git ops
Manage your kubernetes cluster with cluster api, azure and git opsManage your kubernetes cluster with cluster api, azure and git ops
Manage your kubernetes cluster with cluster api, azure and git opsJorge Arteiro
 
OpenStack and Kubernetes - A match made for Telco Heaven
OpenStack and Kubernetes - A match made for Telco HeavenOpenStack and Kubernetes - A match made for Telco Heaven
OpenStack and Kubernetes - A match made for Telco HeavenTrinath Somanchi
 
How to Integrate Kubernetes in OpenStack
 How to Integrate Kubernetes in OpenStack  How to Integrate Kubernetes in OpenStack
How to Integrate Kubernetes in OpenStack Meng-Ze Lee
 
Docker OpenStack Cloud Foundry
Docker OpenStack Cloud FoundryDocker OpenStack Cloud Foundry
Docker OpenStack Cloud FoundryAnimesh Singh
 
Evolution of netflix conductor
Evolution of netflix conductorEvolution of netflix conductor
Evolution of netflix conductorvedu12
 
Future of Kubernetes and its Impact on Technology Industry.pdf
Future of Kubernetes and its Impact on Technology Industry.pdfFuture of Kubernetes and its Impact on Technology Industry.pdf
Future of Kubernetes and its Impact on Technology Industry.pdfUrolime Technologies
 

Similar to Data Engineer's Lunch #47: Airflow on Kubernetes (20)

Kubernetes Java Operator
Kubernetes Java OperatorKubernetes Java Operator
Kubernetes Java Operator
 
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuSpring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
 
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB201904_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
 
Argo Workflows 3.0, a detailed look at what’s new from the Argo Team
Argo Workflows 3.0, a detailed look at what’s new from the Argo TeamArgo Workflows 3.0, a detailed look at what’s new from the Argo Team
Argo Workflows 3.0, a detailed look at what’s new from the Argo Team
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018
 
6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf
6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf
6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
 
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with KubernetesSumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
 
Jenkins_K8s (2).pptx
Jenkins_K8s (2).pptxJenkins_K8s (2).pptx
Jenkins_K8s (2).pptx
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
 
Kubernetes for Java developers
Kubernetes for Java developersKubernetes for Java developers
Kubernetes for Java developers
 
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...
 
OpenNebulaConf 2014 - Cloud Automation for OpenNebula - Kishorekumar Neelamegam
OpenNebulaConf 2014 - Cloud Automation for OpenNebula - Kishorekumar NeelamegamOpenNebulaConf 2014 - Cloud Automation for OpenNebula - Kishorekumar Neelamegam
OpenNebulaConf 2014 - Cloud Automation for OpenNebula - Kishorekumar Neelamegam
 
Manage your kubernetes cluster with cluster api, azure and git ops
Manage your kubernetes cluster with cluster api, azure and git opsManage your kubernetes cluster with cluster api, azure and git ops
Manage your kubernetes cluster with cluster api, azure and git ops
 
OpenStack and Kubernetes - A match made for Telco Heaven
OpenStack and Kubernetes - A match made for Telco HeavenOpenStack and Kubernetes - A match made for Telco Heaven
OpenStack and Kubernetes - A match made for Telco Heaven
 
How to Integrate Kubernetes in OpenStack
 How to Integrate Kubernetes in OpenStack  How to Integrate Kubernetes in OpenStack
How to Integrate Kubernetes in OpenStack
 
Docker OpenStack Cloud Foundry
Docker OpenStack Cloud FoundryDocker OpenStack Cloud Foundry
Docker OpenStack Cloud Foundry
 
Evolution of netflix conductor
Evolution of netflix conductorEvolution of netflix conductor
Evolution of netflix conductor
 
Future of Kubernetes and its Impact on Technology Industry.pdf
Future of Kubernetes and its Impact on Technology Industry.pdfFuture of Kubernetes and its Impact on Technology Industry.pdf
Future of Kubernetes and its Impact on Technology Industry.pdf
 

More from Anant Corporation

QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137Anant Corporation
 
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfKono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfAnant Corporation
 
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotData Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotAnant Corporation
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...Anant Corporation
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAnant Corporation
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapAnant Corporation
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowAnant Corporation
 
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksCassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksAnant Corporation
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionAnant Corporation
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Anant Corporation
 
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & FutureCassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & FutureAnant Corporation
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Anant Corporation
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergAnant Corporation
 
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsApache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsAnant Corporation
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraAnant Corporation
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 

More from Anant Corporation (20)

QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
 
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfKono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
 
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotData Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
 
YugabyteDB Developer Tools
YugabyteDB Developer ToolsYugabyteDB Developer Tools
YugabyteDB Developer Tools
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with Airflow
 
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksCassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward Talks
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
 
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & FutureCassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
CL 121
CL 121CL 121
CL 121
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsApache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

Data Engineer's Lunch #47: Airflow on Kubernetes

  • 1. Airflow on Kubernetes An Anant Corporation Story. Using kubernetes to deploy airflow
  • 2. What Is Airflow? Apache Airflow is one realization of the DevOps philosophy of "Configuration As Code." Airflow allows users to launch multi- step pipelines using a simple Python object DAG (Directed Acyclic Graph). You can define dependencies, programmatically construct complex workflows, and monitor scheduled jobs in an easy to read UI.
  • 3. Why Airflow on Kubernetes? Since its inception, Airflow's greatest strength has been its flexibility. Airflow offers a wide range of integrations for services ranging from Spark and HBase, to services on various cloud providers. Airflow also offers easy extensibility through its plug- in framework. However, one limitation of the project is that Airflow users are confined to the frameworks and clients that exist on the Airflow worker at the moment of execution. A single organization can have varied Airflow workflows ranging from data science pipelines to application deployments. This difference in use-case creates issues in dependency management as both teams might use vastly different libraries for their workflows. To address this issue, we've utilized Kubernetes to allow users to launch arbitrary Kubernetes pods and configurations. Airflow users can now have full power over their run-time environments, resources, and secrets, basically turning Airflow into an "any job you want" workflow orchestrator.
  • 4. The Kubernetes Operator Before we move any further, we should clarify that an Operator in Airflow is a task definition. When a user creates a DAG, they would use an operator like the "SparkSubmitOperator" or the "PythonOperator" to submit/monitor a Spark job or a Python function respectively. Airflow comes with built-in operators for frameworks like Apache Spark, BigQuery, Hive, and EMR. It also offers a Plugins entrypoint that allows DevOps engineers to develop their own connectors. Airflow users are always looking for ways to make deployments and ETL pipelines simpler to manage. Any opportunity to decouple pipeline steps, while increasing monitoring, can reduce future outages and fire-fights. The following is a list of benefits provided by the Airflow Kubernetes Operator: ● Increased flexibility for deployments ● Flexibility of configurations and dependencies ● Usage of kubernetes secrets for added security
  • 5. Architecture The Kubernetes Operator uses the Kubernetes Python Client to generate a request that is processed by the APIServer (1). Kubernetes will then launch your pod with whatever specs you've defined (2). Images will be loaded with all the necessary environment variables, secrets and dependencies, enacting a single command. Once the job is launched, the operator only needs to monitor the health of track logs (3). Users will have the choice of gathering logs locally to the scheduler or to any distributed logging service currently in their Kubernetes cluster.