SlideShare a Scribd company logo
From Airflow to 

Google Cloud Composer
Bruce Kuo
• Know the efforts on hosting Apache Airflow

• Know how Google Cloud Composer reduces hosting
efforts

• Share and discuss what we care about when using Cloud
Composer
KEY TAKEAWAYS
1
0
0 1
1 0
• Apache Incubator Project

• Contribute by Airbnb

• An open source workflow
engine
APACHE AIRFLOW
1
0
0 1
1 0
Dynamic

Workflow as code
Extensible

Customize your code easily
Elegant

Parameter your scripts is
built into the core of Airflow
with Jinja
Scalable

Easy to scale your machines
APACHE AIRFLOW
1
0
0 1
1 0
• ETL Pipeline 

• Batch Data Transformation between DB and BigQuery

• Machine Learning Pipeline

• Training pipeline 

• Deploy model pipeline

• Model predictions: community quality score, applicant quality score
AIRFLOW IN CODEMENTOR
1
0
0 1
1 0
Web Servers
WorkersScheduler
Task QueuesAirflow Database
AIRFLOW ARCHITECTURE
1
0
0 1
1 0
Web Servers 

(AWS EC2 * 1)
Workers

(AWS EC2 * 1)
Scheduler

(AWS EC2 * 1)
Task Queues

(AWS EC2 * 1)
Airflow Database

(AWS EC2 * 1)
ENVIRONMENT IN CODEMENTOR
1
0
0 1
1 0
• Machine Managements

• Web Server

• Scheduler

• Worker

• DB

• Task Queue - Celery on Redis
HOST AIRFLOW
1
0
0 1
1 0
• Deployment - deploy to all machines when code changed

• DAG changes - Hot reload

• Airflow configuration changes (restart service)

• Package dependencies changes (restart service)
HOST AIRFLOW
1
0
0 1
1 0
• Permission management

• Use command airflow create_user

• Monitor Airflow

• DAG

• Environments
HOST AIRFLOW
1
0
0 1
1 0
• Announce on May, 2018

• Airflow as a Service

• Core: Airflow 1.9

• Support other versions in the future

• Native integration with GCP components
CLOUD COMPOSER
1
0
0 1
1 0
• How to deploy DAG

• Put DAGS on GCS folder: gs://project-bucket/dags

• How to deploy pypi packages

• Setting on GUI

• Execute command: gcloud beta composer environments update cm-airflow-prod
--update-pypi-packages-from-file gcp_requirements.txt --location us-east1

• How to deploy private packages

• Put packages on GCS folder: gs://project-bucket/plugins
DEPLOY ON COMPOSER
1
0
0 1
1 0
• Permission management

• Just setting IAM Roles on GCP

• Monitor cloud composer

• Webserver: uptime robot

• Scheduler & Worker: 

• add a DAG to send a message to CloudWatch

• https://cloud.google.com/composer/docs/how-to/managing/monitoring-
environments

• No machine efforts!
PERMISSION & MONITOR
1
0
0 1
1 0
Self-host Airflow Google Cloud Composer
Machine
efforts
Manage cluster 

status / scaling
No
Permission
Management
Setting by our selves in different
environments

(web / scheduler / worker)
Only setting IAM roles and
worker permission are closed
Monitoring
Uptime Robots (Web)

Monitor DAG (scheduler+worker)
Uptime Robots (Web)

Monitor DAG (scheduler+worker)
Deploy Configs
Deploy configs to all machines and
restart service
Setting on GUI or 

Executing Command
COMPARISON
1
0
0 1
1 0
Self-host Airflow Google Cloud Composer
DAG Update Deploy DAGs to all machines Put DAGs to GCS
Deploy Python
Package
Deploy packages to all machines
and restart service
Setting on GUI or 

Executing Command
Deploy Private
Package
Same as deploy python package Put private packages to GCS
Deploy Configs
Deploy configs to all machines and
restart service
Setting on GUI or 

Executing Command
COMPARISON
1
0
0 1
1 0
Notes from Practice
Notes From Practice
• Use local path between workers

• Cloud composer sync gs://project-bucket/data on
local path /home/airflow/gcs

• Sync bidirectionally

• Details: https://cloud.google.com/composer/docs/
concepts/cloud-storage
TRICKS
1
0
0 1
1 0
• To make whole environment stable, here is our airflow
settings after tuning

TRICKS
1
0
0 1
1 0
1. See DAG log on web console

INCIDENT SOP
1
0
0 1
1 0
2. Stackdriver

INCIDENT SOP
1
0
0 1
1 0
3. Trace Kubernetes workloads

INCIDENT SOP
1
0
0 1
1 0
• Rerun is hard, following is the SOP

1. Clear task instances

2. Clear failed jobs

• DB may be locked unfortunately if you don’t follow rerun
SOP QQ.
RERUN SOP
1
0
0 1
1 0
• Expensive: about 10-20 usd / day in our usage

• Sometimes there are zombie tasks

• The zombie task is caused by worker crash
CONCERNS
1
0
0 1
1 0
• More focus on 

• DAG development

• Data logics

• Performance tuning

• Reduce many efforts on maintaining the service
CONCLUSION
1
0
0 1
1 0
• https://www.youtube.com/watch?v=gFQVmsRRY_A

• Discussion:

• https://groups.google.com/forum/#!forum/cloud-
composer-discuss

• https://stackoverflow.com/questions/tagged/google-
cloud-composer
REFERENCE
1
0
0 1
1 0
Notes from Practice
Q & A
codementor.io
WE’RE HIRING
CHECK OUT OUR CAREER PAGE
www.codementor.io/careers

More Related Content

What's hot

Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
huguk
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
kbajda
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
Ilias Okacha
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Knoldus Inc.
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
Walter Liu
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Flink Forward
 
Apache airflow
Apache airflowApache airflow
Apache airflow
Purna Chander
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
Chandler Huang
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - Preview
VictoriaMetrics
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
Robert Sanders
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
Dhrubaji Mandal ♛
 
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbtSiligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Jon Su
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Knoldus Inc.
 
Apache airflow
Apache airflowApache airflow
Apache airflow
Pavel Alexeev
 
Airflow for Beginners
Airflow for BeginnersAirflow for Beginners
Airflow for Beginners
Varya Karpenko
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 

What's hot (20)

Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - Preview
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbtSiligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Airflow for Beginners
Airflow for BeginnersAirflow for Beginners
Airflow for Beginners
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 

Similar to From airflow to google cloud composer

HOW TO DRONE.IO IN CI/CD WORLD
HOW TO DRONE.IO IN CI/CD WORLDHOW TO DRONE.IO IN CI/CD WORLD
HOW TO DRONE.IO IN CI/CD WORLD
Aleksandr Maklakov
 
Webinar- Tea for the Tillerman
Webinar- Tea for the TillermanWebinar- Tea for the Tillerman
Webinar- Tea for the Tillerman
Cumulus Networks
 
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
NETWAYS
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
Derrick Qin
 
Deploying WSO2 API Manager in Production-Grade Kubernetes
Deploying WSO2 API Manager in Production-Grade KubernetesDeploying WSO2 API Manager in Production-Grade Kubernetes
Deploying WSO2 API Manager in Production-Grade Kubernetes
WSO2
 
Cloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guideCloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guide
sparkfabrik
 
The App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxThe App Developer's Kubernetes Toolbox
The App Developer's Kubernetes Toolbox
Nebulaworks
 
Talend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech OverviewTalend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech Overview
Talend
 
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
Oleg Shalygin
 
App Deployment on Cloud
App Deployment on CloudApp Deployment on Cloud
App Deployment on Cloud
Ajey Pratap Singh
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
NETWAYS
 
Cloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdfCloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdf
Leah Cole
 
Gitlab ci e kubernetes, build test and deploy your projects like a pro
Gitlab ci e kubernetes, build test and deploy your projects like a proGitlab ci e kubernetes, build test and deploy your projects like a pro
Gitlab ci e kubernetes, build test and deploy your projects like a pro
sparkfabrik
 
DevFest 2022 - Cloud Workstation Introduction TaiChung
DevFest 2022 - Cloud Workstation Introduction TaiChungDevFest 2022 - Cloud Workstation Introduction TaiChung
DevFest 2022 - Cloud Workstation Introduction TaiChung
KAI CHU CHUNG
 
The path to a serverless-native era with Kubernetes
The path to a serverless-native era with KubernetesThe path to a serverless-native era with Kubernetes
The path to a serverless-native era with Kubernetes
sparkfabrik
 
Introduction to IBM Bluemix
Introduction to IBM BluemixIntroduction to IBM Bluemix
Introduction to IBM Bluemix
craftworkz
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
Patrick Chanezon
 
Sprint 17
Sprint 17Sprint 17
Sprint 17
ManageIQ
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
Kaxil Naik
 
CBDW2014- Intro to CommandBox; The ColdFusion CLI, Package Manager, and REPL ...
CBDW2014- Intro to CommandBox; The ColdFusion CLI, Package Manager, and REPL ...CBDW2014- Intro to CommandBox; The ColdFusion CLI, Package Manager, and REPL ...
CBDW2014- Intro to CommandBox; The ColdFusion CLI, Package Manager, and REPL ...
Ortus Solutions, Corp
 

Similar to From airflow to google cloud composer (20)

HOW TO DRONE.IO IN CI/CD WORLD
HOW TO DRONE.IO IN CI/CD WORLDHOW TO DRONE.IO IN CI/CD WORLD
HOW TO DRONE.IO IN CI/CD WORLD
 
Webinar- Tea for the Tillerman
Webinar- Tea for the TillermanWebinar- Tea for the Tillerman
Webinar- Tea for the Tillerman
 
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Deploying WSO2 API Manager in Production-Grade Kubernetes
Deploying WSO2 API Manager in Production-Grade KubernetesDeploying WSO2 API Manager in Production-Grade Kubernetes
Deploying WSO2 API Manager in Production-Grade Kubernetes
 
Cloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guideCloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guide
 
The App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxThe App Developer's Kubernetes Toolbox
The App Developer's Kubernetes Toolbox
 
Talend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech OverviewTalend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech Overview
 
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
 
App Deployment on Cloud
App Deployment on CloudApp Deployment on Cloud
App Deployment on Cloud
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
 
Cloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdfCloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdf
 
Gitlab ci e kubernetes, build test and deploy your projects like a pro
Gitlab ci e kubernetes, build test and deploy your projects like a proGitlab ci e kubernetes, build test and deploy your projects like a pro
Gitlab ci e kubernetes, build test and deploy your projects like a pro
 
DevFest 2022 - Cloud Workstation Introduction TaiChung
DevFest 2022 - Cloud Workstation Introduction TaiChungDevFest 2022 - Cloud Workstation Introduction TaiChung
DevFest 2022 - Cloud Workstation Introduction TaiChung
 
The path to a serverless-native era with Kubernetes
The path to a serverless-native era with KubernetesThe path to a serverless-native era with Kubernetes
The path to a serverless-native era with Kubernetes
 
Introduction to IBM Bluemix
Introduction to IBM BluemixIntroduction to IBM Bluemix
Introduction to IBM Bluemix
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
Sprint 17
Sprint 17Sprint 17
Sprint 17
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
 
CBDW2014- Intro to CommandBox; The ColdFusion CLI, Package Manager, and REPL ...
CBDW2014- Intro to CommandBox; The ColdFusion CLI, Package Manager, and REPL ...CBDW2014- Intro to CommandBox; The ColdFusion CLI, Package Manager, and REPL ...
CBDW2014- Intro to CommandBox; The ColdFusion CLI, Package Manager, and REPL ...
 

Recently uploaded

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
obonagu
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 

Recently uploaded (20)

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 

From airflow to google cloud composer

  • 1. From Airflow to Google Cloud Composer Bruce Kuo
  • 2. • Know the efforts on hosting Apache Airflow • Know how Google Cloud Composer reduces hosting efforts • Share and discuss what we care about when using Cloud Composer KEY TAKEAWAYS 1 0 0 1 1 0
  • 3. • Apache Incubator Project • Contribute by Airbnb • An open source workflow engine APACHE AIRFLOW 1 0 0 1 1 0
  • 4. Dynamic Workflow as code Extensible Customize your code easily Elegant Parameter your scripts is built into the core of Airflow with Jinja Scalable Easy to scale your machines APACHE AIRFLOW 1 0 0 1 1 0
  • 5.
  • 6.
  • 7.
  • 8. • ETL Pipeline • Batch Data Transformation between DB and BigQuery • Machine Learning Pipeline • Training pipeline • Deploy model pipeline • Model predictions: community quality score, applicant quality score AIRFLOW IN CODEMENTOR 1 0 0 1 1 0
  • 9. Web Servers WorkersScheduler Task QueuesAirflow Database AIRFLOW ARCHITECTURE 1 0 0 1 1 0
  • 10. Web Servers (AWS EC2 * 1) Workers
 (AWS EC2 * 1) Scheduler (AWS EC2 * 1) Task Queues (AWS EC2 * 1) Airflow Database (AWS EC2 * 1) ENVIRONMENT IN CODEMENTOR 1 0 0 1 1 0
  • 11. • Machine Managements • Web Server • Scheduler • Worker • DB • Task Queue - Celery on Redis HOST AIRFLOW 1 0 0 1 1 0
  • 12. • Deployment - deploy to all machines when code changed • DAG changes - Hot reload • Airflow configuration changes (restart service) • Package dependencies changes (restart service) HOST AIRFLOW 1 0 0 1 1 0
  • 13. • Permission management • Use command airflow create_user • Monitor Airflow • DAG • Environments HOST AIRFLOW 1 0 0 1 1 0
  • 14. • Announce on May, 2018 • Airflow as a Service • Core: Airflow 1.9 • Support other versions in the future • Native integration with GCP components CLOUD COMPOSER 1 0 0 1 1 0
  • 15.
  • 16.
  • 17.
  • 18. • How to deploy DAG • Put DAGS on GCS folder: gs://project-bucket/dags • How to deploy pypi packages • Setting on GUI • Execute command: gcloud beta composer environments update cm-airflow-prod --update-pypi-packages-from-file gcp_requirements.txt --location us-east1 • How to deploy private packages • Put packages on GCS folder: gs://project-bucket/plugins DEPLOY ON COMPOSER 1 0 0 1 1 0
  • 19. • Permission management • Just setting IAM Roles on GCP • Monitor cloud composer • Webserver: uptime robot • Scheduler & Worker: • add a DAG to send a message to CloudWatch • https://cloud.google.com/composer/docs/how-to/managing/monitoring- environments • No machine efforts! PERMISSION & MONITOR 1 0 0 1 1 0
  • 20. Self-host Airflow Google Cloud Composer Machine efforts Manage cluster status / scaling No Permission Management Setting by our selves in different environments (web / scheduler / worker) Only setting IAM roles and worker permission are closed Monitoring Uptime Robots (Web) Monitor DAG (scheduler+worker) Uptime Robots (Web) Monitor DAG (scheduler+worker) Deploy Configs Deploy configs to all machines and restart service Setting on GUI or Executing Command COMPARISON 1 0 0 1 1 0
  • 21. Self-host Airflow Google Cloud Composer DAG Update Deploy DAGs to all machines Put DAGs to GCS Deploy Python Package Deploy packages to all machines and restart service Setting on GUI or Executing Command Deploy Private Package Same as deploy python package Put private packages to GCS Deploy Configs Deploy configs to all machines and restart service Setting on GUI or Executing Command COMPARISON 1 0 0 1 1 0
  • 22. Notes from Practice Notes From Practice
  • 23. • Use local path between workers • Cloud composer sync gs://project-bucket/data on local path /home/airflow/gcs • Sync bidirectionally • Details: https://cloud.google.com/composer/docs/ concepts/cloud-storage TRICKS 1 0 0 1 1 0
  • 24. • To make whole environment stable, here is our airflow settings after tuning TRICKS 1 0 0 1 1 0
  • 25. 1. See DAG log on web console INCIDENT SOP 1 0 0 1 1 0
  • 27. 3. Trace Kubernetes workloads INCIDENT SOP 1 0 0 1 1 0
  • 28. • Rerun is hard, following is the SOP 1. Clear task instances 2. Clear failed jobs • DB may be locked unfortunately if you don’t follow rerun SOP QQ. RERUN SOP 1 0 0 1 1 0
  • 29. • Expensive: about 10-20 usd / day in our usage • Sometimes there are zombie tasks • The zombie task is caused by worker crash CONCERNS 1 0 0 1 1 0
  • 30. • More focus on • DAG development • Data logics • Performance tuning • Reduce many efforts on maintaining the service CONCLUSION 1 0 0 1 1 0
  • 31. • https://www.youtube.com/watch?v=gFQVmsRRY_A • Discussion: • https://groups.google.com/forum/#!forum/cloud- composer-discuss • https://stackoverflow.com/questions/tagged/google- cloud-composer REFERENCE 1 0 0 1 1 0
  • 33. codementor.io WE’RE HIRING CHECK OUT OUR CAREER PAGE www.codementor.io/careers