SlideShare a Scribd company logo
Airflow
Workflow management system
Ilias OKACHA
Index
- Workflows Management Systems
- Architecture
- Building blocks
- More features
- User Interface
- Security
- CLI
- Demo
WTH is a Workflow Management System ?
A worflow Management system is:
Is a data-centric software (framework) for :
- Setting up
- Performing
- Monitoring
of a defined sequence of processes and tasks
Popular Workflow Management Systems
Airflow Architecture
Airflow architecture
SequentialExecutor / LocalExecutor
Airflow architecture
CeleryExecutor
Airflow architecture
HA + CeleryExector
Airflow architecture
● MesosExecutor : already available in contrib package
● KubernetesExecutor ??
Building blocks
Dags :
- Directed Acyclic Graph
- Is a collection of all the tasks you want to run
- DAGs describe how to run a workflow
Building blocks
Building blocks
Dags :
Building blocks
Operators :
- Describes a single task in a workflow.
- Determine what actually gets done
- Operators generally run independently (atomic)
- The DAG make sure that operators run in the correct certain order
- They may run on completely different machines
Building blocks
Operators : There are 3 main types of operators:
● Operators that performs an action, or tell another system to perform an action
● Transfer operators move data from one system to another
● Sensors are a certain type of operator that will keep running until a certain criterion is met.
○ Examples include a specific file landing in HDFS or S3.
○ A partition appearing in Hive.
○ A specific time of the day.
Operators :
- Operators :
- BashOperator
- PythonOperator
- EmailOperator
- HTTPOperator
- MySqlOperator
- SqliteOperator
- PostgresOperator
- MsSqlOperator
- OracleOperator
- JdbcOperator
- DockerOperator
- HiveOperator
- SlackOperator
Building blocks
Operators :
- Transfers :
- S3FileTransferOperator
- PrestoToMysqlOperator
- MySqlToHiveTransfer
- S3ToHiveTransfer
- BigQueryToCloudStorageOperator
- GenericTransfer
- HiveToDruidTransfer
- HiveToMySqlTransfer
Building blocks
Operators :
- Sensors :
- ExternalTaskSensor
- HdfsSensor
- HttpSensor
- MetastorePartitionSensor
- HivePartitionSensor
- S3KeySensor
- S3PrefixSensor
- SqlSensor
- TimeDeltaSensor
- TimeSensor
- WebHdfsSensor
Building blocks
Building blocks
Operators :
Tasks : a parameterized instance of an operator
Building blocks
Building blocks
Task Instance : Dag + Task + point in time
- Specific run of a Task
- A task assigned to a DAG
- Has State associated with a specific run of the DAG
- States : it could be
- running
- success,
- failed
- skipped
- up for retry
- …
Building blocks
Workflows :
● DAG: a description of the order in which work should take place
● Operator: a class that acts as a template for carrying out some work
● Task: a parameterized instance of an operator
● Task Instance: a task that
○ Has been assigned to a DAG
○ Has a state associated with a specific run of the DAG
● By combining DAGs and Operators to create TaskInstances, you can build complex workflows.
Building blocks
More features
- Features :
- Hooks
- Connections
- Variables
- XComs
- SLA
- Pools
- Queues
- Trigger Rules
- Branchings
- SubDags
More features
Hooks :
- Interface to external platforms and databases :
- Hive
- S3
- MySQL
- PostgreSQL
- HDFS
- Hive
- Pig
- …
- Act as building block for Operators
- Use Connection to retrieve authentication informations
- Keep authentication infos out of pipelines.
More features
Connections :
Connection informations to external systems are stored in the airflow metadata Database and managed in the UI
More features
More features
Exemple de Hook + connection :
More features
More features
Variables :
- A generic way to store and retrieve arbitrary content or settings as a simple key value store within Airflow.
- Variables can be listed, created, updated and deleted from the UI (Admin -> Variables), code or CLI.
- While your pipeline code definition and most of your constants and variables should be defined in code and stored in source control, it
can be useful to have some variables or configuration items accessible and modifiable through the UI.
More features
XCom or Cross-communication:
● Let tasks exchange messages allowing shared state.
● Defined by a key, value, and timestamp.
● Also track attributes like the task/DAG that created the XCom and when it should become visible.
● Any object that can be pickled can be used as an XCom value.
XComs can be :
● Pushed (sent) :
○ Calling xcom_push()
○ If a task return a value (from its operator execute() method) or from a PythonOperator’s python_callable
● Pulled (received) : calling xcom_pull()
More features
More features
SLA :
- Service Level Agreements, or time by which a task or DAG should have succeeded,
- Can be set at a task level as a timedelta.
- An alert email is sent detailing the list of tasks that missed their SLA.
More features
Pools :
- Some systems can get overwhelmed when too many processes hit them at the same time.
- Limit the execution parallelism on arbitrary sets of tasks.
More features
Pools :
Queues : (only on CeleryExecutors) :
- Every Task can be assigned a specific queue name
- By default, both worker and tasks are assigned with the default_queue queue
- Workers can be assigned multiple queues
- Very useful feature when specialized workers are needed (GPU, Spark…)
More features
More features
Trigger Rules:
Though the normal workflow behavior is to trigger tasks when all their directly upstream tasks have succeeded, Airflow allows for more complex
dependency settings.
All operators have a trigger_rule argument which defines the rule by which the generated task get triggered. The default value for trigger_rule is
all_success and can be defined as “trigger this task when all directly upstream tasks have succeeded”. All other rules described here are based
on direct parent tasks and are values that can be passed to any operator while creating tasks:
● all_success: (default) all parents have succeeded
● all_failed: all parents are in a failed or upstream_failed state
● all_done: all parents are done with their execution
● one_failed: fires as soon as at least one parent has failed, it does not wait for all parents to be done • one_success: fires as soon as at
least one parent succeeds, it does not wait for all parents to be done • dummy: dependencies are just for show, trigger at will.
User Interface
User Interface
Dags view :
User Interface
Tree view :
User Interface
Graph view :
User Interface
Gantt view :
User Interface
Task duration :
User Interface
Data Profiling : SQL Queries
User Interface
Data Profiling : Charts
User Interface
Data Profiling : Charts
CLI
CLI
https://airflow.apache.org/cli.html
airflow variables [-h] [-s KEY VAL] [-g KEY] [-j] [-d VAL] [-i FILEPATH] [-e FILEPATH] [-x KEY]
airflow connections [-h] [-l] [-a] [-d] [--conn_id CONN_ID]
[--conn_uri CONN_URI] [--conn_extra CONN_EXTRA]
[--conn_type CONN_TYPE] [--conn_host CONN_HOST]
[--conn_login CONN_LOGIN] [--conn_password CONN_PASSWORD]
[--conn_schema CONN_SCHEMA] [--conn_port CONN_PORT]
airflow pause [-h] [-sd SUBDIR] dag_id
airflow test [-h] [-sd SUBDIR] [-dr] [-tp TASK_PARAMS] dag_id task_id execution_date
airflow backfill dag_id task_id -s START_DATE -e END_DATE
airflow clear DAG_ID
airflow resetdb [-h] [-y]
...
Security
Security
By default : all access are open
Support ;
● Web authentication with :
○ Password
○ LDAP
○ Custom auth
○ Kerberos
○ OAuth
■ Github Entreprise Authentication
■ Google Authentication
● Impersonation (run as other $USER)
● Secure access via SSL
Demo
Demo
1. Facebook Ads insights data pipeline.
2. Run a pyspark script on a ephemeral dataproc cluster only when s3 data input is available
3. Useless workflow : Hook + Connection + Operators + Sensors + XCom +( SLA ):
○ List s3 files (hooks)
○ Share state with the next task (xcom)
○ Write content to s3 (hooks)
○ Resume the workflow when an S3 DONE.FLAG file is ready (sensor)
Resources
https://airflow.apache.org
http://www.clairvoyantsoft.com/assets/whitepapers/GuideToApacheAirflow.pdf
https://speakerdeck.com/artwr/apache-airflow-at-airbnb-introduction-and-lessons-learned
https://www.slideshare.net/sumitmaheshwari007/apache-airflow
Thanks

More Related Content

What's hot

Airflow for Beginners
Airflow for BeginnersAirflow for Beginners
Airflow for Beginners
Varya Karpenko
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
Gerard Toonstra
 
Airflow Intro-1.pdf
Airflow Intro-1.pdfAirflow Intro-1.pdf
Airflow Intro-1.pdf
BagustTriCahyo1
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
Liangjun Jiang
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
Yohei Onishi
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
Kaxil Naik
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
Derrick Qin
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Knoldus Inc.
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
Anant Corporation
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Burasakorn Sabyeying
 
Apache airflow
Apache airflowApache airflow
Apache airflow
Pavel Alexeev
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
Walter Liu
 
Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_on
pko89403
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
Robert Sanders
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
Chris Riccomini
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
Sid Anand
 
Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
Rafael Roman Otero
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Kaxil Naik
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
Rico Chen
 

What's hot (20)

Airflow for Beginners
Airflow for BeginnersAirflow for Beginners
Airflow for Beginners
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
 
Airflow Intro-1.pdf
Airflow Intro-1.pdfAirflow Intro-1.pdf
Airflow Intro-1.pdf
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_on
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 

Similar to Airflow presentation

airflowpresentation1-180717183432.pptx
airflowpresentation1-180717183432.pptxairflowpresentation1-180717183432.pptx
airflowpresentation1-180717183432.pptx
VIJAYAPRABAP
 
airflow web UI and CLI.pptx
airflow web UI and CLI.pptxairflow web UI and CLI.pptx
airflow web UI and CLI.pptx
VIJAYAPRABAP
 
GoDocker presentation
GoDocker presentationGoDocker presentation
GoDocker presentation
Olivier Sallou
 
Cli jbug
Cli jbugCli jbug
Cli jbug
maeste
 
AS7 and CLI
AS7 and CLIAS7 and CLI
AS7 and CLI
JBug Italy
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
DataWorks Summit
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
Tao Feng
 
Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )
Mari Kupatadze
 
KubeCon 2017: Kubernetes from Dev to Prod
KubeCon 2017: Kubernetes from Dev to ProdKubeCon 2017: Kubernetes from Dev to Prod
KubeCon 2017: Kubernetes from Dev to Prod
Subhas Dandapani
 
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiaoadaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
lyvanlinh519
 
DataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxDataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptx
John J Zhao
 
Knolx session
Knolx sessionKnolx session
Knolx session
Knoldus Inc.
 
Consul and Consul Pusher
Consul and Consul PusherConsul and Consul Pusher
Consul and Consul Pusher
Łukasz Cieśluk
 
MVC & SQL_In_1_Hour
MVC & SQL_In_1_HourMVC & SQL_In_1_Hour
MVC & SQL_In_1_HourDilip Patel
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
Rishabh Indoria
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
Docker, Inc.
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
Nathan Handler
 
ApacheCon NA - Apache Camel K: a cloud-native integration platform
ApacheCon NA - Apache Camel K: a cloud-native integration platformApacheCon NA - Apache Camel K: a cloud-native integration platform
ApacheCon NA - Apache Camel K: a cloud-native integration platform
Nicola Ferraro
 
Operator SDK for K8s using Go
Operator SDK for K8s using GoOperator SDK for K8s using Go
Operator SDK for K8s using Go
CloudOps2005
 

Similar to Airflow presentation (20)

airflowpresentation1-180717183432.pptx
airflowpresentation1-180717183432.pptxairflowpresentation1-180717183432.pptx
airflowpresentation1-180717183432.pptx
 
airflow web UI and CLI.pptx
airflow web UI and CLI.pptxairflow web UI and CLI.pptx
airflow web UI and CLI.pptx
 
GoDocker presentation
GoDocker presentationGoDocker presentation
GoDocker presentation
 
Cli jbug
Cli jbugCli jbug
Cli jbug
 
AS7 and CLI
AS7 and CLIAS7 and CLI
AS7 and CLI
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 
Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )
 
KubeCon 2017: Kubernetes from Dev to Prod
KubeCon 2017: Kubernetes from Dev to ProdKubeCon 2017: Kubernetes from Dev to Prod
KubeCon 2017: Kubernetes from Dev to Prod
 
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiaoadaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
 
DataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxDataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptx
 
Knolx session
Knolx sessionKnolx session
Knolx session
 
Consul and Consul Pusher
Consul and Consul PusherConsul and Consul Pusher
Consul and Consul Pusher
 
MVC & SQL_In_1_Hour
MVC & SQL_In_1_HourMVC & SQL_In_1_Hour
MVC & SQL_In_1_Hour
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
 
ApacheCon NA - Apache Camel K: a cloud-native integration platform
ApacheCon NA - Apache Camel K: a cloud-native integration platformApacheCon NA - Apache Camel K: a cloud-native integration platform
ApacheCon NA - Apache Camel K: a cloud-native integration platform
 
Operator SDK for K8s using Go
Operator SDK for K8s using GoOperator SDK for K8s using Go
Operator SDK for K8s using Go
 

Recently uploaded

The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 

Recently uploaded (20)

The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 

Airflow presentation

  • 2. Index - Workflows Management Systems - Architecture - Building blocks - More features - User Interface - Security - CLI - Demo
  • 3. WTH is a Workflow Management System ? A worflow Management system is: Is a data-centric software (framework) for : - Setting up - Performing - Monitoring of a defined sequence of processes and tasks
  • 9. Airflow architecture ● MesosExecutor : already available in contrib package ● KubernetesExecutor ??
  • 11. Dags : - Directed Acyclic Graph - Is a collection of all the tasks you want to run - DAGs describe how to run a workflow Building blocks
  • 13. Building blocks Operators : - Describes a single task in a workflow. - Determine what actually gets done - Operators generally run independently (atomic) - The DAG make sure that operators run in the correct certain order - They may run on completely different machines
  • 14. Building blocks Operators : There are 3 main types of operators: ● Operators that performs an action, or tell another system to perform an action ● Transfer operators move data from one system to another ● Sensors are a certain type of operator that will keep running until a certain criterion is met. ○ Examples include a specific file landing in HDFS or S3. ○ A partition appearing in Hive. ○ A specific time of the day.
  • 15. Operators : - Operators : - BashOperator - PythonOperator - EmailOperator - HTTPOperator - MySqlOperator - SqliteOperator - PostgresOperator - MsSqlOperator - OracleOperator - JdbcOperator - DockerOperator - HiveOperator - SlackOperator Building blocks
  • 16. Operators : - Transfers : - S3FileTransferOperator - PrestoToMysqlOperator - MySqlToHiveTransfer - S3ToHiveTransfer - BigQueryToCloudStorageOperator - GenericTransfer - HiveToDruidTransfer - HiveToMySqlTransfer Building blocks
  • 17. Operators : - Sensors : - ExternalTaskSensor - HdfsSensor - HttpSensor - MetastorePartitionSensor - HivePartitionSensor - S3KeySensor - S3PrefixSensor - SqlSensor - TimeDeltaSensor - TimeSensor - WebHdfsSensor Building blocks
  • 19. Tasks : a parameterized instance of an operator Building blocks
  • 20. Building blocks Task Instance : Dag + Task + point in time - Specific run of a Task - A task assigned to a DAG - Has State associated with a specific run of the DAG - States : it could be - running - success, - failed - skipped - up for retry - …
  • 21. Building blocks Workflows : ● DAG: a description of the order in which work should take place ● Operator: a class that acts as a template for carrying out some work ● Task: a parameterized instance of an operator ● Task Instance: a task that ○ Has been assigned to a DAG ○ Has a state associated with a specific run of the DAG ● By combining DAGs and Operators to create TaskInstances, you can build complex workflows.
  • 24. - Features : - Hooks - Connections - Variables - XComs - SLA - Pools - Queues - Trigger Rules - Branchings - SubDags More features
  • 25. Hooks : - Interface to external platforms and databases : - Hive - S3 - MySQL - PostgreSQL - HDFS - Hive - Pig - … - Act as building block for Operators - Use Connection to retrieve authentication informations - Keep authentication infos out of pipelines. More features
  • 26. Connections : Connection informations to external systems are stored in the airflow metadata Database and managed in the UI More features
  • 28. Exemple de Hook + connection : More features
  • 29. More features Variables : - A generic way to store and retrieve arbitrary content or settings as a simple key value store within Airflow. - Variables can be listed, created, updated and deleted from the UI (Admin -> Variables), code or CLI. - While your pipeline code definition and most of your constants and variables should be defined in code and stored in source control, it can be useful to have some variables or configuration items accessible and modifiable through the UI.
  • 31. XCom or Cross-communication: ● Let tasks exchange messages allowing shared state. ● Defined by a key, value, and timestamp. ● Also track attributes like the task/DAG that created the XCom and when it should become visible. ● Any object that can be pickled can be used as an XCom value. XComs can be : ● Pushed (sent) : ○ Calling xcom_push() ○ If a task return a value (from its operator execute() method) or from a PythonOperator’s python_callable ● Pulled (received) : calling xcom_pull() More features
  • 32. More features SLA : - Service Level Agreements, or time by which a task or DAG should have succeeded, - Can be set at a task level as a timedelta. - An alert email is sent detailing the list of tasks that missed their SLA.
  • 33. More features Pools : - Some systems can get overwhelmed when too many processes hit them at the same time. - Limit the execution parallelism on arbitrary sets of tasks.
  • 35. Queues : (only on CeleryExecutors) : - Every Task can be assigned a specific queue name - By default, both worker and tasks are assigned with the default_queue queue - Workers can be assigned multiple queues - Very useful feature when specialized workers are needed (GPU, Spark…) More features
  • 36. More features Trigger Rules: Though the normal workflow behavior is to trigger tasks when all their directly upstream tasks have succeeded, Airflow allows for more complex dependency settings. All operators have a trigger_rule argument which defines the rule by which the generated task get triggered. The default value for trigger_rule is all_success and can be defined as “trigger this task when all directly upstream tasks have succeeded”. All other rules described here are based on direct parent tasks and are values that can be passed to any operator while creating tasks: ● all_success: (default) all parents have succeeded ● all_failed: all parents are in a failed or upstream_failed state ● all_done: all parents are done with their execution ● one_failed: fires as soon as at least one parent has failed, it does not wait for all parents to be done • one_success: fires as soon as at least one parent succeeds, it does not wait for all parents to be done • dummy: dependencies are just for show, trigger at will.
  • 46. CLI
  • 47. CLI https://airflow.apache.org/cli.html airflow variables [-h] [-s KEY VAL] [-g KEY] [-j] [-d VAL] [-i FILEPATH] [-e FILEPATH] [-x KEY] airflow connections [-h] [-l] [-a] [-d] [--conn_id CONN_ID] [--conn_uri CONN_URI] [--conn_extra CONN_EXTRA] [--conn_type CONN_TYPE] [--conn_host CONN_HOST] [--conn_login CONN_LOGIN] [--conn_password CONN_PASSWORD] [--conn_schema CONN_SCHEMA] [--conn_port CONN_PORT] airflow pause [-h] [-sd SUBDIR] dag_id airflow test [-h] [-sd SUBDIR] [-dr] [-tp TASK_PARAMS] dag_id task_id execution_date airflow backfill dag_id task_id -s START_DATE -e END_DATE airflow clear DAG_ID airflow resetdb [-h] [-y] ...
  • 49. Security By default : all access are open Support ; ● Web authentication with : ○ Password ○ LDAP ○ Custom auth ○ Kerberos ○ OAuth ■ Github Entreprise Authentication ■ Google Authentication ● Impersonation (run as other $USER) ● Secure access via SSL
  • 50. Demo
  • 51. Demo 1. Facebook Ads insights data pipeline. 2. Run a pyspark script on a ephemeral dataproc cluster only when s3 data input is available 3. Useless workflow : Hook + Connection + Operators + Sensors + XCom +( SLA ): ○ List s3 files (hooks) ○ Share state with the next task (xcom) ○ Write content to s3 (hooks) ○ Resume the workflow when an S3 DONE.FLAG file is ready (sensor)