SlideShare a Scribd company logo
Airflow
for beginners
https://github.com/karpenkovarya/airflow_for_beginners
What is Airflow?
It is a tool to BUILD, SCHEDULE and MONITOR
data pipelines
Set of data processing elements connected in series.
The output of one element is the input of the next one.
I
Create
Questions
table
II
Store data
from Stack
Overflow
III
Write filtered
questions to
S3
IV
Render HTML
template
V
Send me an
email
Building blocks
of Airflow
Operator
(Worker)
Knows how to perform a task
and has the tools to do it.
Example:
Python Operator
Postgres Operator
Bash Operator
Email Operator
DAG
(Protocol /
Instructions)
Describes the
order of tasks and
what to do if task is failing.
Example:
Run Task A, when it is finished, run
Task B. If one of the tasks failed, stop
the whole process and send me a
notification.
Task
(Specific job)
Job that is done by an
Operator.
Example:
- Load data from some API using
Python Operator
- Write data to the database using
MySQL Operator
Hooks
Interfaces to the external
platforms and databases.
Implements common interface
(all hooks look very similar) and
use Connections
Example:
S3 Hook
Slack Hook
HDFS Hook
Connection
Credentials to the external
systems that can be securely
stored in the Airflow.
Example:
Postgres Connection = Connection
string to the Postgres database
AWS Connection = AWS access
keys
Variables
Like environment
variables.
Can store arbitrary
information and be used in
the Tasks
Examples:
Stack Overflow base URL
Gmail Client ID and Secret
XComs
Let’s Tasks exchange
small messages.
I
Create
Questions
table
II
Store data
from Stack
Overflow
III
Write filtered
questions to
S3
IV
Render HTML
template
V
Send me an
email
Postgres
Connection
Postgres
Connection
Postgres
Connection
S3
Connection
Python Operator
Python Operator
Python Operator
Postgres Hook
S3
Connection
S3
Hook
Postgres Hook S3
HookPostgres
Operator
XCom
XCom
Variables
Variables
Email
Operator
What have we learned?
- What is Apache Airflow
- What is a data pipeline
- Main Airflow concepts (DAG, Task, Operator, Connection, etc.)
- First pipeline
Thank you!
🌻✨💛
📬 hello@varya.io

More Related Content

What's hot

Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
Walter Liu
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Knoldus Inc.
 
Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_on
pko89403
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
mutt_data
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
Chris Riccomini
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
Chandler Huang
 
Airflow Intro-1.pdf
Airflow Intro-1.pdfAirflow Intro-1.pdf
Airflow Intro-1.pdf
BagustTriCahyo1
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
Liangjun Jiang
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Sumit Maheshwari
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
Gerard Toonstra
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
Ilias Okacha
 
Apache Airflow | What Is An Operator
Apache Airflow | What Is An OperatorApache Airflow | What Is An Operator
Apache Airflow | What Is An Operator
Marc Lamberti
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
Derrick Qin
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
PyData
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
Kaxil Naik
 
Apache airflow
Apache airflowApache airflow
Apache airflow
Pavel Alexeev
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
Sid Anand
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
Robert Sanders
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
Tao Feng
 

What's hot (20)

Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_on
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Airflow Intro-1.pdf
Airflow Intro-1.pdfAirflow Intro-1.pdf
Airflow Intro-1.pdf
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Apache Airflow | What Is An Operator
Apache Airflow | What Is An OperatorApache Airflow | What Is An Operator
Apache Airflow | What Is An Operator
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 

Similar to Airflow for Beginners

Flyte kubecon 2019 SanDiego
Flyte kubecon 2019 SanDiegoFlyte kubecon 2019 SanDiego
Flyte kubecon 2019 SanDiego
KetanUmare
 
Introduce Airflow.ppsx
Introduce Airflow.ppsxIntroduce Airflow.ppsx
Introduce Airflow.ppsx
ManKD
 
Managing transactions on Ethereum with Apache Airflow
Managing transactions on Ethereum with Apache AirflowManaging transactions on Ethereum with Apache Airflow
Managing transactions on Ethereum with Apache Airflow
Michael Ghen
 
ISI work
ISI workISI work
ISI work
dgarijo
 
Exploring SharePoint with F#
Exploring SharePoint with F#Exploring SharePoint with F#
Exploring SharePoint with F#Talbott Crowell
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data Integration
Philip Yurchuk
 
Srgoc dotnet
Srgoc dotnetSrgoc dotnet
Srgoc dotnet
Gaurav Singh
 
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersMSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
Dave Bost
 
DataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxDataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptx
John J Zhao
 
Chapter_01_Intro to_Airflow.pdf
Chapter_01_Intro to_Airflow.pdfChapter_01_Intro to_Airflow.pdf
Chapter_01_Intro to_Airflow.pdf
jinbebrookfrankyusop
 
LINQ 2 SQL Presentation To Palmchip And Trg, Technology Resource Group
LINQ 2 SQL Presentation To Palmchip  And Trg, Technology Resource GroupLINQ 2 SQL Presentation To Palmchip  And Trg, Technology Resource Group
LINQ 2 SQL Presentation To Palmchip And Trg, Technology Resource Group
Shahzad
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
Jonathan Holloway
 
Datastage free tutorial
Datastage free tutorialDatastage free tutorial
Datastage free tutorial
tekslate1
 
JAVA_BASICS.ppt
JAVA_BASICS.pptJAVA_BASICS.ppt
JAVA_BASICS.ppt
VGaneshKarthikeyan
 
Sqllite
SqlliteSqllite
Sqllite
Senthil Kumar
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
Bill Liu
 
Ty bca-sem-v-introduction to vb.net-i-uploaded
Ty bca-sem-v-introduction to vb.net-i-uploadedTy bca-sem-v-introduction to vb.net-i-uploaded
Ty bca-sem-v-introduction to vb.net-i-uploaded
Prachi Sasankar
 
Rifartek Robot Training Course - How to use ClientRobot
Rifartek Robot Training Course - How to use ClientRobotRifartek Robot Training Course - How to use ClientRobot
Rifartek Robot Training Course - How to use ClientRobotTsai Tsung-Yi
 
PowerPoint
PowerPointPowerPoint
PowerPointVideoguy
 

Similar to Airflow for Beginners (20)

Flyte kubecon 2019 SanDiego
Flyte kubecon 2019 SanDiegoFlyte kubecon 2019 SanDiego
Flyte kubecon 2019 SanDiego
 
Introduce Airflow.ppsx
Introduce Airflow.ppsxIntroduce Airflow.ppsx
Introduce Airflow.ppsx
 
Managing transactions on Ethereum with Apache Airflow
Managing transactions on Ethereum with Apache AirflowManaging transactions on Ethereum with Apache Airflow
Managing transactions on Ethereum with Apache Airflow
 
ISI work
ISI workISI work
ISI work
 
Exploring SharePoint with F#
Exploring SharePoint with F#Exploring SharePoint with F#
Exploring SharePoint with F#
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data Integration
 
Srgoc dotnet
Srgoc dotnetSrgoc dotnet
Srgoc dotnet
 
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersMSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
 
DataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxDataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptx
 
Chapter_01_Intro to_Airflow.pdf
Chapter_01_Intro to_Airflow.pdfChapter_01_Intro to_Airflow.pdf
Chapter_01_Intro to_Airflow.pdf
 
LINQ 2 SQL Presentation To Palmchip And Trg, Technology Resource Group
LINQ 2 SQL Presentation To Palmchip  And Trg, Technology Resource GroupLINQ 2 SQL Presentation To Palmchip  And Trg, Technology Resource Group
LINQ 2 SQL Presentation To Palmchip And Trg, Technology Resource Group
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
 
Datastage free tutorial
Datastage free tutorialDatastage free tutorial
Datastage free tutorial
 
JAVA_BASICS.ppt
JAVA_BASICS.pptJAVA_BASICS.ppt
JAVA_BASICS.ppt
 
Sqllite
SqlliteSqllite
Sqllite
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
 
Ty bca-sem-v-introduction to vb.net-i-uploaded
Ty bca-sem-v-introduction to vb.net-i-uploadedTy bca-sem-v-introduction to vb.net-i-uploaded
Ty bca-sem-v-introduction to vb.net-i-uploaded
 
GCF
GCFGCF
GCF
 
Rifartek Robot Training Course - How to use ClientRobot
Rifartek Robot Training Course - How to use ClientRobotRifartek Robot Training Course - How to use ClientRobot
Rifartek Robot Training Course - How to use ClientRobot
 
PowerPoint
PowerPointPowerPoint
PowerPoint
 

Recently uploaded

一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
zwunae
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
manasideore6
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 

Recently uploaded (20)

一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 

Airflow for Beginners

  • 2. What is Airflow? It is a tool to BUILD, SCHEDULE and MONITOR data pipelines Set of data processing elements connected in series. The output of one element is the input of the next one.
  • 3. I Create Questions table II Store data from Stack Overflow III Write filtered questions to S3 IV Render HTML template V Send me an email
  • 4. Building blocks of Airflow Operator (Worker) Knows how to perform a task and has the tools to do it. Example: Python Operator Postgres Operator Bash Operator Email Operator DAG (Protocol / Instructions) Describes the order of tasks and what to do if task is failing. Example: Run Task A, when it is finished, run Task B. If one of the tasks failed, stop the whole process and send me a notification. Task (Specific job) Job that is done by an Operator. Example: - Load data from some API using Python Operator - Write data to the database using MySQL Operator Hooks Interfaces to the external platforms and databases. Implements common interface (all hooks look very similar) and use Connections Example: S3 Hook Slack Hook HDFS Hook Connection Credentials to the external systems that can be securely stored in the Airflow. Example: Postgres Connection = Connection string to the Postgres database AWS Connection = AWS access keys Variables Like environment variables. Can store arbitrary information and be used in the Tasks Examples: Stack Overflow base URL Gmail Client ID and Secret XComs Let’s Tasks exchange small messages.
  • 5.
  • 6. I Create Questions table II Store data from Stack Overflow III Write filtered questions to S3 IV Render HTML template V Send me an email Postgres Connection Postgres Connection Postgres Connection S3 Connection Python Operator Python Operator Python Operator Postgres Hook S3 Connection S3 Hook Postgres Hook S3 HookPostgres Operator XCom XCom Variables Variables Email Operator
  • 7.
  • 8. What have we learned? - What is Apache Airflow - What is a data pipeline - Main Airflow concepts (DAG, Task, Operator, Connection, etc.) - First pipeline