3. WHAT IS AIRFLOW
Airflow is the workflow orchestration tools:
Manage scheduling and running jobs and data pipelines
Ensure jobs are ordered correctly based on dependencies
Provides mechanisms for tracking and monitoring state of jobs and recovering from
failure
4. BENEFITS OF AIRFLOW
Easy to Use: User need a bit of Python knowledge to use.
Open Source: Free and open source with a lot of contributing
Robust Integrations: Ready to use with various platforms and systems: BashShell, SFTP,
MySQL, Posgres, ORACLE, Python …
Standard Python: User can use Python to create workflow very flexibilities.
Visualization: Can monitor and manage workflows by web interface.
6. CORE CONCEPTS
DAG: Directed Acyclic Graph - A DAG is a series of tasks that you want to run as part of
your workflow. This might include something like execute bashshell, performing some
checklist by Python script or Database script ... In Airflow each of these steps would be
written as individual tasks in a DAG.
Airflow enables you to also specify the relationship between the tasks, any dependencies
(e.g. data having loaded in a table before a task is run) and the order in which the tasks
should be run.
7. CORE CONCEPTS
TASK: represent each node of a defined DAG. They are visual representations of the work
being done at each step of the workflow, with the actual work that they represent being
defined by operators
8. CORE CONCEPTS
Operators: An operator encapsulates the operation to be performed in each task in a DAG.
Airflow has a wide range of built-in operators that can perform specific tasks some of which are
platform-specific. Additionally, it is possible to create your own custom operators.
9. HOW TO CREATE WORKFLOW
DAG will be saved as a .py file in the dags directory. The steps to create a dag:
Define DAG
•Define dag_id, start_date
and how often the tasks
should be run.
•Airflow uses a CRON
expression to define the
schedule
CREATE TASKS
•Airflow provides a range
of operators to perform
most functions:
BashShell, Database
(Oracle, Postgres, MySQL
…), Python, FTP … to
create tasks.
ORDER TASKS
•Define dependency and
order to perform tasks