2. What is Airflow?
• Airflow is an open-source workflow management platform.
• All the jobs which we create in Airflow is Python Script.
• It is a basic schedular which is used to schedule some jobs based on business requirements.
• Airflow is a data transformation pipeline ETL (Extract, Transform, Load) workflow orchestration
tool.
• In workflow we have multiple job’s running towards the business flow.
3. Explain how workflow is designed in Airflow?
• In airflow the Job’s are formed in a DAG’s(Directed Acyclic Graph) which means the jobs are
created in the form of Graph which have nodes and vertices.
• A directed acyclic graph (DAG) is used to design an Airflow workflow. That is to say, when creating
a workflow, consider how it can be divided into tasks that can be completed independently.
• Each DAG is schedule to run at a particular interval of time.
• DAG is a workflow; inside workflow we have jobs.
• Jobs will always run left to right, and each job is dependent on the other jobs.
4. What are the Airflow Task Level States?
Success: A success state shows that Airflow didn’t face any error while running the job and finished
successfully.
Running: A running state determines the Airflow scheduler already has assigned those dag or task
to the executor and is running the actual jobs.
Failed: A failed state points to the users that something goes wrong during the execution period,
and Airflow cannot schedule and execute it to the end.
Upstream_Failed: An upstream failed state is a state that refers to some errors that happened
upstream. As airflow marks the exact failed tasks, and all the downstream of that tasks are
marked as upstream failed
5. Skipped: A skipped state is a task that has been ignored execution, and the scheduler bypasses the
task and continues.
Up_For_Retry: A up_for_retry state means the previous attempts have failed since this task has
retried multiple times, it is marked as ready for retry.
Queued : A queue state is triggered when the task is waiting for a slot in an executor. Once a slot is
available, a task is pulled from the queue (with priority), and the worker is going to run it.
None: It is just a way to describe a new task created, and its initial state is unknown. You’d observe
the None state when a new dag run starts, all of its tasks are None in the first place. Also, if you
want to rerun some dags or tasks, once you clear the State, all of them become None states.
6. Scheduled: The Scheduled State is for Airflow to send tasks to the executor to run. Once all the
conditions have been met for scheduling, the Airflow scheduler will move forward with the task to
the Scheduled State.
7. Operators
• An operator encapsulates the operation to be performed in each task in a DAG. Airflow has a wide
range of built-in operators that can perform specific tasks some of which are platform-specific.
Some of the operators in airflow are:
• TimeDeltaSensor: Waits for a timedelta after the task’s execution_date + schedule interval
• EmailOperator: Airflow comes with an operator to send emails.
• TriggerDagRunOperator: This operator allows you to have a task in one DAG that triggers another
DAG in the same Airflow environment.
• PythonOperator: It is a very simple but powerful operator, allowing you to execute a Python
callable function from your DAG