Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Airflow for Beginners

56 views

Published on

A presentation about Apache Airflow at PyCon & PyData Berlin 2019.
https://github.com/karpenkovarya/airflow_for_beginners

Published in: Engineering
  • Be the first to comment

Airflow for Beginners

  1. 1. Airflow for beginners https://github.com/karpenkovarya/airflow_for_beginners
  2. 2. What is Airflow? It is a tool to BUILD, SCHEDULE and MONITOR data pipelines Set of data processing elements connected in series. The output of one element is the input of the next one.
  3. 3. I Create Questions table II Store data from Stack Overflow III Write filtered questions to S3 IV Render HTML template V Send me an email
  4. 4. Building blocks of Airflow Operator (Worker) Knows how to perform a task and has the tools to do it. Example: Python Operator Postgres Operator Bash Operator Email Operator DAG (Protocol / Instructions) Describes the order of tasks and what to do if task is failing. Example: Run Task A, when it is finished, run Task B. If one of the tasks failed, stop the whole process and send me a notification. Task (Specific job) Job that is done by an Operator. Example: - Load data from some API using Python Operator - Write data to the database using MySQL Operator Hooks Interfaces to the external platforms and databases. Implements common interface (all hooks look very similar) and use Connections Example: S3 Hook Slack Hook HDFS Hook Connection Credentials to the external systems that can be securely stored in the Airflow. Example: Postgres Connection = Connection string to the Postgres database AWS Connection = AWS access keys Variables Like environment variables. Can store arbitrary information and be used in the Tasks Examples: Stack Overflow base URL Gmail Client ID and Secret XComs Let’s Tasks exchange small messages.
  5. 5. I Create Questions table II Store data from Stack Overflow III Write filtered questions to S3 IV Render HTML template V Send me an email Postgres Connection Postgres Connection Postgres Connection S3 Connection Python Operator Python Operator Python Operator Postgres Hook S3 Connection S3 Hook Postgres Hook S3 HookPostgres Operator XCom XCom Variables Variables Email Operator
  6. 6. What have we learned? - What is Apache Airflow - What is a data pipeline - Main Airflow concepts (DAG, Task, Operator, Connection, etc.) - First pipeline
  7. 7. Thank you! 🌻✨💛 📬 hello@varya.io

×