Why Airflow? & What's new in Airflow 2.3?

Jun. 16, 2022
Why Airflow? & What's new in Airflow 2.3?

Jun. 16, 2022
Talk: https://odsc.com/speakers/whats-new-in-apache-airflow-2-3/

This session talks about Why to use Apache Airflow & the awesome new features the community has built that were recently released in Apache Airflow 2.3.

Highlights:
- Dynamic Task Mapping
- First-class support for DB Downgrades
- Pruning old DB records (No need of using Maintenance DAGs anymore)
- Building Connections using JSON
- UI Improvements

The talk will also cover the growth of Airflow Community over years and why Airflow is still the defacto tool for Workflow Orchestration.

Why Airflow? & What's new in Airflow 2.3?

  1. 1. Why Airﬂow? & What's new in Airﬂow 2.3? Kaxil Naik ODSC 2022
  2. 2. Who am I? ● Committer & PMC Member of Apache Airflow ● Director of Airflow Engineering @ Astronomer @kaxil
  3. 3. What is Apache Airﬂow?
  4. 4. A platform to programmatically author, schedule, and monitor workﬂows
  5. 5. A platform to programmatically author, schedule, and monitor workﬂows
  6. 6. A platform to programmatically author, schedule, and monitor workﬂows
  7. 7. A platform to programmatically author, schedule, and monitor workﬂows
  8. 8. Example DAG
  9. 9. Why Apache Airﬂow?
  10. 10. 2k+ 26.3k 6.8m+ Monthly Downloads GitHub Stars Contributors 23k+ Slack Members The Community
  11. 11. Under …
  12. 12. Governed by 48 Committers 27 PMC Members Project Management Committee Project Management Committee
  13. 13. Integrations And ……
  14. 14. 75+ Providers
  15. 15. Docker Image docker pull apache/airflow
  16. 16. Helm Chart helm repo add apache-airflow https://airflow.apache.org/ helm install my-airflow apache-airflow/airflow
  17. 17. Conference & Meetups 13 Local Groups 3 years with min 6k-10k registrants every year
  18. 18. Managed Airﬂow Vendors
  19. 19. What’s new in Airﬂow 2.3?
  20. 20. Biggest Airﬂow Release since 2.0 700+ commits! with 50 new features
  21. 21. Dynamic Task Mapping Highlight feature of 2.3 First-class support for common ETL pattern around dynamic tasks Run same set of tasks for N number of ﬁles in a bucket, DB records, ML models where N is unpredictable.
  22. 22. Dynamic Task Mapping Before After
  23. 23. Dynamic Task Mapping
  24. 24. Grid View replaces Tree View!!
  25. 25. Grid View replaces Tree View!! Better support for Task Groups & Task Mapping Grid lines and hover eﬀects to see which task you are inspecting Show durations of dag runs to quickly see performance changes Paves way for DAG Versioning
  26. 26. Create Connection in native JSON format
  27. 27. Create Connection in native JSON format
  28. 28. Create Connection in native JSON format
  29. 29. Create Connection in native JSON format
  30. 30. DB downgrades First class support Downgrades to a - Airﬂow version - or to a speciﬁc Alembic revision id
  31. 31. DB downgrades First class support
  32. 32. Generate SQL for DB upgrade & downgrade Allows DBA to run the DB Migrations ("--show-sql-only" ﬂag)
  33. 33. Purge DB history First class support Helps reduce time when running DB Migrations when updating Airﬂow version Removes need of Maintenance DAGs! ‘--dry-run’ option to print the row counts in the tables to be cleaned Backup your DB before running this!
  34. 34. LocalKubernetesExecutor Speed, Isolation & Simplicity packed in one! Allows users to simultaneously run a LocalExecutor and KubernetesExecutor. An executor is chosen to run a task based on the task's queue Tasks just calling APIs + Tasks requiring isolation due to dependencies or computation-heavy Slide from Jed’s Airflow’s Summit talk: https://www.crowdcast.io/e/airflowsummit2022/35
  35. 35. DAG Processor separation Standalone process for DAG parsing “airflow dag-processor” CLI Command Code Parsing and Callbacks (Sla + DAG’s on_{success,failure}_callbacks) Makes scheduler not run any user code* First step towards multi-tenancy Disabled by default, can be enabled by Images from AIP-43 AIRFLOW__SCHEDULER__STANDALONE_DAG_PROCESSOR=True
  36. 36. Events Timetable Run DAGs at arbitrary dates Built-in Timetable Useful for events which can’t be expressed by Cron or Timedelta
  37. 37. Smooth Operator
  38. 38. Other Minor features Minor but very handy! ● A new REST API endpoint (‘/dags’) that lets you bulk-pause/resume DAGs ● airflow dags reserialize command to delete serialized dags & reparse them ● A new listener plugin API that tracks TaskInstance state changes (used by OpenLineage) ● New Trigger Rule: all_skipped ● Doc: Single page to check Changelog & Updating Guide -> ‘Release Notes’ ● (Experimental) Support for ARM Docker Images
  39. 39. Upgrade Now to Airﬂow 2.3!
  40. 40. Thank You @kaxil

