SlideShare a Scribd company logo
1 of 58
Download to read offline
What’s coming
in Apache
Airflow 2.0
Polidea
Polidea
Apache Airflow
Airflow is a platform to programmatically author,
schedule and monitor workflows.
Dynamic/Elegant
Extensible
Scalable
Polidea
What’s on
today ?
Polidea
What is the presentation about ?
● The team @ Polidea
● What the Airflow ?
● Where Apache Airflow is now?
● What’s coming in Apache Airflow 2.0.
Polidea
Team @ Polidea
Polidea
Logo or mockup
Hi!
Jarek Potiuk
Principal Software Engineer @Polidea
Apache Airflow PMC member
Certified GCP Architect
ex-Googler, ex-CTO, ex-choir member
@higrys
Polidea
Apache Airflow Development team@ Polidea
Jarek Potiuk Kamil Breguła Tomasz Urbaszek Karolina Rosół
Dariusz Aniszewski Szymon Przedwojski Antoni Smoliński
Tobiasz Kędzierski Michał Słowikowski
PMC
Past:
Polidea
Apache Airflow Website team @ Polidea
Kamil Breguła Zuzanna Rykowska Kamil Gabryjelski
Magdalena WęgrzyńskaMarta StrzałkowskaTomasz Urbaszek
Polidea
70+
TALENTS
100+
PROJECTS
DELIVERED
3m
USERS OF
OUR APPS
75%
OF BUSINESS
THROUGH
REFERRALS
Team
@Polidea
Polidea
Polidea &
Apache Airflow
Polidea
August 2018
2 people
Timeline December 2019
6 (9) people
Polidea
Our tasks
● 130+ operators
● 18+ GCP services
● Oozie-To-Airflow
● New Apache Airflow Website
Polidea
What we delivered extra
● Documentation improvements
● Breeze - improved dev environment
● Py2 -> Py3
● Pylint compatibility
● Pre-commit framework introduction
● CI environment reimplemented
● Operator scaffolding
● Convert tests to pytests
2 Apache Airflow Committers
Apache Airflow PMC member
Polidea
Open-source friendly company
Polidea
Apache
Airflow
Polidea
Why Apache Airflow and not one of these?
And many, many, many more ....
Polidea
Airflow is an Orchestrator
● Tells others what/when to do
● Synchronizes work between others
● Monitors what’s going on
● Intervenes if needed
● Mostly does not do much
Polidea
Airflow is Python
Polidea
Arbitrary complex workflows as a program
Polidea
Airflow has usable UI
Polidea
Airflow CLI
Polidea
What Airflow shines at ?
● Regular batch ETL jobs (think CRON)
● Processing fixed intervals of data
● Managing complex dependencies
● Backfilling data
● Interfacing to hundreds of different systems
● Platform for others to generate DAG files
Polidea
Apache Airflow 1.10
state of the pinwheel
Polidea
Current versions
● 1.10.2, 1.10.3, 1.10.4, 1.10.5, 1.10.6 ….
● 1.10.7 in the making
● Deployed in thousands of companies
● On the rise of usage
● 2.0 - in master
Polidea
How to stay relevant ?
● Cloud Native is coming
● APIs are backbone of modern software
● User Interface matters
● Performance and reliability matter
● Many services, many changes
● Community over code
Polidea
End of 2019 survey: 300 responses(!)
● Started by Tomasz Urbaszek
● Run for the last 2 weeks
● Fresh off-the press
● Some surprises found
● Going in the right direction
Polidea
Polidea
What do you use Airflow for?
Data processing (ETL) 97%
Artificial Intelligence and Machine Learning
Pipelines 29%
Automating DevOps operations 21%
Polidea
What can be improved ?
Scheduler performance 61%
Web UI 58%
Logging, monitoring and alerting 47%
Examples, howtos, onboarding documentation 46%
Technical documentation 44%
Reliability 36%
REST API 31%
Authentication and authorization 29%
Polidea
What would be the most interesting feature for you ?
Production-ready docker image 56%
Declarative way of writing DAGs 50%
Horizontal autoscaling 40%
Examples, howtos, onboarding documentation 46%
Asynchronous Operators 31%
Stateless web server 26%
Knative Executor 16%
I already have all I need 4%
Polidea
Apache Airflow
2.0
Polidea
Cloud Native is coming
Polidea
Polidea
No - we do not plan to use Kubernetes near term 29%
Yes - setup on our own via Helm Chart or similar 21%
Not yet - but we use Kubernetes in our organization and we
could move 20%
Yes - via managed service in the cloud
(Composer/Astronomer etc.) 15%
Not yet - but we plan to deploy Kubernetes in our
organization soon 14%
Other 2%
Either use or can use Kubernetes in foreseeable future 69%
Do not have plans to use Kubernetes 29%
Do you use Kubernetes-based deployments for Airflow?
Polidea
Cloud Native is coming: Scalability
● Knative Executor
● SIG-Knative => SIG Scalability
● Native Airflow Executor (WIP)
● Pub/Sub communication
● Horizontally auto-scalable
Polidea
Cloud Native is coming: Deployability
● Native worker deployable at different providers
● “As a service” and “on-premises” friendly
● Generic Pub/Sub architecture for communication
● No DB communication between components
● Production-optimised docker image
Polidea
Cloud Native is coming: Monitoring
● Integrate with standard monitoring tools
● More metrics exported using stats
● Integration with Prometheus on Kubernetes
● Horizontal Scalability approach based on metrics
Polidea
APIs are backbone of modern software
Polidea
APIs are taking over the world
● Modern API
● HTTP-based API used by CLI, webserver
● Pub/Sub API for communication Scheduler <> Workers
● Generic APIs - not tied to Kubernetes/other deployment options
● Better Authentication/Authorization
● Opens up multi-tenancy capabilities
Polidea
User Interface matters
Polidea
Original Airflow Graphical User Interface 97%
CLI 40%
API (experimental) 20%
Custom Own Created UI 8%
Which interface(s) of Airflow do you use as part of your current role?
Polidea
UIs are getting better
● Make UI refresh like it’s 2020
● Modern design (possibly)
● Use APIs for communication not DB/file access
● Better authentication and authorisation
● Stateless web-server
● Better responsiveness
Polidea
Performance and reliability
matter
Polidea
Performance and reliability is important
● Automated performance testing (CI - targeted)
● Monitoring performance characteristics
● Improve Webserver/Scheduler Performance
● Internal instrumentation and optimisations
Polidea
Many services, fast changes
Polidea
Fast evolving services
● Currently operators bound to releases of Airflow
● Migration to 2.0 will take time
● Introducing new approach
○ move operators to new path/namespaces
○ change import paths
○ backporting to 1.10
○ backportable to 1.10 (!)
○ future: per-provider packaging
Polidea
Community over code
Polidea
Polidea
Community over code: Documentation
● Google Season of Docs - great programme!
● Onboarding, best practices, architecture, deployment options
● Better, clearer structure
● Both user and developer documentation improved
● Worked with technical writers from India and Russia
Polidea
Community over code: New website: airflow.apache.org
Work sponsored by Google Cloud
Polidea
Community over code: Development environment
● It’s a Breeze to develop Apache Airflow
● Get your environment up in 10 minutes
● Integration with IDE
● Well documented
● Team-work enabler
● Allows to run and debug DAGs
● Fully debuggable: DebugExecutor - cooperation with Databand.ai
Friday, December 13, 2019
5:30 PM to 9:30 PM
Polidea Sp. z o.o.
Przeskok 2 IV p. · Warsaw
https://t.co/TmWdWwfemI
First Warsaw Apache Airflow Workshop
Polidea
Thanks!
hello@polidea.com
Polidea
Thanks!
Prototypes Widely Used
UX Design
Personal Growth Individuality Trust
Beter, Fester
Ideation
API Design
Shipping
Building
blocks
Backend VR Android Learn
more
Firmware
Decision
Coffee
CollaborationContactCVSave moneyOpen Source
Management
React
Native
Testing Team
Quality iOS Maintain Security Front End
Flutter
Task
Seamless UXResearchMobile AR

More Related Content

What's hot

What's hot (20)

Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
 
Airflow 101
Airflow 101Airflow 101
Airflow 101
 
Devops Porto - CI/CD at Gitlab
Devops Porto - CI/CD at GitlabDevops Porto - CI/CD at Gitlab
Devops Porto - CI/CD at Gitlab
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
 
GitLab for CI/CD process
GitLab for CI/CD processGitLab for CI/CD process
GitLab for CI/CD process
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Kube Your Enthusiasm - Paul Czarkowski
Kube Your Enthusiasm - Paul CzarkowskiKube Your Enthusiasm - Paul Czarkowski
Kube Your Enthusiasm - Paul Czarkowski
 
Fyber - airflow best practices in production
Fyber - airflow best practices in productionFyber - airflow best practices in production
Fyber - airflow best practices in production
 
GitOps Toolkit (Cloud Native Nordics Tech Talk)
GitOps Toolkit (Cloud Native Nordics Tech Talk)GitOps Toolkit (Cloud Native Nordics Tech Talk)
GitOps Toolkit (Cloud Native Nordics Tech Talk)
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
 
Importance of GCP: 30 Days of GCP
Importance of GCP: 30 Days of GCPImportance of GCP: 30 Days of GCP
Importance of GCP: 30 Days of GCP
 
Java 8
Java 8Java 8
Java 8
 
Introduction of cloud native CI/CD on kubernetes
Introduction of cloud native CI/CD on kubernetesIntroduction of cloud native CI/CD on kubernetes
Introduction of cloud native CI/CD on kubernetes
 
Gradle
GradleGradle
Gradle
 
Gradle build capabilities
Gradle build capabilities Gradle build capabilities
Gradle build capabilities
 
Load impact insights webinar
Load impact insights webinarLoad impact insights webinar
Load impact insights webinar
 
Argocd up and running
Argocd up and runningArgocd up and running
Argocd up and running
 
Docker New York City: From GitOps to a scalable CI/CD Pattern for Kubernetes
Docker New York City: From GitOps to a scalable CI/CD Pattern for KubernetesDocker New York City: From GitOps to a scalable CI/CD Pattern for Kubernetes
Docker New York City: From GitOps to a scalable CI/CD Pattern for Kubernetes
 
5 Habits of High-Velocity Teams Using Kubernetes
5 Habits of High-Velocity Teams Using Kubernetes5 Habits of High-Velocity Teams Using Kubernetes
5 Habits of High-Velocity Teams Using Kubernetes
 
So you want to write a cloud function
So you want to write a cloud functionSo you want to write a cloud function
So you want to write a cloud function
 

Similar to What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019

Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
Gibran Badrulzaman
 
The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0
The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0
The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0
WSO2
 

Similar to What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019 (20)

PCF Cloud-Native Workshop Slides
PCF Cloud-Native Workshop SlidesPCF Cloud-Native Workshop Slides
PCF Cloud-Native Workshop Slides
 
Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ...
 Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ... Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ...
Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ...
 
It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)
 
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
 
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for YouWebinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
 
London-MuleSoft-Meetup-April-19-2023
London-MuleSoft-Meetup-April-19-2023London-MuleSoft-Meetup-April-19-2023
London-MuleSoft-Meetup-April-19-2023
 
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
 
Pivotal + Apigee Workshop (June 4th, 2019)
Pivotal + Apigee Workshop (June 4th, 2019)Pivotal + Apigee Workshop (June 4th, 2019)
Pivotal + Apigee Workshop (June 4th, 2019)
 
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
 
London MuleSoft Meetup
London MuleSoft Meetup London MuleSoft Meetup
London MuleSoft Meetup
 
Continuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event KeynoteContinuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event Keynote
 
Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
 
How to Scale Operations for a Multi-Cloud Platform using PCF
How to Scale Operations for a Multi-Cloud Platform using PCFHow to Scale Operations for a Multi-Cloud Platform using PCF
How to Scale Operations for a Multi-Cloud Platform using PCF
 
Mule soft meetup__jaipur_december_2020_final
Mule soft meetup__jaipur_december_2020_finalMule soft meetup__jaipur_december_2020_final
Mule soft meetup__jaipur_december_2020_final
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
 
Introduction to DevOps and the Practical Use Cases at Credit OK
Introduction to DevOps and the Practical Use Cases at Credit OKIntroduction to DevOps and the Practical Use Cases at Credit OK
Introduction to DevOps and the Practical Use Cases at Credit OK
 
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
 
DocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winnerDocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winner
 
The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0
The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0
The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0
 
GCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native ArchitecturesGCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native Architectures
 

More from Jarek Potiuk

Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Jarek Potiuk
 

More from Jarek Potiuk (8)

Subtle Differences between Python versions
Subtle Differences between Python versionsSubtle Differences between Python versions
Subtle Differences between Python versions
 
Caching in Docker - the hardest thing in computer science
Caching in Docker - the hardest thing in computer scienceCaching in Docker - the hardest thing in computer science
Caching in Docker - the hardest thing in computer science
 
Off time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social mediaOff time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social media
 
Berlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow WorkshopsBerlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow Workshops
 
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
 
Ci for android OS
Ci for android OSCi for android OS
Ci for android OS
 
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
 
React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019

  • 1.
  • 4. Polidea Apache Airflow Airflow is a platform to programmatically author, schedule and monitor workflows. Dynamic/Elegant Extensible Scalable
  • 6. Polidea What is the presentation about ? ● The team @ Polidea ● What the Airflow ? ● Where Apache Airflow is now? ● What’s coming in Apache Airflow 2.0.
  • 8. Polidea Logo or mockup Hi! Jarek Potiuk Principal Software Engineer @Polidea Apache Airflow PMC member Certified GCP Architect ex-Googler, ex-CTO, ex-choir member @higrys
  • 9. Polidea Apache Airflow Development team@ Polidea Jarek Potiuk Kamil Breguła Tomasz Urbaszek Karolina Rosół Dariusz Aniszewski Szymon Przedwojski Antoni Smoliński Tobiasz Kędzierski Michał Słowikowski PMC Past:
  • 10. Polidea Apache Airflow Website team @ Polidea Kamil Breguła Zuzanna Rykowska Kamil Gabryjelski Magdalena WęgrzyńskaMarta StrzałkowskaTomasz Urbaszek
  • 13. Polidea August 2018 2 people Timeline December 2019 6 (9) people
  • 14. Polidea Our tasks ● 130+ operators ● 18+ GCP services ● Oozie-To-Airflow ● New Apache Airflow Website
  • 15. Polidea What we delivered extra ● Documentation improvements ● Breeze - improved dev environment ● Py2 -> Py3 ● Pylint compatibility ● Pre-commit framework introduction ● CI environment reimplemented ● Operator scaffolding ● Convert tests to pytests 2 Apache Airflow Committers Apache Airflow PMC member
  • 18. Polidea Why Apache Airflow and not one of these? And many, many, many more ....
  • 19. Polidea Airflow is an Orchestrator ● Tells others what/when to do ● Synchronizes work between others ● Monitors what’s going on ● Intervenes if needed ● Mostly does not do much
  • 24. Polidea What Airflow shines at ? ● Regular batch ETL jobs (think CRON) ● Processing fixed intervals of data ● Managing complex dependencies ● Backfilling data ● Interfacing to hundreds of different systems ● Platform for others to generate DAG files
  • 26. Polidea Current versions ● 1.10.2, 1.10.3, 1.10.4, 1.10.5, 1.10.6 …. ● 1.10.7 in the making ● Deployed in thousands of companies ● On the rise of usage ● 2.0 - in master
  • 27. Polidea How to stay relevant ? ● Cloud Native is coming ● APIs are backbone of modern software ● User Interface matters ● Performance and reliability matter ● Many services, many changes ● Community over code
  • 28. Polidea End of 2019 survey: 300 responses(!) ● Started by Tomasz Urbaszek ● Run for the last 2 weeks ● Fresh off-the press ● Some surprises found ● Going in the right direction
  • 30. Polidea What do you use Airflow for? Data processing (ETL) 97% Artificial Intelligence and Machine Learning Pipelines 29% Automating DevOps operations 21%
  • 31. Polidea What can be improved ? Scheduler performance 61% Web UI 58% Logging, monitoring and alerting 47% Examples, howtos, onboarding documentation 46% Technical documentation 44% Reliability 36% REST API 31% Authentication and authorization 29%
  • 32. Polidea What would be the most interesting feature for you ? Production-ready docker image 56% Declarative way of writing DAGs 50% Horizontal autoscaling 40% Examples, howtos, onboarding documentation 46% Asynchronous Operators 31% Stateless web server 26% Knative Executor 16% I already have all I need 4%
  • 36. Polidea No - we do not plan to use Kubernetes near term 29% Yes - setup on our own via Helm Chart or similar 21% Not yet - but we use Kubernetes in our organization and we could move 20% Yes - via managed service in the cloud (Composer/Astronomer etc.) 15% Not yet - but we plan to deploy Kubernetes in our organization soon 14% Other 2% Either use or can use Kubernetes in foreseeable future 69% Do not have plans to use Kubernetes 29% Do you use Kubernetes-based deployments for Airflow?
  • 37. Polidea Cloud Native is coming: Scalability ● Knative Executor ● SIG-Knative => SIG Scalability ● Native Airflow Executor (WIP) ● Pub/Sub communication ● Horizontally auto-scalable
  • 38. Polidea Cloud Native is coming: Deployability ● Native worker deployable at different providers ● “As a service” and “on-premises” friendly ● Generic Pub/Sub architecture for communication ● No DB communication between components ● Production-optimised docker image
  • 39. Polidea Cloud Native is coming: Monitoring ● Integrate with standard monitoring tools ● More metrics exported using stats ● Integration with Prometheus on Kubernetes ● Horizontal Scalability approach based on metrics
  • 40. Polidea APIs are backbone of modern software
  • 41. Polidea APIs are taking over the world ● Modern API ● HTTP-based API used by CLI, webserver ● Pub/Sub API for communication Scheduler <> Workers ● Generic APIs - not tied to Kubernetes/other deployment options ● Better Authentication/Authorization ● Opens up multi-tenancy capabilities
  • 43. Polidea Original Airflow Graphical User Interface 97% CLI 40% API (experimental) 20% Custom Own Created UI 8% Which interface(s) of Airflow do you use as part of your current role?
  • 44. Polidea UIs are getting better ● Make UI refresh like it’s 2020 ● Modern design (possibly) ● Use APIs for communication not DB/file access ● Better authentication and authorisation ● Stateless web-server ● Better responsiveness
  • 46. Polidea Performance and reliability is important ● Automated performance testing (CI - targeted) ● Monitoring performance characteristics ● Improve Webserver/Scheduler Performance ● Internal instrumentation and optimisations
  • 48. Polidea Fast evolving services ● Currently operators bound to releases of Airflow ● Migration to 2.0 will take time ● Introducing new approach ○ move operators to new path/namespaces ○ change import paths ○ backporting to 1.10 ○ backportable to 1.10 (!) ○ future: per-provider packaging
  • 51. Polidea Community over code: Documentation ● Google Season of Docs - great programme! ● Onboarding, best practices, architecture, deployment options ● Better, clearer structure ● Both user and developer documentation improved ● Worked with technical writers from India and Russia
  • 52. Polidea Community over code: New website: airflow.apache.org Work sponsored by Google Cloud
  • 53. Polidea Community over code: Development environment ● It’s a Breeze to develop Apache Airflow ● Get your environment up in 10 minutes ● Integration with IDE ● Well documented ● Team-work enabler ● Allows to run and debug DAGs ● Fully debuggable: DebugExecutor - cooperation with Databand.ai
  • 54.
  • 55. Friday, December 13, 2019 5:30 PM to 9:30 PM Polidea Sp. z o.o. Przeskok 2 IV p. · Warsaw https://t.co/TmWdWwfemI First Warsaw Apache Airflow Workshop
  • 58. Prototypes Widely Used UX Design Personal Growth Individuality Trust Beter, Fester Ideation API Design Shipping Building blocks Backend VR Android Learn more Firmware Decision Coffee CollaborationContactCVSave moneyOpen Source Management React Native Testing Team Quality iOS Maintain Security Front End Flutter Task Seamless UXResearchMobile AR