SlideShare a Scribd company logo
1 of 13
Testing in Airflow
Chandu Kavar
Data Engineer
Grab, Ex-ThoughtWorker
@chandukavar
Traditional Web App Testing
We mostly write -
● Unit Tests
● Integration Tests
● Functional Tests
Client
Database
Server
Stress testing, snapshot testing etc. for
complex and large scale application.
● Typos, cyclicity, mandatory parameters etc. in the DAGs
● No of tasks, nature of tasks in the DAG
● Upstream and downstream dependencies of each task
● Custom Operator, sensor etc.
● Communication between tasks such as xComs
● End-to-End pipeline
What can we test in Airflow?
Five Categories of tests we can write -
1. DAG validation tests
2. DAG/Pipeline definition tests
3. Unit tests for custom operators, sensor etc.
4. Integration tests
5. End-to-End pipeline tests
Sample DAG
dag = DAG('hello_world',
description='Hello world example',
schedule_interval='0 12 * * *',
start_date=datetime(2017, 3, 20))
dummy_operator = DummyOperator(task_id='dummy_task', retries = 3, dag=dag)
hello_operator = PythonOperator(task_id='hello_task', python_callable=print_hello, dag=dag)
multiplyby5_operator = MultiplyBy5Operator(my_operator_param= 20, task_id='multiplyby5_task',
dag=dag)
dummy_operator >> hello_operator
dummy_operator >> multiplyby5_operator
def print_hello():
return 'Hello'
DAG Validation Tests
class TestDagIntegrity(unittest.TestCase):
def setUp(self):
self.dagbag = DagBag()
def test_import_dags(self):
self.assertFalse(
len(self.dagbag.import_errors),
'DAG import failures. Errors: {}'.format(
self.dagbag.import_errors
)
)
def test_queue_present(self):
for dag_id, dag in self.dagbag.dags.iteritems():
queue = dag.default_args.get('queue', None)
msg = Queue not set for DAG {id}'.format(id=dag_id)
self.assertNotEqual(None, queue, msg)
DAG Definition Tests
class TestHelloWorldDAG(unittest.TestCase):
"""Check HelloWorldDAG expectation"""
def setUp(self):
self.dagbag = DagBag()
def test_task_count(self):
"""Check task count of hello_world dag"""
dag_id='hello_world'
dag = self.dagbag.get_dag(dag_id)
self.assertEqual(len(dag.tasks), 3)
def test_contain_tasks(self):
"""Check task contains in hello_world dag"""
dag_id='hello_world'
dag = self.dagbag.get_dag(dag_id)
tasks = dag.tasks
task_ids = list(map(lambda task: task.task_id, tasks))
self.assertListEqual(task_ids, ['dummy_task', 'multiplyby5_task','hello_task'])
DAG Definition Tests
def test_dependencies_of_dummy_task(self):
"""Check the task dependencies of dummy_task in hello_world dag"""
dag_id='hello_world'
dag = self.dagbag.get_dag(dag_id)
dummy_task = dag.get_task('dummy_task')
upstream_task_ids = list(map(lambda task: task.task_id, dummy_task.upstream_list))
self.assertListEqual(upstream_task_ids, [])
downstream_task_ids = list(map(lambda task: task.task_id, dummy_task.downstream_list))
self.assertListEqual(downstream_task_ids, ['hello_task', 'multiplyby5_task'])
Custom Operator
class MultiplyBy5Operator(BaseOperator):
@apply_defaults
def __init__(self, my_operator_param, *args, **kwargs):
self.operator_param = my_operator_param
super(MultiplyBy5Operator, self).__init__(*args, **kwargs)
def execute(self, context):
log.info('operator_param: %s', self.operator_param)
return (self.operator_param * 5)
class MultiplyBy5Plugin(AirflowPlugin):
name = "multiplyby5_plugin"
Unit tests - Custom Operator
class TestMultiplyBy5Operator(unittest.TestCase):
def test_execute(self):
dag = DAG(dag_id='anydag', start_date=datetime.now())
task = MultiplyBy5Operator(my_operator_param=10, dag=dag, task_id='anytask')
ti = TaskInstance(task=task, execution_date=datetime.now())
result = task.execute(ti.get_template_context())
self.assertEqual(result, 50)
https://github.com/chandulal/airflow-testing
Don’t forget to give a star!
Blog post on medium -
https://medium.com/@chandukavar
Thanks!

More Related Content

What's hot

Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Prometheus
PrometheusPrometheus
Prometheuswyukawa
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to PrometheusJulien Pivotto
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberFlink Forward
 
NGINX: Basics & Best Practices - EMEA Broadcast
NGINX: Basics & Best Practices - EMEA BroadcastNGINX: Basics & Best Practices - EMEA Broadcast
NGINX: Basics & Best Practices - EMEA BroadcastNGINX, Inc.
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
 
Client side performance testing using blazemeter
Client side performance testing using blazemeterClient side performance testing using blazemeter
Client side performance testing using blazemeterGowthamraj Palani
 
RESTful API In Node Js using Express
RESTful API In Node Js using Express RESTful API In Node Js using Express
RESTful API In Node Js using Express Jeetendra singh
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Integrating microservices with apache camel on kubernetes
Integrating microservices with apache camel on kubernetesIntegrating microservices with apache camel on kubernetes
Integrating microservices with apache camel on kubernetesClaus Ibsen
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅NAVER D2
 
Node JS Crash Course
Node JS Crash CourseNode JS Crash Course
Node JS Crash CourseHaim Michael
 
Architecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash WorkshopArchitecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash WorkshopSudhir Tonse
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화OpenStack Korea Community
 
How to Avoid the Top 5 NGINX Configuration Mistakes
How to Avoid the Top 5 NGINX Configuration MistakesHow to Avoid the Top 5 NGINX Configuration Mistakes
How to Avoid the Top 5 NGINX Configuration MistakesNGINX, Inc.
 
Docker introduction
Docker introductionDocker introduction
Docker introductionPhuc Nguyen
 
Introduction to Docker storage, volume and image
Introduction to Docker storage, volume and imageIntroduction to Docker storage, volume and image
Introduction to Docker storage, volume and imageejlp12
 

What's hot (20)

Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Prometheus
PrometheusPrometheus
Prometheus
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
NGINX: Basics & Best Practices - EMEA Broadcast
NGINX: Basics & Best Practices - EMEA BroadcastNGINX: Basics & Best Practices - EMEA Broadcast
NGINX: Basics & Best Practices - EMEA Broadcast
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
Client side performance testing using blazemeter
Client side performance testing using blazemeterClient side performance testing using blazemeter
Client side performance testing using blazemeter
 
RESTful API In Node Js using Express
RESTful API In Node Js using Express RESTful API In Node Js using Express
RESTful API In Node Js using Express
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Spring Boot
Spring BootSpring Boot
Spring Boot
 
Integrating microservices with apache camel on kubernetes
Integrating microservices with apache camel on kubernetesIntegrating microservices with apache camel on kubernetes
Integrating microservices with apache camel on kubernetes
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
 
Node JS Crash Course
Node JS Crash CourseNode JS Crash Course
Node JS Crash Course
 
Architecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash WorkshopArchitecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash Workshop
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
How to Avoid the Top 5 NGINX Configuration Mistakes
How to Avoid the Top 5 NGINX Configuration MistakesHow to Avoid the Top 5 NGINX Configuration Mistakes
How to Avoid the Top 5 NGINX Configuration Mistakes
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
Introduction to Docker storage, volume and image
Introduction to Docker storage, volume and imageIntroduction to Docker storage, volume and image
Introduction to Docker storage, volume and image
 

Similar to Testing in airflow

Angularjs Test Driven Development (TDD)
Angularjs Test Driven Development (TDD)Angularjs Test Driven Development (TDD)
Angularjs Test Driven Development (TDD)Anis Bouhachem Djer
 
Test-Driven Development of AngularJS Applications
Test-Driven Development of AngularJS ApplicationsTest-Driven Development of AngularJS Applications
Test-Driven Development of AngularJS ApplicationsFITC
 
E2E testing con nightwatch.js - Drupalcamp Spain 2018
E2E testing con nightwatch.js  - Drupalcamp Spain 2018E2E testing con nightwatch.js  - Drupalcamp Spain 2018
E2E testing con nightwatch.js - Drupalcamp Spain 2018Salvador Molina (Slv_)
 
Javascript first-class citizenery
Javascript first-class citizeneryJavascript first-class citizenery
Javascript first-class citizenerytoddbr
 
Browser testing with nightwatch.js - Drupal Europe
Browser testing with nightwatch.js - Drupal EuropeBrowser testing with nightwatch.js - Drupal Europe
Browser testing with nightwatch.js - Drupal EuropeSalvador Molina (Slv_)
 
Slaven tomac unit testing in angular js
Slaven tomac   unit testing in angular jsSlaven tomac   unit testing in angular js
Slaven tomac unit testing in angular jsSlaven Tomac
 
Intro To JavaScript Unit Testing - Ran Mizrahi
Intro To JavaScript Unit Testing - Ran MizrahiIntro To JavaScript Unit Testing - Ran Mizrahi
Intro To JavaScript Unit Testing - Ran MizrahiRan Mizrahi
 
Automated acceptance test
Automated acceptance testAutomated acceptance test
Automated acceptance testBryan Liu
 
Improving your Gradle builds
Improving your Gradle buildsImproving your Gradle builds
Improving your Gradle buildsPeter Ledbrook
 
Gradle For Beginners (Serbian Developer Conference 2013 english)
Gradle For Beginners (Serbian Developer Conference 2013 english)Gradle For Beginners (Serbian Developer Conference 2013 english)
Gradle For Beginners (Serbian Developer Conference 2013 english)Joachim Baumann
 
Designing REST API automation tests in Kotlin
Designing REST API automation tests in KotlinDesigning REST API automation tests in Kotlin
Designing REST API automation tests in KotlinDmitriy Sobko
 
Using Task Queues and D3.js to build an analytics product on App Engine
Using Task Queues and D3.js to build an analytics product on App EngineUsing Task Queues and D3.js to build an analytics product on App Engine
Using Task Queues and D3.js to build an analytics product on App EngineRiver of Talent
 
Gradle - time for another build
Gradle - time for another buildGradle - time for another build
Gradle - time for another buildIgor Khotin
 

Similar to Testing in airflow (20)

Angularjs Test Driven Development (TDD)
Angularjs Test Driven Development (TDD)Angularjs Test Driven Development (TDD)
Angularjs Test Driven Development (TDD)
 
Test-Driven Development of AngularJS Applications
Test-Driven Development of AngularJS ApplicationsTest-Driven Development of AngularJS Applications
Test-Driven Development of AngularJS Applications
 
E2E testing con nightwatch.js - Drupalcamp Spain 2018
E2E testing con nightwatch.js  - Drupalcamp Spain 2018E2E testing con nightwatch.js  - Drupalcamp Spain 2018
E2E testing con nightwatch.js - Drupalcamp Spain 2018
 
Javascript first-class citizenery
Javascript first-class citizeneryJavascript first-class citizenery
Javascript first-class citizenery
 
Browser testing with nightwatch.js - Drupal Europe
Browser testing with nightwatch.js - Drupal EuropeBrowser testing with nightwatch.js - Drupal Europe
Browser testing with nightwatch.js - Drupal Europe
 
Slaven tomac unit testing in angular js
Slaven tomac   unit testing in angular jsSlaven tomac   unit testing in angular js
Slaven tomac unit testing in angular js
 
Testing in JavaScript
Testing in JavaScriptTesting in JavaScript
Testing in JavaScript
 
Intro To JavaScript Unit Testing - Ran Mizrahi
Intro To JavaScript Unit Testing - Ran MizrahiIntro To JavaScript Unit Testing - Ran Mizrahi
Intro To JavaScript Unit Testing - Ran Mizrahi
 
Browser testing with nightwatch.js
Browser testing with nightwatch.jsBrowser testing with nightwatch.js
Browser testing with nightwatch.js
 
Automated acceptance test
Automated acceptance testAutomated acceptance test
Automated acceptance test
 
Improving your Gradle builds
Improving your Gradle buildsImproving your Gradle builds
Improving your Gradle builds
 
JavaCro'14 - Unit testing in AngularJS – Slaven Tomac
JavaCro'14 - Unit testing in AngularJS – Slaven TomacJavaCro'14 - Unit testing in AngularJS – Slaven Tomac
JavaCro'14 - Unit testing in AngularJS – Slaven Tomac
 
Full Stack Unit Testing
Full Stack Unit TestingFull Stack Unit Testing
Full Stack Unit Testing
 
Gradle For Beginners (Serbian Developer Conference 2013 english)
Gradle For Beginners (Serbian Developer Conference 2013 english)Gradle For Beginners (Serbian Developer Conference 2013 english)
Gradle For Beginners (Serbian Developer Conference 2013 english)
 
Grails 101
Grails 101Grails 101
Grails 101
 
Slickdemo
SlickdemoSlickdemo
Slickdemo
 
Designing REST API automation tests in Kotlin
Designing REST API automation tests in KotlinDesigning REST API automation tests in Kotlin
Designing REST API automation tests in Kotlin
 
Using Task Queues and D3.js to build an analytics product on App Engine
Using Task Queues and D3.js to build an analytics product on App EngineUsing Task Queues and D3.js to build an analytics product on App Engine
Using Task Queues and D3.js to build an analytics product on App Engine
 
Gradle - time for another build
Gradle - time for another buildGradle - time for another build
Gradle - time for another build
 
Spring data requery
Spring data requerySpring data requery
Spring data requery
 

Recently uploaded

PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 

Recently uploaded (20)

PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 

Testing in airflow

  • 1. Testing in Airflow Chandu Kavar Data Engineer Grab, Ex-ThoughtWorker @chandukavar
  • 2. Traditional Web App Testing We mostly write - ● Unit Tests ● Integration Tests ● Functional Tests Client Database Server Stress testing, snapshot testing etc. for complex and large scale application.
  • 3. ● Typos, cyclicity, mandatory parameters etc. in the DAGs ● No of tasks, nature of tasks in the DAG ● Upstream and downstream dependencies of each task ● Custom Operator, sensor etc. ● Communication between tasks such as xComs ● End-to-End pipeline What can we test in Airflow?
  • 4. Five Categories of tests we can write - 1. DAG validation tests 2. DAG/Pipeline definition tests 3. Unit tests for custom operators, sensor etc. 4. Integration tests 5. End-to-End pipeline tests
  • 5. Sample DAG dag = DAG('hello_world', description='Hello world example', schedule_interval='0 12 * * *', start_date=datetime(2017, 3, 20)) dummy_operator = DummyOperator(task_id='dummy_task', retries = 3, dag=dag) hello_operator = PythonOperator(task_id='hello_task', python_callable=print_hello, dag=dag) multiplyby5_operator = MultiplyBy5Operator(my_operator_param= 20, task_id='multiplyby5_task', dag=dag) dummy_operator >> hello_operator dummy_operator >> multiplyby5_operator def print_hello(): return 'Hello'
  • 6. DAG Validation Tests class TestDagIntegrity(unittest.TestCase): def setUp(self): self.dagbag = DagBag() def test_import_dags(self): self.assertFalse( len(self.dagbag.import_errors), 'DAG import failures. Errors: {}'.format( self.dagbag.import_errors ) ) def test_queue_present(self): for dag_id, dag in self.dagbag.dags.iteritems(): queue = dag.default_args.get('queue', None) msg = Queue not set for DAG {id}'.format(id=dag_id) self.assertNotEqual(None, queue, msg)
  • 7. DAG Definition Tests class TestHelloWorldDAG(unittest.TestCase): """Check HelloWorldDAG expectation""" def setUp(self): self.dagbag = DagBag() def test_task_count(self): """Check task count of hello_world dag""" dag_id='hello_world' dag = self.dagbag.get_dag(dag_id) self.assertEqual(len(dag.tasks), 3) def test_contain_tasks(self): """Check task contains in hello_world dag""" dag_id='hello_world' dag = self.dagbag.get_dag(dag_id) tasks = dag.tasks task_ids = list(map(lambda task: task.task_id, tasks)) self.assertListEqual(task_ids, ['dummy_task', 'multiplyby5_task','hello_task'])
  • 8. DAG Definition Tests def test_dependencies_of_dummy_task(self): """Check the task dependencies of dummy_task in hello_world dag""" dag_id='hello_world' dag = self.dagbag.get_dag(dag_id) dummy_task = dag.get_task('dummy_task') upstream_task_ids = list(map(lambda task: task.task_id, dummy_task.upstream_list)) self.assertListEqual(upstream_task_ids, []) downstream_task_ids = list(map(lambda task: task.task_id, dummy_task.downstream_list)) self.assertListEqual(downstream_task_ids, ['hello_task', 'multiplyby5_task'])
  • 9. Custom Operator class MultiplyBy5Operator(BaseOperator): @apply_defaults def __init__(self, my_operator_param, *args, **kwargs): self.operator_param = my_operator_param super(MultiplyBy5Operator, self).__init__(*args, **kwargs) def execute(self, context): log.info('operator_param: %s', self.operator_param) return (self.operator_param * 5) class MultiplyBy5Plugin(AirflowPlugin): name = "multiplyby5_plugin"
  • 10. Unit tests - Custom Operator class TestMultiplyBy5Operator(unittest.TestCase): def test_execute(self): dag = DAG(dag_id='anydag', start_date=datetime.now()) task = MultiplyBy5Operator(my_operator_param=10, dag=dag, task_id='anytask') ti = TaskInstance(task=task, execution_date=datetime.now()) result = task.execute(ti.get_template_context()) self.assertEqual(result, 50)
  • 12. Blog post on medium - https://medium.com/@chandukavar