Testing is an integral part of any software system to build confidence and increase the reliability of the system. This deck covered different categories of tests that you can write for airflow. It includes DAG validation tests, pipeline definition tests, unit tests. Also, I will showcase all these tests using some examples.
2. Traditional Web App Testing
We mostly write -
● Unit Tests
● Integration Tests
● Functional Tests
Client
Database
Server
Stress testing, snapshot testing etc. for
complex and large scale application.
3. ● Typos, cyclicity, mandatory parameters etc. in the DAGs
● No of tasks, nature of tasks in the DAG
● Upstream and downstream dependencies of each task
● Custom Operator, sensor etc.
● Communication between tasks such as xComs
● End-to-End pipeline
What can we test in Airflow?
4. Five Categories of tests we can write -
1. DAG validation tests
2. DAG/Pipeline definition tests
3. Unit tests for custom operators, sensor etc.
4. Integration tests
5. End-to-End pipeline tests
6. DAG Validation Tests
class TestDagIntegrity(unittest.TestCase):
def setUp(self):
self.dagbag = DagBag()
def test_import_dags(self):
self.assertFalse(
len(self.dagbag.import_errors),
'DAG import failures. Errors: {}'.format(
self.dagbag.import_errors
)
)
def test_queue_present(self):
for dag_id, dag in self.dagbag.dags.iteritems():
queue = dag.default_args.get('queue', None)
msg = Queue not set for DAG {id}'.format(id=dag_id)
self.assertNotEqual(None, queue, msg)
7. DAG Definition Tests
class TestHelloWorldDAG(unittest.TestCase):
"""Check HelloWorldDAG expectation"""
def setUp(self):
self.dagbag = DagBag()
def test_task_count(self):
"""Check task count of hello_world dag"""
dag_id='hello_world'
dag = self.dagbag.get_dag(dag_id)
self.assertEqual(len(dag.tasks), 3)
def test_contain_tasks(self):
"""Check task contains in hello_world dag"""
dag_id='hello_world'
dag = self.dagbag.get_dag(dag_id)
tasks = dag.tasks
task_ids = list(map(lambda task: task.task_id, tasks))
self.assertListEqual(task_ids, ['dummy_task', 'multiplyby5_task','hello_task'])
8. DAG Definition Tests
def test_dependencies_of_dummy_task(self):
"""Check the task dependencies of dummy_task in hello_world dag"""
dag_id='hello_world'
dag = self.dagbag.get_dag(dag_id)
dummy_task = dag.get_task('dummy_task')
upstream_task_ids = list(map(lambda task: task.task_id, dummy_task.upstream_list))
self.assertListEqual(upstream_task_ids, [])
downstream_task_ids = list(map(lambda task: task.task_id, dummy_task.downstream_list))
self.assertListEqual(downstream_task_ids, ['hello_task', 'multiplyby5_task'])