Presented at the 2016 Phoenix Data Conference (phxdataconference.com)
While working with Hadoop, you'll eventually encounter the need to schedule and run workflows to perform various operations like ingesting data or performing ETL. There are a number of tools available to assist you with this type of requirement and one such tool that we at Clairvoyant have been looking to use is Apache Airflow. Apache Airflow is an Apache Incubator project that allows you to programmatically create workflows through a python script. This provides a flexible and effective way to design your workflows with little code and setup. In this talk, we will discuss Apache Airflow and how we at Clairvoyant have utilized it for ETL pipelines on Hadoop.