Learn how to leverage new workflow management tools to simplify complex data pipelines and ETL jobs spanning multiple systems. In this technical deep dive from Treasure Data, company founder and chief architect walks through the codebase of DigDag, our recently open-sourced workflow management project. He shows how workflows can break large, error-prone SQL statements into smaller blocks that are easier to maintain and reuse. He also demonstrates how a system using ‘last good’ checkpoints can save hours of computation when restarting failed jobs and how to use standard version control systems like Github to automate data lifecycle management across Amazon S3, Amazon EMR, Amazon Redshift, and Amazon Aurora. Finally, you see a few examples where SQL-as-pipeline-code gives data scientists both the right level of ownership over production processes and a comfortable abstraction from the underlying execution engines. This session is sponsored by Treasure Data.
AWS Competency Partner