Wait! Exclusive 60 day trial to the world's largest digital library.
The SlideShare family just got bigger. You now have unlimited* access to books, audiobooks, magazines, and more from Scribd.Cancel anytime.
It is not fast enough! That is one of the more common responses to a data engineer when putting a data pipeline in production. It is easy to dig down into the code and try to optimize it. My experience as a data engineer shows me that it is often easier and more efficient, both in time spent and outcome, to focus on a more holistic view of the pipeline.
In this talk, we will look at a structured process to optimize our batch pipelines. We will introduce steps that make our process data-driven instead of a gut feeling. With examples from real-world cases where delivery time was reduced in order by magnitude, we will look at actions where taken.
The intended audience is a beginner to intermediate data engineers. After the talk, you will have a better understanding of how to optimize your pipeline and be able to explain the steps taken for a stakeholder. You will know:
* what metrics to look at
* how to visualize the metrics
* how to detect bottlenecks and other time thieves from the metrics
* what actions to take.