This document contains the slides from a presentation by Matt Dowle on working with messy data. It discusses Dowle's experiences over his career cleaning and munging data from various financial institutions to prepare it for machine learning models. It highlights some of the common issues encountered, such as data entry errors, id changes, missing values. It also showcases tools that can be used to help with pre-processing large datasets directly on the command line. Finally, it discusses how the size and scale of data has increased significantly over time, to the point where datasets can now easily exceed RAM sizes on standard machines.