This document provides an overview of data preprocessing techniques. It discusses why preprocessing is important, including that real-world data is often dirty, incomplete, noisy, and inconsistent. The major tasks of preprocessing are described as data cleaning, integration, transformation, reduction, and discretization. Specific techniques for handling missing data, noisy data, and reducing redundancy are also summarized.