This document discusses data preprocessing techniques for transforming raw data into an understandable format. It describes measures for data quality such as accuracy, completeness, and consistency. The major tasks in data preprocessing are outlined as data cleaning, integration, reduction, transformation, and discretization. Data cleaning involves handling missing values, noise, and inconsistencies. Data integration merges data from multiple sources to reduce redundancies and inconsistencies. Data reduction techniques include aggregation, attribute selection, and dimensionality reduction to obtain a smaller data representation. Data transformation consolidates data into appropriate forms for mining through techniques like smoothing, aggregation, generalization, and normalization. Data discretization divides continuous attributes into intervals to reduce data size and prepare for further analysis.