This document discusses various techniques for data preprocessing including data cleaning, integration, transformation, reduction, and discretization.
Data cleaning involves filling in missing values, smoothing noisy data, identifying outliers, and resolving inconsistencies. Data integration combines data from multiple sources by integrating schemas and resolving value conflicts. Data transformation techniques include normalization, aggregation, generalization, and smoothing.
Data reduction aims to reduce the volume of data while maintaining similar analytical results. This includes data cube aggregation, dimensionality reduction by removing unimportant attributes, data compression, and discretization which converts continuous attributes to categorical bins.