The document introduces data preprocessing techniques for data mining. It discusses why data preprocessing is important due to real-world data often being dirty, incomplete, noisy, inconsistent or duplicate. It then describes common data types and quality issues like missing values, noise, outliers and duplicates. The major tasks of data preprocessing are outlined as data cleaning, integration, transformation and reduction. Specific techniques for handling missing values, noise, outliers and duplicates are also summarized.