This document provides an overview of key concepts related to data and data preprocessing. It defines data as a collection of objects and their attributes. Attributes can be nominal, ordinal, interval, or ratio. Data can take the form of records, graphs, ordered sequences, or other types. The document discusses attribute values, data quality issues like noise, outliers, and missing values. It also covers common preprocessing techniques like aggregation, sampling, dimensionality reduction, feature selection and creation, and discretization. Finally, it introduces concepts of similarity and dissimilarity measures between data objects.