Data mining involves analyzing large data sets to extract useful patterns and summarize data in comprehensible ways, facilitating knowledge discovery for commercial or scientific purposes. It encompasses various models, including predictive, explanatory, and summarization models, using diverse data types such as numeric, categorical, and relational data. The data mining pipeline consists of steps like data collection, preprocessing, mining, and post-processing, emphasizing the importance of data quality and structured storage.