This document discusses clustering and anomaly detection in data science. It introduces the concept of clustering, which is grouping a set of data into clusters so that data within each cluster are more similar to each other than data in other clusters. The k-means clustering algorithm is described in detail, which works by iteratively assigning data to the closest cluster centroid and updating the centroids. Other clustering algorithms like k-medoids and hierarchical clustering are also briefly mentioned. The document then discusses how anomaly detection, which identifies outliers in data that differ from expected patterns, can be performed based on measuring distances between data points. Examples applications of anomaly detection are provided.
This document discusses clustering and anomaly detection in data science. It introduces the concept of clustering, which is grouping a set of data into clusters so that data within each cluster are more similar to each other than data in other clusters. The k-means clustering algorithm is described in detail, which works by iteratively assigning data to the closest cluster centroid and updating the centroids. Other clustering algorithms like k-medoids and hierarchical clustering are also briefly mentioned. The document then discusses how anomaly detection, which identifies outliers in data that differ from expected patterns, can be performed based on measuring distances between data points. Examples applications of anomaly detection are provided.