This document discusses various techniques for document clustering and retrieval, including cosine similarity, k-means clustering, hierarchical clustering, and the EM algorithm. Cosine similarity measures the similarity between document vectors based on the angle between them. K-means clustering partitions documents into k clusters to minimize intra-cluster similarity, while hierarchical clustering merges clusters in a dendogram based on similarity. The EM algorithm computes maximum likelihood estimates of document distributions. Evaluation of clustering assesses the quality based on intra-class and inter-class similarity.