This document discusses challenging issues and similarity measures for web document clustering. It begins with an introduction to text mining and document clustering. Some key challenges discussed include ambiguity in natural language, efficiently measuring semantic similarity between words, and cluster validity. Various string-based, term-based, and corpus-based similarity measures are then described that can be used for document clustering, including Jaro-Winkler distance, cosine similarity, latent semantic analysis, and pointwise mutual information. The conclusion states that accurate clustering requires a precise definition of similarity between document pairs.