This document describes how to perform latent semantic analysis (LSA) on Wikipedia data using Apache Spark. It discusses parsing Wikipedia XML data, creating a term-document matrix, applying singular value decomposition to reduce the matrix's rank, and interpreting the results to find concepts and related documents. Key steps include parsing Wikipedia pages into terms and documents, cleaning the data through lemmatization and removing stop words, creating a tf-idf weighted term-document matrix, applying SVD to get U, S, and V matrices, and using these to find terms and documents most strongly related to given queries.