This document discusses using Apache Spark and MLlib for text mining on big data. It outlines common text mining applications, describes how Spark and MLlib enable scalable machine learning on large datasets, and provides examples of text mining workflows and pipelines that can be built with Spark MLlib algorithms and components like tokenization, feature extraction, and modeling. It also discusses customizing ML pipelines and the Zeppelin notebook platform for collaborative data science work.