This document discusses building data pipelines with Spark and StreamSets. It describes how StreamSets Data Collector can be used to build pipelines that run on Spark today by leveraging Kafka RDDs and containers on Spark. It also outlines future directions for deeper Spark integration, including running pipelines on Databricks and developing a standalone Spark processor. The document concludes with a demo of StreamSets Data Collector capabilities.