This document discusses how StreamSets and Spark can be used together for analytics insights in retail. Some key points:
- StreamSets Data Collector (SDC) is used to ingest IoT and sensor data from various sources into a common format and process the data in real-time using Spark evaluators.
- The Spark evaluator allows running Spark transformations on batches of data within SDC pipelines to do tasks like anomaly detection, sentiment analysis, and fraud detection.
- SDC can also be used to move data to and from Spark for end-of-batch processing using a Spark executor, such as running jobs on Databricks after files land in S3.
- Together, SDC and Spark