In mid-2016, we introduced Structured Steaming, a new stream processing engine built on Spark SQL that revolutionized how developers can write stream processing application without having to reason about having to reason about streaming. It allows the user to express their streaming computations the same way you would express a batch computation on static data. The Spark SQL engine takes care of running it incrementally and continuously updating the final result as streaming data continues to arrive. It truly unifies batch, streaming and interactive processing in the same Datasets/DataFrames API and the same optimized Spark SQL processing engine.
The initial alpha release of Structured Streaming in Apache Spark 2.0 introduced the basic aggregation APIs and files as streaming source and sink. Since then, we have put in a lot of work to make it ready for production use. In this talk, Tathagata Das will cover in more detail about the major features we have added, the recipes for using them in production, and the exciting new features we have plans for in future releases. Some of these features are as follows:
- Design and use of the Kafka Source
- Support for watermarks and event-time processing
- Support for more operations and output modes
Speaker: Tathagata Das
This talk was originally presented at Spark Summit East 2017.