Distributed stream processing is one of the hot topics in big data analytics today. An increasing number of applications are shifting from traditional static data sources to processing the incoming data in real-time. Performing large scale stream processing or analysis requires specialized tools and techniques which have become publicly available in the last couple of years.
This talk will give a deep, technical overview of the top-level Apache stream processing landscape. We compare several frameworks including Spark, Storm, Samza and Flink. Our goal is to highlight the strengths and weaknesses of the individual systems in a project-neutral manner to help selecting the best tools for the specific applications. We will touch on the topics of API expressivity, runtime architecture, performance, fault-tolerance and strong use-cases for the individual frameworks.