The concept of stream processing has been around for a while and most software systems continuously transform streams of inputs into streams of outputs. Yet the idea of directly modeling stream processing in infrastructure systems is just coming into its own after a few decades on the periphery.
At its core, stream processing is simple: read data in, process it, and maybe emit some data out. So why are there so many stream processing frameworks that all define their own terminology? And are the components of each even comparable? Why do I need to know about spouts or DStreams just to process a simple sequence of records? Depending on your application’s requirements, you may not need a framework.
This talk will be delivered by one of the creators of the popular stream data systems Apache Kafka and will abstract away the details of individual frameworks while describing the key features they provide. These core features include scalability and parallelism through data partitioning, fault tolerance and event processing order guarantees, support for stateful stream processing, and handy stream processing primitives such as windowing. Based on our experience building and scaling Kafka to handle streams that captured hundreds of billions of records per day — this presentation will help you understand how to map practical data problems to stream processing and how to write applications that process streams of data at scale.