Stream processing is getting more & more important in our data-centric systems. In the world of Big Data, batch processing is not enough anymore - everyone needs interactive, real-time analytics for making critical business decisions, as well as providing great features to the customers.
There are many stream processing frameworks available nowadays, but the cost of provisioning infrastructure and maintaining distributed computations is usually very high. Sometimes you just have to satisfy some specific requirements, like using HDFS or YARN.
Apache Kafka is de facto a standard for building data pipelines. Kafka Streams is a lightweight library (available since 0.10) that uses powerful Kafka abstractions internally and doesn't require any complex setup or special infrastructure - you just deploy it like any other regular application.
In this session I want to talk about the goals behind stream processing, basic techniques and some best practices. Then I'm going to explain main fundamental concepts behind Kafka and explore Kafka Streams syntax and streaming features. By the end of the session you'll be able to write stream processing applications in your domain, especially if you already use Kafka as your data pipeline.