We've updated our privacy policy. Click here to review the details. Tap here to review the details.
Activate your 30 day free trial to unlock unlimited reading.
Activate your 30 day free trial to continue reading.
Download to read offline
Due to the increasing interest in real-time processing, many stream processing frameworks were developed. However, no clear guidelines have been established for choosing a framework for a specific use case. In this talk, two different scenarios are taken and the audience is guided through the thought process and questions that one should ask oneself when choosing the right tool. The stream processing frameworks that will be discussed are Spark Streaming, Structured Streaming, Flink and Kafka Streams.
The main questions are:
How much data does it need to process? (throughput)
Does it need to be fast? (latency)
Who will build it? (supported languages, level of API, SQL capabilities, built-in windowing and joining functionalities, etc)
Is accurate ordering important? (event time vs. processing time)
Is there a batch component? (integration of batch API)
How do we want it to run? (deployment options: standalone, YARN, mesos, …)
How much state do we have? (state store options) – What if a message gets lost? (message delivery guarantees, checkpointing).
For each of these questions, we look at how each framework tackles this and what the main differences are. The content is based on the PhD research of Giselle van Dongen in benchmarking stream processing frameworks in several scenarios using latency, throughput and resource utilization.
Due to the increasing interest in real-time processing, many stream processing frameworks were developed. However, no clear guidelines have been established for choosing a framework for a specific use case. In this talk, two different scenarios are taken and the audience is guided through the thought process and questions that one should ask oneself when choosing the right tool. The stream processing frameworks that will be discussed are Spark Streaming, Structured Streaming, Flink and Kafka Streams.
The main questions are:
How much data does it need to process? (throughput)
Does it need to be fast? (latency)
Who will build it? (supported languages, level of API, SQL capabilities, built-in windowing and joining functionalities, etc)
Is accurate ordering important? (event time vs. processing time)
Is there a batch component? (integration of batch API)
How do we want it to run? (deployment options: standalone, YARN, mesos, …)
How much state do we have? (state store options) – What if a message gets lost? (message delivery guarantees, checkpointing).
For each of these questions, we look at how each framework tackles this and what the main differences are. The content is based on the PhD research of Giselle van Dongen in benchmarking stream processing frameworks in several scenarios using latency, throughput and resource utilization.
You just clipped your first slide!
Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips.The SlideShare family just got bigger. Enjoy access to millions of ebooks, audiobooks, magazines, and more from Scribd.
Cancel anytime.Unlimited Reading
Learn faster and smarter from top experts
Unlimited Downloading
Download to take your learnings offline and on the go
You also get free access to Scribd!
Instant access to millions of ebooks, audiobooks, magazines, podcasts and more.
Read and listen offline with any device.
Free access to premium services like Tuneln, Mubi and more.
We’ve updated our privacy policy so that we are compliant with changing global privacy regulations and to provide you with insight into the limited ways in which we use your data.
You can read the details below. By accepting, you agree to the updated privacy policy.
Thank you!