Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Stream processing systems comparison
Yangjun Wang
Department of Information and Computer Science
Aalto University, School ...
Stream processing systems comparison
January 20, 2016
2/15
Introduction
Process model of many big data applications are ch...
Stream processing systems comparison
January 20, 2016
3/15
Introduction
Process model of many big data applications are ch...
Stream processing systems comparison
January 20, 2016
4/15
Comparison
Processing model
Storm and Flink are real stream pro...
Stream processing systems comparison
January 20, 2016
5/15
Comparison (cnt.)
Latency of WordCount – skewed data
Flink: aro...
Stream processing systems comparison
January 20, 2016
6/15
Comparison (cnt.)
Usage of Spark and Flink
Flink and Spark prov...
Stream processing systems comparison
January 20, 2016
7/15
Comparison (cnt.)
Usage
More work need be done in storm applica...
Stream processing systems comparison
January 20, 2016
8/15
Example
Problem
There are two streams: advertisement(advId, sho...
Stream processing systems comparison
January 20, 2016
9/15
Example
Problem
There are two streams: advertisement(advId, sho...
Stream processing systems comparison
January 20, 2016
10/15
Example (cnt.)
Problems of Flink
1. Flink only provides join o...
Stream processing systems comparison
January 20, 2016
11/15
Example (cnt.)
Problems of Flink
1. Flink only provides join o...
Stream processing systems comparison
January 20, 2016
12/15
Example (cnt.)
Problems of Spark
1. Spark doesn’t support even...
Stream processing systems comparison
January 20, 2016
13/15
Example (cnt.)
Problems of Spark
1. Spark doesn’t support even...
Stream processing systems comparison
January 20, 2016
14/15
Summary
Comparison summary table:
Storm Spark Flink
Model stre...
Stream processing systems comparison
January 20, 2016
15/15
Thanks
Upcoming SlideShare
Loading in …5
×

Stream processing comparison

278 views

Published on

Comparison among Storm, Flink streaming and Spark streaming

  • Be the first to comment

  • Be the first to like this

Stream processing comparison

  1. 1. Stream processing systems comparison Yangjun Wang Department of Information and Computer Science Aalto University, School of Science yangjun.wang@aalto.fi January 20, 2016
  2. 2. Stream processing systems comparison January 20, 2016 2/15 Introduction Process model of many big data applications are changed from batch processing to stream processing batch processing has advantages in throughput, while latency of stream processing is much shorter stream processing could get very high throughput too
  3. 3. Stream processing systems comparison January 20, 2016 3/15 Introduction Process model of many big data applications are changed from batch processing to stream processing batch processing has advantages in throughput, while latency of stream processing is much shorter stream processing could get very high throughput too Widely used stream processing systems: Storm, Spark streaming, Flink, Samza
  4. 4. Stream processing systems comparison January 20, 2016 4/15 Comparison Processing model Storm and Flink are real stream processing which process record one by one Spark streaming is micro-batch which process very small batches continuously Storm’s Trident also provides micro-batch API Throughput of WordCount – skewed data Flink – 300K/s (4 cores, 15 GB ROM) Storm(ack enabled) – 5K/s node Spark stream – (250 ∼ 2500(batch))K/s
  5. 5. Stream processing systems comparison January 20, 2016 5/15 Comparison (cnt.) Latency of WordCount – skewed data Flink: around 50ms (90%) Storm: around 55ms (90%) Spark: 1s ∼ ... (depends on interval)
  6. 6. Stream processing systems comparison January 20, 2016 6/15 Comparison (cnt.) Usage of Spark and Flink Flink and Spark provide many high-level operations which could be used easily as: stream1.flatMap(...) .mapToPair(...) .reduceByKey(...) Usage of Storm In storm applications, we need define stream sources(spout) all process logic(bolt) by ourselves.
  7. 7. Stream processing systems comparison January 20, 2016 7/15 Comparison (cnt.) Usage More work need be done in storm applications, but we get more flexibility. Flink provides low-level operators which are similar to Storm Bolts such as OneInputStreamOperator, TwoInputStreamOperator. These operators are not too complex to use. Spark streaming low-level operators are a little hard to use. Spark streaming could also lose some ability because of micro-batch processing model.
  8. 8. Stream processing systems comparison January 20, 2016 8/15 Example Problem There are two streams: advertisement(advId, shownTime) and click(advId, clickTime). How to get a stream that contains all clicked advertisements (advId, shownTime, clickTime) which are clicked in 10 minutes after shown?
  9. 9. Stream processing systems comparison January 20, 2016 9/15 Example Problem There are two streams: advertisement(advId, shownTime) and click(advId, clickTime). How to get a stream that contains all clicked advertisements (advId, shownTime, clickTime) which are clicked in 10 minutes after shown? Solution of Storm Implement a bolt which receives records from two spouts, cache records and do join operation
  10. 10. Stream processing systems comparison January 20, 2016 10/15 Example (cnt.) Problems of Flink 1. Flink only provides join operation on the same window 2. Window without slides will cause data missing 3. Window with slides could introduce duplicate data
  11. 11. Stream processing systems comparison January 20, 2016 11/15 Example (cnt.) Problems of Flink 1. Flink only provides join operation on the same window 2. Window without slides will cause data missing 3. Window with slides could introduce duplicate data Solution of Flink Implement a join operator extend TwoInputStreamOperator which is similar to WindowOperator. The self-implemented operator is similar to storm solution at some point.
  12. 12. Stream processing systems comparison January 20, 2016 12/15 Example (cnt.) Problems of Spark 1. Spark doesn’t support event time join and watermark 2. Similar problems with Flink(2, 3)
  13. 13. Stream processing systems comparison January 20, 2016 13/15 Example (cnt.) Problems of Spark 1. Spark doesn’t support event time join and watermark 2. Similar problems with Flink(2, 3) Solution of Spark advertisement.window(11 mins, 1min) .join(click.window(1min, 1min)) .filter(...) Issues Spark only supports join on processing time Filter operations is base on event time Data missing if advertisement records arrive later(delay)
  14. 14. Stream processing systems comparison January 20, 2016 14/15 Summary Comparison summary table: Storm Spark Flink Model stream micro-batch stream Throughput low high high Latency low high low Usage complex easy easy Flexible very flexible flexible inflexible
  15. 15. Stream processing systems comparison January 20, 2016 15/15 Thanks

×