Successfully reported this slideshow.
Your SlideShare is downloading. ×

Stream Processing: Choosing the Right Tool for the Job

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 64 Ad

Stream Processing: Choosing the Right Tool for the Job

Download to read offline

Due to the increasing interest in real-time processing, many stream processing frameworks were developed. However, no clear guidelines have been established for choosing a framework for a specific use case. In this talk, two different scenarios are taken and the audience is guided through the thought process and questions that one should ask oneself when choosing the right tool. The stream processing frameworks that will be discussed are Spark Streaming, Structured Streaming, Flink and Kafka Streams.

The main questions are:

How much data does it need to process? (throughput)
Does it need to be fast? (latency)
Who will build it? (supported languages, level of API, SQL capabilities, built-in windowing and joining functionalities, etc)
Is accurate ordering important? (event time vs. processing time)
Is there a batch component? (integration of batch API)
How do we want it to run? (deployment options: standalone, YARN, mesos, …)
How much state do we have? (state store options) – What if a message gets lost? (message delivery guarantees, checkpointing).
For each of these questions, we look at how each framework tackles this and what the main differences are. The content is based on the PhD research of Giselle van Dongen in benchmarking stream processing frameworks in several scenarios using latency, throughput and resource utilization.

Due to the increasing interest in real-time processing, many stream processing frameworks were developed. However, no clear guidelines have been established for choosing a framework for a specific use case. In this talk, two different scenarios are taken and the audience is guided through the thought process and questions that one should ask oneself when choosing the right tool. The stream processing frameworks that will be discussed are Spark Streaming, Structured Streaming, Flink and Kafka Streams.

The main questions are:

How much data does it need to process? (throughput)
Does it need to be fast? (latency)
Who will build it? (supported languages, level of API, SQL capabilities, built-in windowing and joining functionalities, etc)
Is accurate ordering important? (event time vs. processing time)
Is there a batch component? (integration of batch API)
How do we want it to run? (deployment options: standalone, YARN, mesos, …)
How much state do we have? (state store options) – What if a message gets lost? (message delivery guarantees, checkpointing).
For each of these questions, we look at how each framework tackles this and what the main differences are. The content is based on the PhD research of Giselle van Dongen in benchmarking stream processing frameworks in several scenarios using latency, throughput and resource utilization.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Stream Processing: Choosing the Right Tool for the Job (20)

Advertisement

More from Databricks (20)

Recently uploaded (20)

Advertisement

Stream Processing: Choosing the Right Tool for the Job

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Giselle van Dongen, Stream processing: choosing the right tool for the job #UnifiedDataAnalytics #SparkAISummit
  3. 3. 3#UnifiedDataAnalytics #SparkAISummit ● ● ○ ○
  4. 4. Context 4#UnifiedDataAnalytics #SparkAISummit …
  5. 5. Context 5#UnifiedDataAnalytics #SparkAISummit
  6. 6. Context 6#UnifiedDataAnalytics #SparkAISummit
  7. 7. Disclaimer 7#UnifiedDataAnalytics #SparkAISummit a.k.a I will not pick a stream processing framework for you
  8. 8. Commonalities 8#UnifiedDataAnalytics #SparkAISummit ● ● ● ●
  9. 9. Imagine... 9#UnifiedDataAnalytics #SparkAISummit
  10. 10. 10#UnifiedDataAnalytics #SparkAISummit
  11. 11. 11#UnifiedDataAnalytics #SparkAISummit
  12. 12. Do we need stream processing? 12#UnifiedDataAnalytics #SparkAISummit …
  13. 13. Do we need stream processing? 13#UnifiedDataAnalytics #SparkAISummit …
  14. 14. 14#UnifiedDataAnalytics #SparkAISummit
  15. 15. 15#UnifiedDataAnalytics #SparkAISummit
  16. 16. 16#UnifiedDataAnalytics #SparkAISummit
  17. 17. How much data? ➔ ➔ 17#UnifiedDataAnalytics #SparkAISummit
  18. 18. How much data? 18#UnifiedDataAnalytics #SparkAISummit
  19. 19. Spark Flink Kafka Spark Struct Flink KafkaStruct
  20. 20. 20#UnifiedDataAnalytics #SparkAISummit
  21. 21. Does it need to be fast? 21#UnifiedDataAnalytics #SparkAISummit ➔ ➔
  22. 22. Does it need to be fast? 22#UnifiedDataAnalytics #SparkAISummit
  23. 23. Does it need to be fast? ● … ● ● 23#UnifiedDataAnalytics #SparkAISummit
  24. 24. Event-driven 24#UnifiedDataAnalytics #SparkAISummit https://www.cakesolutions.net/teamblogs/comparison-of-apache-stream-processing-frameworks-part-1
  25. 25. Micro-batching 25#UnifiedDataAnalytics #SparkAISummit https://www.cakesolutions.net/teamblogs/comparison-of-apache-stream-processing-frameworks-part-1
  26. 26. Does it need to be fast? ● 26#UnifiedDataAnalytics #SparkAISummit ➔ ➔
  27. 27. Spark Flink Kafka Spark Struct Flink KafkaStruct
  28. 28. 28#UnifiedDataAnalytics #SparkAISummit
  29. 29. 29#UnifiedDataAnalytics #SparkAISummit Performance Advanced features Deployment & Internals
  30. 30. Who will build it? 30#UnifiedDataAnalytics #SparkAISummit
  31. 31. Who will build it? 31#UnifiedDataAnalytics #SparkAISummit
  32. 32. Spark Flink Kafka Spark Struct Flink KafkaStruct
  33. 33. 33#UnifiedDataAnalytics #SparkAISummit Performance Advanced features Deployment & Internals
  34. 34. Is accurate ordering important? 34#UnifiedDataAnalytics #SparkAISummit 4 7 10 window 1 window 2 window 3 8
  35. 35. Is accurate ordering important? 35#UnifiedDataAnalytics #SparkAISummit 4 7 10 window 1 window 2 window 3 88
  36. 36. 36#UnifiedDataAnalytics #SparkAISummit 4 7 10 window 1 window 2 window 3 88 window 1 window 2 window 3 Is accurate ordering important?
  37. 37. Is accurate ordering important? 37#UnifiedDataAnalytics #SparkAISummit
  38. 38. Is accurate ordering important? 38#UnifiedDataAnalytics #SparkAISummit ➔ ➔
  39. 39. Spark Flink Kafka Spark Struct Flink KafkaStruct
  40. 40. 40#UnifiedDataAnalytics #SparkAISummit Performance Advanced features Deployment & Internals
  41. 41. What is the ecosystem like? • • • 41#UnifiedDataAnalytics #SparkAISummit
  42. 42. What is the ecosystem like? 42#UnifiedDataAnalytics #SparkAISummit
  43. 43. Spark Flink Kafka Spark Struct Flink KafkaStruct
  44. 44. 44#UnifiedDataAnalytics #SparkAISummit
  45. 45. 45#UnifiedDataAnalytics #SparkAISummit
  46. 46. How do we want it to run? 46#UnifiedDataAnalytics #SparkAISummit W W W W W M T T T T T
  47. 47. Spark Flink Kafka Spark Struct Flink KafkaStruct
  48. 48. 48#UnifiedDataAnalytics #SparkAISummit
  49. 49. What if a message gets lost? 49#UnifiedDataAnalytics #SparkAISummit
  50. 50. What if a message gets lost? • • • 50#UnifiedDataAnalytics #SparkAISummit
  51. 51. What if a message gets lost? • •
  52. 52. W1 W3 W4 W2 W4 What if a message gets lost? 52#UnifiedDataAnalytics #SparkAISummit State State State State State HDFS ref. Flink Forward 2018 Best practices for state and time, Tzu-Li Tai
  53. 53. W1 W3 W4 W2 W4 What if a message gets lost? 53#UnifiedDataAnalytics #SparkAISummit State State State State State HDFS ref. Flink Forward 2018 Best practices for state and time, Tzu-Li Tai
  54. 54. W1 W4 W2 W4 W3 What if a message gets lost? 54#UnifiedDataAnalytics #SparkAISummit State State State State State HDFS ref. Flink Forward 2018 Best practices for state and time, Tzu-Li Tai
  55. 55. What if a message gets lost? ● ○ ○ ○ ● ○ ○ ○ ○
  56. 56. What if a message gets lost? 56#UnifiedDataAnalytics #SparkAISummit
  57. 57. What if a message gets lost? 57#UnifiedDataAnalytics #SparkAISummit
  58. 58. Spark Flink Kafka Spark Struct Flink KafkaStruct
  59. 59. Spark Flink Kafka SparkStruct Struct Flink Kafka
  60. 60. 60#UnifiedDataAnalytics #SparkAISummit
  61. 61. 61#UnifiedDataAnalytics #SparkAISummit
  62. 62. Want to know more? 62#UnifiedDataAnalytics #SparkAISummit
  63. 63. 63#UnifiedDataAnalytics #SparkAISummit THANK YOU! Do you want to work with these tools? We are hiring!
  64. 64. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

×