Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine

3,301 views

Published on

For many businesses, the batch-oriented architecture of Big Data–where data is captured in large, scalable stores, then processed later–is simply too slow: a new breed of “Fast Data” architectures has evolved to be stream-oriented, where data is processed as it arrives, providing businesses with a competitive advantage.

There are many stream processing tools, so which ones should you choose? It helps to consider several factors in the context of your applications:

* Low latency: How low is necessary?
* High volume: How high is required?
* Integration with other tools: Which ones and how?
* Data processing: What kinds? In bulk? As individual events?

In this talk by Dean Wampler, PhD., VP of Fast Data Engineering at Lightbend, we’ll look at the criteria you need to consider when selecting technologies, plus specific examples of how four streaming tools–Akka Streams, Kafka Streams, Apache Flink and Apache Spark serve particular needs and use cases when working with continuous streams of data.

Published in: Software
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine

  1. 1. Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine WEBINAR Dean Wampler, Ph.D (@deanwampler) VP of Fast Data Engineering
  2. 2. Upgrade your grey matter!
 Get the free O’Reilly book by Dr. Dean Wampler, 
 VP of Fast Data Engineering at Lightbend bit.ly/lightbend-fast-data
  3. 3. Mesos, YARN, Cloud, … Logs Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaEa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam Streaming architecture from the book
  4. 4. Mesos, YARN, Cloud, … Logs Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaEa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam Streams of data are ingested in Kafka
  5. 5. Mesos, YARN, Cloud, … Logs Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaEa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam Kafka - the data backplane
  6. 6. Mesos, YARN, Cloud, … Logs Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaEa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam REST requests to microservices
  7. 7. Mesos, YARN, Cloud, … Logs Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaEa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam Microservices for CEP, other services
  8. 8. Mesos, YARN, Cloud, … Logs Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaEa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam Metadata management, master orchestration
  9. 9. Mesos, YARN, Cloud, … Logs Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaEa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam “Bring you own persistence”
  10. 10. Mesos, YARN, Cloud, … Logs Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaEa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam Platform agnostic (mostly)
  11. 11. Mesos, YARN, Cloud, … Logs Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaEa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam Finally, the streaming engines
  12. 12. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Requirements
  13. 13. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Latency: How low?
  14. 14. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Latency: How low? • < 1 microsecond? • Custom hardware • “Kernel bypass” networking • Custom C++ code
  15. 15. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Latency: How low? • < 100 microseconds? • Fast JVM messaging systems • Akka Actors • LMAX Disruptor
  16. 16. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Latency: How low? • < 10 milliseconds? • Fast streaming engines • Akka Streams • Flink • Kafka Streams
  17. 17. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Latency: How low? • ~ 1 second? • Window processing • Spark Streaming • Explicit windowing in the others…
  18. 18. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Latency: How low? • > many seconds to minutes? • Run separate batch jobs!
  19. 19. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Volume: How high?
  20. 20. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Volume: How high? • < 10,000 events/second • REST is fine • Especially if you want to do complex event processing
  21. 21. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Volume: How high? • < 100,000 events/second • Nonblocking REST! • Can still do complex event processing • E.g., using Akka Actors
  22. 22. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Volume: How high? • 1,000,000s events/second • Flink or Spark • Use bulk processing for greater efficiency.
  23. 23. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Kinds of analytics?
  24. 24. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Kinds of analytics? • SQL? • ETL? • Aggregations? • Machine learning training/ scoring?
  25. 25. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Our Favorites • Based on • coverage of requirements • spectrum of features • vibrant, active communities
  26. 26. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Apache Beam • Based on Google Dataflow • Sophisticated streaming semantics • Requires a “runner”
  27. 27. 0 Time (minutes) 1 2 3 … Analysis Server 1 Server 2 accumulate 1 1 2 2 2 2 2 2 1 1 2 2 1 1 1 … Key Collect data, Then process accumulate n Event at Server n propagated to Analysis
  28. 28. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Flink • High volume • Low latency • Beam runner • Can run batch jobs, too • Evolving SQL & ML
  29. 29. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Akka Streams • Low latency • Complex event processing • Might implement Beam • Efficient, per-event processing • Mid-volume, complex pipes
  30. 30. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Kafka Streams • Low overhead • Read/write Kafka topics • Ideal for • ETL (“KStreams”) • aggregations (“KTables”)
  31. 31. ogs ckets REST Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaEa Cluster Ka9a 2 4 7 8 9 10 Beam Spark Streaming • Mini-batch model • > ~0.5 sec latency • Evolving to low-latency • Rich SQL, ML options • Beam support under dev.
  32. 32. Upgrade your grey matter!
 Get the free O’Reilly book by Dr. Dean Wampler, 
 VP of Fast Data Engineering at Lightbend bit.ly/lightbend-fast-data
  33. 33. lightbend.com/fast-data-platform

×