Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Executive Briefing: What Is Fast Data And Why Is It Important

3,577 views

Published on

[About This Webinar]
Streaming data systems, so called Fast Data, promise accelerated access to information, leading to new innovations and competitive advantages. These systems, however, aren’t just faster versions of Big Data; they force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices.

This means new challenges for your organization. Whereas a batch job might run for hours, a stream processing application might run for weeks or months. This raises the bar for making these systems resilient against traffic spikes, hardware and network failures, and so forth. The good news is that there is a strong history of facing these demands in the world of microservices.

In this webinar by Dr. Dean Wampler, VP of Fast Data Architecture at Lightbend, Inc., we will cut through the buzz around Fast Data and explore how to successfully exploit this new opportunity for innovation in how your organization leverages data. Specifically, Dean will review:

* The business justification for transitioning from batch-oriented big data to stream-oriented fast data
* The architectural and organizational changes that streaming systems require to meet their higher demands for reliability, resiliency, dynamic scalability, etc.
* How some of these requirements can be met by leveraging what your organization already knows about microservice architectures

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Executive Briefing: What Is Fast Data And Why Is It Important

  1. 1. © Dean Wampler & Lightbend, 2018
  2. 2. lightbend.com/fast-data-platform 2nd edition coming in October! Based on this report
  3. 3. What We’ll Discuss
  4. 4. • Why streaming? Why now? • How should I choose technologies? • How will this impact the rest of my organization? What We’ll Discuss
  5. 5. Why Streaming?
  6. 6. Why Streaming? • New opportunities that require streaming • Upgrading batch applications for competitive advantage
  7. 7. Real-time marketing based on behavior, location, inventory levels, product promotions, etc. Real-time Personalization Drive better business outcomes through real- time risk, fraud detection, compliance, audit, governance, etc. Real-time Financial Processes Predictive Analytics Apply ML models to large volumes of device data to pre-empt failures / outages More at: https://www.lightbend.com/customers Fast Data Use Cases Real-time consumer and industrial Device and Supply Chain management at scale IoT Similar IoT Architectures
  8. 8. • ML models applied to device telemetry to detect anomalies • Preemptive maintenance prevents potential failures that would impact users Predictive Analytics
  9. 9. Predictive Analytics - Core Idea Anomaly Detection: Model Anomaly Handler Telemetry Records Probable Anomalies Corrective Actions Ingest telemetry from edge devices. Train models to look for anomalies… and score incoming telemetry. Handle anomaly: move activity off component, schedule maintenance window to replace it.
  10. 10. Datacenter Mini-batch, Batch Spark Low Latency Microservices Ka:a Streams Akka Streams … Persistence Ka:a Sessions Streams Storage Telemetry1 Model Training 2 Model Storage 3 Data Pipeline: Model Serving 4 Ingest Scores5 Corrective Action 6 Spark Microservice Microservice Microservice Device Session Microservices Example Architecture
  11. 11. Datacenter Mini-batch, Batch Spark Low Latency Microservices Ka:a Streams Akka Streams … Persistence Ka:a Sessions Streams Storage Telemetry1 Model Training 2 Model Storage 3 Data Pipeline: Model Serving 4 Ingest Scores5 Corrective Action 6 Spark Microservice Microservice Microservice Device Session Microservices Example Architecture Session management, RESTful microservices Model Scoring Model Training Three groups of functionality
  12. 12. Datacenter Mini-batch, Batch Spark Low Latency Microservices Ka:a Streams Akka Streams … Persistence Ka:a Sessions Streams Storage Telemetry1 Model Training 2 Model Storage 3 Data Pipeline: Model Serving 4 Ingest Scores5 Corrective Action 6 Spark Microservice Microservice Microservice Device Session Microservices Ingest device telemetry to Kafka Example Architecture
  13. 13. Datacenter Mini-batch, Batch Spark Low Latency Microservices Ka:a Streams Akka Streams … Persistence Ka:a Sessions Streams Storage Telemetry1 Model Training 2 Model Storage 3 Data Pipeline: Model Serving 4 Ingest Scores5 Corrective Action 6 Spark Microservice Microservice Microservice Device Session Microservices Example Architecture Read telemetry into a periodic Spark job for model training to detect anomalies Large data volume, Long latency (seconds-days)
  14. 14. Datacenter Mini-batch, Batch Spark Low Latency Microservices Ka:a Streams Akka Streams … Persistence Ka:a Sessions Streams Storage Telemetry1 Model Training 2 Model Storage 3 Data Pipeline: Model Serving 4 Ingest Scores5 Corrective Action 6 Spark Microservice Microservice Microservice Device Session Microservices Updated model parameters are written back to Kafka in a new topic Example Architecture
  15. 15. Datacenter Mini-batch, Batch Spark Low Latency Microservices Ka:a Streams Akka Streams … Persistence Ka:a Sessions Streams Storage Telemetry1 Model Training 2 Model Storage 3 Data Pipeline: Model Serving 4 Ingest Scores5 Corrective Action 6 Spark Microservice Microservice Microservice Device Session Microservices Updated model parameters are also written to longer-term storage (e.g., for system restarts) Example Architecture
  16. 16. Datacenter Mini-batch, Batch Spark Low Latency Microservices Ka:a Streams Akka Streams … Persistence Ka:a Sessions Streams Storage Telemetry1 Model Training 2 Model Storage 3 Data Pipeline: Model Serving 4 Ingest Scores5 Corrective Action 6 Spark Microservice Microservice Microservice Device Session Microservices a) Ingest model parameters to update model in the low-latency microservices for model serving Example Architecture
  17. 17. Datacenter Mini-batch, Batch Spark Low Latency Microservices Ka:a Streams Akka Streams … Persistence Ka:a Sessions Streams Storage Telemetry1 Model Training 2 Model Storage 3 Data Pipeline: Model Serving 4 Ingest Scores5 Corrective Action 6 Spark Microservice Microservice Microservice Device Session Microservices b) Ingest the telemetry data to score it, looking for anomalies Small data volume, Low latency (milliseconds-…) Example Architecture
  18. 18. Datacenter Mini-batch, Batch Spark Low Latency Microservices Ka:a Streams Akka Streams … Persistence Ka:a Sessions Streams Storage Telemetry1 Model Training 2 Model Storage 3 Data Pipeline: Model Serving 4 Ingest Scores5 Corrective Action 6 Spark Microservice Microservice Microservice Device Session Microservices Write the detected anomalies back to Kafka in a new topic Example Architecture
  19. 19. Datacenter Mini-batch, Batch Spark Low Latency Microservices Ka:a Streams Akka Streams … Persistence Ka:a Sessions Streams Storage Telemetry1 Model Training 2 Model Storage 3 Data Pipeline: Model Serving 4 Ingest Scores5 Corrective Action 6 Spark Microservice Microservice Microservice Device Session Microservices Read anomaly information into microservices that manage the devices remotely Example Architecture
  20. 20. Datacenter Mini-batch, Batch Spark Low Latency Microservices Ka:a Streams Akka Streams … Persistence Ka:a Sessions Streams Storage Telemetry1 Model Training 2 Model Storage 3 Data Pipeline: Model Serving 4 Ingest Scores5 Corrective Action 6 Spark Microservice Microservice Microservice Device Session Microservices Take corrective action, e.g., upload more information, download patches, control- alt-delete, … Example Architecture
  21. 21. • Network overhead for telemetry ingestion too high? • Model serving latency too long? • Datacenter unavailable? Challenges Mini-batch, Batch Spark Low Latency Microservices Ka9a Streams Akka Streams … Persistence Ka9a Sessions Streams Storage Telemetry1 Model Training 2 Model Storage 3 Data Pipeline: Model Serving 4 Ingest Scores5 Corrective Action 6 Spark Microservice Microservice Microservice Device Session Microservices • Idea: Serve models on the device!
  22. 22. Internet of Things • Real-time consumer and industrial device and supply chain management at scale
  23. 23. Datacenter Mini-batch, Batch Spark Low Latency Microservices Ka:a Streams Akka Streams … Persistence Ka:a Sessions Streams Storage Telemetry1 Model Training 2 Model Storage 3 Data Pipeline: Model Serving 4 Ingest Scores5 Corrective Action 6 Spark Microservice Microservice Microservice Device Session Microservices Example Architecture What we just discussed…
  24. 24. Datacenter Mini-batch, Batch Spark Persistence Ka6a Telemetry1 Model Training 2 Model Storage 3 Ingest Model4 Push Model 5 Spark Microservice Microservice Microservice Device Session MicroservicesCorrective Action 6 Sessions Streams Storage Scoring MicroserviceScoring MicroserviceScoring Microservice Edge Scoring Example Architecture Alternative: model serving on the edge device
  25. 25. Datacenter Mini-batch, Batch Spark Persistence Ka6a Telemetry1 Model Training 2 Model Storage 3 Ingest Model4 Push Model 5 Spark Microservice Microservice Microservice Device Session MicroservicesCorrective Action 6 Sessions Streams Storage Scoring MicroserviceScoring MicroserviceScoring Microservice Edge Scoring Example Architecture Scoring microservices now on the edge devices, so the device session microservices (in the datacenter) now ingest model updates…
  26. 26. Datacenter Mini-batch, Batch Spark Persistence Ka6a Telemetry1 Model Training 2 Model Storage 3 Ingest Model4 Push Model 5 Spark Microservice Microservice Microservice Device Session MicroservicesCorrective Action 6 Sessions Streams Storage Scoring MicroserviceScoring MicroserviceScoring Microservice Edge Scoring Example Architecture Model updates are pushed to the devices
  27. 27. Datacenter Mini-batch, Batch Spark Persistence Ka6a Telemetry1 Model Training 2 Model Storage 3 Ingest Model4 Push Model 5 Spark Microservice Microservice Microservice Device Session MicroservicesCorrective Action 6 Sessions Streams Storage Scoring MicroserviceScoring MicroserviceScoring Microservice Edge Scoring Example Architecture Serving is now done locally
  28. 28. Datacenter Mini-batch, Batch Spark Persistence Ka6a Telemetry1 Model Training 2 Model Storage 3 Ingest Model4 Push Model 5 Spark Microservice Microservice Microservice Device Session MicroservicesCorrective Action 6 Sessions Streams Storage Scoring MicroserviceScoring MicroserviceScoring Microservice Edge Scoring Example Architecture When anomalies are detected, corrective actions are coordinated with the session microservices In this scenario, there will be other uses of ML, such as serving coupons to customers, making recommendations, etc.
  29. 29. Datacenter Mini-batch, Batch Spark Persistence Ka6a Telemetry1 Model Training 2 Model Storage 3 Ingest Model4 Push Model 5 Spark Microservice Microservice Microservice Device Session MicroservicesCorrective Action 6 Sessions Streams Storage Scoring MicroserviceScoring MicroserviceScoring Microservice Edge Scoring Example Architecture Latency of telemetry ingest can be much longer now, as it’s only used for model training
  30. 30. Technology Choices
  31. 31. Technology Choices • More than “faster” Hadoop… • New architectures that merge data processing with microservices
  32. 32. Recall Hadoop…
  33. 33. • Data warehouse replacement • Historical analysis • Interactive exploration • Offline training of machine learning models • …
  34. 34. YARN HDFS MapReduce jobs Spark jobs … Flume Sqoop DBs Worker #1 DiskDiskDiskDiskDisk Node Manager Data Node Master Resource Manager Name Node …#2 Logs
  35. 35. YARN HDFS MapReduce jobs Spark jobs … Flume Sqoop DBs Worker #1 DiskDiskDiskDiskDisk Node Manager Data Node Master Resource Manager Name Node …#2 Logs Storage Compute
  36. 36. YARN HDFS MapReduce jobs Spark jobs … Flume Sqoop DBs Worker #1 DiskDiskDiskDiskDisk Node Manager Data Node Master Resource Manager Name Node …#2 Logs • Hadoop is ideal for batch and interactive apps • … but also constrained by that model
  37. 37. New Fast Data Architecture
  38. 38. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage Flesh out earlier example architectures
  39. 39. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage While YARN can be used, it’s not flexible enough for today’s dynamic workloads Kubernetes and Mesos provide the job and resource management needed for dynamic, heterogenous work loads Deploy in the cloud or on premise
  40. 40. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage “Events” - e.g., REST messages, sessions, alerts, …
  41. 41. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage “Events” - e.g., REST messages, sessions, alerts, … “Streams” - one-way data flows, e.g., sockets or files, including logs, metrics, other telemetry, click streams, etc.
  42. 42. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage Each has different volumes, velocities, latency characteristics, protocols, etc. “Events” - e.g., REST messages, sessions, alerts, … “Streams” - one-way data flows, e.g., server logs, telemetry, click streams, … “Storage” - JDBC, async reads/writes to storage
  43. 43. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage Kafka deployed as a cluster of “Brokers” for scalability, resiliency.
  44. 44. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage Data backplane - like Enterprise Service Bus (ESB), but without the flaws…
  45. 45. Why Kafka? Organized into topics Ka#a Partition 1 Partition 2 Topic A Partition 1Topic B Topics are partitioned, replicated, and distributed
  46. 46. Why Kafka?Logs, not queues Unlike queues, consumers don’t delete entries; Kafka manages their lifecycles M Producers N Consumers, who start reading where they want Consumer 1 (at offset 14) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Partition 1Topic B Producer 1 Producer 2 Consumer 2 (at offset 10) writes reads Consumer 3 (at offset 6) earliest latest
  47. 47. Using Kafka Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N * M links ConsumersProducers Before: Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N + M links ConsumersProducers After: Messy and fragile; what if “Service 1” goes down? Simpler and more robust! Loss of Service 1 means no data loss. X X
  48. 48. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage Lots of streaming engine options… too many.
  49. 49. • Latency? How low? • Volume: How high? • Which kinds of data processing? • How do you want to build, deploy, and manage these services? How do you choose?
  50. 50. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage Run as distributed services You submit jobs, they are partitioned into tasks The streaming engines form two groups:
  51. 51. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage Libraries you embed in your microservices The streaming engines form two groups:
  52. 52. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage New, low-latency Structured Streaming Older mini-batch streaming Full batch support (replaced MapReduce)Rich SQL and Machine Learning options.
  53. 53. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage Richer streaming semantics, more mature low-latency support Low latency Spark alternative. Also uses services to partition the data and tasks across a cluster
  54. 54. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage Part of the rich Akka ecosystem for general microservices
  55. 55. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage Kafka-centric library for common data processing tasks
  56. 56. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage f Standard APIs allow almost any storage you want
  57. 57. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST Mini-batch Spark Batch Spark … Low Latency Flink Ka4a Streams Akka Streams Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search KaEa Cluster Ka4a Microservices Reac@ve PlaCorm Go Node.js … Spark Events Streams Storage Use your regular microservice tools… … but why are microservices in this diagram??
  58. 58. 1.The trend is to run everything in big clusters using Kubernetes or Mesos • In the cloud or on-premise Why Microservices in Fast Data?
  59. 59. 2.If streaming gives you information faster… • … you’ll want quick access to it in your other services! Why Microservices in Fast Data?
  60. 60. 3.Streaming raises the bar on data services • Compared to batch services, long-running streaming services must be more: • Scalable • Resilient • Flexible Why Microservices in Fast Data? https://www.reactivemanifesto.org/ FORM MEANS VALUE Responsive Resilient Message Driven Elastic
  61. 61. 4.This leads to our last major point… Why Microservices in Fast Data?
  62. 62. Organizational Impact
  63. 63. Organizational Impact • Data scientists have to understand production issues • Data engineers have to become good at highly-available microservices • Microservice engineers have to become good at data
  64. 64. Some overlap in concerns, architecture Big DataServices The Past
  65. 65. Much more overlap The Present Microservices & Fast Data
  66. 66. Much more microservice focused? The Future? Microservices for Fast Data Why? Since streams process data incrementally, there is less need for large-scale tools like Spark, Flink … and using microservices for everything simplifies development and operations Whatever tool choices we make, they have to integrate with the data science workflows we love.
  67. 67. Lightbend Fast Data Platform lightbend.com/fast-data-platform
  68. 68. lightbend.com/fast-data-platform
  69. 69. lightbend.com/fast-data-platform What we discussed
  70. 70. lightbend.com/fast-data-platform Plus management & monitoring tools
  71. 71. Dean Wampler, Ph.D. dean@lightbend.com @deanwampler polyglotprogramming.com/talks lightbend.com/fast-data-platform © Dean Wampler & Lightbend, 2018

×