Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Design Streaming Fast Data Applications with Spark, Akka, Kafka and Cassandra on Mesos & DC/OS

2,219 views

Published on

Data today is online, in motion, and generated constantly. For architects, developers and their businesses, this means that there is a critical need for tools and applications that can process this data in real-time, generating actionable insights that can drive business value. How? By moving away from batch-centric processing to streaming, Fast Data pipelines that are fully Reactive.

In this webinar with Craig Pottinger, Senior Consultant at Lightbend, we examine the design choices around building streaming systems with technologies like Akka Streams, Apache Kafka, Apache Spark, Apache Flink, Mesosphere DC/OS and Lightbend Reactive Platform, all of which come integrated with Lightbend Fast Data Platform.

Craig will discuss how Fast Data Platform brings together a curated set of technologies—including streaming engines, a data backplane, reactive microservices, persistence, Machine Learning, purpose-built monitoring and more—plus on-demand guidance from experts to help you design, and then help your developers build, streaming fast data applications that are successful in the long-run.

Published in: Software
  • Be the first to comment

Design Streaming Fast Data Applications with Spark, Akka, Kafka and Cassandra on Mesos & DC/OS

  1. 1. Understanding Data Streaming •To understand Fast Data we must first understand traditional data streaming: •The processing of large or unbounded sequences of data •Dataset’s that are too large to fit in memory or disk •Can be used to provide insights for data that never ends •Typical use cases are to provide aggregations and predictions •Usually synchronous and single threaded •Very low latency and close to real time
  2. 2. What is Fast Data? •The processing of high volumes of a continuous stream of data •Fast Data combines properties from traditional stream processing, big data infrastructure, and reactive applications •Low to medium latency data processing in near real time •Scales horizontally to handle a high volume of data •Stream processing is parallelized across many CPU cores and machines by partitioning the data stream •Fault Tolerance provides resilience and the ability to recover from failures •High Availability provides responsiveness and ensures uptime guarantees
  3. 3. Fast Data Sources •Sensor data: Processing discrete data from many Internet of Things (IoT) devices •Network traffic: Telecommunication network optimization using SDN’s •Web/mobile user activity: Up-to-date trends of user behaviour from web or mobile apps •Database event logs: Data pumps and streaming ETL to create new views of source data
  4. 4. Applications for Fast Data Monitoring Better Products & Services
  5. 5. Application: Monitoring •Monitoring data for anomalies has many finance applications: Fraud Detection, Risk Management, Compliance •Credit card companies have multiple levels of Fraud Detection •Transaction-time fraud detection occurs at time of purchase •Secondary fraud detection occurs after transaction time •Requirements •Reliable data capture is important for monitoring compliance •Model training & scoring for fraud detection
  6. 6. Application: Better Products & Services •Recommendation Engines •Media companies suggest new songs & tv shows to users (Netflix, Spotify) •eCommerce companies recommend new products (Amazon) •Requirements •Joining historical data with real-time data
  7. 7. So how should we design Fast Data systems? •To implement Fast Data systems we need to review the evolution of two subsets of software development •Building Application Services •Building Data Systems
  8. 8. Why? •The worlds of Data Systems (aka Streaming Applications) and Applications (Microservices) are converging.
  9. 9. Building Application Services
  10. 10. The Software Spectrum •Monoliths and Microservices exist on a spectrum •Monolith on one end, Microservices on the other •Most applications live somewhere in the middle
  11. 11. Characteristics of a Monolith •Deployed as a single unit •Single shared database •Communicate with synchronous method calls •Deep coupling between libraries and components(often through the DB) •“Big Bang” style releases •Long cycle times (weeks to months) •Teams carefully synchronize features and releases
  12. 12. The Monolithic Ball of Mud •The ball of mud represents the worst case scenario for a Monolith •No clear isolation in the application •Complex dependencies •Hard to understand and harder to modify
  13. 13. The Microservice Architecture •Microservices are a subset of SOA •Logical components are separated into isolated microservices •Microservices can be physically separated and independently deployed •Each component/microservice has it’s own independent data store
  14. 14. Scaling a Microservice Application •Each microservice is scaled independently •Could be one or more copies of a service per machine •Each machine hosts a subset of the entire system
  15. 15. Characteristics of Microservices •Each service is deployed independently •Multiple independent databases •Communication is synchronous or asynchronous (Reactive Microservices) •Loose coupling between components •Rapid deployments (possibly continuous) •Teams release features when they are ready
  16. 16. Building Data Services
  17. 17. Fast Data Pipeline
  18. 18. Hadoop •Hadoop is a system for collecting and processing massive amounts of data •Focus on batch processing and analytics •Divided into three projects: MapReduce, HDFS, YARN •Linear scalability with inexpensive commodity servers •Open Source
  19. 19. Disadvantages of Hadoop •Batch semantics delay gaining insight from your data •Discovering insights faster is a competitive advantage •Customers today expect up-to-date and accurate information •It’s difficult to implement business processes in MapReduce programming model •A poor choice for iterative operations such as Machine Learning
  20. 20. Distributed Stream Processors •There are lots of distributed stream processors to choose from: Spark Streaming, Storm, Samza, Flink, Apex, Gear Pump •They fill in the gap of streaming requirements that exists in Hadoop •Target YARN, Mesos, and standalone cluster resource managers
  21. 21. Complexity of Distributed Stream Processors •Distributed stream processors address complexities not found in batch semantics •Handling out of order messages •Message delivery & processing semantics •Event-time vs processing-time
  22. 22. Reactive Principles •Responsive: A Reactive System consistently responds in a timely fashion •Resilient: A Reactive System remains responsive, even when failures occur •Elastic: A Reactive System remains responsive, despite changes system load •Message Driven: A Reactive System is built on a foundation of async, non-blocking messages
  23. 23. Introducing Lightbend Fast Data Platform
  24. 24. What is Lightbend Fast Data Platform? Lightbend Fast Data Platform is a ● curated, ● integrated and ● fully supported platform that provides you with an easy on-ramp for designing, building and running streaming Fast Data applications.
  25. 25. Why Lightbend Fast Data Platform? ● For architects: Design capabilities and guided tool choices so you can demystify complexity and reduce risk. ● For developers: An easy on-ramp that accelerates developer velocity so you can build & launch performant apps on time. ● For ops teams: Run-time capabilities so you can serve users reliably at scale, along with one-stop support for all components to ensure peace of mind.
  26. 26. 1 2 3 4 5 6 78 7 Introducing Lightbend Fast Data Platform Stream Processing 1. Streaming Engines Machine Learning 2. Pluggable ML Libraries Microservices 3. Reactive Platform Operational Tooling 4. Intelligent Management and Monitoring 5. Cluster Analysis (FUTURE) Infrastructure 6. Durable Messaging Backplane 7. Persistence 8. Infrastructure (On-Prem, Cloud, Hybrid)
  27. 27. 1 2 3 4 5 6 78 7 Lightbend Fast Data Platform Components Stream Processing 1. Streaming Engines Machine Learning 2. Pluggable ML Libraries Microservices 3. Reactive Platform Operational Tooling 4. Intelligent Management and Monitoring 5. Cluster Analysis (FUTURE) Infrastructure 6. Durable Messaging Backplane 7. Persistence 8. Infrastructure (On-Prem, Cloud, Hybrid)
  28. 28. When Choosing Streaming Engines… •Low latency? How low? •High volume? How high? •Kinds of data processing and analytics? Which ones? •Process data: •Individually (e.g., complex event processing)? •In bulk (e.g., like SQL joins)? •Required integrations with other tools? Which ones?
  29. 29. 1 2 3 4 5 6 78 7 Lightbend Fast Data Platform Components Stream Processing 1. Streaming Engines Machine Learning 2. Pluggable ML Libraries Microservices 3. Reactive Platform Operational Tooling 4. Intelligent Management and Monitoring 5. Cluster Analysis (FUTURE) Infrastructure 6. Durable Messaging Backplane 7. Persistence 8. Infrastructure (On-Prem, Cloud, Hybrid)
  30. 30. 1 2 3 4 5 6 78 7 Lightbend Fast Data Platform Components Stream Processing 1. Streaming Engines Machine Learning 2. Pluggable ML Libraries Microservices 3. Reactive Platform Operational Tooling 4. Intelligent Management and Monitoring 5. Cluster Analysis (FUTURE) Infrastructure 6. Durable Messaging Backplane 7. Persistence 8. Infrastructure (On-Prem, Cloud, Hybrid)
  31. 31. 1 2 3 4 5 6 78 7 Lightbend Fast Data Platform Components Stream Processing 1. Streaming Engines Machine Learning 2. Pluggable ML Libraries Microservices 3. Reactive Platform Operational Tooling 4. Intelligent Management and Monitoring 5. Cluster Analysis (FUTURE) Infrastructure 6. Durable Messaging Backplane 7. Persistence 8. Infrastructure (On-Prem, Cloud, Hybrid)
  32. 32. 1 2 3 4 5 6 78 7 Lightbend Fast Data Platform Components Stream Processing 1. Streaming Engines Machine Learning 2. Pluggable ML Libraries Microservices 3. Reactive Platform Operational Tooling 4. Intelligent Management and Monitoring 5. Cluster Analysis (FUTURE) Infrastructure 6. Durable Messaging Backplane 7. Persistence 8. Infrastructure (On-Prem, Cloud, Hybrid)
  33. 33. 1 2 3 4 5 6 78 7 Lightbend Fast Data Platform Components Stream Processing 1. Streaming Engines Machine Learning 2. Pluggable ML Libraries Microservices 3. Reactive Platform Operational Tooling 4. Intelligent Management and Monitoring 5. Cluster Analysis (FUTURE) Infrastructure 6. Durable Messaging Backplane 7. Persistence 8. Infrastructure (On-Prem, Cloud, Hybrid)
  34. 34. 1 2 3 4 5 6 78 7 Lightbend Fast Data Platform Components Stream Processing 1. Streaming Engines Machine Learning 2. Pluggable ML Libraries Microservices 3. Reactive Platform Operational Tooling 4. Intelligent Management and Monitoring 5. Cluster Analysis (FUTURE) Infrastructure 6. Durable Messaging Backplane 7. Persistence 8. Infrastructure (On-Prem, Cloud, Hybrid)
  35. 35. 1 2 3 4 5 6 78 7 Lightbend Fast Data Platform Components Stream Processing 1. Streaming Engines Machine Learning 2. Pluggable ML Libraries Microservices 3. Reactive Platform Operational Tooling 4. Intelligent Management and Monitoring 5. Cluster Analysis (FUTURE) Infrastructure 6. Durable Messaging Backplane 7. Persistence 8. Infrastructure (On-Prem, Cloud, Hybrid)
  36. 36. 1 2 3 4 5 6 78 7 Lightbend Fast Data Platform Components Stream Processing 1. Streaming Engines Machine Learning 2. Pluggable ML Libraries Microservices 3. Reactive Platform Operational Tooling 4. Intelligent Management and Monitoring 5. Cluster Analysis (FUTURE) Infrastructure 6. Durable Messaging Backplane 7. Persistence 8. Infrastructure (On-Prem, Cloud, Hybrid)

×