Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Kafka Streams + Machine Learning / Deep Learning

24,430 views

Published on

Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams...

Big Data and Machine Learning are key for innovation in many industries today. Large amounts of historical data are stored and analyzed in Hadoop, Spark or other clusters to find patterns and insights, e.g. for predictive maintenance, fraud detection or cross-selling.

This first part of the session explains how to build analytic models with R, Python and Scala leveraging open source machine learning / deep learning frameworks like Apache Spark, TensorFlow or H2O.ai. The second part discusses how to leverage these built analytic models in your own streaming applications or microservices; leveraging the Apache Kafka cluster and Kafka Streams instead of building an own stream processing cluster. The session focuses on live demos and teaches lessons learned for executing analytic models in a highly scalable and performant way.

The last part explains how Apache Kafka can help to move from a manual build and deployment of analytic models to continuous online model improvement in real time.

Published in: Technology

Apache Kafka Streams + Machine Learning / Deep Learning

  1. 1. 1Confidential Apache Kafka + Machine Learning Analytic Models Applied to Real Time Stream Processing Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.kai-waehner.de
  2. 2. 2Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  3. 3. 3Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  4. 4. 4Apache Kafka and Machine Learning Machine Learning ... allows computers to find hidden insights without being explicitly programmed where to look.
  5. 5. 5Apache Kafka and Machine Learning Real World Examples of Machine Learning Spam Detection Search Results + Product Recommendation Picture Detection (Friends, Locations, Products) Your Company The Next Disruption: Google Beats Go Champion
  6. 6. 6Apache Kafka and Machine Learning Leverage Machine Learning to Analyze and Act on Critical Business Moments Seconds Minutes Hours Price Optimization Predictive Maintenance Fraud Detection Cross Selling Transportation Rerouting Customer Service Inventory Management Windows of Opportunity
  7. 7. 7Apache Kafka and Machine Learning How to realize these use cases?
  8. 8. 8Apache Kafka and Machine Learning Big Data Analytics Volume (terabytes, petabytes) Variety (social networks, blog posts, logs, sensors, etc.) Velocity („real time“) Value
  9. 9. 9Apache Kafka and Machine Learning Big Data Analytics for Actionable Insights From Insight to Action (continuously closed loop)
  10. 10. 10Apache Kafka and Machine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  11. 11. 11Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  12. 12. 12Apache Kafka and Machine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  13. 13. 13Apache Kafka and Machine Learning Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Writing source code is not the time-consuming task! !
  14. 14. 14Apache Kafka and Machine Learning Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Execution 6. Model Validation 7. Deployment
  15. 15. 15Apache Kafka and Machine Learning Data Access Find insights to create added business value by correlating various data sources!
  16. 16. 16Apache Kafka and Machine Learning Data Preparation http://www.slideshare.net/odsc/feature-engineering Data Preparation
  17. 17. 17Apache Kafka and Machine Learning Exploratory Data Analysis © Copyright 2000-2017 TIBCO Software Inc. • Scripting • Visual Analytics • Machine Learning
  18. 18. 18Apache Kafka and Machine Learning Model Building A model is a simplification of the truth that helps you with decision making.
  19. 19. 19Apache Kafka and Machine Learning Model Execution (Coding) Apply Model to New Data
  20. 20. 20Apache Kafka and Machine Learning Model Execution (Tooling) Apply Model to New Data
  21. 21. 21Apache Kafka and Machine Learning Model Validation https://genome.tugraz.at/proclassify/help/pages/XV.html Cross-Validation Procedure
  22. 22. 22Apache Kafka and Machine Learning Frameworks and Tooling?
  23. 23. 23Apache Kafka and Machine Learning Languages, Frameworks and Tools Many more …. Portable Format for Analytics (PFA)
  24. 24. 24Apache Kafka and Machine Learning Live Demos with Open Source Technologies Development of Analytic Models with R, TensorFlow, Apache Spark, H2O.ai, RapidMiner
  25. 25. 25Apache Kafka and Machine Learning Live Demo Use Case: Customer Churn Prediction Machine Learning Algorithm: Generalized Linear Model (GLM) using Logistic Regression Technology: Open Source R
  26. 26. 26Apache Kafka and Machine Learning Live Demo Use Case: Airline Flight Delay Prediction Machine Learning Algorithm: Gradient Boosted Machines (GBM) using Decision Trees Technology: H2O.ai
  27. 27. 27Apache Kafka and Machine Learning Live Demo Use Case: Predictive Maintenance (Anomaly Detection in Telco Networks) Deep Learning Algorithm: Artificial Neural Networks (ANN) using Autoencoders Technology: TensorFlow + Python API
  28. 28. 28Apache Kafka and Machine Learning Live Demo Use Case: Classification (Prediction of Titanic Survivors) Deep Learning Algorithm: Recurrent Neural Networks (RNN) Technology: RapidMiner
  29. 29. 29Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  30. 30. 30Apache Kafka and Machine Learning Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Execution 6. Model Validation 7. Deployment
  31. 31. 31Apache Kafka and Machine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  32. 32. 32Apache Kafka and Machine Learning Definition of Stream Processsing Data at Rest Data in Motion
  33. 33. 33Apache Kafka and Machine Learning Key Concepts
  34. 34. 34Apache Kafka and Machine Learning Key Concepts
  35. 35. 35Apache Kafka and Machine Learning Key Concepts
  36. 36. 36Apache Kafka and Machine Learning Stream Processing Use Cases • Real Time Applications • Stateful Streaming Analytics • Stateless “Real Time ETL”
  37. 37. 37Apache Kafka and Machine Learning Event Processing Windows Various Options for Windowing (Fixed, Sliding, Session, …)
  38. 38. 38Apache Kafka and Machine Learning How to apply analytic models to real time processing without redevelopment?
  39. 39. 39Apache Kafka and Machine Learning Application of Analytic Models to Real Time without Redevelopment Stream Processing H20.ai R Python Spark ML MATLAB SAS PMML
  40. 40. 40Apache Kafka and Machine Learning Streaming Analytics - Processing Pipeline APIs Adapters / Channels Integration Messaging Stream Ingest Transformation Aggregation Enrichment Filtering Stream Preprocessing Process Management Analytics (Real Time) Applications & APIs Analytics / DW Reporting Stream Outcomes • Contextual Rules • Windowing • Patterns • Analytics • Machine Learning • … Stream Analytics Index / SearchNormalization Applying an Analytic Model is just a piece of the puzzle!
  41. 41. 41Apache Kafka and Machine Learning Frameworks and Tooling?
  42. 42. 42Apache Kafka and Machine Learning Frameworks and Products OPEN SOURCE CLOSED SOURCE PRODUCT FRAMEWORK Azure Microsoft Stream Analytics
  43. 43. 43Apache Kafka and Machine Learning When to use Kafka Streams for Stream Processing?
  44. 44. 44Apache Kafka and Machine Learning When to use Kafka Streams for Stream Processing? No need for a Big Data cluster Deploy in your existing infrastructure Kafka manages scalability / fail-over Focus on development of business logic in your department
  45. 45. 45Apache Kafka and Machine Learning Kafka Streams Map, filter, aggregate, apply analytic model, „any business logic“ Input Stream (Kafka Topic) Kafka Cluster Output Stream (Kafka Topic) Kafka Cluster Stream Processing Microservice (Kafka Streams) Deployed anywhere: Docker, Kubernetes, Mesos, Java App, …
  46. 46. 46Apache Kafka and Machine Learning A complete streaming microservices, ready for production at large-scale Word Count App configuration Define processing (here: WordCount) Start processing
  47. 47. 47Apache Kafka and Machine Learning Confluent Platform: the Free, Open-Source Streaming Platform Open Source ExternalCommercial Confluent Platform Monitoring Analytics Custom Apps Transformations Real-time Applications … CRM Data Warehouse Database Hadoop Data Integration … Control Center Auto-data Balancing Multi-Data Center Replication 24/7 Support Supported Connectors Clients Schema Registry REST Proxy Apache Kafka Kafka Connect Kafka Streams Kafka Core Database Changes Log Events loT Data Web Events …
  48. 48. 48Apache Kafka and Machine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  49. 49. 49Apache Kafka and Machine Learning STREAMING PLATFORM BIG DATAANALYTICS Oracle DB CoaP IoT Kafka Java Client ….. HP Vertica Data Integration F L U M E H2O.ai, Spark, TensorFlow Batch Real Time Confluent REST Proxy MQTT IoT iPhone App Kafka Go Client C K O A N F N K E A C T H I V E Grafana Kafka Java EE Web App Hadoop C K O A N F N K E A C T Confluent Schema Registry Kafka Streams H2O.ai Mesos Kafka Streams TensorFlow Kubernetes Avro Avro 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  50. 50. 50Apache Kafka and Machine Learning Live Demos with Open Source Technologies Development of Analytic Models with Apache Kafka Messaging, Kafka Streams, Kafka Connect, Confluent Schema Registry
  51. 51. 51Apache Kafka and Machine Learning Live Demo Use Case: Airline Flight Delay Prediction Machine Learning Algorithm: Any! (in our example, H2O.ai GBM) Streaming Platform: Apache Kafka Core, Kafka Connect, Kafka Streams, Confluent Schema Registry
  52. 52. 52Apache Kafka and Machine Learning H2O.ai Model + Kafka Streams Filter Map 1) Create H2O ML model 2) Configure Kafka Streams Application 3) Apply H2O ML model to Streaming Data 4) Start Kafka Streams App
  53. 53. 53Apache Kafka and Machine Learning End-to-End Stream Monitoring and Alerting Confluent Control Center Data Stream Monitoring and Alerting Multi-cluster monitoring and management Kafka Connect Configuration • Message delivery? • Delays? • Where got it stuck? • Lost messages? • Broker issues? • Performance? http://docs.confluent.io/3.2.0/control-center/docs/monitoring.html
  54. 54. 54Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  55. 55. 55Apache Kafka and Machine Learning Let’s improve the analytic model continuously…
  56. 56. 56Apache Kafka and Machine Learning Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Execution 6. Model Validation 7. Deployment Online Training Continuously train and improve the model with every new event
  57. 57. 57Apache Kafka and Machine Learning Online Model Training of Analytic Models How to improve models? 1.Manual Update 2.Automated Batch 3.Real Time
  58. 58. 58Apache Kafka and Machine Learning STREAMING PLATFORM BIG DATAANALYTICS F L U M E H2O.ai, Spark, TensorFlow H I V E Kafka Hadoop Confluent Schema Registry Kafka Streams H2O.ai Mesos Kafka Streams TensorFlow Kubernetes Avro Avro 1) Get new Input Event via Kafka Topic 2) Improve Model in Big Data Cluster 3) Update deployed Model via Kafka Topic 4) Leverage Improved Model for new Events
  59. 59. 59Apache Kafka and Machine Learning Caveats for Online Model Training • Processes and infrastructure not ready • Validation needed before production • Slows down the system • Only a few ML implementations supported • Many use cases do not need it
  60. 60. 60Apache Kafka and Machine Learning Key Take-Aways Ø Insights are hidden in Historical Data on Big Data Platforms Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models Ø Streaming Platform uses these Models (without Redevelopment) to take Action in Real Time
  61. 61. 61Apache Kafka and Machine Learning Kai Waehner Technology Evangelist kontakt@kai-waehner.de @KaiWaehner www.kai-waehner.de LinkedIn Questions? Feedback? Please contact me!

×