Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow

566 views

Published on

Use cases and architectures for IoT projects leveraging Apache Kafka, ksqlDB, machine Learning / deep Learning frameworks like TensorFlow, and cloud infrastructure.

Large numbers of IoT devices lead to big data and the need for further processing and analysis. Apache Kafka is a highly scalable and distributed open source streaming platform, which can connect to MQTT and other IoT standards. Kafka ingests, stores, processes and forwards high volumes of data from thousands of IoT devices.

The rapidly expanding world of stream processing can be daunting, with new concepts such as various types of time semantics, windowed aggregates, changelogs, and programming frameworks to master. KSQL is the streaming SQL engine on top of Apache Kafka which simplifies all this and make stream processing available to everyone without the need to write source code.

This talk shows how to leverage Kafka and KSQL in an IoT sensor analytics scenario for predictive maintenance and integration with real time monitoring systems. A live demo shows how to embed and deploy Machine Learning models - built with frameworks like TensorFlow, DeepLearning4J or H2O - into mission-critical and scalable real time applications.

Published in: Software
  • Be the first to comment

IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow

  1. 1. IoT Sensor Analytics with Apache Kafka, KSQL, TensorFlow and MQTT Kafka-Native End-to-End IoT Data Integration and Processing Kai Waehner Technology Evangelist Confluent kai.waehner@confluent.io LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de
  2. 2. POLL: What are your current Use Cases for Apache Kafka?
  3. 3. Agenda 3 01 IoT Use Cases 06 IoT Data Processing 02 MQTT Standard 07 Live Demo: End-to-End Sensor Analytics 03 Apache Kafka Ecosystem 04 TensorFlow for IoT Scenarios 05 End-to-End IoT Integration Architecture(s)
  4. 4. IoT Use Cases
  5. 5. The Future of the Automotive Industry is a Real Time Data Cluster Front, rear and top view cameras Parking assistant Environment pointer Ultrasonic Sensors Parking assistant with front and rear camera plus environment indicator Crash Sensors Front protection adaptivity Side protection Tail impact protection Front Camera Audi Active lane assistant Speed limit indicator Adaptive light Infrared Camera Rearview assistance with Pedestrian recognition Front and Rear Radar Sensors ACC with stop and go function Side assist
  6. 6. The Future of the Automotive Industry is a Real Time Data Cluster Front, rear and top view cameras Ultrasonic SensorsCrash Sensors Front Camera Infrared Camera Front and Rear Radar Sensors Traffic Alerts Hazard Alerts Personalization Anomaly Detection MQTT MQTT MQTT MQTT MQTTMQTT
  7. 7. 7
  8. 8. 8
  9. 9. 9
  10. 10. 10
  11. 11. 11
  12. 12. 12
  13. 13. 13
  14. 14. Architecture (High-Level) 14
  15. 15. POLL: Which IoT scenarios do you see in your company?
  16. 16. MQTT Standard
  17. 17. MQTT - Publish / subscribe messaging protocol • Built on top of TCP/IP for constrained devices and unreliable networks • Many (open source) broker implementations • Many client libraries • IoT-specific features for bad network / connectivity • Widely used (mostly IoT, but also web and mobile apps via MQTT over WebSockets) 17
  18. 18. MQTT Architecture (large-scale) 18
  19. 19. MQTT Trade-Offs PROs ● Lightweight ● Simple API ● Built for poor connectivity / high latency scenario ● Many client connections (tens of thousands per MQTT server) 19 CONs ● “Just” queuing, not stream processing ● Can’t handle usage surges (no buffering) ● Most MQTT brokers don’t support high scalability ● Very asynchronous processing (often offline for long time) ● No good integration to the rest of the enterprise ● No reprocessing of events
  20. 20. Apache Kafka Ecosystem
  21. 21. Apache Kafka - The Rise of an Event Streaming Platform 21
  22. 22. Log & Pub/Sub 22
  23. 23. Apache Kafka == Distributed Commit Log with Replication 23
  24. 24. Apache Kafka at Scale at Tech Giants 24 > 7 trillion messages / day > 6 Petabytes / day ...you name it!
  25. 25. Kafka Trade-Offs (IoT Perspective) PROs ● Stream processing, not just queuing ● High throughput ● Large scale ● High availability ● Long term storage and buffering ● Reprocessing of events ● Good integration to the rest of the enterprise 26 CONs ● Not built for tens of thousands connections ● Requires stable network and good infrastructure ● No IoT-specific features like keep alive, last will or testament
  26. 26. (De facto) Standards for Processing IoT Data 27 ...a match made in Heaven
  27. 27. TensorFlow for IoT Scenarios
  28. 28. TensorFlow TensorFlow is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domains. 29
  29. 29. The first analytic models 30 How to deploy the models in production? …real-time processing? …at scale? …24/7 zero downtime?
  30. 30. Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf 31
  31. 31. Apache Kafka’s Open Source Ecosystem as Infrastructure for ML 32
  32. 32. Apache Kafka’s Open Ecosystem as Infrastructure for ML 33
  33. 33. Data Ingestion 34
  34. 34. Model Training 35
  35. 35. Model Training without additional Data Store https://github.com/tensorflow/io/tree/master/tensorflow_io/kafka 36 ● Native integration between Kafka and TensorFlow ● KafkaDataSet and KafkaOutputSequence for TensorFlow ● Written in C++ (linked with librdkafka) ● Part of the graph in TensorFlow ● Direct training and inference from streaming data ● No data storage like S3 or HDFS needed
  36. 36. Replayability — a log never forgets! 37
  37. 37. Analytic Model (Autoencoder for Anomaly Detection) 38
  38. 38. Model Deployment #1: RPC Communication to do Model Inference 39
  39. 39. Model deployment #2: Model inference natively in the App 40
  40. 40. End-to-End IoT Integration Architecture(s)
  41. 41. Architecture (high-level) 42
  42. 42. Architecture (High Level) – Machine Learning Perspective 43
  43. 43. Kafka-Native Integration Options between MQTT and Apache Kafka Kafka Connect MQTT Proxy REST Proxy 44
  44. 44. Kafka-Native Integration Options between MQTT and Apache Kafka Kafka Connect MQTT Proxy REST Proxy 45
  45. 45. Integration with Kafka Connect (Source and Sink) 46
  46. 46. Kafka Connect – Don‘t reinvent the wheel! 47
  47. 47. Kafka-Native Integration Options between MQTT and Apache Kafka Kafka Connect MQTT Proxy REST Proxy 48
  48. 48. MQTT Proxy 49
  49. 49. Kafka-Native Integration Options between MQTT and Apache Kafka Kafka Connect MQTT Proxy REST Proxy 50
  50. 50. REST Proxy 51
  51. 51. Confluent REST Proxy - Produce and Consume Messages 52
  52. 52. IoT Data Processing
  53. 53. IoT Data Processing 54
  54. 54. Processing Options for MQTT Data with Kafka 55
  55. 55. The 3 stream processing modalities with Confluent 56
  56. 56. The 3 stream processing modalities differ in flexibility and ease of use 57
  57. 57. Using external processing systems leads to complicated architectures 58
  58. 58. We can put it back together in a simpler way 59
  59. 59. Connect integration and pull queries enable end- to-end streaming in just a few SQL statements 60
  60. 60. Live Demo: End-to-End Sensor Analytics
  61. 61. KSQL and Deep Learning (Auto Encoder) for Anomaly Detection 63
  62. 62. Model Training with Python, KSQL, TensorFlow, Keras and Jupyter https://github.com/kaiwaehner/python-jupyter-apache-kafka-ksql-tensorflow-keras 64
  63. 63. Model Deployment with Apache Kafka, KSQL and TensorFlow 65
  64. 64. Live Demo: End-to-End Sensor Analytics…
  65. 65. Deep Learning UDF for KSQL for Streaming Anomaly Detection of MQTT IoT Sensor Data https://github.com/kaiwaehner/ksql-udf-deep-learning-mqtt-iot 67
  66. 66. Demo: Predictive Analytics for 100000 Connected Cars Kafka + KSQL + MQTT + TensorFlow + Kubernetes 68 https://www.kai-waehner.de/blog/2019/11/08/live-demo- iot-100-000-connected-cars-kubernetes-kafka-mqtt- tensorflow/
  67. 67. POLL: What is the best choice for your IoT integration between MQTT and Kafka?
  68. 68. Stay in touch! Try Confluent cnfl.io/download Confluent Blog cnfl.io/blog Community cnfl.io/meetups
  69. 69. Thank You! Kai Waehner kai-waehner.de @KaiWaehner LinkedIn

×