Use cases and architectures for IoT projects leveraging Apache Kafka, ksqlDB, machine Learning / deep Learning frameworks like TensorFlow, and cloud infrastructure.
Large numbers of IoT devices lead to big data and the need for further processing and analysis. Apache Kafka is a highly scalable and distributed open source streaming platform, which can connect to MQTT and other IoT standards. Kafka ingests, stores, processes and forwards high volumes of data from thousands of IoT devices.
The rapidly expanding world of stream processing can be daunting, with new concepts such as various types of time semantics, windowed aggregates, changelogs, and programming frameworks to master. KSQL is the streaming SQL engine on top of Apache Kafka which simplifies all this and make stream processing available to everyone without the need to write source code.
This talk shows how to leverage Kafka and KSQL in an IoT sensor analytics scenario for predictive maintenance and integration with real time monitoring systems. A live demo shows how to embed and deploy Machine Learning models - built with frameworks like TensorFlow, DeepLearning4J or H2O - into mission-critical and scalable real time applications.
5. The Future of the Automotive Industry
is a Real Time Data Cluster
Front, rear and top
view cameras
Parking assistant
Environment pointer
Ultrasonic Sensors
Parking assistant with
front and rear camera plus
environment indicator
Crash Sensors
Front protection adaptivity
Side protection
Tail impact protection
Front Camera
Audi Active lane assistant
Speed limit indicator
Adaptive light
Infrared Camera
Rearview assistance with
Pedestrian recognition
Front and Rear
Radar Sensors
ACC with stop and go function
Side assist
6. The Future of the Automotive Industry
is a Real Time Data Cluster
Front, rear and top
view cameras
Ultrasonic SensorsCrash Sensors
Front Camera Infrared Camera
Front and Rear
Radar Sensors
Traffic Alerts
Hazard Alerts Personalization
Anomaly
Detection
MQTT MQTT
MQTT
MQTT MQTTMQTT
17. MQTT - Publish / subscribe messaging protocol
• Built on top of TCP/IP for constrained devices and unreliable networks
• Many (open source) broker implementations
• Many client libraries
• IoT-specific features for bad network / connectivity
• Widely used (mostly IoT, but also web and mobile apps via MQTT over WebSockets)
17
19. MQTT Trade-Offs
PROs
● Lightweight
● Simple API
● Built for poor connectivity / high
latency scenario
● Many client connections (tens of
thousands per MQTT server)
19
CONs
● “Just” queuing, not stream
processing
● Can’t handle usage surges (no
buffering)
● Most MQTT brokers don’t support
high scalability
● Very asynchronous processing
(often offline for long time)
● No good integration to the rest of
the enterprise
● No reprocessing of events
24. Apache Kafka at Scale at Tech Giants
24
> 7 trillion messages / day > 6 Petabytes / day
...you name it!
25. Kafka Trade-Offs (IoT Perspective)
PROs
● Stream processing, not just
queuing
● High throughput
● Large scale
● High availability
● Long term storage and buffering
● Reprocessing of events
● Good integration to the rest of the
enterprise
26
CONs
● Not built for tens of thousands
connections
● Requires stable network and good
infrastructure
● No IoT-specific features like keep
alive, last will or testament
28. TensorFlow
TensorFlow is an open source software
library for high performance numerical
computation. Its flexible architecture allows
easy deployment of computation across a
variety of platforms (CPUs, GPUs, TPUs), and
from desktops to clusters of servers to mobile
and edge devices. Originally developed by
researchers and engineers from the Google
Brain team within Google’s AI organization, it
comes with strong support for machine
learning and deep learning and the flexible
numerical computation core is used across
many other scientific domains.
29
29. The first analytic models
30
How to deploy the models
in production?
…real-time processing?
…at scale?
…24/7 zero downtime?
30. Hidden Technical Debt in Machine Learning
Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf 31
35. Model Training without additional Data Store
https://github.com/tensorflow/io/tree/master/tensorflow_io/kafka
36
● Native integration
between Kafka and
TensorFlow
● KafkaDataSet and
KafkaOutputSequence
for TensorFlow
● Written in C++ (linked
with librdkafka)
● Part of the graph in
TensorFlow
● Direct training and
inference from
streaming data
● No data storage like S3
or HDFS needed
65. Deep Learning UDF for KSQL for Streaming
Anomaly Detection of MQTT IoT Sensor Data
https://github.com/kaiwaehner/ksql-udf-deep-learning-mqtt-iot
67