Lots of organizations are looking to stream IoT data to Apache Kafka. However, connecting tens of thousands or even millions of devices, over unreliable networks can create some architecture challenges. Dropped connections, managing large amount of device connections, implementing end-to-end quality of service for message delivery are issues that can impact the overall IoT system performance, scalability and resilience.
This presentation will identify some best practices for implementing a large scale IoT system that stream IoT data to Apache Kafka. The best practices will be based on our experience of implementing large scale IoT solutions, such as connected cars, connected industrial equipment, and consumer products. We will look at the different approaches for using the MQTT standard for moving data from the device to Kafka and recommendation on overall system architecture to ensure seamless and secure delivery of IoT data.
This presentation was given at BMW in Munich in June 2019
6. Introduction
• HiveMQ CTO
• Strong background in distributed
and large scale systems
architecture
• OASIS MQTT TC Member
• Author of “The Technical
Foundations of IoT”
• Conference Speaker
• Program committee member for
German and international IoT
conferences
Dominik
Obermaier
@dobermai
7. IOT CHALLENGES
➤ Unreliable communication channels (e.g.
mobile)
➤ Constrained Devices
➤ Low Bandwidth and High Latency
environments
➤ Bi-directional communication required
➤ Security
➤ Instantaneous data exchange
14. KAFKA STRENGTHS
➤ Optimized to stream data between
systems and applications in a scalable
manner
➤ Scale-out with multiple topics and
partitions and multiple nodes
➤ Perfect for system communication inside
trusted network and limited producers
and consumers
15. KAFKA ALONE IS NOT OPTIMAL
➤ For IoT use cases where devices are
connected to the data center or cloud
over the public Internet as first point of
contact
➤ If you attempt to stream data from
thousands or even millions of devices
using Kafka over the Internet
16. KAFKA CHALLENGES
➤ Kafka Clients need to address
Kafka brokers directly, which is
not possible with load
balancers
IOT REALITY
➤ Clients are connected over the
Internet
➤ Load Balancers are used as first
line of defense
➤ IP addresses of infrastructure
(e.g. Kafka nodes) not exposed
to the public Internet
➤ Load Balancers effectively act
as proxy
17. KAFKA CHALLENGES
➤ Kafka is hard to scale to
multiple hundreds of
thousands or millions of topics
IOT REALITY
➤ IoT devices typically are
segmented to use individual
topics
➤ Individual topics very often
contain data like unique device
identifier
➤ Multiple millions of topics can
be used in a single IoT scenario
➤ Ideal for security as it’s possible
to restrict devices to only
produce and consume for
specific topics
➤ Topics are usually dynamic
18. KAFKA CHALLENGES
➤ Kafka Clients are reasonable
complex by design (e.g. use
multiple TCP connections)
➤ Libraries optimized for
throughput
➤ APIs for Kafka libraries are
simple to use but the behavior
sometimes isn’t configurable
easily (e.g. async send()
method can block)
IOT REALITY
➤ IoT devices are typically very
constrained (computing power
and memory)
➤ Device programmer need very
simple APIs AND full flexibility
when it comes to library behavior
➤ Single IoT devices typically don’t
require lot of throughput
➤ Important to limit and
understand the number of TCP
connections, especially over the
Internet. Very often only one TCP
connection to the backend
desired
19. KAFKA CHALLENGES
➤ No on/off notification
mechanism
➤ No Keep-Alive mechanism
individual TCP connections for
producers
➤ Kafka Protocol for producers
rather heavyweight over the
Internet (lots of
communication)
IOT REALITY
➤ Features like on/off
notifications are often required
➤ Unreliable networks require
lightweight keep-alive
mechanisms for producers and
consumers (half-open
connections)
➤ Device communication over
the Internet requires minimal
communication overhead
20.
21. WHAT IS MQTT?
➤ Most popular IoT Messaging Protocol
➤ Minimal Overhead
➤ Publish / Subscribe
➤ Easy
➤ Binary
➤ Data Agnostic
➤ Designed for reliable communication
over unreliable channels
➤ ISO Standard
22. USE CASES
➤ Push Communication
➤ Unreliable communication
channels (e.g. mobile)
➤ Constrained Devices
➤ Low Bandwidth and High Latency
environments
➤ Communication from backend to
IoT device
➤ Lightweight backen communication
29. HiveMQ
MQTT broker built for enterprise applications
Extensive plugin system
Scales to > 10 million of concurrent connections
OSS Community Edition available
Built for High Availability and used by 150+
of the largest IoT deployments in the world
31. HiveMQ MQTT Client
Java based MQTT library
Developed by HiveMQ and BMW Car-IT
Built for devices and backends
Open Source (Apache 2)
Extremely fast and low overhead
36. CON
➤ Doesn’t scale well as Kafka acts
as MQTT client.
➤ MQTT Client are not designed
for extremely large amounts of
MQTT messages
➤ Centralizes business and
message transformation logic
PRO
➤ Ideal if you don’t control MQTT
Broker or use third party MQTT
broker
38. CON
➤ Does not implement the full
MQTT ISO Standard
➤ Does not support features
unique to MQTT which were
designed for IoT use cases
(LWT, retained messages, …)
➤ Client must ensure that no
MQTT features beside plain
publishing are used
➤ Tightly coupled with Kafka,
so it does not allow the Pub/
Sub features of MQTT
PRO
➤ Does not require a MQTT
brokers
➤ Usually stateless, which makes
scaling easier
40. CON
➤ Extremely hard to avoid
message loss
➤ Developers must take care of
fault tolerance themselves
➤ Another system which adds
complexity and adds no
value
PRO
➤ Application used for
transposing MQTT to Kafka
and vice versa built by
developers themselves, so no
external component
42. CON
➤ Extension only available for
HiveMQ
PRO
➤ Broker uses native Kafka
protocol
➤ Full MQTT Features can be used
➤ Bi-directional producing and
consumption possible
➤ High Scalability and resilience.
➤ Extreme throughput. Can write
hundreds of thousands of MQTT
messages per second to Kafka
➤ Ideal for aggregating MQTT
topics to Kafka Firehose Topics
➤ Can write to multiple Kafka
Deployments