SlideShare a Scribd company logo
1 of 46
Download to read offline
BEST PRACTICES FOR STREAMING
IOT DATA TO APACHE KAFKA®
Dominik Obermaier | CTO @ HiveMQ
Numberofusersinmillions
Numberofusersinmillions
PEOPLE ON THE INTERNET
Source IHS © 2016 IHS
DEVICES ON THE INTERNET
Key Industry Trend:
IoT & Connectivity
Introduction
• HiveMQ CTO
• Strong background in distributed
and large scale systems
architecture
• OASIS MQTT TC Member
• Author of “The Technical
Foundations of IoT”
• Conference Speaker
• Program committee member for
German and international IoT
conferences
Dominik
Obermaier
@dobermai
IOT CHALLENGES
➤ Unreliable communication channels (e.g.
mobile)
➤ Constrained Devices
➤ Low Bandwidth and High Latency
environments
➤ Bi-directional communication required
➤ Security
➤ Instantaneous data exchange
Millions of Devices
Meanwhile…
KAFKA
FOR IOT?
KAFKA STRENGTHS
➤ Optimized to stream data between
systems and applications in a scalable
manner
➤ Scale-out with multiple topics and
partitions and multiple nodes
➤ Perfect for system communication inside
trusted network and limited producers
and consumers
KAFKA ALONE IS NOT OPTIMAL
➤  For IoT use cases where devices are
connected to the data center or cloud
over the public Internet as first point of
contact
➤ If you attempt to stream data from
thousands or even millions of devices
using Kafka over the Internet
KAFKA CHALLENGES
➤ Kafka Clients need to address
Kafka brokers directly, which is
not possible with load
balancers
IOT REALITY
➤ Clients are connected over the
Internet
➤ Load Balancers are used as first
line of defense
➤ IP addresses of infrastructure
(e.g. Kafka nodes) not exposed
to the public Internet
➤ Load Balancers effectively act
as proxy
KAFKA CHALLENGES
➤ Kafka is hard to scale to
multiple hundreds of
thousands or millions of topics
IOT REALITY
➤ IoT devices typically are
segmented to use individual
topics
➤ Individual topics very often
contain data like unique device
identifier
➤ Multiple millions of topics can
be used in a single IoT scenario
➤ Ideal for security as it’s possible
to restrict devices to only
produce and consume for
specific topics
➤ Topics are usually dynamic
KAFKA CHALLENGES
➤ Kafka Clients are reasonable
complex by design (e.g. use
multiple TCP connections)
➤ Libraries optimized for
throughput
➤ APIs for Kafka libraries are
simple to use but the behavior
sometimes isn’t configurable
easily (e.g. async send()
method can block)
IOT REALITY
➤ IoT devices are typically very
constrained (computing power
and memory)
➤ Device programmer need very
simple APIs AND full flexibility
when it comes to library behavior
➤ Single IoT devices typically don’t
require lot of throughput
➤ Important to limit and
understand the number of TCP
connections, especially over the
Internet. Very often only one TCP
connection to the backend
desired
KAFKA CHALLENGES
➤ No on/off notification
mechanism
➤ No Keep-Alive mechanism
individual TCP connections for
producers
➤ Kafka Protocol for producers
rather heavyweight over the
Internet (lots of
communication)
IOT REALITY
➤ Features like on/off
notifications are often required
➤ Unreliable networks require
lightweight keep-alive
mechanisms for producers and
consumers (half-open
connections)
➤ Device communication over
the Internet requires minimal
communication overhead
WHAT IS MQTT?
➤ Most popular IoT Messaging Protocol
➤ Minimal Overhead
➤ Publish / Subscribe
➤ Easy
➤ Binary
➤ Data Agnostic
➤ Designed for reliable communication
over unreliable channels
➤ ISO Standard
USE CASES
➤ Push Communication
➤ Unreliable communication
channels (e.g. mobile)
➤ Constrained Devices
➤ Low Bandwidth and High Latency
environments
➤ Communication from backend to
IoT device
➤ Lightweight backen communication
Source: https://iot.eclipse.org/resources/iot-developer-survey/iot-developer-survey-2019.pdf; 1717 participants
Source: https://iot.eclipse.org/resources/iot-developer-survey/iot-developer-survey-2019.pdf; 1717 participants
Source: https://iot.eclipse.org/resources/iot-developer-survey/iot-developer-survey-2019.pdf; 1717 participants
PUBLISH / SUBSCRIBE
Scalable communication paradigm for the IoT
PUBLISH / SUBSCRIBE EXPLAINED
HiveMQ
MQTT broker built for enterprise applications
Extensive plugin system
Scales to > 10 million of concurrent connections
OSS Community Edition available
Built for High Availability and used by 150+
of the largest IoT deployments in the world
Confidential and Proprietary. Copyright © by dc-square GmbH. All Rights Reserved.
Enterprise MQTT
Devices HiveMQ Enterprise
unreliable
network
Protocol
Integration
Enterprise Systems
• Kafka
• OAuth Server
• …
Kubernetes, Docker, OpenShift
Public or private cloud (AWS, MS Azure…) or on-premise
Backend
HiveMQ MQTT Client
Java based MQTT library
Developed by HiveMQ and BMW Car-IT
Built for devices and backends
Open Source (Apache 2)
Extremely fast and low overhead
➤https://www.hivemq.com/
benchmark-10-million/
HOW TO
USE KAFKA
FOR IOT?
KAFKA CONNECT
https://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines/
CON
➤ Doesn’t scale well as Kafka acts
as MQTT client.
➤ MQTT Client are not designed
for extremely large amounts of
MQTT messages
➤ Centralizes business and
message transformation logic
PRO
➤ Ideal if you don’t control MQTT
Broker or use third party MQTT
broker
MQTT PROXY
CON
➤ Does not implement the full
MQTT ISO Standard
➤ Does not support features
unique to MQTT which were
designed for IoT use cases
(LWT, retained messages, …)
➤ Client must ensure that no
MQTT features beside plain
publishing are used
➤ Tightly coupled with Kafka,
so it does not allow the Pub/
Sub features of MQTT
PRO
➤ Does not require a MQTT
brokers
➤ Usually stateless, which makes
scaling easier
MQTT CUSTOM BRIDGE
CON
➤ Extremely hard to avoid
message loss
➤ Developers must take care of
fault tolerance themselves
➤ Another system which adds
complexity and adds no
value
PRO
➤ Application used for
transposing MQTT to Kafka
and vice versa built by
developers themselves, so no
external component
MQTT BROKER EXTENSION
CON
➤ Extension only available for
HiveMQ
PRO
➤ Broker uses native Kafka
protocol
➤ Full MQTT Features can be used
➤ Bi-directional producing and
consumption possible
➤ High Scalability and resilience.
➤ Extreme throughput. Can write
hundreds of thousands of MQTT
messages per second to Kafka
➤ Ideal for aggregating MQTT
topics to Kafka Firehose Topics
➤ Can write to multiple Kafka
Deployments
+ = ❤
Dominik
Obermaier
dominik.obermaier@dc-square.de
@dobermai
Get in touch
HiveMQ -
Enterprise MQTT Broker
www.hivemq.com
@hivemq

More Related Content

More from Dominik Obermaier

Bringing M2M to the web with Paho: Connecting Java Devices and online dashboa...
Bringing M2M to the web with Paho: Connecting Java Devices and online dashboa...Bringing M2M to the web with Paho: Connecting Java Devices and online dashboa...
Bringing M2M to the web with Paho: Connecting Java Devices and online dashboa...
Dominik Obermaier
 

More from Dominik Obermaier (17)

HiveMQ Webinar: Lightweight and scalable IoT Messaging with MQTT
HiveMQ Webinar: Lightweight and scalable IoT Messaging with MQTTHiveMQ Webinar: Lightweight and scalable IoT Messaging with MQTT
HiveMQ Webinar: Lightweight and scalable IoT Messaging with MQTT
 
A pure Java MQTT Stack for IoT
A pure Java MQTT Stack for IoTA pure Java MQTT Stack for IoT
A pure Java MQTT Stack for IoT
 
Lightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTTLightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTT
 
Lightweight and scalable IoT Messaging with MQTT
Lightweight and scalable IoT Messaging with MQTTLightweight and scalable IoT Messaging with MQTT
Lightweight and scalable IoT Messaging with MQTT
 
In search of the perfect IoT Stack - Scalable IoT Architectures with MQTT
In search of the perfect IoT Stack - Scalable IoT Architectures with MQTTIn search of the perfect IoT Stack - Scalable IoT Architectures with MQTT
In search of the perfect IoT Stack - Scalable IoT Architectures with MQTT
 
MQTT 5 - What's New?
MQTT 5 - What's New?MQTT 5 - What's New?
MQTT 5 - What's New?
 
Scaling MQTT - Webinar with Elastic Beam
Scaling MQTT - Webinar with Elastic BeamScaling MQTT - Webinar with Elastic Beam
Scaling MQTT - Webinar with Elastic Beam
 
MQTT Deep Dive Workshop [GERMAN]
MQTT Deep Dive Workshop [GERMAN]MQTT Deep Dive Workshop [GERMAN]
MQTT Deep Dive Workshop [GERMAN]
 
Securing MQTT - BuildingIoT 2016 slides
Securing MQTT - BuildingIoT 2016 slidesSecuring MQTT - BuildingIoT 2016 slides
Securing MQTT - BuildingIoT 2016 slides
 
An introduction to MQTT - Pub / Sub for the masses
An introduction to MQTT - Pub / Sub for the massesAn introduction to MQTT - Pub / Sub for the masses
An introduction to MQTT - Pub / Sub for the masses
 
Pub/Sub for the masses- Ein Einführungsworkshop in MQTT [GERMAN]
Pub/Sub for the masses- Ein Einführungsworkshop in MQTT [GERMAN]Pub/Sub for the masses- Ein Einführungsworkshop in MQTT [GERMAN]
Pub/Sub for the masses- Ein Einführungsworkshop in MQTT [GERMAN]
 
IoT with MQTT and Paho for Webpages - Eclipse Democamp München 2014
IoT with MQTT and Paho for Webpages - Eclipse Democamp München 2014IoT with MQTT and Paho for Webpages - Eclipse Democamp München 2014
IoT with MQTT and Paho for Webpages - Eclipse Democamp München 2014
 
JAX 2014 - M2M for Java Developers with MQTT
JAX 2014 - M2M for Java Developers with MQTTJAX 2014 - M2M for Java Developers with MQTT
JAX 2014 - M2M for Java Developers with MQTT
 
Push! - MQTT for the Internet of Things
Push! - MQTT for the Internet of ThingsPush! - MQTT for the Internet of Things
Push! - MQTT for the Internet of Things
 
Eclipse Democamps 2013 - M2M for Java Developers with MQTT
Eclipse Democamps 2013 - M2M for Java Developers with MQTTEclipse Democamps 2013 - M2M for Java Developers with MQTT
Eclipse Democamps 2013 - M2M for Java Developers with MQTT
 
Bringing M2M to the web with Paho: Connecting Java Devices and online dashboa...
Bringing M2M to the web with Paho: Connecting Java Devices and online dashboa...Bringing M2M to the web with Paho: Connecting Java Devices and online dashboa...
Bringing M2M to the web with Paho: Connecting Java Devices and online dashboa...
 
M2M for Java Developers: MQTT with Eclipse Paho - Eclipsecon Europe 2013
M2M for Java Developers: MQTT with Eclipse Paho - Eclipsecon Europe 2013M2M for Java Developers: MQTT with Eclipse Paho - Eclipsecon Europe 2013
M2M for Java Developers: MQTT with Eclipse Paho - Eclipsecon Europe 2013
 

Recently uploaded

Recently uploaded (20)

Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Buy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxBuy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptx
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 

Best Practices for Streaming IoT Data to Apache Kafka®

  • 1. BEST PRACTICES FOR STREAMING IOT DATA TO APACHE KAFKA® Dominik Obermaier | CTO @ HiveMQ
  • 4. Source IHS © 2016 IHS DEVICES ON THE INTERNET
  • 5. Key Industry Trend: IoT & Connectivity
  • 6. Introduction • HiveMQ CTO • Strong background in distributed and large scale systems architecture • OASIS MQTT TC Member • Author of “The Technical Foundations of IoT” • Conference Speaker • Program committee member for German and international IoT conferences Dominik Obermaier @dobermai
  • 7. IOT CHALLENGES ➤ Unreliable communication channels (e.g. mobile) ➤ Constrained Devices ➤ Low Bandwidth and High Latency environments ➤ Bi-directional communication required ➤ Security ➤ Instantaneous data exchange
  • 10.
  • 11.
  • 12.
  • 14. KAFKA STRENGTHS ➤ Optimized to stream data between systems and applications in a scalable manner ➤ Scale-out with multiple topics and partitions and multiple nodes ➤ Perfect for system communication inside trusted network and limited producers and consumers
  • 15. KAFKA ALONE IS NOT OPTIMAL ➤  For IoT use cases where devices are connected to the data center or cloud over the public Internet as first point of contact ➤ If you attempt to stream data from thousands or even millions of devices using Kafka over the Internet
  • 16. KAFKA CHALLENGES ➤ Kafka Clients need to address Kafka brokers directly, which is not possible with load balancers IOT REALITY ➤ Clients are connected over the Internet ➤ Load Balancers are used as first line of defense ➤ IP addresses of infrastructure (e.g. Kafka nodes) not exposed to the public Internet ➤ Load Balancers effectively act as proxy
  • 17. KAFKA CHALLENGES ➤ Kafka is hard to scale to multiple hundreds of thousands or millions of topics IOT REALITY ➤ IoT devices typically are segmented to use individual topics ➤ Individual topics very often contain data like unique device identifier ➤ Multiple millions of topics can be used in a single IoT scenario ➤ Ideal for security as it’s possible to restrict devices to only produce and consume for specific topics ➤ Topics are usually dynamic
  • 18. KAFKA CHALLENGES ➤ Kafka Clients are reasonable complex by design (e.g. use multiple TCP connections) ➤ Libraries optimized for throughput ➤ APIs for Kafka libraries are simple to use but the behavior sometimes isn’t configurable easily (e.g. async send() method can block) IOT REALITY ➤ IoT devices are typically very constrained (computing power and memory) ➤ Device programmer need very simple APIs AND full flexibility when it comes to library behavior ➤ Single IoT devices typically don’t require lot of throughput ➤ Important to limit and understand the number of TCP connections, especially over the Internet. Very often only one TCP connection to the backend desired
  • 19. KAFKA CHALLENGES ➤ No on/off notification mechanism ➤ No Keep-Alive mechanism individual TCP connections for producers ➤ Kafka Protocol for producers rather heavyweight over the Internet (lots of communication) IOT REALITY ➤ Features like on/off notifications are often required ➤ Unreliable networks require lightweight keep-alive mechanisms for producers and consumers (half-open connections) ➤ Device communication over the Internet requires minimal communication overhead
  • 20.
  • 21. WHAT IS MQTT? ➤ Most popular IoT Messaging Protocol ➤ Minimal Overhead ➤ Publish / Subscribe ➤ Easy ➤ Binary ➤ Data Agnostic ➤ Designed for reliable communication over unreliable channels ➤ ISO Standard
  • 22. USE CASES ➤ Push Communication ➤ Unreliable communication channels (e.g. mobile) ➤ Constrained Devices ➤ Low Bandwidth and High Latency environments ➤ Communication from backend to IoT device ➤ Lightweight backen communication
  • 23.
  • 27. PUBLISH / SUBSCRIBE Scalable communication paradigm for the IoT
  • 28. PUBLISH / SUBSCRIBE EXPLAINED
  • 29. HiveMQ MQTT broker built for enterprise applications Extensive plugin system Scales to > 10 million of concurrent connections OSS Community Edition available Built for High Availability and used by 150+ of the largest IoT deployments in the world
  • 30. Confidential and Proprietary. Copyright © by dc-square GmbH. All Rights Reserved. Enterprise MQTT Devices HiveMQ Enterprise unreliable network Protocol Integration Enterprise Systems • Kafka • OAuth Server • … Kubernetes, Docker, OpenShift Public or private cloud (AWS, MS Azure…) or on-premise Backend
  • 31. HiveMQ MQTT Client Java based MQTT library Developed by HiveMQ and BMW Car-IT Built for devices and backends Open Source (Apache 2) Extremely fast and low overhead
  • 36. CON ➤ Doesn’t scale well as Kafka acts as MQTT client. ➤ MQTT Client are not designed for extremely large amounts of MQTT messages ➤ Centralizes business and message transformation logic PRO ➤ Ideal if you don’t control MQTT Broker or use third party MQTT broker
  • 38. CON ➤ Does not implement the full MQTT ISO Standard ➤ Does not support features unique to MQTT which were designed for IoT use cases (LWT, retained messages, …) ➤ Client must ensure that no MQTT features beside plain publishing are used ➤ Tightly coupled with Kafka, so it does not allow the Pub/ Sub features of MQTT PRO ➤ Does not require a MQTT brokers ➤ Usually stateless, which makes scaling easier
  • 40. CON ➤ Extremely hard to avoid message loss ➤ Developers must take care of fault tolerance themselves ➤ Another system which adds complexity and adds no value PRO ➤ Application used for transposing MQTT to Kafka and vice versa built by developers themselves, so no external component
  • 42. CON ➤ Extension only available for HiveMQ PRO ➤ Broker uses native Kafka protocol ➤ Full MQTT Features can be used ➤ Bi-directional producing and consumption possible ➤ High Scalability and resilience. ➤ Extreme throughput. Can write hundreds of thousands of MQTT messages per second to Kafka ➤ Ideal for aggregating MQTT topics to Kafka Firehose Topics ➤ Can write to multiple Kafka Deployments
  • 43.
  • 44.