The document discusses ingesting and processing IoT data using Kafka, MQTT, Kafka Connect, and KSQL. It begins with an introduction and overview of reference architectures. It then demonstrates streaming IoT logistics data from devices to Kafka using MQTT, the MQTT Connector, and MQTT Proxy. It shows how to analyze streaming data with KSQL, including creating streams and tables, running queries, and creating new streams with SELECT statements. The goal is to provide a complete solution for ingesting, routing, and analyzing IoT data in real-time and at scale.
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Ingesting and Processing IoT Data -
using MQTT, Kafka Connect and KSQL
Guido Schmutz
Kafka Summit 2018 – 16.10.2018
@gschmutz guidoschmutz.wordpress.com
2. Guido Schmutz
Working at Trivadis for more than 21 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
3. Agenda
1. Introduction
2. IoT Logistics use case – Kafka Ecosystem "in Action”
3. Stream Data Integration – IoT Device to Kafka over MQTT
4. Stream Analytics with KSQL
5. Summary
5. Hadoop Clusterd
Hadoop Cluster
Big Data
Reference Architecture for Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
D
ata
Flow
D
ata
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
Results
SQL
Export
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Search
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
6. Hadoop Clusterd
Hadoop Cluster
Big Data
Reference Architecture for Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
D
ata
Flow
D
ata
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
SQL
Export
Microservice State
{ }
API
Event
Stream
Event
Stream
Search
Service
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
Stream
Processor
State
{ }
API
Stream Analytics
Results
DB
7. Two Types of Stream Processing
(from Gartner)
Stream Data Integration
• Primarily cover streaming ETL
• Integration of data source and data sinks
• Filter and transform data
• (Enrich data)
• Route data
Stream Analytics
• analytics use cases
• calculating aggregates and detecting
patterns to generate higher-level, more
relevant summary information (complex
events => used to be CEP)
• Complex events may signify threats or
opportunities that require a response
8. Stream Integration and Stream Analytics with Kafka
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
9. Stream Data Integration and Stream Analytics with
Kafka
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
10. Hadoop Clusterd
Hadoop Cluster
Big Data
Unified Architecture for Modern Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
D
ata
Flow
D
ata
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
Results
SQL
Export
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Search
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
11. Various IoT Data Protocols
• MQTT (Message Queue Telemetry Transport)
• CoaP
• AMQP
• DDS (Data Distribution Service)
• STOMP
• REST
• WebSockets
• …
13. Demo - IoT Logistics Use Case
Trucks are sending driving info and geo-position
data in one single message
Position &
Driving Info
Testdata-Generator originally by Hortonworks
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
{
"timestamp":1537343400827,
"truckId":87,
"driverId":13,
"routeId":987179512,
"eventType":"Normal",
"latitude":38.65,
"longitude":-90.21,
"correlationId":"-32087002637”
}
?
16. (I) IoT Device sends data via MQTT
Message Queue Telemetry Transport (MQTT)
Pub/Sub architecture with Message Broker
Built in retry / QoS mechanism
Last Will and Testament (LWT)
Not all MQTT brokers are scalable
Available
Does not provide state (history)
truck/nn/
position
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
18. IoT Device sends data via MQTTs – how to get the data
into Kafka?
truck
position
truck/nn/
position
?
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
19. 2 Ways for MQTT with Confluent Streaming Platform
Confluent MQTT Connector (Preview)
• Pull-based
• integrate with (existing) MQTT servers
• can be used both as a Source and Sink
• output is an envelope with all of the
properties of the incoming message
• Value: body of MQTT message
• Key: is the MQTT topic the message was
written to
• Can consume multiple MQTT topics and write to
one single Kafka topic
• RegexRouter SMT can be used to change topic
names
Confluent MQTT Proxy
• Push-based
• enables MQTT clients to use the MQTT
protocol to publish data directly to Kafka
• MQTT Proxy is stateless and independent
of other instances
• simple mapping scheme of MQTT topics to
Kafka topics based on regular expressions
• reduced lag in message publishing
compared to traditional MQTT brokers
20. (II) MQTT to Kafka using Confluent MQTT Connector
truck/nn/
position
mqtt to
kafka
truck_position kafkacat
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
21. Confluent MQTT Connector
Currently available as a Preview on Confluent Hub
Setup plugin.path to specify the additional folder
confluent-hub install confluentinc/kafka-connect-mqtt:1.0.0-preview
plugin.path=/usr/share/java,/etc/kafka-connect/custom-plugins,
/usr/share/confluent-hub-components
25. MQTTProxy
MQTT Connector vs. MQTT Proxy
MQTT Connector
• Pull-based
• Use existing MQTT infrastructures
• Bi-directional
MQTT Proxy
• Push-based
• Does not provide all MQTT functionality
• Only uni-directional
Position
Position
Position
truck/nn/
driving info
mqtt to
kafka
truck
driving info
truck/nn/
position
mqtt to
kafka
truck
position
Position
Position
Position
truck/nn/
driving info
mqtt to
kafka
truck/nn/
position
mqtt to
kafka
Position
Position
Position
truck
driving info
truck
position
Position
Position
Position
REGION-1 DC
REGION-2 DC
REGION-1 DC
REGION-2 DC
Headquarter DC
Headquarter DC
26. (IV) MQTT to Kafka using StreamSets Data Collector
truck/nn/
position
mqtt to
kafka
truck_position
console
consumer
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
28. MQTT
Proxy
Wait … there is more ….
truck/nn/
position
mqtt to
kafka
truck_driving
info
truck_position
console
consumer
what about some
analytics ?
console
consumer
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
Position &
Driving Info
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
31. KSQL - Terminology
Stream
• “History”
• an unbounded sequence of structured data
("facts")
• Facts in a stream are immutable
• new facts can be inserted to a stream
• existing facts can never be updated or
deleted
• Streams can be created from a Kafka topic
or derived from an existing stream
Table
• “State”
• a view of a stream, or another table, and
represents a collection of evolving facts
• Facts in a table are mutable
• new facts can be inserted to the table
• existing facts can be updated or deleted
• Tables can be created from a Kafka topic or
derived from existing streams and tables
Enables stream processing with zero coding required
The simplest way to process streams of data in real-time
32. (V) Create STREAM on truck_position and use it in
KSQL CLI
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
KSQL CLI
33. Create a STREAM on truck_driving_info
ksql> CREATE STREAM truck_driving_info_s
(ts VARCHAR,
truckId VARCHAR,
driverId BIGINT,
routeId BIGINT,
eventType VARCHAR,
latitude DOUBLE,
longitude DOUBLE,
correlationId VARCHAR)
WITH (kafka_topic='truck_driving_info',
value_format=‘JSON');
Message
----------------
Stream created
34. Create a STREAM on truck_driving_info
ksql> describe truck_position_s;
Field | Type
---------------------------------
ROWTIME | BIGINT
ROWKEY | VARCHAR(STRING)
TS | VARCHAR(STRING)
TRUCKID | VARCHAR(STRING)
DRIVERID | BIGINT
ROUTEID | BIGINT
EVENTTYPE | VARCHAR(STRING)
LATITUDE | DOUBLE
LONGITUDE | DOUBLE
CORRELATIONID | VARCHAR(STRING)
35. KSQL - SELECT
Selects rows from a KSQL stream or table
Result of this statement will not be persisted in a Kafka topic and will only be printed out
in the console
from_item is one of the following: stream_name, table_name
SELECT select_expr [, ...]
FROM from_item
[ LEFT JOIN join_table ON join_criteria ]
[ WINDOW window_expression ]
[ WHERE condition ]
[ GROUP BY grouping_expression ]
[ HAVING having_expression ]
[ LIMIT count ];
37. (VI) – CREATE AS … SELECT …
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
Dangerous-
driving
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
38. CREATE STREAM … AS SELECT …
Create a new KSQL table along with the corresponding Kafka topic and stream the
result of the SELECT query as a changelog into the topic
WINDOW clause can only be used if the from_item is a stream
CREATE STREAM stream_name
[WITH ( property_name = expression [, ...] )]
AS SELECT select_expr [, ...]
FROM from_stream [ LEFT | FULL | INNER ]
JOIN [join_table | join_stream]
[ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria
[ WHERE condition ]
[PARTITION BY column_name];
39. INSERT INTO … AS SELECT …
Stream the result of the SELECT query into an existing stream and its underlying topic
schema and partitioning column produced by the query must match the stream’s
schema and key
If the schema and partitioning column are incompatible with the stream, then the
statement will return an error
stream_name and from_item must both
refer to a Stream. Tables are not supported!
CREATE STREAM stream_name ...;
INSERT INTO stream_name
SELECT select_expr [., ...]
FROM from_stream
[ WHERE condition ]
[ PARTITION BY column_name ];
40. CREATE AS … SELECT …
ksql> CREATE STREAM dangerous_driving_s
WITH (kafka_topic= dangerous_driving_s',
value_format='JSON')
AS SELECT * FROM truck_position_s
WHERE eventtype != 'Normal';
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_s;
1539712399201 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe
following distance | 38.65 | -90.21 | -6187001306629414077
1539712416623 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe
following distance | 39.1 | -94.59 | -6187001306629414077
1539712430051 | truck/18/position | null | 18 | 25 | 987179512 | Lane
Departure | 35.1 | -90.07 | -6187001306629414077
41. Windowing
streams are unbounded
need some meaningful time frames to do
computations (i.e. aggregations)
Computations over events done using
windows of data
Windows are tracked per unique key
Fixed Window Sliding Window Session Window
Time
Stream of Data Window of Data
42. (VII) Aggregate and Window
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
Dangerous-
driving
count_by_
eventType
Table
Dangergous-
driving-count
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
43. SELECT COUNT … GROUP BY
ksql> CREATE TABLE dangerous_driving_count AS
SELECT eventType, count(*) nof
FROM dangerous_driving_s
WINDOW TUMBLING (SIZE 30 SECONDS)
GROUP BY eventType;
Message
----------------------------
Table created and running
ksql> SELECT TIMESTAMPTOSTRING(ROWTIME, 'yyyy-MM-dd HH:mm:ss.SSS’),
eventType, nof
FROM dangerous_driving_count;;
2018-10-16 05:12:19.408 | Unsafe following distance | 1
2018-10-16 05:12:38.926 | Unsafe following distance | 1
2018-10-16 05:12:39.615 | Unsafe tail distance | 1
2018-10-16 05:12:43.155 | Overspeed | 1
44. Joining
Stream to Static (Table) Join Stream to Stream Join (one window join)
Stream to Stream Join (two window join)
Stream-to-
Static Join
Stream-to-
Stream
Join
Stream-to-
Stream
Join
TimeTime
Time
45. (VIII) – Join Table to enrich with Driver data
Truck
Driver
kdbc-to-
kafka
truck-
driver
27, Walter, Ward, Y,
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
Dangerous-
driving & driver
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
Dangerous-
driving
count_by_
eventType
Table
Dangergous-
driving-count
{"id":27,"firstName":"Walter","lastName":"W
ard","available":"Y","birthdate":"24-JUL-
85","last_update":1506923052012}
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
47. Create Table with Driver State
ksql> CREATE TABLE driver_t
(id BIGINT,
first_name VARCHAR,
last_name VARCHAR,
available VARCHAR)
WITH (kafka_topic='truck_driver',
value_format='JSON',
key='id');
Message
----------------
Table created
48. Create Table with Driver State
ksql> CREATE STREAM dangerous_driving_and_driver_s
WITH (kafka_topic='dangerous_driving_and_driver_s',
value_format='JSON’, partitions=8)
AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype,
latitude, longitude
FROM truck_position_s
LEFT JOIN driver_t
ON dangerous_driving_and_driver_s.driverId = driver_t.id;
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_and_driver_s;
1539713095921 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Lane Departure |
39.01 | -93.85
1539713113254 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Unsafe following
distance | 39.0 | -93.65
49. (IX) – Custom UDF for calculating Geohash
Truck
Driver
kdbc-to-
kafka
truck-
driver
27, Walter, Ward, Y,
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
Dangerous-
driving & driver
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
Dangerous-
driving
count_by_
eventType
Table
Dangergous-
driving-count
{"id":27,"firstName":"Walter","lastName":"W
ard","available":"Y","birthdate":"24-JUL-
85","last_update":1506923052012}
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
dangerous
driving by geo
Stream
dangerous-
drving-geohash
50. Custom UDF for calculating Geohashes
Geohash is a geocoding which encodes a
geographic location into a short string of letters
and digits
hierarchical spatial data structure which
subdivides space into buckets of grid shape
Length Area width x height
1 5,009.4km x 4,992.6km
2 1,252.3km x 624.1km
3 156.5km x 156km
4 39.1km x 19.5km
5 39.1km x 19.5km
12 3.7cm x 1.9cm
ksql> SELECT latitude, longitude,
geohash(latitude, longitude, 4)
FROM dangerous_driving_s;
38.31 | -91.07 | 9yz1
37.7 | -92.61 | 9ywn
34.78 | -92.31 | 9ynm
42.23 | -91.78 | 9zw8xw
...
http://geohash.gofreerange.com/
51. Add an UDF sample
Geohash and join to some important messages for drivers
@UdfDescription(name = "geohash",
description = "returns the geohash for a given LatLong")
public class GeoHashUDF {
@Udf(description = "encode lat/long to geohash of specified length.")
public String geohash(final double latitude, final double longitude,
int length) {
return GeoHash.encodeHash(latitude, longitude, length);
}
@Udf(description = "encode lat/long to geohash.")
public String geohash(final double latitude, final double longitude) {
return GeoHash.encodeHash(latitude, longitude);
}
}
53. Summary
Two ways to bring in MQTT data => MQTT Connector or MQTT Proxy
KSQL is another way to work with data in Kafka => you can (re)use some of your SQL
knowledge
• Similar semantics to SQL, but is for queries on continuous, streaming data
Well-suited for structured data (there is the "S" in KSQL)
There is more
• Stream to Stream Join
• REST API for executing KSQL
• Avro Format & Schema Registry
• Using Kafka Connect to write results to data stores
• …
54. Choosing the Right API
• Java, c#, c++, scala,
phyton, node.js,
go, php …
• subscribe()
• poll()
• send()
• flush()
• Anything Kafka
• Fluent Java API
• mapValues()
• filter()
• flush()
• Stream Analytics
• SQL dialect
• SELECT … FROM …
• JOIN ... WHERE
• GROUP BY
• Stream Analytics
Consumer,
Producer API
Kafka Streams KSQL
• Declarative
• Configuration
• REST API
• Out-of-the-box
connectors
• Stream Integration
Kafka Connect
Flexibility Simplicity
Source: adapted from Confluent
55. Technology on its own won't help you.
You need to know how to use it properly.