Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Processing IoT Data with Apache Kafka

4,690 views

Published on

Apache Kafka is a distributed streaming platform that forms a key part of the infrastructure at many companies including Uber, Netflix and LinkedIn. In this talk, Matt gave a technical overview of Apache Kafka, discussed practical use cases of Kafka for IoT data and demonstrated how to ingest data from an MQTT server using Kafka Connect.

Published in: Software
  • Be the first to comment

Processing IoT Data with Apache Kafka

  1. 1. 1 Processing IoT Data with Apache Kafka Matt Howlett Confluent Inc.
  2. 2. 2 Pub Sub Messaging Protocol Pub Sub Messaging System (rethought as a distributed commit log) Distributed Streaming Platform ● Pub Sub Messaging ● Event Storage ● Processing Framework
  3. 3. 3 OBD-II Adapters
  4. 4. 4 Problem Statement Let’s build a system to: • Transport OBD-II data over unreliable links from cars to the data center • Capable of handling millions of devices* • Extract information from + respond to this data in (near) real time (at scale) • Handle surges in usage • Potential for ad-hoc historical processing * also less Architecture / technology / methods applicable to many scenarios.
  5. 5. 5 Publish / subscribe messaging protocol: • Built on top of TCP/IP • Features that make it well suited to poor connectivity / high latency scenarios • Lightweight • Efficient client implementations, low network overhead • MQTT-SN for non IP networks (’virtual connections’) • Many (open source) broker implementations • Mosquitto, RabbitMQ, HiveMQ, VerneMQ • Many Client Libraries • C, C++, Java, C#, Python, Javascript, websockets, Arduino … • Widely used (incl. phone apps!) • Oil pipeline sensor via satellite link • Facebook Messenger • AWS IoT MQTT Introduction
  6. 6. 6 • Simple API • Hierarchical topics • myhome/kitchen/door/front/battery/level • wildcard subscription: myhome/*/door/*/battery/level • 3 qualities of service (on both produce and consume) • At most once (QoS 0) • At least once (QoS 1) • Exactly once (QoS 2) [not universally supported] • Persistent consumer sessions • Important for QoS 1, QoS 2 • Last will and testament • Last known good value • Authorization, SSL/TLS MQTT Features
  7. 7. 7 • Device Id • GPS Location [lon, lat] • Ignition on / off • Speedometer reading • Timestamp • …plus a lot more Assume: data sent via 3G wireless connection at ~30 second interval OBD-II Data
  8. 8. 8 Deficiencies: • Single MQTT server can handle maybe ~100K connections • Can’t handle usage surges (no buffering) • No storage of events or reprocess capability MQTT Server 1 Processor 1 Processor 2 ... Ingest Architecture V1 topic: [deviceid]/obd
  9. 9. 9 MQTT Server Coordinator MQTT Server 1 MQTT Server 2 MQTT Server 3 MQTT Server 4 topic: [deviceid]/obd http / REST ... • Easily Shardable • Treat MQTT server as commodity service Ingest Architecture V2
  10. 10. 10 MQTT Server Coordinator MQTT Server 1 MQTT Server 2 MQTT Server 3 MQTT Server 4 topic: [deviceid]/obd Kafka Connect OBD_Data Stream processing kafka OBD -> MQTT -> Kafka
  11. 11. 11 Apache Kafka Distributed Streaming Platform: • Pub Sub Messaging • (typically clients are within data-center) • Data Store • Messages not deleted after delivery • Stream Processing • Low or high level libraries • Data re-processing
  12. 12. 12 Apache Kafka adoption spans companies across industries.
  13. 13. 13 ● Persisted ● Append only ● Immutable ● Delete earliest data based on time / size / never
  14. 14. 14 • Allows topics to scale past constraints of single server • Message → partition_id deterministic. Partition relevant to application. • Ordering guarantees per partition but not across partitions
  15. 15. 15 Apache Kafka Replication • cheap durability! • choose # acks for message produced confirmation
  16. 16. 16 Apache Kafka Consumer Groups partitions possibly across different brokers
  17. 17. 17 Kafka Connect • Use client library producers / consumers in custom applications. • Often want to bulk transfer data between standard systems: • Don’t re-invent the wheel – configure Kafka Connect • Narrow scope: move data into & out of Kafka • Off-the-shelf connectors • Fault Tolerant • Auto-balances load • Pluggable Serialization • Standalone and distributed modes of operation • Configuration / management via REST API
  18. 18. 18
  19. 19. 19 MQTT Connector https://github.com/evokly/kafka-connect-mqtt • Single Task • Single MQTT Broker • Source only Either: • Start a bunch of these connectors (in one connect cluster), one per server, or: • Implement a new multi-task connector, one task per MQTT broker. • Communicate with MQTT Controller
  20. 20. 20 • user_id • device_id • name • address • phone_number • speed_alert_level • ... SQL Db User_Info User Data
  21. 21. 21 Example: Car Towed Alert Detect movement of car when ignition off, send SMS alert kafka OBD_Data P1 OBD_Data P5 Consumer 1 Consumer 2 Broker 1 ... OBD_Data P3 OBD_Data P7 Broker 2 ... ... ... SMS Gateway Last loc. in mem KV store Last loc. in mem KV store User Info
  22. 22. 22 Consumer Implementation on_message(message m) { var device_id = m.key; var obd_data = m.value; if (obd_data.ignition_on) return; if (!kv_store.contains(device_id)) { kv_store.add(device_id, obd_data.lon_lat); return; } var prev_lon_lat = kv_store.get(device_id); var dist = calc_dist(obd_data.lon_lat, prev_lon_lat); kv_store.set(device_id, obd_data.lon_lat); if (dist > alert_max_dist) { // infrequent send_alert(SQL.get_phone_number(device_id)); } } • Message can be from any partition assigned to this consumer • Ordering guaranteed per partition, but not predictable across partitions • All messages from a particular device guaranteed to arrive at the same consumer instance
  23. 23. 23 Example: Speed Alert • Scenario: Parent wants to monitor son/daughter driving and be alerted if they exceed a specified speed. • In the Tow Alert example User_Info only needs to be queried in the event of an alert. • In this example, the table needs to be queried for every OBD data record in every partition. OBD_data [can update at any time] User Info table Not scalable! Cache? ... Highfrequency P1
  24. 24. 24 Time = 0 1 60 {device_id=1, speed_limit=60} Time = 1 1 60 {device_id=2, speed_limit=80} 2 80 Time = 2 1 60 {device_id=3, speed_limit=70} 2 80 3 70 Time = 3 1 80 {device_id=1, speed_limit=80} 2 80 3 70 Time = 4 1 80 {device_id=1, speed_limit=65} 2 80 3 70 Table can be represented as stream of updates device_id speed_limit Log compaction!
  25. 25. 25 Debezium Kafka Connector that turns database tables into streams of update records. debezium Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 ... MySQL User Info [key: userId] User_Info [changelog topic]Partition by device_id
  26. 26. 26 Stream / Table Join Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 7 ... Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 ... Consumer 1 Relevant subset of User_Info device_id speed_limit 1 80 3 70 User_Info [ChangeLog, compacted] OBD_Data [Record Stream] ... debezium key:device_id key:device_id
  27. 27. 27 Speed Alert: Message handler on_message(message m) { var device_id = m.key; var obd_data = m.value; var user_info = user_info_local.get(device_id); if (obd_data.speedometer > user_info.max_speed) { alert_user(device_id, user_info); } }
  28. 28. 28 MQTT Phone Client Connectivity MQTT Server Coordinator MQTT Server 1 MQTT Server 2 [deviceid]/alert ... Consumer 1 ... MQTT Server 3 ... [deviceid]/obd
  29. 29. 29 Speed Limit Alert: Rate limiting Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 7 ... app_state kafka topic • Prefer to rate limit on server to minimize network overhead. • Create new Kafka topic app_state, partitioned on device_id. • When alert triggered, store alert time in this topic. • [can use this topic as general store for other per device state info too] • Materialize this change-log stream on consumers as necessary.
  30. 30. 30 Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 7 ... Partition 1 Partition 2 Partition 3 ... Consumer 1 Relevant subset of User_Info ... OBD_Data [Record Stream] User_Info [ChangeLog, compacted] Partition 4 Partition 1 Partition 2 Partition 3 ... Partition 4 App_State [compacted] Relevant subset of App_State
  31. 31. 31 Example: Location Based Special Offers When Car enters specific region, send available special offers to the user’s phone. Require: • User_Info • Address – so we know whether they are local to their current location or not • App_state • Use to persist already sent offers • Special_Offer_Info • Table that store list of all special offers.
  32. 32. 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Regions • Regions may be simple (as depicted here) or complex • F(lon, lat) -> locationId. • Note: could also implement ride—share surge pricing using similar partitioning.
  33. 33. 33 Special Offer Change-log Stream debezium Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 ... MySQL Special Offer Info Special_Offers [changelog, compacted] Partition by location_id
  34. 34. 34 Multi-stage Data Pipeline OBD_Data App_State [offers already sent] User_Info [address] K: device_id V: OBD record consume enrich K: device_id V: OBD record address K: device_id V: OBD record Address offers_sent enrich
  35. 35. 35 Multi-stage Data Pipeline (continued) K: [device_id] V: OBD record Address offers_sent K: location_id V: OBD record Address offers_sent OBD_Data_By_Location P1 …… … Repartition by location_id P2 P1 P3 Data from given device will still all be on the same partition (except when region changes)
  36. 36. 36 Multi-stage Data Pipeline (continued) K: location_id V: OBD record Address offers_sent Special_Offers K: location_id V: OBD record address offers_sent available_offers re-partition enrich
  37. 37. 37 Multi-stage Data Pipeline (continued) Special offer available in location Special offer not already sent User address near location? MQTT Server filter filter filter ... [deviceId]/alert
  38. 38. 38
  39. 39. 39
  40. 40. 40 Discount code: kafcom17 Use the Apache Kafka community discount code to get $50 off www.kafka-summit.org Kafka Summit New York: May 8 Kafka Summit San Francisco: August 28 Presented by
  41. 41. 41 Thank You @matt_howlett @confluentinc

×