IoT meets Big Data

IoT meets Big Data
รัฐศิลป์ รานอกภานุวัชร์, D.ENG

Keywords
• Big Data
• Internet of Things
• Streaming data processing
• IoT Big Data analytics
• Advanced machine learning
2

Big Data technology
Credit: https://www.xenonstack.com/blog/big-data-engineering/ingestion-processing-big-data-iot-stream/ 4

Internet of Things (IoT)
Credit: https://orzota.com/industrial-iot/
Software and
platform
VisualizationThings
5
Sensors & Actuators

IoT data characteristics
Large-Scale
Streaming Data
Heterogeneity
Time and space
correlation
High noise data
IoT
data
Fast computing and
advanced machine learning
techniques require for IoT
streaming data processing
and IoT bigdata analytics
Analytics requirement
IoT Applications support
High-speed data streams
and requiring real-time
or near real-time actions
Reference: M. Chen, S. Mao, Y. Zhang, and V. C. Leung, Big data: related technologies, challenges and future prospects. Springer, 2014

Things are Producing Streaming Data
7

Variety
Difference type of
Data
Velocity
Speed at which
Data is Generated
Veracity
Data Accuracy
“6V” for IoT Big Data
IoT Big Data
Volume
Size of Data
Variability
Dynamic Behavior In Data
Source coz dataflow rate
Value
Useful Data
8

New class of analytics “Fast and streaming data analytics”
IoT data
‘6V’
Streaming
processing
Advanced
machine
learning
Fast distributed
computing
9

IoT Big Data Architecture
Filtering
Analytics
Ingestion Data
Source: https://mapr.com/blog/ml-iot-connected-medical-devices/ 10

How to design a Streaming Analytics Solution?
12

How to design a Streaming Analytics System?
It usually starts very simple … just one data pipeline
13

New Event Stream sources are added…
14

New Processors are interested in the events …
15

… and the solution becomes the problem
16

… and the solution becomes the problem
17

Decouple event streams from consumers
data pipeline
18

Apache Kafka
A distributed streaming platform
19

Messaging Systems: Publish/Subscribe
Producer Consumer
Producer
Consumer
Topic 1 Topic 2
Topic 3
subscribe
publish(topic, msg)
Publish subscribe
system
msg
msg
20

Before: How to integrate this variety of data and make it available to all products?
▪ LinkedIn grew to have dozens of data systems and data repositories.
▪ LinkedId described their point-to-point data pipelines like;
The first presentation for Kafka Meetup @ Linkedin (Bangalore) held on 2015/12/5 21

After
▪ Kafka was crated to server as centralized online data pipelining system:
▪ Elastically scalable
▪ Durable
▪ High-throughput
▪ Fast
22

Why must be concerned
▪ Over 1,300,000,000,000 messages are transported via Kafka every
day at LinkedIn
▪ 300 Terabytes of inbound and 900 Terabytes of outbound traffic
▪ 4.5 Million messages per second, on single cluster
▪ Kafka runs on around 1300 servers at LinkedIn
Newsfeed Recommendation Metrics and Monitoring23

A few important characteristics
Fast
◦ Kafka can handle hundreds of megabytes of reads and writes per second from a
large number of clients.
◦ Designed for real time activity streaming.
Distributed and highly scalable
◦ Kafka has a cluster-centric design offers strong durability and fault-tolerance
guarantees.
◦ Messages partitioning spread over a cluster of machines
Durable
◦ Message persisted to disk and replicated within cluster to prevent data loss.
◦ Each broker can handle terabytes of messages without performance impact

Kafka architecture: Broker, Topics, Producers,
and Consumers
26
Kafka Cluster is made up of multiple Kafka Brokers

Kafka Zookeeper Coordination
Producer
Consumer
Producer
Broker Broker Broker Broker
Consumer
ZK
27

Apache Kafka - Architecture
Producer
Consumer
29

Apache Kafka - Architecture
Producer
Consumer
30

Apache Kafka
Producer
Consumer
31

Kafka Single Node Example
DOWNLOAD LATEST VERSION FROM HTTPS://KAFKA.APACHE.ORG/DOWNLOADS

Run ZooKeeper
Wait about 30 seconds or so for ZooKeeper to startup.
34

Run Kafka Server (Broker)
Wait about 30 seconds or so for Kafka to startup.
35

Create Kafka Topic
• We create a topic called my-topic with a replication factor of 1 since we only have one server.
• We will use 13 partitions for my-topic, which means we could have up to 13 Kafka consumers.
36

Run Kafka Producer
• Notice that we specify the Kafka node which is running at localhost:9092..
• Next run start-producer-console.sh and send at least four messages
37

Run Kafka Consumer
Notice that we specify the Kafka node which is running at localhost:9092 like
we did before, but we also specify to read all of the messages from my-topic
from the beginning —from-beginning.
38

Running Kafka Producer and Consumer
• Notice that the messages are not coming in order.
• This is because we only have one consumer so it is reading the messages from all 13
partitions.
• Order is only guaranteed within a partition.
39

IoT Big Data Streaming processing patterns
Events Events
Events
Real-time
applications
Long term
storage
Real-time
dashboards
Source: Streaming Big Data on Azure with HDInsight Kafka, Storm and Spark by Raghav Mohan Program Manager Azure HDInsight

Example
Source: https://www.scnsoft.com/blog/salesforce-iot-cloud-benefits-and-limitations 42

IoT Big Data Architecture
Filtering
Analytics
Ingestion Data
Source: https://mapr.com/blog/ml-iot-connected-medical-devices/ 44

Source: https://cybrml.com/2017/01/23/ml-in-cs-4-machine-learning-technical-review/ 46

Machine Learning in IoT Applications
Source : https://medium.com/iotforall/using-deep-learning-processors-for-intelligent-iot-devices-1a7ed9d2226d 47

Dataset
48Reference : Deep Learning for IoT Big Data and Streaming Analytics: A Survey

Disadvantages of Pure Cloud Service Model
o Unpredictable response time from cloud server to endpoints
o Unreliable cloud connections can bring down the service
o Excessive data can overburden infrastructure
o Privacy issues when sensitive customer data are stored in the cloud
o Difficulties in scaling to ever increasing number of sensors and actuators
49

Fog computing for IoT
• Bringing computing and analytics closer to the end-users/devices to remove unnecessary and
prohibitive communication delays (saves on transmissions costs).
• It can receive, process and react in real time to the incoming data.
50

Ex. Fog computing + Kafka
https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/UCS_CVDs/Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_
Cloudera_and_Apache_Spark.html 51

Case study #1
REFERENCE: HTTPS://MAPR.COM/BLOG/ML-IOT-CONNECTED-MEDICAL-DEVICES/
52

Streaming machine-learning application to detect
anomalies in data from a heart monitor
◦ Cheaper sensors that can monitor vital signs combined with machine learning, are making it
possible for doctors to rapidly apply smart medicine to their patients’ cases.
electrocardiogram (ECG)
53

Building the Model with Clustering
Heartbeats activity: normal EKG pattern
we use this repeating pattern to train a model on
previous heartbeat activity and then compare
subsequent observations to this model in order to
evaluate anomalous behavior.
To build a model of typical heartbeats activity, we process an
EKG (based on a specific patient or a group of many patients),
break it into overlapping pieces that are about 1/3 sec long, and
then apply a clustering algorithm to group similar shapes.
The k-means algorithm
54

Apache Spark processing with k-means
55

Results in a catalog of shapes
It can be used for reconstructing
what an EKG should look like.
56

Using the Model of Normal with Streaming Data
57

Detecting Anomalies
The difference between the observed and expected EKG (the green minus the red) is
the reconstruction error, or residual (shown in yellow). If the residual is high, then
there could be an anomaly.
58

Case study #2
REFERENCE:
การประชุมวิชาการระดับประเทศด้านเทคโนโลยีสารสนเทศ (NATIONAL CONFERENCE ON
INFORMATION TECHNOLOGY: NCIT) ครั้ง ที่ 10 24-25 ตุลาคา 2561
60

โรงเรือนผักไฮโดรโปรนิกส์อัตโนมัติโดยใช้เทคโนโลยี IoT และ
Machine learning
Internet
Camera
Amazon S3
Small class Medium class Large class
61

การวิเคราะห์การเติบโตผัก แบ่ง3 class
Small Medium Large
✓ ในการทาโมเดล เราจะทาการเทรนชุดข้อมูล class ละ 300 รูป
✓ เฟรมเวิร์ก Caffe โมเดล CNNs และ SDK ของ Intel deep learning training
tool ในการพัฒนาโมเดล ที่ติดตั้งบน AWS Cloud
62

ขั้นตอนการทางาน
Camera Module
ชุดข้อมูล class ละ 300 รูป
Predict Class
CNNs
CNNs = Convolutional Neural Network

ผลการทดสอบโมเดล
64

Profile ผักสาหรับควบคุมอัตโนมัติ 3 class
ตั แปร ค ค ม ม ย
Temp อง C อ มิ ยในโรงเรือน
Hum % ค มชนในอ ก ยในโรงเรือน
Lux Lux ค มเ ้มแสง ยในโรงเรือน
Fan On/Off ก รปิดปิด ัดลม
Silent On/Off ก รเปิดปิดม น ร งแสง
Water On/Off ก รเปิดปิดปัมน
Cool On/Off ก รเปิดปิดปัมน ไ ลผ นแผงรังผง
Foggy On/Off ก รเปิดปิด ั น มอก

Challenges and Future Directions
o Lack of Large IoT Dataset
o more data is needed to achieve more accuracy
o Preprocessing
o more complex since the system deals with data from different sources that may have various formats
o Secure and Privacy Preserving Machine Learning
o developing further techniques to defend and prevent the effect of this sort of attacks on models is
necessary for reliable IoT applications.
o Machine Learning for IoT Devices
o consider the requirements of handling Machine learning in resource-constrained devices
66

IoT meets Big Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to IoT meets Big Data

Similar to IoT meets Big Data (20)

Recently uploaded

Recently uploaded (20)

IoT meets Big Data