Analytics& IoT
Dr.Selvaraj Kesavan
selvarajkesavan@gmail.com
Agenda
2
 IoT and IIoT
 Data collection, storage, processing and visualization
 Data Analytics
 Cloud infrastructure and platform services for Analytics
 Architecture - Example
IOT & IIOT
 Billions of connected devices to server via network and deliver connected
industry solutions. The connectivity is just an enabler but the real value of
IoT is on data (business insight/data-driven economy).
 Use of smart sensors and actuators to enhance manufacturing and
industrial processes.
 Industry 4.0 focusses on the interconnectedness of machines and systems
to improve operational efficiency and productivity
3
IoT- Key Technology Enablers
4
(1) Cloud Computing
(2) Big Data and Analytics
(3) Web 2.0 and 3.0
(4) Evolution of high speed communication technologies
Technology Landscape
5
Industry verticals -
Dashboard
Platform and
services
Protocols and
Communication
Sensors ,Devices and
Gateway
Light
Sensor
Voltage
Sensor
Temp/
humidity
Vibration
Sensor
Ultrasonic
Sensor
Gas
Sensor
BLE
Sensor
GPS
Pi 3
gateway
Gateway
PLC
• Username/
Password
• API Security
• Data At
Transit
• Data At Rest
• Firewall
• DoS
prevention
• Certificates/
Encryption
• Policies
• SSO/MFA
Security
IOT- Sensors to Application
Beacons
Industrial
plants
Sensors
Senor/Machine
Parameters
Device Gateway
Edge
Analytics
Platform
Agent
Sensor
Data
Agent
IoT & Cloud platform
AnalyzeStore
Application
Alert
Data
Visualization
 Edge Devices/Sensors source for the real time data .
 Device gateway collects data from multiple edge devices, filter aggregate and ingest the data to the cloud
platform for further processing and analyzing
 IoT platform enable device onboarding, data ingestion, device to cloud and cloud to device communication
 Cloud platform receive data, store, process and generate insights
 Application helps to visualize the dashboard ,monitor and control the devices.
Monitor
and Control
Edge
Devices/Sensors
Device
Provision &
Onboard
Data Ingestion
Rules
Device
Management Compute Hosting
Security
Delivery
IOT Sensors &Gateway
7
Acquire and Transmit
Beacon
Sensor
Device
Thing
Gateway
Monitor Transmit Aggregate Analyze Send to Cloud
Analyze and ActAggregate
Gateway Provides
 Authentication
 Data Filtering
 Edge Analytics
 Control and management
Communication between sensors and
Gateway , Gateway to cloud platform
using
 Zigbee
 BLE
 Wi-Fi
 RF
 LoRa
 MQTT
 AMQP
 CoAP
 HTTP/HTTPS
 NFC,TCP/UDP
 UART,SPI
Different field sensors/Devices
 Sensors: Temperature, pressure, accelerometer ,vibration ,RPM,
Beacons etc..
 Devices: Camera, activity tracker, smart glass etc..
8
Data from Sensors/Devices
• Structured, semi-structured, or unstructured, or any combination of these varieties.
• Velocity, Variety and Volume
• What can we do with Vast amount of data?
• Real time streaming analysis and insights
• Derive meaningful KPI’s to help business
• Detect anomaly in operation, device behavior, Alert
• Predictive analysis
• Visualization, Control/Operate and report
• Quality inspection
• Process automation
• Improve quality of life
• Research and improvements
• How?
• Analyze the data
• How to analyze Terabytes of streaming data ?
• Large repositories
• Complex data analysis techniques
• Distributed/parallel processing .
• Data lake/warehousing /Business Intelligence .
Data Collection, Storage and processing
9
Predictive Tasks
 With growing volumes of available data
and affordable data storage.
Computational processing is also cheaper
 Analyzing bigger and complex data helps
in delivering faster and more accurate
results
 Process real-time data such as video,
audio, application logs, website
clickstreams, and IoT telemetry data.
 What happened and why?.
 Real time Streaming Analytics
 Descriptive Analytics
 human-interpretable patterns that
describe the data
 correlations, trends, clusters,
trajectories, and anomalies
 summarize the underlying relationships
in data
 Predictive Analytics.
 Use some variables to predict unknown
or future values of other variables
Data Collection
Event
Streaming/Data
Storage
Data Processing Descriptive Tasks
Real time Streaming
analytics
10
Step 1: Data Collection
 How to collect Continuous data from multiple sources/industry machines?
 Choose the important parameters and monitor the parameters
 Select the frequency at which the data needs to be collected
 Choose appropriate Cloud Gateway/broker and streaming platform
 Choose proper data lake to store different type of data
Collection
11
Step 2: Data Storage
Storage
Requirement:
 Relational/Non-Relational
 Store large amount of data
 Real Time Streaming Data
 Scalable ,Reliable and high availability
 Multi-tenancy
 Cost
Relational (SQL): Structure with defined attributes
 MS SQL, POSTGRESQL ,oracle SQL
 Can be queried using SQL
No-relational(NO-SQL): Free flow operations
 Utilize a variety of data models, including document, graph, key-value and columnar
 Unique way to query the data
 Mongo DB(Document) , Redis (Key-value) ,Amazon Redshift (Columnar) , Cassandra
(Columnar), HBase (Columnar) ,Dynamo DB(Document DB-stores JSON/XML) , GraphDB
12
Step 3: Data Processing
Processing
 Histogram
 Distribution
 Mean, variance, std
 Univariate
 Bivariate
 Missing value
 Outlier detection
 Variable
Transformation
 PCA
 Correlation
Analysis
 Variable Clustering
 Logarithm
 Square root
 Square
 Cube root
 Reciprocal
 Integrate multiple
data bases/files
 Data redundancy
check
 Transformation
13
Processing – Tools/Framework
 Processing Big Data
 Apache Hadoop
 Spark - distributed stream processing
 Storm - distributed stream processing
 Mango DB
 Casandra
 Talend –ETL
 Kafka – Event Processing
 Splunk – Log analysis Platform
 Hive – Data warehouse
 Hbase – No SQL
 Pig- Scripting
 Zookeeper – Centralized config & coordination
 Streams and Complex Event Processing
 Kafka, AWS Kinesis, JMS , Azure Event
hub, Google pub/sub
Where to install Big Data Tools ? Who is providing process/memory/Storage/
Network Capabilities?
 Language/Tools
 Python
 R Programming (Statistical
computing )
 Matlab
 Java
 Tensor flow
 Amazon Machine Learning
 Spark Mlib
 H2O
 Azure ML studio
 In-Memory
 Distributed/Parallel
processing
 Automated
Infrastructure/configuration
14
Streaming/Descriptive Analytics
 Streaming Analytics
 Analyzes and visualizes data in real time
 Ex: production floor manager wants to have real-time insights from the
sensor data, patterns and take actions on them
 Equipment performance
 LBS, In-context
 Descriptive Analytics
 Use historical data
 Clustering
 Association rule analysis
 Anomaly detection
15
 Finding groups of objects/points such that the objects/points in a group
will be similar (or related) to one another and different from (or unrelated
to) the objects in other groups
Clustering
Predictive Analytics
16
• What else most likely to happen?
• Data/Text mining ,forecast and statistical analysis
• Intelligent/scientific estimates about the future values (Ex:customer demand, interest rates, stock
market movements etc..).
• Deploy to take business decision
• Predictive Shipping
Input Param 3
Build predictive
Model
Model validation
Model
tuning/optimization
Input Param 1
Input Param 2
Model Deployment
17
Model Creation and training with historical data
18
Deployment and production
19
Classification and Regression
 Output of an algorithm after it has been trained on a historical dataset and applied to
new data to know the likelihood of a particular outcome.
 Set of algorithms & methods to predict categorical values.
 Classification, which is the task of assigning objects to one of several predefined
categories (classes)
20
Apply
Model
Learn
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Classification and Regression
21
Models/Algorithms
• Algorithm Selection depends on
• Accuracy
• Training time
• Number of parameters
• Feature count
• Memory footprint
• Linear/Non-linear data
• Algorithms
 Linear Regression
 Logical Regression
 Naive Bayes
 K-Means
 Random Forest
 Support Vector Machine
 Neural Network
22
Data Visualization
 To capture and communicate insights from Big Data analytics, move from standard
reporting to more sophisticated visualization.
 Visualization -> presenting information in such a way that people can consume it
effectively.
 The most impactful visualizations are often the most interactive
 Explore and have a conversation with the data.
 it capitalizes on visual advantage to recognize and understand patterns, represents
a large amount of data in one place, and gives users access to actionable insights.
Heat Map Tag Cloud History Flow
How to Visualize Raw/Processed/Analytics output?
Applications/Visualization Tools
23
Web Applications:
• Application that is accessed via a web browser over a
network
• JavaScript, CSS, and HTML5
• Web apps became really popular when HTML5 came
around and people realized that they can obtain
native-like functionality in the browser.
Native Applications:
• Native apps are written in languages that the platform
accepts
• Swift or Objective-C for iOS
• Java for Android
• C# for Windows Mobile
Hybrid Application:
• Combination of Native with Web Component
• Xamarin -Slack, Pinterest.
• React Native -Facebook, Walmart, Tesla, and Airbnb
• Titanium -eBay, ZipCar, PayPal
• Angular JS -PubNub Chat, YouTube on PS3
• Advanced BI tools – Power BI, Qlikview, Tableau
IoT Platforms
24
AWS IoT
Azure IoT
GE Predix
IBM Watson IoT
Thingworx
Google Cloud IoT
Bosch IoT
Mindsphere
Alibaba IoT
C3 IoT
Jasper
leonardo
Telit IoT
 Platform ability to centrally manage of
multiple devices at scale, provide
remote configuration, monitoring and
decommissioning.
 Facilitate seamless connection
between device to platform, platform
to device and direct connectivity
between sensors to platform.
 Ability to provide infrastructure, tools
to manage, store, process and real
time analysis of streaming data.
 Platform to host in public, private,
hybrid environments,
 Friendly environment, programming,
framework options to develop,
integrate, connect, host and run the
applications
 Platform capability to provide fine
grained security and data privacy.
Cloud Platforms
25
AWS
Azure
GCP
Alibaba
 PaaS and IaaS
 Ramp up or ramp down resource on need
basis
 Compute/memory/storage/GPU optimized
 Route the load to difference instances
 Virtual Network Environment
 Environment to host applications and run
 Auto scaling and Load Balancing
 Disaster management
 Automatic Deployment with Zero downtime
 Scaling and Elasticity
 Failure fallback
 Managed Machine learning, Deep Learning ,
Notification/Alert engine
 API management
 Identity and Access Management
 Compute, Storage, Network
 Serverless, Microservice
Open shift
Rackspace
Heroku
IoT asset management and Predictive Maintenance
Thank You
27

Analytics&IoT

  • 1.
  • 2.
    Agenda 2  IoT andIIoT  Data collection, storage, processing and visualization  Data Analytics  Cloud infrastructure and platform services for Analytics  Architecture - Example
  • 3.
    IOT & IIOT Billions of connected devices to server via network and deliver connected industry solutions. The connectivity is just an enabler but the real value of IoT is on data (business insight/data-driven economy).  Use of smart sensors and actuators to enhance manufacturing and industrial processes.  Industry 4.0 focusses on the interconnectedness of machines and systems to improve operational efficiency and productivity 3
  • 4.
    IoT- Key TechnologyEnablers 4 (1) Cloud Computing (2) Big Data and Analytics (3) Web 2.0 and 3.0 (4) Evolution of high speed communication technologies
  • 5.
    Technology Landscape 5 Industry verticals- Dashboard Platform and services Protocols and Communication Sensors ,Devices and Gateway Light Sensor Voltage Sensor Temp/ humidity Vibration Sensor Ultrasonic Sensor Gas Sensor BLE Sensor GPS Pi 3 gateway Gateway PLC • Username/ Password • API Security • Data At Transit • Data At Rest • Firewall • DoS prevention • Certificates/ Encryption • Policies • SSO/MFA Security
  • 6.
    IOT- Sensors toApplication Beacons Industrial plants Sensors Senor/Machine Parameters Device Gateway Edge Analytics Platform Agent Sensor Data Agent IoT & Cloud platform AnalyzeStore Application Alert Data Visualization  Edge Devices/Sensors source for the real time data .  Device gateway collects data from multiple edge devices, filter aggregate and ingest the data to the cloud platform for further processing and analyzing  IoT platform enable device onboarding, data ingestion, device to cloud and cloud to device communication  Cloud platform receive data, store, process and generate insights  Application helps to visualize the dashboard ,monitor and control the devices. Monitor and Control Edge Devices/Sensors Device Provision & Onboard Data Ingestion Rules Device Management Compute Hosting Security Delivery
  • 7.
    IOT Sensors &Gateway 7 Acquireand Transmit Beacon Sensor Device Thing Gateway Monitor Transmit Aggregate Analyze Send to Cloud Analyze and ActAggregate Gateway Provides  Authentication  Data Filtering  Edge Analytics  Control and management Communication between sensors and Gateway , Gateway to cloud platform using  Zigbee  BLE  Wi-Fi  RF  LoRa  MQTT  AMQP  CoAP  HTTP/HTTPS  NFC,TCP/UDP  UART,SPI Different field sensors/Devices  Sensors: Temperature, pressure, accelerometer ,vibration ,RPM, Beacons etc..  Devices: Camera, activity tracker, smart glass etc..
  • 8.
    8 Data from Sensors/Devices •Structured, semi-structured, or unstructured, or any combination of these varieties. • Velocity, Variety and Volume • What can we do with Vast amount of data? • Real time streaming analysis and insights • Derive meaningful KPI’s to help business • Detect anomaly in operation, device behavior, Alert • Predictive analysis • Visualization, Control/Operate and report • Quality inspection • Process automation • Improve quality of life • Research and improvements • How? • Analyze the data • How to analyze Terabytes of streaming data ? • Large repositories • Complex data analysis techniques • Distributed/parallel processing . • Data lake/warehousing /Business Intelligence .
  • 9.
    Data Collection, Storageand processing 9 Predictive Tasks  With growing volumes of available data and affordable data storage. Computational processing is also cheaper  Analyzing bigger and complex data helps in delivering faster and more accurate results  Process real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data.  What happened and why?.  Real time Streaming Analytics  Descriptive Analytics  human-interpretable patterns that describe the data  correlations, trends, clusters, trajectories, and anomalies  summarize the underlying relationships in data  Predictive Analytics.  Use some variables to predict unknown or future values of other variables Data Collection Event Streaming/Data Storage Data Processing Descriptive Tasks Real time Streaming analytics
  • 10.
    10 Step 1: DataCollection  How to collect Continuous data from multiple sources/industry machines?  Choose the important parameters and monitor the parameters  Select the frequency at which the data needs to be collected  Choose appropriate Cloud Gateway/broker and streaming platform  Choose proper data lake to store different type of data Collection
  • 11.
    11 Step 2: DataStorage Storage Requirement:  Relational/Non-Relational  Store large amount of data  Real Time Streaming Data  Scalable ,Reliable and high availability  Multi-tenancy  Cost Relational (SQL): Structure with defined attributes  MS SQL, POSTGRESQL ,oracle SQL  Can be queried using SQL No-relational(NO-SQL): Free flow operations  Utilize a variety of data models, including document, graph, key-value and columnar  Unique way to query the data  Mongo DB(Document) , Redis (Key-value) ,Amazon Redshift (Columnar) , Cassandra (Columnar), HBase (Columnar) ,Dynamo DB(Document DB-stores JSON/XML) , GraphDB
  • 12.
    12 Step 3: DataProcessing Processing  Histogram  Distribution  Mean, variance, std  Univariate  Bivariate  Missing value  Outlier detection  Variable Transformation  PCA  Correlation Analysis  Variable Clustering  Logarithm  Square root  Square  Cube root  Reciprocal  Integrate multiple data bases/files  Data redundancy check  Transformation
  • 13.
    13 Processing – Tools/Framework Processing Big Data  Apache Hadoop  Spark - distributed stream processing  Storm - distributed stream processing  Mango DB  Casandra  Talend –ETL  Kafka – Event Processing  Splunk – Log analysis Platform  Hive – Data warehouse  Hbase – No SQL  Pig- Scripting  Zookeeper – Centralized config & coordination  Streams and Complex Event Processing  Kafka, AWS Kinesis, JMS , Azure Event hub, Google pub/sub Where to install Big Data Tools ? Who is providing process/memory/Storage/ Network Capabilities?  Language/Tools  Python  R Programming (Statistical computing )  Matlab  Java  Tensor flow  Amazon Machine Learning  Spark Mlib  H2O  Azure ML studio  In-Memory  Distributed/Parallel processing  Automated Infrastructure/configuration
  • 14.
    14 Streaming/Descriptive Analytics  StreamingAnalytics  Analyzes and visualizes data in real time  Ex: production floor manager wants to have real-time insights from the sensor data, patterns and take actions on them  Equipment performance  LBS, In-context  Descriptive Analytics  Use historical data  Clustering  Association rule analysis  Anomaly detection
  • 15.
    15  Finding groupsof objects/points such that the objects/points in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Clustering
  • 16.
    Predictive Analytics 16 • Whatelse most likely to happen? • Data/Text mining ,forecast and statistical analysis • Intelligent/scientific estimates about the future values (Ex:customer demand, interest rates, stock market movements etc..). • Deploy to take business decision • Predictive Shipping Input Param 3 Build predictive Model Model validation Model tuning/optimization Input Param 1 Input Param 2 Model Deployment
  • 17.
    17 Model Creation andtraining with historical data
  • 18.
  • 19.
    19 Classification and Regression Output of an algorithm after it has been trained on a historical dataset and applied to new data to know the likelihood of a particular outcome.  Set of algorithms & methods to predict categorical values.  Classification, which is the task of assigning objects to one of several predefined categories (classes)
  • 20.
    20 Apply Model Learn Model Tid Attrib1 Attrib2Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Classification and Regression
  • 21.
    21 Models/Algorithms • Algorithm Selectiondepends on • Accuracy • Training time • Number of parameters • Feature count • Memory footprint • Linear/Non-linear data • Algorithms  Linear Regression  Logical Regression  Naive Bayes  K-Means  Random Forest  Support Vector Machine  Neural Network
  • 22.
    22 Data Visualization  Tocapture and communicate insights from Big Data analytics, move from standard reporting to more sophisticated visualization.  Visualization -> presenting information in such a way that people can consume it effectively.  The most impactful visualizations are often the most interactive  Explore and have a conversation with the data.  it capitalizes on visual advantage to recognize and understand patterns, represents a large amount of data in one place, and gives users access to actionable insights. Heat Map Tag Cloud History Flow
  • 23.
    How to VisualizeRaw/Processed/Analytics output? Applications/Visualization Tools 23 Web Applications: • Application that is accessed via a web browser over a network • JavaScript, CSS, and HTML5 • Web apps became really popular when HTML5 came around and people realized that they can obtain native-like functionality in the browser. Native Applications: • Native apps are written in languages that the platform accepts • Swift or Objective-C for iOS • Java for Android • C# for Windows Mobile Hybrid Application: • Combination of Native with Web Component • Xamarin -Slack, Pinterest. • React Native -Facebook, Walmart, Tesla, and Airbnb • Titanium -eBay, ZipCar, PayPal • Angular JS -PubNub Chat, YouTube on PS3 • Advanced BI tools – Power BI, Qlikview, Tableau
  • 24.
    IoT Platforms 24 AWS IoT AzureIoT GE Predix IBM Watson IoT Thingworx Google Cloud IoT Bosch IoT Mindsphere Alibaba IoT C3 IoT Jasper leonardo Telit IoT  Platform ability to centrally manage of multiple devices at scale, provide remote configuration, monitoring and decommissioning.  Facilitate seamless connection between device to platform, platform to device and direct connectivity between sensors to platform.  Ability to provide infrastructure, tools to manage, store, process and real time analysis of streaming data.  Platform to host in public, private, hybrid environments,  Friendly environment, programming, framework options to develop, integrate, connect, host and run the applications  Platform capability to provide fine grained security and data privacy.
  • 25.
    Cloud Platforms 25 AWS Azure GCP Alibaba  PaaSand IaaS  Ramp up or ramp down resource on need basis  Compute/memory/storage/GPU optimized  Route the load to difference instances  Virtual Network Environment  Environment to host applications and run  Auto scaling and Load Balancing  Disaster management  Automatic Deployment with Zero downtime  Scaling and Elasticity  Failure fallback  Managed Machine learning, Deep Learning , Notification/Alert engine  API management  Identity and Access Management  Compute, Storage, Network  Serverless, Microservice Open shift Rackspace Heroku
  • 26.
    IoT asset managementand Predictive Maintenance
  • 27.