Big Data Application
Architectures - IoT
Nishant Thacker
Technical Product Manager – Big Data
Microsoft
@nishantthacker
Big Data Application
Architectures - IoT
2
Big Data Application
Architectures - IoT
3
“Information is the oil of the 21st century,
and analytics is the combustion engine.”
- Peter Sondergaard - Gartner
Today: More “Connected Things“ Than
Toothbrushes In The World…
Category 2013 2014 2015 2020
Automotive 96 190 372 3,511
Consumer 1,842 2,245 2.875 13,173
Generic
Business
395 479 624 5,159
Vertical
Business
699 837 1,009 3,164
Grand Total 3,032 3,750 4,881 25,007
The Internet of Things Story
3
Customer Examples *
From the $ 1 WiFi Module…
1 x CPU
160 MHz
80 kByte usable RAM
… to the $ 1000+ Automotive Supercomputer
2 x CPU
2 x GPU
8 Teraflops
Get To Know Your Things!
Device Supplier Processor Memory IOs Network OS Price*
ESP8266
modules
Espressif 1 x 160
MHz
128 kB RAM, 1
MB flash
12 GPIO
1 ADC, I2C, I2C
WiFi 2.4 GHz n/a $ 2.5
Photon Particle.io 1 x 120
MHz
128 kB RAM, 1
MB flash
18 GPIO, 2 SPI, I2S, I2C, CAN,
USB, 9 PWM, ADC, DAC
WiFi 2.4 GHz n/a $ 19
Electron Particle.io 1 x 120
MHz
128 kB RAM
1 MB flash
28 GPIO 3G UMTS n/a $ 39
WiLink 8
family
Texas
Instruments
n/a n/a n/a WiFi 2.4/5 GHz,
Bluetooth 4.1
LE
n/a $ 10 – 25
(industrial
grade)
Arduino
Leonardo
Arduino LLC,
Arduino Sarl
1 x 16 MHz 2.5 kB RAM 32
KB flash
20 GPIO, 7 PWM, 10 ADC, USB - n/a $ 10
Raspberry Pi
Zero
Raspberry Pi
Foundation
1 x 1 GHz 512 MB RAM
micro-SD
10 GPIO, Mini HDMI, USB - Linux $ 5
Raspberry Pi
2
Raspberry Pi
Foundation
4 x 900
MHz,
GPU
1 GB RAM
Micro-SD
40 GPIO, 1 PWM, 1 ADC, HDMI, 4
USB
Ethernet Windows 10,
Linux, RiscOS
$ 35
Beaglebone
Black
Beagleboard.o
rg
1 x 1 GHz 512 MB RAM, 4
GB flash
69 GPIO, 2 CAN, 10 ADC, 8 PWM,
HDMI, USB
Ethernet Linux $ 55
Drive PX 2
(H2/CY16)
NVIDIA 2 x CPU
2 x GPU
8 Tflops
tbd 12 cameras, LIDAR, RADAR,
Ultrasonic, …
Tbd tbd $ 1000+ ?
Big Data Application
Architectures - IoT
12
IoT Reference Architecture
Low power
devices
Existing IoT
devices
IoT Client
Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Gateway
Data Lake
Gateway
App Backend
Data Path
Optional solution component
IoT solution component
IoT Client
Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity
Personal
mobile
devices
IP capable
devices
IoT Client
Business
systems
Reference Architecture & Azure Services
Low power
devices
Existing IoT
devices
IoT Client
Solution UX
Provisioning API
Device Registry
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Gateway
Data Lake
Gateway
App Backend
IoT Client
Personal
mobile
devices
IP capable
devices
IoT Client
Business
systems
Data Path
Optional solution component
Azure IoT solution component
Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity
Device Connectivity Options
Field
Gateway
CoAP, AllJoyn, OPC
Custom Cloud
Gateway
(Cloud
Service, VM)VPN/ExpressRoute
OPC, HTTP, CoAP
Field
Gateway
CoAP, AllJoyn, OPC
IoT Hub
Custom Cloud
Gateway
(Cloud
Service, VM)
AMQP, MQTT, HTTPS
Custom Protocols
Data Path
Optional solution component
Azure IoT solution component
Device
IoT Client
Device
IoT Client
Device
IoT ClientDevice
Device
Device
AMQP, MQTT, HTTPS
Device Stores
App Backend Solution UX
Provisioning API
Device Registry Store
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Data Lake
Gateway
(Kafka,
IoT Hub,
Event Hubs)
Gateway
IP capable
devices
IoT Client
Data Path
Optional solution component
Azure IoT solution component
IoT Client
Existing IoT
devices
IoT Client
Low power
devices
Device
Identity
Store
Device Identity, Registry and State Stores
Identity Store
Authority for all registered devices
Stores identity information and authentication secrets
Registry Store
Index in addition to the identity store
Contains discovery and reference data related to devices
Can define a schema model or use a vertical industry standard schema for metadata
Can contain structured metadata and links to externally stored operational data
Device State Store
Contains operational data related to the devices:
- “Last known values” for each device
- Aggregated or computed values
- Stream of device data events
Device Provisioning
Provisioning API is the common external interface for
changes on device identity and device registry stores.
Workflow for processing individual and bulk requests:
Registering new devices
Updating or removing existing devices
Activation or access control
May also include interactions with external systems:
Billing systems
Business support systems
Connectivity management systems
Stream Processors
App Backend
Gateway
IP capable
devices
IoT Client
Data Path
Optional solution component
Azure IoT solution component
IoT Client
Existing IoT
devices
IoT Client
Low power
devices
Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Data Lake
Cloud
Gateway
Stream Processing: Data Flow
After ingress through the IoT Hub, the flow of data through the
system is facilitated by data pumps and analytics tasks
Data flow can be driven by:
• Apache Storm on Azure HDInsight
• Apache Spark on Azure HDInsight
• Azure Stream Analytics
• Custom Event Processors
Each can perform tasks
in flight:
• Data aggregation
• Data enrichment
• Complex event processing
… and can output data
to:
• Azure Data Lake
• Azure Blobs/Tables
• HDInsight / HBase
• Azure SQL DB
• Time Series Databases
• Event Hub
• Service Bus Queues
Stream Processor Examples
Queue
Gateway
IP capable
devices
IoT Client
Data Path
Optional solution component
Azure IoT solution component
IoT Client
Existing IoT
devices
IoT Client
Low power
devices
Device Registry Store
Device Metadata
Processor
Data Lake
Cloud
Gateway
Device State Store
Device State
Processor
Notification
Processor
Raw Telemetry Processor
App Backend
Rules Processor
Event Hub
Stream Transformation
Processor
Secondary Stream
Processor
App Backend
App Backend
Gateway
IP capable
devices
IoT Client
Data Path
Optional solution component
Azure IoT solution component
IoT Client
Existing IoT
devices
IoT Client
Low power
devices
Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Storage
Cloud
Gateway
High-Scale Compute Models
Scale-appropriate compute models
Actor Frameworks / Service Fabric Reliable Actors: distributed
compute fabric hosting device actors.
Service Fabric Reliable Collections: highly available with
replicated and local state management.
Azure Batch: job scheduling and compute management for
highly parallelizable compute workloads.
Simple programming logic in vastly scalable
compute nodes
Data Analytics
App Backend
Gateway
IP capable
devices
IoT Client
Data Path
Optional solution component
Azure IoT solution component
IoT Client
Existing IoT
devices
IoT Client
Low power
devices
Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Data Lake
Cloud
Gateway
Data Analytics
Ingestion Gateway
Stream Processing
(ASA, Storm or Spark)
Batch Events / Logs
Fetching & Updating
Reference Data
Interceptor (Rules)
Spark
Hive/Pig
U-SQL
Azure Data Lake Store Azure Data Lake Analytics
SQL DB
R, Azure ML and/or
Spark
Reports and Dashboards
Real Time Scoring
Training and Scoring
ML Models
Azure SQL DW
Federated Query
NRT Events
Transactional Data
Alerts
Data Analytics
Real-Time Analysis
Aggregation/Reduction, Temporal Queries, State
Correlation, Threshold Detection, Alerting
Data-At-Rest Analysis
Time-Series, Map/Reduce, Correlation
Machine Learning
Pattern Detection, Behavior Prediction
Plausibility Analysis, Anomaly and Fraud Detection
Power BI
HDInsight
Stream Analytics
Data Factory
Machine Learning
WebHDFS
YARN
U-SQL
Analytics Service HDInsight
(managed Hadoop Clusters)
Analytics
Store
Azure Data Lake
Cortana Intelligence Suite
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
Presentation and Business Connectivity
App Backend
Gateway
IP capable
devices
IoT Client
Data Path
Optional solution component
Azure IoT solution component
IoT Client
Existing IoT
devices
IoT Client
Low power
devices
Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Data Lake
Cloud
Gateway
Reference arch. with component services
Low power
devices
Existing IoT
devices
IoT Client
Solution UX
Provisioning API
Device Registry
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Gateway
Data Lake
Gateway
App Backend
IoT Client
Personal
mobile
devices
IP capable
devices
IoT Client
Business
systems
Data Path
Optional solution component
Azure IoT solution component
Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity
Big Data Application
Architectures - IoT
31
Reference Architecture Guiding Principles
Heterogeneity
Accommodates for a vast variety of scenarios, environments, devices, and processing patterns
Security
Considers security and privacy measures across all areas
Hyper-scale
Supports millions of connected devices
Flexibility
Allows for composability and extensibility to enable the usage of various first-party or third-party
technologies
Big Data Application
Architectures - IoT
33
nishant.thacker@microsoft.com
© 2016 Microsoft Corporation. All rights reserved.

Big Data Application Architectures - IoT

  • 1.
    Big Data Application Architectures- IoT Nishant Thacker Technical Product Manager – Big Data Microsoft @nishantthacker
  • 2.
  • 3.
  • 4.
    “Information is theoil of the 21st century, and analytics is the combustion engine.” - Peter Sondergaard - Gartner
  • 5.
    Today: More “ConnectedThings“ Than Toothbrushes In The World… Category 2013 2014 2015 2020 Automotive 96 190 372 3,511 Consumer 1,842 2,245 2.875 13,173 Generic Business 395 479 624 5,159 Vertical Business 699 837 1,009 3,164 Grand Total 3,032 3,750 4,881 25,007
  • 6.
    The Internet ofThings Story 3
  • 7.
  • 9.
    From the $1 WiFi Module… 1 x CPU 160 MHz 80 kByte usable RAM
  • 10.
    … to the$ 1000+ Automotive Supercomputer 2 x CPU 2 x GPU 8 Teraflops
  • 11.
    Get To KnowYour Things! Device Supplier Processor Memory IOs Network OS Price* ESP8266 modules Espressif 1 x 160 MHz 128 kB RAM, 1 MB flash 12 GPIO 1 ADC, I2C, I2C WiFi 2.4 GHz n/a $ 2.5 Photon Particle.io 1 x 120 MHz 128 kB RAM, 1 MB flash 18 GPIO, 2 SPI, I2S, I2C, CAN, USB, 9 PWM, ADC, DAC WiFi 2.4 GHz n/a $ 19 Electron Particle.io 1 x 120 MHz 128 kB RAM 1 MB flash 28 GPIO 3G UMTS n/a $ 39 WiLink 8 family Texas Instruments n/a n/a n/a WiFi 2.4/5 GHz, Bluetooth 4.1 LE n/a $ 10 – 25 (industrial grade) Arduino Leonardo Arduino LLC, Arduino Sarl 1 x 16 MHz 2.5 kB RAM 32 KB flash 20 GPIO, 7 PWM, 10 ADC, USB - n/a $ 10 Raspberry Pi Zero Raspberry Pi Foundation 1 x 1 GHz 512 MB RAM micro-SD 10 GPIO, Mini HDMI, USB - Linux $ 5 Raspberry Pi 2 Raspberry Pi Foundation 4 x 900 MHz, GPU 1 GB RAM Micro-SD 40 GPIO, 1 PWM, 1 ADC, HDMI, 4 USB Ethernet Windows 10, Linux, RiscOS $ 35 Beaglebone Black Beagleboard.o rg 1 x 1 GHz 512 MB RAM, 4 GB flash 69 GPIO, 2 CAN, 10 ADC, 8 PWM, HDMI, USB Ethernet Linux $ 55 Drive PX 2 (H2/CY16) NVIDIA 2 x CPU 2 x GPU 8 Tflops tbd 12 cameras, LIDAR, RADAR, Ultrasonic, … Tbd tbd $ 1000+ ?
  • 12.
  • 13.
    IoT Reference Architecture Lowpower devices Existing IoT devices IoT Client Solution UX Provisioning API Identity and Registry Stores Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Gateway Data Lake Gateway App Backend Data Path Optional solution component IoT solution component IoT Client Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity Personal mobile devices IP capable devices IoT Client Business systems
  • 14.
    Reference Architecture &Azure Services Low power devices Existing IoT devices IoT Client Solution UX Provisioning API Device Registry Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Gateway Data Lake Gateway App Backend IoT Client Personal mobile devices IP capable devices IoT Client Business systems Data Path Optional solution component Azure IoT solution component Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity
  • 15.
    Device Connectivity Options Field Gateway CoAP,AllJoyn, OPC Custom Cloud Gateway (Cloud Service, VM)VPN/ExpressRoute OPC, HTTP, CoAP Field Gateway CoAP, AllJoyn, OPC IoT Hub Custom Cloud Gateway (Cloud Service, VM) AMQP, MQTT, HTTPS Custom Protocols Data Path Optional solution component Azure IoT solution component Device IoT Client Device IoT Client Device IoT ClientDevice Device Device AMQP, MQTT, HTTPS
  • 16.
    Device Stores App BackendSolution UX Provisioning API Device Registry Store Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Data Lake Gateway (Kafka, IoT Hub, Event Hubs) Gateway IP capable devices IoT Client Data Path Optional solution component Azure IoT solution component IoT Client Existing IoT devices IoT Client Low power devices Device Identity Store
  • 17.
    Device Identity, Registryand State Stores Identity Store Authority for all registered devices Stores identity information and authentication secrets Registry Store Index in addition to the identity store Contains discovery and reference data related to devices Can define a schema model or use a vertical industry standard schema for metadata Can contain structured metadata and links to externally stored operational data Device State Store Contains operational data related to the devices: - “Last known values” for each device - Aggregated or computed values - Stream of device data events
  • 18.
    Device Provisioning Provisioning APIis the common external interface for changes on device identity and device registry stores. Workflow for processing individual and bulk requests: Registering new devices Updating or removing existing devices Activation or access control May also include interactions with external systems: Billing systems Business support systems Connectivity management systems
  • 19.
    Stream Processors App Backend Gateway IPcapable devices IoT Client Data Path Optional solution component Azure IoT solution component IoT Client Existing IoT devices IoT Client Low power devices Solution UX Provisioning API Identity and Registry Stores Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Data Lake Cloud Gateway
  • 20.
    Stream Processing: DataFlow After ingress through the IoT Hub, the flow of data through the system is facilitated by data pumps and analytics tasks Data flow can be driven by: • Apache Storm on Azure HDInsight • Apache Spark on Azure HDInsight • Azure Stream Analytics • Custom Event Processors Each can perform tasks in flight: • Data aggregation • Data enrichment • Complex event processing … and can output data to: • Azure Data Lake • Azure Blobs/Tables • HDInsight / HBase • Azure SQL DB • Time Series Databases • Event Hub • Service Bus Queues
  • 21.
    Stream Processor Examples Queue Gateway IPcapable devices IoT Client Data Path Optional solution component Azure IoT solution component IoT Client Existing IoT devices IoT Client Low power devices Device Registry Store Device Metadata Processor Data Lake Cloud Gateway Device State Store Device State Processor Notification Processor Raw Telemetry Processor App Backend Rules Processor Event Hub Stream Transformation Processor Secondary Stream Processor
  • 22.
    App Backend App Backend Gateway IPcapable devices IoT Client Data Path Optional solution component Azure IoT solution component IoT Client Existing IoT devices IoT Client Low power devices Solution UX Provisioning API Identity and Registry Stores Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Storage Cloud Gateway
  • 23.
    High-Scale Compute Models Scale-appropriatecompute models Actor Frameworks / Service Fabric Reliable Actors: distributed compute fabric hosting device actors. Service Fabric Reliable Collections: highly available with replicated and local state management. Azure Batch: job scheduling and compute management for highly parallelizable compute workloads. Simple programming logic in vastly scalable compute nodes
  • 24.
    Data Analytics App Backend Gateway IPcapable devices IoT Client Data Path Optional solution component Azure IoT solution component IoT Client Existing IoT devices IoT Client Low power devices Solution UX Provisioning API Identity and Registry Stores Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Data Lake Cloud Gateway
  • 25.
    Data Analytics Ingestion Gateway StreamProcessing (ASA, Storm or Spark) Batch Events / Logs Fetching & Updating Reference Data Interceptor (Rules) Spark Hive/Pig U-SQL Azure Data Lake Store Azure Data Lake Analytics SQL DB R, Azure ML and/or Spark Reports and Dashboards Real Time Scoring Training and Scoring ML Models Azure SQL DW Federated Query NRT Events Transactional Data Alerts
  • 26.
    Data Analytics Real-Time Analysis Aggregation/Reduction,Temporal Queries, State Correlation, Threshold Detection, Alerting Data-At-Rest Analysis Time-Series, Map/Reduce, Correlation Machine Learning Pattern Detection, Behavior Prediction Plausibility Analysis, Anomaly and Fraud Detection Power BI HDInsight Stream Analytics Data Factory Machine Learning
  • 27.
    WebHDFS YARN U-SQL Analytics Service HDInsight (managedHadoop Clusters) Analytics Store Azure Data Lake
  • 28.
    Cortana Intelligence Suite Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards& Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory Machine Learning and Analytics HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning Big Data Stores SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data
  • 29.
    Presentation and BusinessConnectivity App Backend Gateway IP capable devices IoT Client Data Path Optional solution component Azure IoT solution component IoT Client Existing IoT devices IoT Client Low power devices Solution UX Provisioning API Identity and Registry Stores Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Data Lake Cloud Gateway
  • 30.
    Reference arch. withcomponent services Low power devices Existing IoT devices IoT Client Solution UX Provisioning API Device Registry Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Gateway Data Lake Gateway App Backend IoT Client Personal mobile devices IP capable devices IoT Client Business systems Data Path Optional solution component Azure IoT solution component Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity
  • 31.
  • 32.
    Reference Architecture GuidingPrinciples Heterogeneity Accommodates for a vast variety of scenarios, environments, devices, and processing patterns Security Considers security and privacy measures across all areas Hyper-scale Supports millions of connected devices Flexibility Allows for composability and extensibility to enable the usage of various first-party or third-party technologies
  • 33.
  • 36.
  • 37.
    © 2016 MicrosoftCorporation. All rights reserved.

Editor's Notes

  • #35 And in this world we live, where many countries have more mobile phones than humans and its predicted that by 2020 we will have more than 7 networked devices per human, that are throwing out data such as location and bio data, we need to understand how we do more to leverage this data. Your phone and your connected apps know you are here today and so does the intelligence agents like Cortana. Organisations that start to leverage this data will be ahead of the competition.
  • #36 And in this world, We are connecting more and more sensors to things. We are connecting things to Ourselves, adding sensors to our things, the things around us. The health band I wear that tracks my heart rate, steps etc has 11 sensors alone. Phones have another 5 or 6. We have one customer, that collects, washes and delivers over 1 million bedding sheets a day. This hotel will be one of its customers. They wash, deliver and manage these sheets for around 20p each, so very low margin, high volume. They have embedded sensors into every sheet and can now understand where all their stock is, understand where they are seeing loss in their customers, where they can dynamically price based on usage, loss and other factors down to each customer. They can also understand the lifetime of a sheet (number of washes) and then engage with their manufacturers to see if they can improve it. Laundry as a service is here, at scale. And this data is making a huge difference. If you now walk home with one of those fluffy robes, the system will likely know…