SlideShare a Scribd company logo
From Device to Data Center to Insights
Architectural Considerations for the Internet of Anything
P. Taylor Goetz, Hortonworks
@ptgoetz
About Me
• Tech Staff @ Hortonworks
• PMC Chair, Apache Storm
• ASF Member
• PMC, Apache Incubator, Apache Arrow, Apache
Kylin, Apache Apex
• Mentor/PPMC, Apache Eagle (Incubating), Apache
Mynewt (Incubating), Apache Metron (Incubating),
Apache Gossip (Incubating)
26 billion IoT devices by 2020
-Gartner
http://www.gartner.com/newsroom/id/2636073
IPv4 Address Space: 4.6 billion
IoT Growth
• Everyone here should know IoT is huge
• Sensors, Phones, Connected Cars, Wearables, Software-as-a-
Sensor, ...
• Cuts across virtually all industries
IoT Architecture
Key Architectural Tiers
• Origin: Devices and Data Sources
• Transport: Orchestrating Bi-Directional Data Flow Between Sources
• Analytics: Analysis of Unbounded (Streaming) and Bounded (Batch)
Data, and Acting in Response
Origin Tier
Birthplace of IoT Data
Origin Tier
• Where data is born, but also a destination
• Sensors and Devices
• Constrained Hubs/Gateways
Origin Tier
Devices are getting smaller, cheaper, and increasingly network
enabled.
Examples:
• RaspberryPi ($35, Full OS)
• ESP8266 (<$5 WiFi-enabled microcontroller)
Origin Tier
Devices in the Origin Tier both transmit and receive data.
• Command and Control
• Actuators (interaction with the physical environment)
• End user alerts and notifications
IoT Protocol Considerations
IoT Protocol Considerations
• Device-Device / Device-Gateway Communication
• Radio Frequency Protocols
• IP-based Protocols
IoT Protocol Considerations
Radio Frequency Protocols
• Typically for very resource-constrained devices (Ex: Wireless
sensors in a home security system)
• Usually involve an intermediary hub/gateway as a protocol bridge
(Ex: Main panel in a home security system)
• Short range
• Low Power
Radio Frequency Protocols
ZigBee
• Intended for low power applications (~2 yr. battery life)
• Low data rates
• Simpler and less expensive that WPANs like Bluetooth
Radio Frequency Protocols
ZigBee
• Range: 10–100 meters LOS (between nodes, but messages can
hop in a mesh network)
• Data Rate: 250 kbit/s
• Supports Star, Tree, and Mesh network topologies
• Requires a coordinator device for every network (usually the
hub/gateway)
Radio Frequency Protocols
Z-Wave
• Targets home automation
• Low power/Low data rate
• Proprietary
• Sole chip vendor
Radio Frequency Protocols
Z-Wave
• Range: ~30 meters LOS (between nodes, but messages can hop)
• Data Rate: 100kbit/s
• Form source-routed mesh-networks (can route around failures/obstacles)
• Devices must be paired
• Requires a primary controller (e.g. the hub/gateway)
• Max 232 devices per network (but networks can be bridged)
Radio Frequency Protocols
Bluetooth/Blootooth LE
• Targets wireless computer and device accessories
• High data rates
• Do not form routed networks like Zigbee and Z-Wave
• Usually one host to many device pairing
• Range: 0.5m (Class 4) - 100m (Class 1)
• Data Rate: 1 Mbit/s - 24 Mbit/s
Radio Frequency Protocols
Thread
• New wireless protocol introduced by Nest (Google/Alphabet), Samsung, ARM, Qualcomm
• Built on top of the same (IEEE 802.15.4) specification as ZigBee
• IPv6-based
• Mesh network with hops supported
• ~250 devices per network
• Very low power (purported years of operation on a single AA with deep sleep modes)
• Very new/unsure future — WiFi, Bluetooth, etc. already ubiquitous
IoT Protocol Considerations
IP-Based Protocols
• Require a full IP stack
• Higher power consumption
• Longer range (e.g. WiFi)
IP-Based Protocols
CoAP - Constrained Application Protocol
• Designed to be used on micro controllers with as little as 10k of
memory.
• Simple request/response protocol
• Much like HTTP but based on UDP
• Based on the REST model (GET, PUT, POST, DELETE)
• Strong security via DTLS (Datagram Transport Layer Security)
IP-Based Protocols
CoAP - Constrained Application Protocol
• Simple 4-byte header
• Subset of MIME types and HTTP response codes
• Data model agnostic
• one-to-one
• Tranport (UDP) <— Base Messaging (Simple Confirmable/Non-
Confirmable message transfer) <— REST Semantics
IP-Based Protocols
MQTT - Message Queue Telemetry Transport
• Pub/Sub messaging protocol
• Requires a broker (though brokers can be lightweight)
• many-to-many broadcast
IP-Based Protocols
MQTT - Message Queue Telemetry Transport
• Message == Topic + Payload
• Topics: users/ptgoetz/office/thermostat
• Topic wildcards:
• Single level (+): users/ptgoetz/+/thermostat
• Multi-level (#): users/ptgoetz/office/#
• Payload: Just a bunch of bytes (you define the schema)
IP-Based Protocols
MQTT - Message Queue Telemetry Transport
• Delivery guarantees (QoS):
• 0: At-most-once
• 1: At-least-once
• 2: Exactly-once
• Last will and testament (when a device goes offline)
• Security via SSL/TLS
Apache Mynewt (incubating)
• Real-time, modular OS for IoT devices
• Designed for use in devices with power, memory and
storage constraints
• Support for many ARM Cortex-M based boards
(including Arduino)
• HAL for unified access to MCU features
• Connectivity with Bluetooth LE
• WiFi, CoAP, and Thread support (roadmap)
• Remote Firmware Upgrades
• Command-line tools for package management
Transport Tier
Data Flow From Device to Data Center
Transport Tier
• Connecting Edge Devices:
• To and from the Analytics Tier (data center)
• To and from one another (inter-device communication)
• Bridging Protocols:
• e.g. WPAN to IP
• Collecting/Transforming/Enriching Data in Motion
Apache NiFi
Apache NiFi
• Data flow orchestration tool
• Guaranteed Delivery
• Data provenance (important in the Analytics
Tier)
• Backpressure with release
• Flow-specific QoS
• Web-based UI for editing data flows
• Data flows modifiable at runtime
• Supports bi-directional data flows
• Integrates with just about any system
Apache NiFi
Basic Concepts
• Flow File: Unit of user data with associated
key-value metadata
• Processor: Components for creating, sending,
receiving, transforming, routing, etc. Flow Files
• Connection: Acts as the link between
processors.
• Flow Controller: Brokers the exchange of data
between processors
• Process Group: Set of Processors and
Connections with Input/Output ports. New
components can be created by composition.
Apache NiFi minifi
• Supplement to NiFi for constrained
devices/environments
• More suitable for edge devices
• Small footprint
• Designed to collect data near where it
originates an integrate with NiFi
Apache NiFi
For more information:
• https://nifi.apache.org
Some of the best technical
documentation I’ve ever seen:
• https://nifi.apache.org/docs.html
Analytics Tier
Acting on Insights
Analytics Tier
• Where IoT data often (but not always) intersects with Big Data
platforms and Cloud Computing
• Vertical scaling may suffice
Analytics Tier
• Many, many options…
• [insert your definition of Hadoop here]
Analytics Tier
Key Platform Considerations:
• Unbounded (Stream) data processing frequently necessary
• Apache Storm, Apache Flink, etc.
• Bounded (Batch) data processing frequently necessary
• e.g. Training machine learning models, etc.
• Apache Hadoop M/R, Apache Flink, Apache Spark
• Time Series DB a common requirement
• Apache HBase, Apache Cassandra, etc.
Analytics Tier
Key Platform Considerations:
• Latency matters for many use cases
• Latency can add up quickly, depending on the number of “hops”
• Windowing semantics and flexibility
When?
The importance of event time(s).
What is Event Time and why is it so
important?
• Event Times: Origin Time vs. Processing Time
• Ex: Airplane Mode
• Other types of Event Time:
• Enrichment Time
• Ingest Time
• Processing Time 1, 2, n…
• Exit Time (e.g. “return” events, C2, bi-directional communication)
Choose a platform/API that gives you
the most flexibility with respect to
dealing with various event times.
Future-Proofing and Scaling
Small to Medium Scale:
• Not Big Data
• Investment in large-scale distributed system infrastructure wouldn’t
make sense.
• YAGNI (Yet…)
• Vertical scaling may suffice
Future-Proofing and Scaling
Medium to Large Scale:
• A single server is no longer cutting it
• “V”s are starting to pile up
• Need to move to a distributed architecture to scale with increasing
demand
• Your data is now Big
Apache Beam (incubating)
• Unified API for dealing with
bounded/unbounded data sources
(i.e. batch/streaming)
• One API. Multiple implementations
(execution engines). Called
“Runners” in Beamspeak.
Apache Beam (incubating)
• Major focus on Windowing and
properly dealing with Event Time(s)
• Sliding Windows, Tumbling Windows,
Session Windows, etc.
• Watermark capabilities for dealing
with late data
Apache Beam (incubating)
• Runner/Execution Engine Availability
• Local runner (single machine)
• Runners for Google Cloud
Dataflow, Flink and Spark
• Others underway: Apache Storm,
Apache Apex and others
Apache Beam (incubating)
• Choose the right runner for your
current scaling and organizational
needs (you can switch later as as
necessary)
• Understand the limits of different
runner implementations
• Outside of Google Data Flow, the
Flink runner is currently the most
feature-complete (this will change)
Apache Beam (incubating)
For a technical deep dive into Apache
Beam:
Apache Beam: A Unified Model for
Batch and Streaming Data
Processing
- Davor Bonaci, Google Inc.
Thursday 4:10PM, Ballroom A
Firmware, Parsers, and
Schemas
(Oh my!)
Problem: Data Formats
• Many IoT devices transmit data as a raw array of bytes
• The format of that data may be proprietary
• To be of any use it must be parsed into a machine-readable format
(i.e. Schema)
• Once parsed, you need to know the schema
Problem: Firmware Versions
• Deployed IoT devices may be running any number of versions
• Data formats may differ between firmware versions
• Multiple parsers may be necessary to accommodate different device
types and firmware versions
Solution: Parser Registry
• Allow manufacturers to supply proprietary parsers, load at runtime
• Parser API to include way to discover schema
• Tag data with device type + firmware version at the hub/gateway
• Look up associated parser when data arrives
• (This can be done either in either the Transport or Analytics tier)
Solution: Schema Registry
• When parsers are registered, also register the associated schema
• Downstream components (Transport/Analytics Tier) discover schema
based on metadata
Who owns your IoT data?
Hint: It may not be you.
Who owns your data?
• Beware of 3rd-party device manufacturers
• Data is valuable, and everyone wants it
• Frequently exclusive access
Who owns your data?
• Device manufacturers may hoard data.
• Retention policies limit how long you can store the data.
• Aggregate/Derivative data okay, but what’s the definition?
Thank you!
Questions?
P. Taylor Goetz, Hortonworks
@ptgoetz

More Related Content

What's hot

The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
DataWorks Summit
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streams
Joey Echeverria
 
4. Communication and Network Security
4. Communication and Network Security4. Communication and Network Security
4. Communication and Network Security
Sam Bowne
 
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationFlink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Matthew Ring
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
DataWorks Summit
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
Deon Huang
 
CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)
CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)
CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)
Sam Bowne
 
Open / Free Cloud platforms and Open Hardware Systems
Open / Free Cloud platforms and Open Hardware SystemsOpen / Free Cloud platforms and Open Hardware Systems
Open / Free Cloud platforms and Open Hardware Systems
Charalampos Doukas
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and TesseractScalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
DataWorks Summit/Hadoop Summit
 
Nifi
NifiNifi
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
Mark Kerzner
 
Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFiMachine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
DataWorks Summit
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
DataStax Academy
 
How LinkedIn used TCP Anycast to make the site faster
How LinkedIn used TCP Anycast to make the site fasterHow LinkedIn used TCP Anycast to make the site faster
How LinkedIn used TCP Anycast to make the site faster
Shawn Zandi
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Aldrin Piri
 

What's hot (20)

The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streams
 
4. Communication and Network Security
4. Communication and Network Security4. Communication and Network Security
4. Communication and Network Security
 
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationFlink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data Constellation
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)
CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)
CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)
 
Open / Free Cloud platforms and Open Hardware Systems
Open / Free Cloud platforms and Open Hardware SystemsOpen / Free Cloud platforms and Open Hardware Systems
Open / Free Cloud platforms and Open Hardware Systems
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and TesseractScalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
 
Nifi
NifiNifi
Nifi
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
 
Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFiMachine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
 
IPv6 on the Interop Network
IPv6 on the Interop NetworkIPv6 on the Interop Network
IPv6 on the Interop Network
 
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
 
How LinkedIn used TCP Anycast to make the site faster
How LinkedIn used TCP Anycast to make the site fasterHow LinkedIn used TCP Anycast to make the site faster
How LinkedIn used TCP Anycast to make the site faster
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 

Similar to From Device to Data Center to Insights: Architectural Considerations for the Internet of Anything

ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
Altinity Ltd
 
IP Signal Distribution
IP Signal DistributionIP Signal Distribution
IP Signal Distribution
rAVe [PUBS]
 
Lightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTTLightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTT
Dominik Obermaier
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
Alex Moskvin
 
Can a browser become an IoT Gateway?
Can a browser become an IoT Gateway?Can a browser become an IoT Gateway?
Can a browser become an IoT Gateway?
Sooraj Sanker
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann
 
Null mumbai-iot-workshop
Null mumbai-iot-workshopNull mumbai-iot-workshop
Null mumbai-iot-workshop
Nitesh Malviya
 
Global Azure boot camp 2015 - Microsoft IoT Solutions with Azure
Global Azure boot camp 2015 - Microsoft IoT Solutions with AzureGlobal Azure boot camp 2015 - Microsoft IoT Solutions with Azure
Global Azure boot camp 2015 - Microsoft IoT Solutions with Azure
Vinoth Rajagopalan
 
Introductionto SDN
Introductionto SDN Introductionto SDN
Introductionto SDN
Md. Shariful Islam Robin
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
Bangladesh Network Operators Group
 
LinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data CenterLinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data Center
Shawn Zandi
 
5 introduction to internet
5 introduction to internet5 introduction to internet
5 introduction to internetVedpal Yadav
 
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - OptumUltralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Data Driven Innovation
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
John D Almon
 
Web technologies: recap on TCP-IP
Web technologies: recap on TCP-IPWeb technologies: recap on TCP-IP
Web technologies: recap on TCP-IPPiero Fraternali
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
ssuserd3a367
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Serverless, IoT and OpenWhisk
Serverless, IoT and OpenWhiskServerless, IoT and OpenWhisk
Serverless, IoT and OpenWhisk
Alex Glikson
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute Cluster
Ramsay Key
 

Similar to From Device to Data Center to Insights: Architectural Considerations for the Internet of Anything (20)

ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 
IP Signal Distribution
IP Signal DistributionIP Signal Distribution
IP Signal Distribution
 
Lightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTTLightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTT
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
Can a browser become an IoT Gateway?
Can a browser become an IoT Gateway?Can a browser become an IoT Gateway?
Can a browser become an IoT Gateway?
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Null mumbai-iot-workshop
Null mumbai-iot-workshopNull mumbai-iot-workshop
Null mumbai-iot-workshop
 
Global Azure boot camp 2015 - Microsoft IoT Solutions with Azure
Global Azure boot camp 2015 - Microsoft IoT Solutions with AzureGlobal Azure boot camp 2015 - Microsoft IoT Solutions with Azure
Global Azure boot camp 2015 - Microsoft IoT Solutions with Azure
 
Introductionto SDN
Introductionto SDN Introductionto SDN
Introductionto SDN
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
LinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data CenterLinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data Center
 
5 introduction to internet
5 introduction to internet5 introduction to internet
5 introduction to internet
 
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - OptumUltralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Web technologies: recap on TCP-IP
Web technologies: recap on TCP-IPWeb technologies: recap on TCP-IP
Web technologies: recap on TCP-IP
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Serverless, IoT and OpenWhisk
Serverless, IoT and OpenWhiskServerless, IoT and OpenWhisk
Serverless, IoT and OpenWhisk
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute Cluster
 

More from P. Taylor Goetz

Flux: Apache Storm Frictionless Topology Configuration & Deployment
Flux: Apache Storm Frictionless Topology Configuration & DeploymentFlux: Apache Storm Frictionless Topology Configuration & Deployment
Flux: Apache Storm Frictionless Topology Configuration & Deployment
P. Taylor Goetz
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache Storm
P. Taylor Goetz
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
P. Taylor Goetz
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
P. Taylor Goetz
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
P. Taylor Goetz
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
P. Taylor Goetz
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 

More from P. Taylor Goetz (8)

Flux: Apache Storm Frictionless Topology Configuration & Deployment
Flux: Apache Storm Frictionless Topology Configuration & DeploymentFlux: Apache Storm Frictionless Topology Configuration & Deployment
Flux: Apache Storm Frictionless Topology Configuration & Deployment
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache Storm
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 

Recently uploaded

openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 

Recently uploaded (20)

openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 

From Device to Data Center to Insights: Architectural Considerations for the Internet of Anything

  • 1. From Device to Data Center to Insights Architectural Considerations for the Internet of Anything P. Taylor Goetz, Hortonworks @ptgoetz
  • 2. About Me • Tech Staff @ Hortonworks • PMC Chair, Apache Storm • ASF Member • PMC, Apache Incubator, Apache Arrow, Apache Kylin, Apache Apex • Mentor/PPMC, Apache Eagle (Incubating), Apache Mynewt (Incubating), Apache Metron (Incubating), Apache Gossip (Incubating)
  • 3. 26 billion IoT devices by 2020 -Gartner http://www.gartner.com/newsroom/id/2636073
  • 4. IPv4 Address Space: 4.6 billion
  • 5. IoT Growth • Everyone here should know IoT is huge • Sensors, Phones, Connected Cars, Wearables, Software-as-a- Sensor, ... • Cuts across virtually all industries
  • 7. Key Architectural Tiers • Origin: Devices and Data Sources • Transport: Orchestrating Bi-Directional Data Flow Between Sources • Analytics: Analysis of Unbounded (Streaming) and Bounded (Batch) Data, and Acting in Response
  • 9. Origin Tier • Where data is born, but also a destination • Sensors and Devices • Constrained Hubs/Gateways
  • 10. Origin Tier Devices are getting smaller, cheaper, and increasingly network enabled. Examples: • RaspberryPi ($35, Full OS) • ESP8266 (<$5 WiFi-enabled microcontroller)
  • 11. Origin Tier Devices in the Origin Tier both transmit and receive data. • Command and Control • Actuators (interaction with the physical environment) • End user alerts and notifications
  • 13. IoT Protocol Considerations • Device-Device / Device-Gateway Communication • Radio Frequency Protocols • IP-based Protocols
  • 14. IoT Protocol Considerations Radio Frequency Protocols • Typically for very resource-constrained devices (Ex: Wireless sensors in a home security system) • Usually involve an intermediary hub/gateway as a protocol bridge (Ex: Main panel in a home security system) • Short range • Low Power
  • 15. Radio Frequency Protocols ZigBee • Intended for low power applications (~2 yr. battery life) • Low data rates • Simpler and less expensive that WPANs like Bluetooth
  • 16. Radio Frequency Protocols ZigBee • Range: 10–100 meters LOS (between nodes, but messages can hop in a mesh network) • Data Rate: 250 kbit/s • Supports Star, Tree, and Mesh network topologies • Requires a coordinator device for every network (usually the hub/gateway)
  • 17. Radio Frequency Protocols Z-Wave • Targets home automation • Low power/Low data rate • Proprietary • Sole chip vendor
  • 18. Radio Frequency Protocols Z-Wave • Range: ~30 meters LOS (between nodes, but messages can hop) • Data Rate: 100kbit/s • Form source-routed mesh-networks (can route around failures/obstacles) • Devices must be paired • Requires a primary controller (e.g. the hub/gateway) • Max 232 devices per network (but networks can be bridged)
  • 19. Radio Frequency Protocols Bluetooth/Blootooth LE • Targets wireless computer and device accessories • High data rates • Do not form routed networks like Zigbee and Z-Wave • Usually one host to many device pairing • Range: 0.5m (Class 4) - 100m (Class 1) • Data Rate: 1 Mbit/s - 24 Mbit/s
  • 20. Radio Frequency Protocols Thread • New wireless protocol introduced by Nest (Google/Alphabet), Samsung, ARM, Qualcomm • Built on top of the same (IEEE 802.15.4) specification as ZigBee • IPv6-based • Mesh network with hops supported • ~250 devices per network • Very low power (purported years of operation on a single AA with deep sleep modes) • Very new/unsure future — WiFi, Bluetooth, etc. already ubiquitous
  • 21. IoT Protocol Considerations IP-Based Protocols • Require a full IP stack • Higher power consumption • Longer range (e.g. WiFi)
  • 22. IP-Based Protocols CoAP - Constrained Application Protocol • Designed to be used on micro controllers with as little as 10k of memory. • Simple request/response protocol • Much like HTTP but based on UDP • Based on the REST model (GET, PUT, POST, DELETE) • Strong security via DTLS (Datagram Transport Layer Security)
  • 23. IP-Based Protocols CoAP - Constrained Application Protocol • Simple 4-byte header • Subset of MIME types and HTTP response codes • Data model agnostic • one-to-one • Tranport (UDP) <— Base Messaging (Simple Confirmable/Non- Confirmable message transfer) <— REST Semantics
  • 24. IP-Based Protocols MQTT - Message Queue Telemetry Transport • Pub/Sub messaging protocol • Requires a broker (though brokers can be lightweight) • many-to-many broadcast
  • 25. IP-Based Protocols MQTT - Message Queue Telemetry Transport • Message == Topic + Payload • Topics: users/ptgoetz/office/thermostat • Topic wildcards: • Single level (+): users/ptgoetz/+/thermostat • Multi-level (#): users/ptgoetz/office/# • Payload: Just a bunch of bytes (you define the schema)
  • 26. IP-Based Protocols MQTT - Message Queue Telemetry Transport • Delivery guarantees (QoS): • 0: At-most-once • 1: At-least-once • 2: Exactly-once • Last will and testament (when a device goes offline) • Security via SSL/TLS
  • 27. Apache Mynewt (incubating) • Real-time, modular OS for IoT devices • Designed for use in devices with power, memory and storage constraints • Support for many ARM Cortex-M based boards (including Arduino) • HAL for unified access to MCU features • Connectivity with Bluetooth LE • WiFi, CoAP, and Thread support (roadmap) • Remote Firmware Upgrades • Command-line tools for package management
  • 28. Transport Tier Data Flow From Device to Data Center
  • 29. Transport Tier • Connecting Edge Devices: • To and from the Analytics Tier (data center) • To and from one another (inter-device communication) • Bridging Protocols: • e.g. WPAN to IP • Collecting/Transforming/Enriching Data in Motion
  • 31. Apache NiFi • Data flow orchestration tool • Guaranteed Delivery • Data provenance (important in the Analytics Tier) • Backpressure with release • Flow-specific QoS • Web-based UI for editing data flows • Data flows modifiable at runtime • Supports bi-directional data flows • Integrates with just about any system
  • 32. Apache NiFi Basic Concepts • Flow File: Unit of user data with associated key-value metadata • Processor: Components for creating, sending, receiving, transforming, routing, etc. Flow Files • Connection: Acts as the link between processors. • Flow Controller: Brokers the exchange of data between processors • Process Group: Set of Processors and Connections with Input/Output ports. New components can be created by composition.
  • 33. Apache NiFi minifi • Supplement to NiFi for constrained devices/environments • More suitable for edge devices • Small footprint • Designed to collect data near where it originates an integrate with NiFi
  • 34. Apache NiFi For more information: • https://nifi.apache.org Some of the best technical documentation I’ve ever seen: • https://nifi.apache.org/docs.html
  • 36. Analytics Tier • Where IoT data often (but not always) intersects with Big Data platforms and Cloud Computing • Vertical scaling may suffice
  • 37. Analytics Tier • Many, many options… • [insert your definition of Hadoop here]
  • 38. Analytics Tier Key Platform Considerations: • Unbounded (Stream) data processing frequently necessary • Apache Storm, Apache Flink, etc. • Bounded (Batch) data processing frequently necessary • e.g. Training machine learning models, etc. • Apache Hadoop M/R, Apache Flink, Apache Spark • Time Series DB a common requirement • Apache HBase, Apache Cassandra, etc.
  • 39. Analytics Tier Key Platform Considerations: • Latency matters for many use cases • Latency can add up quickly, depending on the number of “hops” • Windowing semantics and flexibility
  • 40. When? The importance of event time(s).
  • 41. What is Event Time and why is it so important? • Event Times: Origin Time vs. Processing Time • Ex: Airplane Mode • Other types of Event Time: • Enrichment Time • Ingest Time • Processing Time 1, 2, n… • Exit Time (e.g. “return” events, C2, bi-directional communication)
  • 42. Choose a platform/API that gives you the most flexibility with respect to dealing with various event times.
  • 43. Future-Proofing and Scaling Small to Medium Scale: • Not Big Data • Investment in large-scale distributed system infrastructure wouldn’t make sense. • YAGNI (Yet…) • Vertical scaling may suffice
  • 44. Future-Proofing and Scaling Medium to Large Scale: • A single server is no longer cutting it • “V”s are starting to pile up • Need to move to a distributed architecture to scale with increasing demand • Your data is now Big
  • 45. Apache Beam (incubating) • Unified API for dealing with bounded/unbounded data sources (i.e. batch/streaming) • One API. Multiple implementations (execution engines). Called “Runners” in Beamspeak.
  • 46. Apache Beam (incubating) • Major focus on Windowing and properly dealing with Event Time(s) • Sliding Windows, Tumbling Windows, Session Windows, etc. • Watermark capabilities for dealing with late data
  • 47. Apache Beam (incubating) • Runner/Execution Engine Availability • Local runner (single machine) • Runners for Google Cloud Dataflow, Flink and Spark • Others underway: Apache Storm, Apache Apex and others
  • 48. Apache Beam (incubating) • Choose the right runner for your current scaling and organizational needs (you can switch later as as necessary) • Understand the limits of different runner implementations • Outside of Google Data Flow, the Flink runner is currently the most feature-complete (this will change)
  • 49. Apache Beam (incubating) For a technical deep dive into Apache Beam: Apache Beam: A Unified Model for Batch and Streaming Data Processing - Davor Bonaci, Google Inc. Thursday 4:10PM, Ballroom A
  • 51. Problem: Data Formats • Many IoT devices transmit data as a raw array of bytes • The format of that data may be proprietary • To be of any use it must be parsed into a machine-readable format (i.e. Schema) • Once parsed, you need to know the schema
  • 52. Problem: Firmware Versions • Deployed IoT devices may be running any number of versions • Data formats may differ between firmware versions • Multiple parsers may be necessary to accommodate different device types and firmware versions
  • 53. Solution: Parser Registry • Allow manufacturers to supply proprietary parsers, load at runtime • Parser API to include way to discover schema • Tag data with device type + firmware version at the hub/gateway • Look up associated parser when data arrives • (This can be done either in either the Transport or Analytics tier)
  • 54. Solution: Schema Registry • When parsers are registered, also register the associated schema • Downstream components (Transport/Analytics Tier) discover schema based on metadata
  • 55. Who owns your IoT data? Hint: It may not be you.
  • 56. Who owns your data? • Beware of 3rd-party device manufacturers • Data is valuable, and everyone wants it • Frequently exclusive access
  • 57. Who owns your data? • Device manufacturers may hoard data. • Retention policies limit how long you can store the data. • Aggregate/Derivative data okay, but what’s the definition?
  • 58. Thank you! Questions? P. Taylor Goetz, Hortonworks @ptgoetz

Editor's Notes

  1. That’s a lot of devices, generating a lot of data.
  2. To put that in perspective, that’s over 5 1/2 times the size of the entire IPv4 address space.
  3. Devices, Phones, Gateways and Hubs typically act as a bridge between devices and the cloud.
  4. communication is frequently bi-directional.
  5. Most IoT devices are wireless and there are a number of protocols need to be considered.
  6. fall loosely into two categories
  7. Compare to arduino — with arduino you write C++ code but don’t necessarily know it.
  8. And there’s actually one Apache project that can handle all this very well…
  9. It’s impossible to do NiFi justice in three slides.
  10. time-based aggregations
  11. Google DataFlow API recently open-sourced to Apache.