More Related Content Similar to How to Build Continuous Ingestion for the Internet of Things (20) More from Cloudera, Inc. (20) How to Build Continuous Ingestion for the Internet of Things1. 1© Cloudera, Inc. All rights reserved.
Beyond ETL: How to Build
Continuous Ingestion for IOT
Sean Anderson | Product Marketing, Cloudera
Kirit Basu | Director of Product Management,
Streamsets
2. 2© Cloudera, Inc. All rights reserved.
Agenda
The Internet of Things
I. Driving Data Growth
II. Real-time Capabilities
III. An IOT Data Platform
IV. Cloudera Enterprise for IOT
The IOT Use Case
I. Packaged Goods
II. Sensor Data
III. Real-time Processing
Streamsets Platform
I. Data Collector
II. Data KPI’s
III. Containerized Architecture
IV. Real-time Analytics with
Cloudera
Demo
3. 3© Cloudera, Inc. All rights reserved.
Poll
Are you currently collecting sensor data?
• Yes
• No
• Plan to in the future
4. 4© Cloudera, Inc. All rights reserved.
Internet of Things (IoT) – A Revolution In The Making
$1.7
Trillion
In Value
20%
Annual Growth
30 Billion
Things
250
Million
Connected Vehicles
Source - IDC & Gartner Estimates
Internet of
Things
IoT Markets - 2020
5. 5© Cloudera, Inc. All rights reserved.
IoT Will Drive An Explosion of Data…
Data expected to explode to
44 ZB by 2020
Source: IDC
44 Trillion GB!80% of data will be
unstructured
6. 6© Cloudera, Inc. All rights reserved.
Value is Maximized when Data is combined from
other sources
Value of Data is multiplied when you combine
and correlate it with other data from relevant
sources
Improvement in value that can be
unlocked by combining data from
multiple IoT applications and sources
SOURCE: McKinsey Global Institute analysis
“Interoperability would significantly improve performance by
combining sensor data from different machines and systems to provide
decision makers with an integrated view of performance across an
entire factory or oil rig.”
40%
7. 7© Cloudera, Inc. All rights reserved.
The IoT Ecosystem
Consumer
Industrial
IoT Gateway
Cloud
Data Center
Data Analytics
Sensors/ Things
8. 8© Cloudera, Inc. All rights reserved.
The IoT Ecosystem
Consumer
Industrial
IoT Gateway
Data Center
Data Analytics
Sensors/ Things
Data Characteristics
• Un-structured
• Intermittent
• Volume & Variety
Gateway
• Data Routing
• Edge-Processing
• Edge-Storage
Sensors/ Things
•To grow by 50X
•Drop in prices by
70% in last 5 years
Data Storage, Processing & Analytics
IOT Data Characteristics
• More processing in the
cloud
• Analytics on the cloud
IOT Data Analytics
• Key to Value Creation
• Combine data from multiple
sources & types
• Drive business insights
IOT Data Characteristics
• Distributed Data
Processing
• Cloud & On-Premise
Cloud
9. 9© Cloudera, Inc. All rights reserved.
Key Attributes For Next Gen IoT Data Platform
Scale efficiently based on
your data growth
Effectively handle multiple
data-types and structures
Manage the complexity of
real-time IoT data ingest
Fundamentally Secure
Real-Time Analytics – Combine and
analyze data from multiple sources
Flexible deployment options
- Cloud & Distributed Data Processing
10. 10© Cloudera, Inc. All rights reserved.
FILESYSTEM RELATIONAL
Cloudera Enterprise – Making Hadoop Fast, Easy, and Secure
OPERATIONS
Cloudera Manager
Cloudera Director
DATA
MANAGEMENT
Cloudera Navigator
Encrypt and KeyTrustee
Optimizer
BATCH
Sqoop
REAL-TIME
Kafka, Flume
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT
YARN
SECURITY
Sentry, RecordService
FILESYSTEM
HDFS
RELATIONAL
Kudu
NoSQL
HBase
STORE
INTEGRATE
BATCH
Spark, Hive, Pig
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
SDK
Partners
CLOUDERA ENTERPRISE
11. 11© Cloudera, Inc. All rights reserved.
Cloudera Enterprise – The Data & Analytics Platform for IoT
Sensors/ IoT
Data Sources
Internal Systems External Sources
BI Solutions Real-Time AppsSearch EDWDiscove
r
Machine
Learning
Data Center
Cloud
Sensor/ IoT Data
IoT Gateway
• Data Storage
• Data Processing
• Machine Learning
• Real-time Analytics
OPERATIONS
Cloudera Manager
Cloudera Director
DATA
MANAGEMENT
Cloudera Navigator
Encrypt and KeyTrustee
Optimizer
BATCH
Sqoop
REAL-TIME
Kafka, Flume
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT
YARN
SECURITY
Sentry, RecordService
FILESYSTEM
HDFS
RELATIONAL
Kudu
NoSQL
HBase
STORE
INTEGRATE
BATCH
Spark, Hive, Pig
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
SDK
Partners
12. 12© Cloudera, Inc. All rights reserved.
Cloudera Enterprise – Real Time Analytics for IoT
BI Solutions Real-Time AppsSearch EDWDiscover Machine
Learning
Deployment
Flexibility
Spark Streaming
Leadership in Spark
Integrated with EDH
Flexible Storage
Store any and all Data.
Kudu - Fast Analytics on
Fast Data
Real-Time Data
Processing
Data Security
Four pillars of security: Perimeter,
Access, Visibility, and Data
+ Record Service
Streaming Ingest
Kafka & Flume - Real-Time
Data Ingest for streaming,
high volume data
Sensor/ IoT Data Internal Systems External Sources
Centralized Mgmt.
Cloudera Manager for
centralized cluster
management
Manage Multiple Clusters – On
Premise or Cloud environment
- On Premise or Cloud
OPERATIONS
Cloudera Manager
Cloudera Director
DATA
MANAGEMENT
Cloudera Navigator
Encrypt and KeyTrustee
Optimizer
BATCH
Sqoop
REAL-TIME
Kafka, Flume
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT
YARN
SECURITY
Sentry, RecordService
FILESYSTEM
HDFS
RELATIONAL
Kudu
NoSQL
HBase
STORE
INTEGRATE
BATCH
Spark, Hive, Pig
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
SDK
Partners
13. 13© Cloudera, Inc. All rights reserved.
The Cloudera Difference
Powerful Cluster Ops
Trusted by the pros
Cloud & Hybrid deployment
Integrated with AWS & Azure
Expert Support
Dedicated prescriptive help, just a click away
Real-Time IoT Analytics
The most experience with Spark
The Fastest Analytic SQL
Lowest latency, best concurrency
Fast, Updateable Analytic Storage
High throughput, low latency, and updates
Easy to ManageFast for Business Security without Compromise
Enterprise Encryption
Protects everything transparently
Access Policy Enforcement
Full-stack row/column-based RBAC & dynamic masking
Automated Data Management
Full-stack audit, lineage, discovery, and lifecycle
Secure Operations
Separation of duties, log data redaction
14. 14© Cloudera, Inc. All rights reserved.
Continuous Data Ingestion with Cloudera & StreamSets
StreamSets enables easy onboarding and effortless data ingest into all
components of CDH
Reliable, Scalable, Always-
on Data Ingest
20. Poll
Where are you in your development effort for bringing IoT data into Hadoop?
• In Production
• Test and Development
• Planning (Already decided on the architecture)
• Not there yet (Need to decide on an architecture)
• Current Architecture doesn’t work, need a better way to do things
21. Challenges with IoT Data
• Multitude of Sensors
• Real-Time Streaming
• Multiple Firmware versions
• Bad data from damaged sensors
• Regulatory Constraints
• Data Quality
23. 23© Cloudera, Inc. All rights reserved.
Getting Started is Easy
Watch the
Beyond ETL
Series
Download the
Streamsets
Data Collector
Contact Us to
start a POC
1 2 3