Wikibon #IoT #HyperConvergence Presentation via @theCUBE

The Journey To IoT Systems Of Intelligence:
Determined By Combination of Tech and Enterprise Capabilities
Smart Grid
Adjunct Data Warehouse
Customer 360
Real-time loyalty
omni-channel
multi-touchpoint
Predictive model learns from and
anticipates consumer in near real-
time
Continuously updated predictive
models of energy supply, demand
tune end-point consumption
Autonomic Systems Management System learns “normal” behavior of apps
and infrastructure and flags or fixes anomalies
Data Lake with some production
analytics offload from Data Warehouse
Enough internal and external customer data in a pipeline
to start predictive modeling
Applications
FoundationCapabilites:
Speed,RichnessofAnalytics

2
Vendor New Services
Telco Manage capacity of towers, cells, switches, connections, devices.
Performance dashboards and reports on customer consumption for
billing and infrastructure utilization for capacity planning.
Intelligent
Service
Provider
Real-time updates/integration between individual plans, consumption,
and promotions; Real-time integration of individual consumer SLAs and
connection / bandwidth allocation in order to support tiered pricing
Use Case
Systems of Record Transition to IoT Systems of Intelligence:
From Telco OSS/BSS to Intelligent Service Provider

Use Case: Bridging Carrier App Billing and Network Operations
Customer- and developer-facing services
Billing and settlement
• App store and in-app billing via carrier billing
• Provisioning app install order on credit verification
• Settle developer royalties based on splits
Offers
• Offer discount on monthly top-up of bandwidth if user is heavy consumer over time and
approaching monthly limit
• Serve app install adds based on user profile
Network operations-facing services
Network performance and configuration management
• Real-time ingestion of CDRs to create heat map of network performance. This requires
such fast ingest that it would likely be done by streaming products in absence of in-
memory DBMS. (this is IoT machine data app example)
Bridging customer-facing and network-facing services
• Enrich CDR data with information about customer profitability
• Real-time prioritization of bandwidth on a per customer basis when there is high
congestion

Spectrum of Applications: Fast Data vs. Big Data
Fast Data Big Data

Range of “Real-Time” Interactions
• REAL RT: high frequency
algorithmic securities trading on
one end of the spectrum
• Updates every couple hours:
inventory levels accessed by
ecommerce, mobile apps at other
end of spectrum
Modern SoR makes it easier to get to
fastest part of spectrum
Real-Time is a Matter of Degree: Choices Depend on Usage Scenario, Accessibility of
Applications That Need to be Integrated – Including Legacy and Modern Systems of Record

GB
TB
PB
DataVolume
Yr Mo Day Hr Min Sec MS µS
Advanced
Analytics
Data Velocity
Data
Warehouse OLTP,
Operational Intelligence
Big Data:
Machine Learning,
Predictive Analytics
OLTP
Business Intelligence,
Production Reporting
Fast Data:
Streaming Data
Per Event Decisions
*TRADITIONAL* Analytic Trade-Off:
Speed vs. Richness

Traditional Data Warehouse
Pipeline
Time-to-analysis bottlenecked
by
• Design time: Need to
decide questions before
building the analytic
pipeline
• Runtime: Batch ETL
Data
Warehouse
OLTP
Applications
Batch ETL
Ingest: Slow
Analysis: Rich But Slow
Analytic Trade-Off:
Speed vs. Richness

Hadoop/HDFS
Iterative self-service and
incremental database design
Data provisioning
OLTP
Applications
Hadoop Data Pipeline
Time-to-analysis bottlenecked
by
• Design time: Iterative,
incremental analysis and
enrichment
• Runtime: Inherent batch
design center
Ingest: Slow
Analysis: Rich But Slow
Analytic Trade-Off:
Speed vs. Richness

OLTP
Applications
Hadoop/HDFS
Iterative self-service and
incremental database design
Hadoop Data Pipeline with
Streaming Ingest
Time-to-analysis bottlenecked by
• Design time: Still need
iterative, incremental analysis
and enrichment
• Runtime: real-time ingest but
data still needs to be stored
before rich analytics
Streaming Ingest: Fast
Analysis but Limited
Hadoop Cluster
Analysis: Rich but Slow
Stream
Processor
BOTTLENECK: DBMS
Storage *Before* Rich
Analysis
Analytic Trade-Off:
Speed vs. Richness

Hadoop Cluster
Integrated Streaming and
Persistence:
Real-Time, Rich Analysis
StoreE-Mail
Social Media
Operational apps
Customer interactions
Customer
“Breadcrumbs”
Predictions,
Recommendations
Improving
Predictions
(Machine
Learning)
Operational
Data
IoT – Devices, Machines Machine
Data
Stream
Processor
Better Integration of Real-Time and Batch:
Analytic Trade-Off Between Speed vs. Richness Diminishes

GB
TB
PB
DataVolume
Yr Mo Day Hr Min Sec MS µS
Advanced
Analytics
Data Velocity
Big *AND* Fast Data:
Machine Learning on
Historical AND Recent Data
Drives Per Event Decisions
OLTP
Better Integration of Real-Time and Batch:
Analytic Trade-Off Between Speed vs. Richness Diminishes

GB
TB
PB
BatchProcessing
Min Sec MS µS
Streaming - Velocity
Big Data
Maximum throughput of data
Exploratory analysis of historical data
Fast Data
Fastest speed to make a decision
on each event
Streaming is Newest Religious War: Use It For *All* Analytic Workloads?
Processing Lots of Data vs. Analyzing Each Event = Inherent Conflict
“Streams can do it all” school: Big Data Apps are Just
Fast Data Apps Scaled-Out
• If it can handle fast data, just scale it out to handle big
data
• Big win: only one application needed
Wikibon recommendation (elaborated on next slide):
Streaming and batch *will always* coexist
• Even batch programs on streaming platform will still
have different application logic…
• High volume machine learning vs. incremental update
• Historical performance analysis vs. looking up a profile

Latency
(Higher is
Slower)
Even When Streaming Engines Support More Sophisticated Analytic Workloads
The Applications Are Likely to Differ Between Event-at-a-Time vs. Batch
Analytic Sophistication
Basic
Streaming
SQL
Machine Learning
What Happened
Counting
What Happened
Exploration, OLAP
or Dashboard
Anticipate or Act Automatically
Prediction or Prescription
IMPLICATION:
Converging on one application engine not critical
Stream processors:
Spark, Flink, InfoStreams,
Samza, DataTorrent,
(DB): VoltDB / MemSQL
Historicalanalysis
Batch-orientedPerEvent-Oriented
Profilelookup
Explorelarge,new
data
Incrementalmodelupdate

YARN – Cluster Resource Management
HDFS or operational database
Streaming
Storm, Flink,
Samza, Data
Torrent
SQL
Impala, Drill,
Hive, HAWQ…
Machine
Learning
Mahout…
Key Takeaway: Coexistence of Batch and Streaming Means One Application Engine
Doesn’t Have to Rule All - Spark and Hadoop Can Live Together
Pro: Mix and match pipeline comprised of
specialized processing *optimized* for each
workload
Con: Batch-only - hand-off between processing
engines via storage is slow. Each processing
engine is standalone and can’t leverage the
others’ functionality
Pro: Fast and simple -
pipeline comprised of one
in-memory engine with
streaming, SQL, machine
learning, graph
personalities (libraries)
Con: still immature –
performance an issue;
haven’t fully delivered
integration – But
Tungsten per boost, IBM
projects could add huge
new valueSpark Core
Spark
MLlib
Spark
Streaming
Machine
Learning
Spark SQL:
Join, filter, aggregate
Streaming Ingest
Spark
SQL
HDFS or operational database
YARN or Mesos or other Workload Mgr

Wikibon #IoT #HyperConvergence Presentation via @theCUBE

More Related Content

What's hot

Viewers also liked

Similar to Wikibon #IoT #HyperConvergence Presentation via @theCUBE

Recently uploaded

Wikibon #IoT #HyperConvergence Presentation via @theCUBE

Editor's Notes