The Journey To IoT Systems Of Intelligence:
Determined By Combination of Tech and Enterprise Capabilities
Smart Grid
Adjunct Data Warehouse
Customer 360
Real-time loyalty
omni-channel
multi-touchpoint
Predictive model learns from and
anticipates consumer in near real-
time
Continuously updated predictive
models of energy supply, demand
tune end-point consumption
Autonomic Systems Management System learns “normal” behavior of apps
and infrastructure and flags or fixes anomalies
Data Lake with some production
analytics offload from Data Warehouse
Enough internal and external customer data in a pipeline
to start predictive modeling
Applications
FoundationCapabilites:
Speed,RichnessofAnalytics
2
Vendor New Services
Telco Manage capacity of towers, cells, switches, connections, devices.
Performance dashboards and reports on customer consumption for
billing and infrastructure utilization for capacity planning.
Intelligent
Service
Provider
Real-time updates/integration between individual plans, consumption,
and promotions; Real-time integration of individual consumer SLAs and
connection / bandwidth allocation in order to support tiered pricing
Use Case
Systems of Record Transition to IoT Systems of Intelligence:
From Telco OSS/BSS to Intelligent Service Provider
Use Case: Bridging Carrier App Billing and Network Operations
Customer- and developer-facing services
Billing and settlement
• App store and in-app billing via carrier billing
• Provisioning app install order on credit verification
• Settle developer royalties based on splits
Offers
• Offer discount on monthly top-up of bandwidth if user is heavy consumer over time and
approaching monthly limit
• Serve app install adds based on user profile
Network operations-facing services
Network performance and configuration management
• Real-time ingestion of CDRs to create heat map of network performance. This requires
such fast ingest that it would likely be done by streaming products in absence of in-
memory DBMS. (this is IoT machine data app example)
Bridging customer-facing and network-facing services
• Enrich CDR data with information about customer profitability
• Real-time prioritization of bandwidth on a per customer basis when there is high
congestion
Spectrum of Applications: Fast Data vs. Big Data
Fast Data Big Data
Range of “Real-Time” Interactions
• REAL RT: high frequency
algorithmic securities trading on
one end of the spectrum
• Updates every couple hours:
inventory levels accessed by
ecommerce, mobile apps at other
end of spectrum
Modern SoR makes it easier to get to
fastest part of spectrum
Real-Time is a Matter of Degree: Choices Depend on Usage Scenario, Accessibility of
Applications That Need to be Integrated – Including Legacy and Modern Systems of Record
GB
TB
PB
DataVolume
Yr Mo Day Hr Min Sec MS µS
Advanced
Analytics
Data Velocity
Data
Warehouse OLTP,
Operational Intelligence
Big Data:
Machine Learning,
Predictive Analytics
OLTP
Business Intelligence,
Production Reporting
Fast Data:
Streaming Data
Per Event Decisions
*TRADITIONAL* Analytic Trade-Off:
Speed vs. Richness
Traditional Data Warehouse
Pipeline
Time-to-analysis bottlenecked
by
• Design time: Need to
decide questions before
building the analytic
pipeline
• Runtime: Batch ETL
Data
Warehouse
OLTP
Applications
Batch ETL
Ingest: Slow
Analysis: Rich But Slow
Analytic Trade-Off:
Speed vs. Richness
Hadoop/HDFS
Iterative self-service and
incremental database design
Data provisioning
OLTP
Applications
Hadoop Data Pipeline
Time-to-analysis bottlenecked
by
• Design time: Iterative,
incremental analysis and
enrichment
• Runtime: Inherent batch
design center
Ingest: Slow
Analysis: Rich But Slow
Analytic Trade-Off:
Speed vs. Richness
OLTP
Applications
Hadoop/HDFS
Iterative self-service and
incremental database design
Hadoop Data Pipeline with
Streaming Ingest
Time-to-analysis bottlenecked by
• Design time: Still need
iterative, incremental analysis
and enrichment
• Runtime: real-time ingest but
data still needs to be stored
before rich analytics
Streaming Ingest: Fast
Analysis but Limited
Hadoop Cluster
Analysis: Rich but Slow
Stream
Processor
BOTTLENECK: DBMS
Storage *Before* Rich
Analysis
Analytic Trade-Off:
Speed vs. Richness
Hadoop Cluster
Integrated Streaming and
Persistence:
Real-Time, Rich Analysis
StoreE-Mail
Social Media
Operational apps
Customer interactions
Customer
“Breadcrumbs”
Predictions,
Recommendations
Improving
Predictions
(Machine
Learning)
Operational
Data
IoT – Devices, Machines Machine
Data
Stream
Processor
Better Integration of Real-Time and Batch:
Analytic Trade-Off Between Speed vs. Richness Diminishes
GB
TB
PB
DataVolume
Yr Mo Day Hr Min Sec MS µS
Advanced
Analytics
Data Velocity
Big *AND* Fast Data:
Machine Learning on
Historical AND Recent Data
Drives Per Event Decisions
OLTP
Better Integration of Real-Time and Batch:
Analytic Trade-Off Between Speed vs. Richness Diminishes
GB
TB
PB
BatchProcessing
Min Sec MS µS
Streaming - Velocity
Big Data
Maximum throughput of data
Exploratory analysis of historical data
Fast Data
Fastest speed to make a decision
on each event
Streaming is Newest Religious War: Use It For *All* Analytic Workloads?
Processing Lots of Data vs. Analyzing Each Event = Inherent Conflict
“Streams can do it all” school: Big Data Apps are Just
Fast Data Apps Scaled-Out
• If it can handle fast data, just scale it out to handle big
data
• Big win: only one application needed
Wikibon recommendation (elaborated on next slide):
Streaming and batch *will always* coexist
• Even batch programs on streaming platform will still
have different application logic…
• High volume machine learning vs. incremental update
• Historical performance analysis vs. looking up a profile
Latency
(Higher is
Slower)
Even When Streaming Engines Support More Sophisticated Analytic Workloads
The Applications Are Likely to Differ Between Event-at-a-Time vs. Batch
Analytic Sophistication
Basic
Streaming
SQL
Machine Learning
What Happened
Counting
What Happened
Exploration, OLAP
or Dashboard
Anticipate or Act Automatically
Prediction or Prescription
IMPLICATION:
Converging on one application engine not critical
Stream processors:
Spark, Flink, InfoStreams,
Samza, DataTorrent,
(DB): VoltDB / MemSQL
Historicalanalysis
Batch-orientedPerEvent-Oriented
Profilelookup
Explorelarge,new
data
Incrementalmodelupdate
YARN – Cluster Resource Management
HDFS or operational database
Streaming
Storm, Flink,
Samza, Data
Torrent
SQL
Impala, Drill,
Hive, HAWQ…
Machine
Learning
Mahout…
Key Takeaway: Coexistence of Batch and Streaming Means One Application Engine
Doesn’t Have to Rule All - Spark and Hadoop Can Live Together
Pro: Mix and match pipeline comprised of
specialized processing *optimized* for each
workload
Con: Batch-only - hand-off between processing
engines via storage is slow. Each processing
engine is standalone and can’t leverage the
others’ functionality
Pro: Fast and simple -
pipeline comprised of one
in-memory engine with
streaming, SQL, machine
learning, graph
personalities (libraries)
Con: still immature –
performance an issue;
haven’t fully delivered
integration – But
Tungsten per boost, IBM
projects could add huge
new valueSpark Core
Spark
MLlib
Spark
Streaming
Machine
Learning
Spark SQL:
Join, filter, aggregate
Streaming Ingest
Spark
SQL
HDFS or operational database
YARN or Mesos or other Workload Mgr

Wikibon #IoT #HyperConvergence Presentation via @theCUBE

  • 1.
    The Journey ToIoT Systems Of Intelligence: Determined By Combination of Tech and Enterprise Capabilities Smart Grid Adjunct Data Warehouse Customer 360 Real-time loyalty omni-channel multi-touchpoint Predictive model learns from and anticipates consumer in near real- time Continuously updated predictive models of energy supply, demand tune end-point consumption Autonomic Systems Management System learns “normal” behavior of apps and infrastructure and flags or fixes anomalies Data Lake with some production analytics offload from Data Warehouse Enough internal and external customer data in a pipeline to start predictive modeling Applications FoundationCapabilites: Speed,RichnessofAnalytics
  • 2.
    2 Vendor New Services TelcoManage capacity of towers, cells, switches, connections, devices. Performance dashboards and reports on customer consumption for billing and infrastructure utilization for capacity planning. Intelligent Service Provider Real-time updates/integration between individual plans, consumption, and promotions; Real-time integration of individual consumer SLAs and connection / bandwidth allocation in order to support tiered pricing Use Case Systems of Record Transition to IoT Systems of Intelligence: From Telco OSS/BSS to Intelligent Service Provider
  • 3.
    Use Case: BridgingCarrier App Billing and Network Operations Customer- and developer-facing services Billing and settlement • App store and in-app billing via carrier billing • Provisioning app install order on credit verification • Settle developer royalties based on splits Offers • Offer discount on monthly top-up of bandwidth if user is heavy consumer over time and approaching monthly limit • Serve app install adds based on user profile Network operations-facing services Network performance and configuration management • Real-time ingestion of CDRs to create heat map of network performance. This requires such fast ingest that it would likely be done by streaming products in absence of in- memory DBMS. (this is IoT machine data app example) Bridging customer-facing and network-facing services • Enrich CDR data with information about customer profitability • Real-time prioritization of bandwidth on a per customer basis when there is high congestion
  • 4.
    Spectrum of Applications:Fast Data vs. Big Data Fast Data Big Data
  • 5.
    Range of “Real-Time”Interactions • REAL RT: high frequency algorithmic securities trading on one end of the spectrum • Updates every couple hours: inventory levels accessed by ecommerce, mobile apps at other end of spectrum Modern SoR makes it easier to get to fastest part of spectrum Real-Time is a Matter of Degree: Choices Depend on Usage Scenario, Accessibility of Applications That Need to be Integrated – Including Legacy and Modern Systems of Record
  • 6.
    GB TB PB DataVolume Yr Mo DayHr Min Sec MS µS Advanced Analytics Data Velocity Data Warehouse OLTP, Operational Intelligence Big Data: Machine Learning, Predictive Analytics OLTP Business Intelligence, Production Reporting Fast Data: Streaming Data Per Event Decisions *TRADITIONAL* Analytic Trade-Off: Speed vs. Richness
  • 7.
    Traditional Data Warehouse Pipeline Time-to-analysisbottlenecked by • Design time: Need to decide questions before building the analytic pipeline • Runtime: Batch ETL Data Warehouse OLTP Applications Batch ETL Ingest: Slow Analysis: Rich But Slow Analytic Trade-Off: Speed vs. Richness
  • 8.
    Hadoop/HDFS Iterative self-service and incrementaldatabase design Data provisioning OLTP Applications Hadoop Data Pipeline Time-to-analysis bottlenecked by • Design time: Iterative, incremental analysis and enrichment • Runtime: Inherent batch design center Ingest: Slow Analysis: Rich But Slow Analytic Trade-Off: Speed vs. Richness
  • 9.
    OLTP Applications Hadoop/HDFS Iterative self-service and incrementaldatabase design Hadoop Data Pipeline with Streaming Ingest Time-to-analysis bottlenecked by • Design time: Still need iterative, incremental analysis and enrichment • Runtime: real-time ingest but data still needs to be stored before rich analytics Streaming Ingest: Fast Analysis but Limited Hadoop Cluster Analysis: Rich but Slow Stream Processor BOTTLENECK: DBMS Storage *Before* Rich Analysis Analytic Trade-Off: Speed vs. Richness
  • 10.
    Hadoop Cluster Integrated Streamingand Persistence: Real-Time, Rich Analysis StoreE-Mail Social Media Operational apps Customer interactions Customer “Breadcrumbs” Predictions, Recommendations Improving Predictions (Machine Learning) Operational Data IoT – Devices, Machines Machine Data Stream Processor Better Integration of Real-Time and Batch: Analytic Trade-Off Between Speed vs. Richness Diminishes
  • 11.
    GB TB PB DataVolume Yr Mo DayHr Min Sec MS µS Advanced Analytics Data Velocity Big *AND* Fast Data: Machine Learning on Historical AND Recent Data Drives Per Event Decisions OLTP Better Integration of Real-Time and Batch: Analytic Trade-Off Between Speed vs. Richness Diminishes
  • 12.
    GB TB PB BatchProcessing Min Sec MSµS Streaming - Velocity Big Data Maximum throughput of data Exploratory analysis of historical data Fast Data Fastest speed to make a decision on each event Streaming is Newest Religious War: Use It For *All* Analytic Workloads? Processing Lots of Data vs. Analyzing Each Event = Inherent Conflict “Streams can do it all” school: Big Data Apps are Just Fast Data Apps Scaled-Out • If it can handle fast data, just scale it out to handle big data • Big win: only one application needed Wikibon recommendation (elaborated on next slide): Streaming and batch *will always* coexist • Even batch programs on streaming platform will still have different application logic… • High volume machine learning vs. incremental update • Historical performance analysis vs. looking up a profile
  • 13.
    Latency (Higher is Slower) Even WhenStreaming Engines Support More Sophisticated Analytic Workloads The Applications Are Likely to Differ Between Event-at-a-Time vs. Batch Analytic Sophistication Basic Streaming SQL Machine Learning What Happened Counting What Happened Exploration, OLAP or Dashboard Anticipate or Act Automatically Prediction or Prescription IMPLICATION: Converging on one application engine not critical Stream processors: Spark, Flink, InfoStreams, Samza, DataTorrent, (DB): VoltDB / MemSQL Historicalanalysis Batch-orientedPerEvent-Oriented Profilelookup Explorelarge,new data Incrementalmodelupdate
  • 14.
    YARN – ClusterResource Management HDFS or operational database Streaming Storm, Flink, Samza, Data Torrent SQL Impala, Drill, Hive, HAWQ… Machine Learning Mahout… Key Takeaway: Coexistence of Batch and Streaming Means One Application Engine Doesn’t Have to Rule All - Spark and Hadoop Can Live Together Pro: Mix and match pipeline comprised of specialized processing *optimized* for each workload Con: Batch-only - hand-off between processing engines via storage is slow. Each processing engine is standalone and can’t leverage the others’ functionality Pro: Fast and simple - pipeline comprised of one in-memory engine with streaming, SQL, machine learning, graph personalities (libraries) Con: still immature – performance an issue; haven’t fully delivered integration – But Tungsten per boost, IBM projects could add huge new valueSpark Core Spark MLlib Spark Streaming Machine Learning Spark SQL: Join, filter, aggregate Streaming Ingest Spark SQL HDFS or operational database YARN or Mesos or other Workload Mgr

Editor's Notes

  • #2 Adjunct DW – put some of DL into production and offload from DW Customer 360 – based on skill pulling together the new “customer master” Autonomic Systems Management – RT machine learning on a well-bounded problem: operation of apps and their sw and how infrastructure Real-time loyalty multi-channel, multi-touchpoint =Harrah’s – and it means the model is integrated with the operational apps so it can be scored in RT, updated offline, or updated in RT Smart Grid is Autonomic Systems Management but any large scale, distributed IoT app Where we are in the customer journey (DW + DL + Adjunct DW + modeling + streaming + ML) = part working with data and part data pipeline
  • #4 Key point: show something concrete up front that pulls the reader in. This will show how delivering RT user engagement and network operations both need fast data as complement to big data
  • #5 Continuing from the last slide, this slide will lay out the technical requirements – most likely in an appendix – that distinguishes different products (without names – just categories) from the usage scenarios that they can support.
  • #6 Snapping modern SoR to SoI is easy Legacy SoR and SoI – how RT depends on use case and cost
  • #7 Once prospects have seen how apps need both big and fast data, this section will start to explain how the data platforms that support them are very different. *key* it’s not about choosing big vs. fast data. It’s about getting the best of both since they are complementary!
  • #12 Once prospects have seen how apps need both big and fast data, this section will start to explain how the data platforms that support them are very different. *key* it’s not about choosing big vs. fast data. It’s about getting the best of both since they are complementary!
  • #13 Once prospects have seen how apps need both big and fast data, this section will start to explain how the data platforms that support them are very different. *key* it’s not about choosing big vs. fast data. It’s about getting the best of both since they are complementary!
  • #14 High volume machine learning vs. incremental update Historical performance analysis vs. looking up an order streaming vs. batch about latency vs. throughput Batch goes thru sequentially instead of randomly; by doing that, you could maximize speed; in memory, you bring in / page in all related items – once they’re in a page, you stream thru the batch collection RT – take an order and get payment - but processing to factory or vendor so that got pushed into a batch for efficiency (hour or day or week); in memory; here every item has to come into memory on its own: calculate availability, calculate shipment length… Batch – get higher throughput – faster in elapsed time for total job Streaming – can act on partial results – more dynamic in that you can react in real-time Over time, more will shift to Streaming When deciding on use of new streams, will use Batch to try them out Spectrum of Analytics BI: exploration, what happened Data mining: why did it happen Machine learning / predictive: what is likely to happen Optimization / prescription: learn and act automatically in closed loop Stream SQL/BI ML Optimization/prescription
  • #15 Analytic data pipeline, Hadoop style