Stream Processing as Game Changer for Big Data and Internet of Things by Kai Wahner

Kai Wähner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.kai-waehner.de
Big Data Spain @ Madrid (November 2016)
Comparison of Streaming Analytics Frameworks

© Copyright 2000-2016 TIBCO Software Inc.
Key Take-Aways
• Streaming Analytics processes Data while it is in Motion!
• Automation and Proactive Human Interaction are BOTH needed!
• Streaming Analytics is Complementary to Hadoop and Machine Learning!

Agenda
• Real World Use Cases
• Introduction to Streaming Analytics
• Market Overview
• Relation to other Big Data Components
• Live Demo

Analyze and Act on Critical Business Moments

Success Story
Predictive
Fault Management

“An outage on one well can cost $10M per
hour. We have 20-100 outages per year.“
- Drilling operations VP, major oil company

Data Monitoring
• Motor temperature
• Motor vibration
• Current
• Intake pressure
• Intake
temperature
Ø Flow
Electrical power cable
Pump
Intake
Protector
ESP motor
Pump monitoring unit
Electric Submersible
Pumps (ESP)
Predictive Analytics - Fault Management

Voltage
Temperature
Vibration
Device
history
Temporal analytic: “If vibration spike is followed by temp spike then
voltage spike [within 4 hours] then flag high severity alert.”
Predictive Analytics - Fault Management

Live Surveillance of Equipment
Continuous, live geospatial display of pump health
and predictive signal breeches
Alerts based on
predictive signals
Compare live readings and signals
to historical average and means
Continuous, live visualization of
stats per 100’s of wells

Success Story
Crowd Management

“Turn the customer into a fan and increase
revenue significantly.“

World’s Smartest Building

All Customers are different… Treat them that way…
14
Capture – Engage – Expand - Monetize
Patterns – Real time
MOREPERSONAL
MORE CONTEXT
social
CRM
POS
mobileweb
e-mails

Success Story
Smart Manufacturing

““For every 1% increase in shipped
product, we make $11MM in profit. The
demand is there, we just need to fulfill it.“
- Head of Quality, Solar Panel Manufacturer

Scenario: Predictive Scrapping of Parts in an Assembly Line
Goal: Scrap parts as early as possible automatically to reduce costs in a manufacturing process.
Question: When to scrap a part in Station 1 instead of doing re-work or sending it to Station 2?
Station 1 Station 2
Cost Before
9€
7€ 13€
Total Cost
29€
(or more)
Scrap? Scrap?

Machine Learning Applied to Sensor Events in Real Time
Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)

Great success stories, but …
… how to realize these use cases?

Traditional Data Processing: ”Request – Response”
Store
Analyze
Act

Traditional Data Processing: ”Request – Response”
• Data is collected from a variety of
sources, and placed in a persistent
store.
– Relational database.
– NoSQL store.
– Hadoop environment.
• Analytical processes are executed
against the stored data to detect
opportunities or threats.
• Actions are identified, delivered,
and executed across various
business channels.
Store
Analyze
Act

Traditional Data Processing: Challenges
Store
Analyze
Act
• Introduces too much “decision
latency” into the business.
• Responses are delivered “after-the-
fact”.
• Maximum value of the identified
situation is lost.
– Cross-sell / up-sell opportunities are
lost, impending equipment failure is
missed, business processes are slow
to respond and lack timely context.
• Decisions are made on old and stale
data.

Event Value Decreases Over TimeValue
Time

Event Value Decreases Over TimeValue
Time
• Events are often most
valuable “close to” the
point of collection.
• As time passes, events tend
to lose their value.
• The ability to proactively
identify “threats” or
“opportunities” will typically
decrease.
• Real-time capability is
needed to maximize event
value.

The New Era: Streaming Analytics
Act &
Monitor
Analyze
Store

The New Era: Streaming Analytics
• Events are analyzed and processed in
real-time as they arrive.
• Decisions are timely, contextual, and
based on fresh data.
• Decision latency is eliminated, resulting
in:
ü Superior Customer Experience
ü Operational Excellence
ü Instant Awareness and Timely Decisions
Act &
Monitor
Analyze
Store

Streaming Analytics: What Is A “Stream”?
Clickstream
Sensors
Social Data
Logs
• Consists of pieces of data
typically generated due to a
change of state.
• One or more identifiers
• Timestamp & payload
• Immutable
• Typically unbounded; there is no
end to the data.
• Batch dataset: “bounded”.
• Can be raw or derived.

Streaming Analytics Processing Pipeline
APIs
Adapters /
Channels
Integration
Messaging
Stream Ingest
Transformation
Aggregation
Enrichment
Filtering
Stream
Preprocessing
Process
Management
Analytics
(Real Time)
Applications
& APIs
Analytics /
DW Reporting
Stream
Outcomes
• Contextual Rules
• Windowing
• Patterns
• Deep ML
• Analytics
• …
Stream Analytics &
Processing
Index / SearchNormalization

Streaming Analytics Processing Pipeline
Separation of concerns
to easily adjust one part in response to
changing business requirements
without the need for rewriting other parts!

Streaming Analytics: Ingest
APIs
Adapters /
Channels
Integration
Messaging
Stream Ingest
• Stream data may come from a number sources,
either at the edge, in the data center, or via the
cloud.
• Need to handle a variety of data formats and protocols, all at global
scale.
• Pay attention to “event time” vs. “processing time”
!!
• Event Time: Time the event was created.
• Processing Time: Time the event was received or processed.
• Event time is typically more relevant, and will lead
to more predictable results.
• Eliminate time skew associated with clock synchronization, system
outages, processing latency, network issues, etc.

Streaming Analytics: Preprocessing
Transformation
Aggregation
Enrichment
Filtering
Stream
Preprocessing
Normalization • Stream data often needs to be manipulated before it is
processed by downstream components.
• Normalization
• Transformation
• May filter unwanted events close to the source to
eliminate “noise”.
• Events may also be enriched with additional context to
provide additional data for further processing.
• Customer details, equipment details, location information, etc.
• Data may be stored in a high-speed cache or other store for rapid
access.

Streaming Analytics: Processing
Batch
• Transform
• Deep ML
• Analytics
• Data Lake
• …
Stream Analytics &
Processing
Real-Time
• RT Analytics
• Contextual
Rules
• Windowing
• Patterns
• …
• Streams may be immediately pushed to a data lake.
• May be raw or preprocessed.
• Used for subsequent analysis as part of an immutable data layer.
• Typically processed in batch in this part of the architecture.
• In parallel, streams may be processed in real-time
against a number of constructs.
• Real-time analytics.
• Graph analysis / Geo Analysis
• Rules.
• Results from the real-time processing may be fed into
the batch component.
• The results of batch processing may also be pushed into the real-
time layer.

Dataflow Streaming Pipeline – Extract, Transform, Load in Real Time
https://www.linkedin.com/pulse/data-pipeline-hadoop-part-1-2-birender-saini

Streaming Analytics: “Windows”
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101

Automation and Augmented Intelligence for Humans
Actions by Operations
Human decisions in real time informed
by up to date information
38
Automated action based on models of history
combined with live context and business rules
Machine-to-Machine Automation

Big Data Reference Architecture
Augmented Intelligence
Operations
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming AnalyticsAction
Aggregate
Rules
Stream Processing
Analytics
Correlate
Continuous query
processing
Alerts
Manual action,
escalation
Data Discovery
Python
R
Data
Scientists
Cleansed
Data
History
Visual Analytics
Spark
Integration
ERP MDM DB WMS
SOA / Microservices
BIG DATA
Data Warehouse, Hadoop
Internal Data
Integration Bus
API
Event Server
H2O.ai
Live UI

Streaming Analytics Market Growing Significantly
“Everything Flows:
The value of stream processing
and streaming integration”
(September 2016)
http://hortonworks.com/info/value-streaming-integration/

Alternatives for Stream Processing
Time
to
Market
Streaming
Frameworks
Streaming
Products
Slow Fast
Streaming
Concepts
IncludesIncludes

Concepts (Continuous Queries, Sliding Windows)
Patterns (Counting, Sequencing, Tracking, Trends)
Build everything by yourself! L
Time
to
Market
Streaming
Frameworks
Streaming
Products
Slow Fast
Streaming
Concepts

Usually not an option ...
… as there are a lot of
Frameworks and
Products available!

Library (Java, .NET, Python)
Query Language (often similar to SQL)
Scalability (horizontal and vertical, fail over)
Connectivity (technologies, markets, products)
Operators (Filter, Sort, Aggregate)
Time
to
Market
Streaming
Frameworks
Streaming
Products
Slow Fast
Streaming
Concepts
Different frameworks
(ingest, preprocess, analytics)
combined!

Example for an Open Source Streaming Pipeline
http://hortonworks.com/hadoop-tutorial/realtime-event-processing-nifi-kafka-storm
“Realtime Event Processing in Hadoop with Apache NiFi, Kafka and Storm”

Dataflow Streaming Pipeline (Ingest, Preprocess)
Operations
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Aggregate
Rules
Stream Processing
Analytics
Correlate
Continuous query
processing
Alerts
Manual action,
escalation
Data Discovery
Python
R
Data
Scientists
Cleansed
Data
History
Visual Analytics
Spark
Integration
ERP MDM DB WMS
SOA / Microservices
BIG DATA
Internal Data
Integration Bus
API
Event Server
H2O.ai
Live UI

Open Source Dataflow Streaming Pipelines

Streaming Analytics
Operations
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Aggregate
Rules
Stream Processing
Analytics
Correlate
Continuous query
processing
Alerts
Manual action,
escalation
Data Discovery
Python
R
Data
Scientists
Cleansed
Data
History
Visual Analytics
Spark
Integration
ERP MDM DB WMS
SOA / Microservices
BIG DATA
Internal Data
Integration Bus
API
Event Server
H2O.ai
Live UI

Frameworks and Products (no complete list!)
OPEN SOURCE CLOSED SOURCE
PRODUCT
FRAMEWORK
Azure Microsoft
Stream Analytics
Google Cloud
Dataflow

Apache Storm
Spout Bolt

Apache Storm – Hello World
http://wpcertification.blogspot.ch/2014/02/helloworld-apache-storm-word-counter.html

AWS Kinesis – Integration with other AWS Components
https://aws.amazon.com/kinesis/
AWS S3 RedShift DynamoDB

AWS Kinesis – Hello World

AWS Kinesis – Public Cloud Trade-Off
… is easy to setup and scale.
But you do not have full control! L
• Any data that is older than 24 hours is automatically deleted
• Every Kinesis application consists of just one procedure, so you can’t use Kinesis
to perform complex stream processing unless you connect multiple applications
• Kinesis can only support a maximum size of 50KB for each data item
http://diamondstream.com/amazon-kinesis-big-real-time-data-processing-solution/
(blog post from 2014, might be outdated, but shows that you do not have full control over a cloud service)

Apache Spark
General Data-processing Framework
à However, focus is especially on Analytics (these days)
x

Apache Spark – Focus on Analytics
http://aptuz.com/blog/is-apache-spark-going-to-replace-hadoop/
http://fortune.com/2015/09/09/cloudera-spark-mapreduce/
http://www.ebaytechblog.com/2014/05/28/using-spark-to-ignite-data-analytics/
http://www.forbes.com/sites/paulmiller/2015/06/15/ibm-backs-apache-spark-for-big-data-analytics/
“[IBM’s initiatives] include:
• deepening the integration between Apache
Spark and existing IBM products like the
Watson Health Cloud;
• open sourcing IBM’s existing SystemML
machine learning technology;

Spark Streaming
Spark Streaming
• is no real streaming solution
• uses micro-batches
• cannot process data in real-time (i.e. no ultra-low latency)
• allows easy combination with other Spark components (SQL, Machine Learning, etc.)

Apache Spark – Hello World
Spark Streaming API
Spark Core API

Apache Spark – as a Cloud Service

Apache Flink
Spark Streaming
• „Newcomer“
• Looks very similar to Spark
• But „Streaming First“ concept

Apache Beam
Generic API with unified programming model for stream processing frameworks
http://www.slideshare.net/DataTorrent/apache-beam-incubating-67428372

Library (Java, .NET, Python)
Query Language (often similar to SQL)
Scalability (horizontal and vertical, fail over)
Connectivity (technologies, markets, products)
Operators (Filter, Sort, Aggregate)
Time
to
Market
Streaming
Frameworks
Streaming
Products
Slow Fast
Streaming
Concepts
Single Tool (Complete Processing Pipeline)
Visual IDE (Dev, Test, Debug)
Simulation (Feed Testing, Test Generation)
Live UI (monitoring, proactive interaction)
Maturity (24/7 support, consulting)
Integration (out-of-the-box: ESB, MDM, Analytics, etc.)

Dataflow Streaming Pipeline + Streaming Analytics
Operations
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Aggregate
Rules
Stream Processing
Analytics
Correlate
Continuous query
processing
Alerts
Manual action,
escalation
Data Discovery
Python
R
Data
Scientists
Cleansed
Data
History
Visual Analytics
Spark
Integration
ERP MDM DB WMS
SOA / Microservices
BIG DATA
Internal Data
Integration Bus
API
Event Server
H2O.ai
Live UI

IBM Streams

TIBCO StreamBase
• Performance: Latency, Throughput, Scalability
• Multi-threaded and clustered server from version 1
• High throughput: Millions of messages, 100,000s of quotes, 10,000s of orders
• Low-latency: microsecond latency for algo trading, pre-trade risk, market data
• Take Advantage of High Performance Hardware
• Multicore (12, 24, 32 core) large memory (10s of gigabytes)
• 64-bit Linux, Windows, Solaris deployment
• Hardware acceleration (GPU, Solace, Tervela)
• Enterprise Deployment
• High availability and fault tolerance
• Distributed state management for large data sets
• Management and monitoring tools
• Security and entitlements Integration
• Continuous deployment and QA Process Support
StreamSQL compiler
and static optimizer
In process, in thread
adapter architecture
Visual parallelism
and scaling
In-Memory Data Grid
integration for
distributed shared state
Data parallelism
and dispatch
StreamBase Server
Innovations

TIBCO StreamBase - Visual Programming
Aggregate
Capture card activations per
location
Sales too high
à Fraud
Log to any
database
No Fraud
Sales too high?

Visual Debugger
Feed Simulation
Unit Testing
StreamBase Development StudioTIBCO StreamBase - Visual Programming

Live UI for Augmented Intelligence
Operations
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Aggregate
Rules
Stream Processing
Analytics
Correlate
Continuous query
processing
Alerts
Manual action,
escalation
Data Discovery
Python
R
Data
Scientists
Cleansed
Data
History
Visual Analytics
Spark
Integration
ERP MDM DB WMS
SOA / Microservices
BIG DATA
Internal Data
Integration Bus
API
Event Server
H2O.ai
Live UI

Live User Interface
Live UI
Continuous Query Processor Alerts
CEP
MQTT
JMS
In-Memory Data Grid
Integration
Social Media Data
Market Data
Sensor Data
Historical
Data
In-Memory Data Grid
Enterprise
dataMarket Data
IoT
Mobile
Social
Browser / App
Command & Control
ACTION
Continuous Query

Live UI in Desktop / Web Browser / Mobile App
Dynamic aggregation
Live visualization
Ad-hoc continuous query
Alerts
Action

Live UI - Products
Characteristics to Check
• Alternative clients (rich client, browser,
mobile app)
• Maturity for enterprise use cases
• Performance and scalability
• “Big data native” deployment (YARN, Mesos)
• Monitoring and proactive actions
• Streaming engine under the hood (not just
visualization layer)
• New Ad-hoc queries by the business user
(without the help of IT department)
• Various visual components
• Extendibility (graphical designer vs. coding)
… or build your own solution using Websockets, Angular JS, etc.

Spoilt for Choice
Does it make sense to
combine frameworks
and products?

Customer Example: Apache Storm + TIBCO Live Datamart
External
Data
Snapshot
Results
Continuous Query Processor
Query
TIBCO Live Datamart
Continuous
Alerting
Active Tables Active Tables
Continuous
Updates
Clients
Message
Bus
Public
Data
Customer
Data
StreamBase
Bolt
StreamBase
Spout
Operational
Data
StreamBase Bolt and Spout connect
Apache Storm to StreamBase to provide
real-time analytics on operational data

Closed Loop: Understand – Anticipate – Act

Closed Loop: Understand – Anticipate – Act
Insights Actions
MONITOR
PREDICT
ACT
DECIDE
MODEL
ORGANIZE
ANALYZE
WRANGLE

Data Discovery via Visual Analytics, Big Data and Machine Learning
Operations
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Aggregate
Rules
Stream Processing
Analytics
Correlate
Continuous query
processing
Alerts
Manual action,
escalation
Data Discovery
Python
R
Data
Scientists
Cleansed
Data
History
Visual Analytics
Spark
Integration
ERP MDM DB WMS
SOA / Microservices
BIG DATA
Internal Data
Integration Bus
API
Event Server
H2O.ai
Live UI

Find Insights and Patterns in Historical Data
Visual Analytics + Machine Learning

Apply Insights and Analytic Models to Proactive Actions
Streaming
AnalyticsH20.ai
Open Source
R
TERR
Spark ML
MATLAB
SAS
PMML

80% of betting happens
AFTER the game begins
TODAY

Case Study: Streaming Analytics for Betting
• Situation: Today, 80% of Betting is Done After the
Game Starts
• It’s not your father’s bookie anymore!
• Problem: How to Analyze Big Betting Data?
• Thousands of concurrent games, constantly adjusting odds, dozens of
betting networks – firms must correlate millions of events a day to
find the best betting opportunities in real-time
• Solution: TIBCO for Fast Data Architecture
• TXOdds uses TIBCO to correlate, aggregate, and analyze large
volumes of streaming betting data in real-time and publish innovative
predictive betting analytics to their customers
• Result: TXOdds First to Market with Innovative Zero
Latency Betting Analytics
• Innovative real-time analytics help players who can process electronic
data in real-time the edge
“With StreamBase, in two
months we had our first
betting analytics feed live,
and we continually deploy
new ideas and evolve our
old ones.”
- Alex Kozlenkov, VP of technology,
TXOdds

Big Data Architecture for Streaming Betting Analytics
Event Processing
MONITOR
REAL-TIME ANALYTICS
AGGREGATE
HISTORICAL COMPARISON
Predictive
odds analytics
Zero Latency
Betting Analytics
GLOBAL, DISTRIBUTED INFRASTRUCTURE
Historical odds
deviations
B
U
S
BETTING LINES
SCORES
NEWS
HADOOP
Context:
Historical Betting
Data, Odds,
Outcomes
B
U
S
CACHE CACHE CACHE
Real-Time Analytics
CORRELATE
Live Datamart
SOCIAL

Real-Time Social Media Analytics
Twitter
(#TomBradyBrokenLeg)
Twitter (#Boston)
Brady’s
Stats
Actionable
Insights
Twitter (#NFL)
Something relevant happening?
Every second counts!
Change Odds (automated or manually triggered):
Stop live-betting for the current running game?
• Who will win the game?
• How many interceptions will the Quarterback throw?
• Will the Patriots win the Super Bowl?
• …

Real-Time Social Media Analytics

Big Data Architecture for Predictive Maintenance
Operational Analytics
Operations
Live UI
CSV Batch
JSON Real Time
XML Real Time
Aggregate
Rules
Analytics
Correlate
Live Datamart
Continuous query
processing
Alerts
Manual action,
escalation
HISTORICAL ANALYSIS Data
Scientists
Flume
HDFS
Spotfire
R / TERR
HDFS
Hadoop (Cloudera)
StreamBase
TIBCO Fast Data Platform
H2O
Oracle RDBMS
Avro Parquet … PMML
Internal Data

Find Patterns à TIBCO Spotfire with H2O Integration
Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)

Apply Patterns à TIBCO StreamBase Connector for H2O.ai

Monitor Patterns à TIBCO Live Datamart
Augmented Intelligence (“Monitor the manufacturing process and change rules in real time!”)
Live Dartmart Desktop Client

Monitor Patterns à TIBCO Live Datamart
Augmented Intelligence (“Monitor the manufacturing process and change rules in real time!”)
Live Dartmart Web API

TIBCO Spotfire + StreamBase + Live Datamart + H2O.ai
Live DemoLive Demo

Questions? Please contact me!
Kai Wähner
Technology Evangelist
kontakt@kai-waehner.de
@KaiWaehner
www.kai-waehner.de
LinkedIn

Stream Processing as Game Changer for Big Data and Internet of Things by Kai Wahner

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Stream Processing as Game Changer for Big Data and Internet of Things by Kai Wahner

Similar to Stream Processing as Game Changer for Big Data and Internet of Things by Kai Wahner (20)

More from Big Data Spain

More from Big Data Spain (20)

Recently uploaded

Recently uploaded (20)

Stream Processing as Game Changer for Big Data and Internet of Things by Kai Wahner