7 Predictive Analytics, Spark , Streaming use cases
1. Mark Palmer, SVP of Analytics, TIBCO
https://about.me/mark.palmer
7 PREDICTIVE ANALYTICS, SPARK, STREAMING USE CASES
2. 7 Predictive Analytics, Spark, Streaming Use Cases
1. Live Train Time Tables: 40% Reduction in Spread (Dutch Railways)
2. Intelligent Equipment: Saving $40M / year (Oil & Gas – Many)
3. Algorithmic Loyalty: Finding the Jacket You Didn’t Know You
Needed (North Face)
4. Predictive Risk & Compliance: Avoiding $440M in 40 Minutes of
Loss (ConvergEx)
5. Live Flight Optimization: Get You Home on Time (United Airlines)
6. Continuous Transaction Optimization: Watch 20,000 Systems at
Once (Morgan Stanley)
7. IoT Parcel Tracking: From 20% to 100% Real-Time (Royal Mail, UK)
3. Hadoop + Analytics: Discover, Automate, Act
AUTOMATE: Inject predictive
model into stream
DISCOVER: Data scientists use interactive analytics to
discover, score and model based on Hadoop / Spark
data lakes
Automation Case ManagementOperational Intelligence
ACT: Automation, alerting and refinement,
BPM
4. SPARK
1: CAPTURE STREAMS,
NORMALIZE, PERSIST IN SPARK
Kafka
JMS
- HDFS
- Parquet
- HBase
A
A
A
Cleanse
Normalize
Bin
STREAMING DATA PREP
StreamBase
2: DISCOVER MODEL
Data
scientists
ANALYTICS
REAL-TIME SPARK ACCELERATOR PATTERN
POS
Mobile
Web
Operations
LIVE MONITORING & ANALYTICS
Live Datamart, LiveView 6: LIVEVIEW
3: LOAD PREDICTIVE MODEL
Model
Stream Scoring
STREAMING ANALYTICS
StreamBase
Real-Time Action
4: CONTINUOUS ALGORITHMIC
ACTION
Upsell Recommendation History
5: AUTO-RETRAINING
MODEL TRACKING
Real-time training
StreamBase
IMPALA
5. STREAMING DATA PREP
Cleanse
Normalize
Bin
BIG DATA
STREAMING ANALYTICS
Model execution
Stream processing
ANALYTICS
Data discovery
Model discovery
Load model
Messaging layer :
- Kafka
- HiveMQ
- JMS
- ActiveMQ
- RabbitMQ
- FTL
- …
Direct access :
- Websocket
- TCP/UDP
- MQTT
- HTTP
- ...
Public/private APIs :
- Twitter, Faceboook,...
- Google finance
- ...
HDFS,
Hbase,
Parquet,
Avro
SQL
Data scientistsOperations
The Spark Accelerator Pattern
MODEL TRACKING
Real-time model training
Live monitoring
CONNECTIVITY
6. Prebuilt building blocks to
speed up Spark
implementations
Data
capture
Data
analysis
Model
scoring
Model
training
TIBCO
Big Data
Accelerator
SIMPLIFYING SPARK
10. 1. Live Train Time Tables: 40% Reduction in Spread (Dutch Railways)
2. Intelligent Equipment: Saving $40M / year (Oil & Gas – Many)
3. Algorithmic Loyalty: Finding the Jacket You Didn’t Know You Needed
(North Face)
4. Predictive Risk & Compliance: Avoiding $440M in 40 Minutes of
Loss (ConvergEx)
5. Live Flight Optimization: Get You Home on Time (United Airlines)
6. Continuous Transaction Optimization: Watch 20,000 Systems at
Once (Morgan Stanley)
7. IoT Parcel Tracking: From 20% to 100% Real-Time (Royal Mail, UK
7 Predictive Analytics, Spark, Streaming Use Cases
15. External Data
Event-Driven Rules &
Predictive Analytics
Trip Optimization
Rules
Predictive
Maintenance
Rules
Alerts
Vehicle
Clustering Rules
Location Stream
TIBCO Live Datamart
AMX BPM
Billions of events
Traffic, Twitter, Weather
Connected
Vehicle
Data
Weather
Case ManagementEnterprise
Data
BusinessWorks
Real-Time Geo
Fencing Rules
Predictive Route
Optimization
Journey
Disruption Rules
Business Events, TERR, StreamBase
Live Datamart Operational Command
& Control App
LiveView
Analytics
Hadoop /
Spark
Spotfire
Scheduling,
Maintenance,
MDM, CRM
ALERTS
IoT Connected Vehicle Architecture
16. 1,000 trains simultaneously transmit location,
capacity, “blocking.” Alerts on status are sent to
customers.
Bad weather delays a trip; streaming analytics
continuously re-calculates the impact based on
state in real-time
Operators analyze the impact in real-time, re-
calculates train “blocking,” and take action
Operations returns to normal, customers alerted
A Moment in the Life of a Connected Vehicle
23. Alert can be sent via Kafka, BPM, signage update
tweet to the public…
24. Train now resumes normal speed -
although still delayed - trip 2202 no
longer impacts the next trip
25. 1. Train Time Table Deviation: 40% Reduction in Spread (Dutch
Railways)
2. Intelligent Equipment: Saving $40M / year (Oil & Gas – Many)
3. Algorithmic Loyalty: Finding the Jacket You Didn’t Know You Needed
(North Face)
4. Predictive Risk & Compliance: Avoiding $440M in 40 Minutes of
Loss (ConvergEx)
5. Live Flight Optimization: Get You Home on Time (United Airlines)
6. IT and Transaction Optimization: Watch 20,000 Systems at Once
(Morgan Stanley)
7. Parcel Tracking: From 20% to 100% Real-Time (Royal Mail, UK)
7 Predictive Analytics, Spark, Streaming Use Cases
33. Streaming & Batch Analytics
Continuous
Predictive
Maintenance
Risk Management TIBCO Live Datamart
Integration
Geo-aware
analytics
Facility
Management
Alert Targeting
TERR, PMML, StreamBase, BusinessEvents
Digital Operations
TIBCO LiveView
Analytics
Spotfire
ALERTS
Case Management
Mobile
Weather
BusinessWorks,EMS,TIBCOMashery,eFTL
TIBCO BPM
Data Scientists
Digital
Operations
(e.g., Drilling
Operations)
OSI PI
Engineering
Documents
Financial
WITSML
In Memory
Data Grid
Open Spirit
MDM
Cloud Foundry
Industrial
Equipment
Monitoring
Industrial Equipment & Spark
Spark
34. 1. Train Time Table Deviation: 40% Reduction in Spread (Dutch
Railways)
2. Intelligent Equipment: Saving $40M / year (Oil & Gas – Many)
3. Algorithmic Loyalty: Finding the Jacket You Didn’t Know You Needed
(North Face)
4. Predictive Risk & Compliance: Avoiding $440M in 40 Minutes of
Loss (ConvergEx)
5. Live Flight Optimization: Get You Home on Time (United Airlines)
6. IT and Transaction Optimization: Watch 20,000 Systems at Once
(Morgan Stanley)
7. Parcel Tracking: From 20% to 100% Real-Time (Royal Mail, UK)
7 Predictive Analytics, Spark, Streaming Use Cases
36. Streaming Data
Continuous Digital Loyalty
IoT Streaming
Analytics
Social Analytics
Live Datamart
Enterprise Data
Integration
In memory data grid
Segment &
Target
Offers & Points
Digital Operations
Analytics
ALERTS
Algorithmic Loyalty
Case Management
API Management
Supply
Chain
Partners
Mobile
Vehicles
Mobile
Loyalty
Wearables
Data
Scientists
Digital
Operations
Call Centers
Mobile
Rewards
Generic EventsReport & Analyze
Operations
37. 1. Train Time Table Deviation: 40% Reduction in Spread (Dutch
Railways)
2. Intelligent Equipment: Saving $40M / year (Oil & Gas – Many)
3. Algorithmic Loyalty: Finding the Jacket You Didn’t Know You Needed
(North Face)
4. Predictive Risk & Compliance: Avoiding $440M in 40 Minutes of
Loss (ConvergEx)
5. Live Flight Optimization: Get You Home on Time (United Airlines)
6. IT and Transaction Optimization: Watch 20,000 Systems at Once
(Morgan Stanley)
7. Parcel Tracking: From 20% to 100% Real-Time (Royal Mail, UK)
7 Predictive Analytics, Spark, Streaming Use Cases
38. “In December, 2012, Knight Capital lost $460M in under
40 minutes. That changed everything. Now, it’s no longer
acceptable to run our business based on end-of-day reports.”
- Head of Risk Management, top 3 bank
39. Continuous Compliance
Market Data Stream
Streaming Analytics
Large Orders
Marking the tape
Layering
Ramping on
close
Alerts
Ramping on open Spiking
Spoofing (1) Spoofing (2)
Spoofing (3) Wash Trades
Sensitivity Adjustments
Wall Street Continuous Compliance Architecture
Audit Trail Logging
Compliance Alerting
Audit
Order Stream
Live Datamart
In aggregate, peak event rates
of 600,000 events a second, or
a rate of 51 billion events a day
Compliance staff
+100M orders a day,
90% cancel rate
500,000 EPS peak
Continuous compliance analytics answer every interesting
surveillance question, (at the peak rate of) 51 billion times a day
Orders
Market
Data
Contextual Case Management
40. Continuous Query
Continuous Query Processor Alerts
Rules
FTL
EMS
ActiveSpaces
Application Data
Social Media Data
Market Data
Sensor Data
Spark
In memory data grid
Enterprise
data
Market Data
IoT
Mobile
Social
Command & Control
ACTION
The Birth of the Live Datamart
Live Datamart
41. 1. Train Time Table Deviation: 40% Reduction in Spread (Dutch
Railways)
2. Intelligent Equipment: Saving $40M / year (Oil & Gas – Many)
3. Algorithmic Loyalty: Finding the Jacket You Didn’t Know You Needed
(North Face)
4. Predictive Risk & Compliance: Avoiding $440M in 40 Minutes of
Loss (ConvergEx)
5. Live Flight Optimization: Get You Home on Time (United Airlines)
6. IT and Transaction Optimization: Watch 20,000 Systems at Once
(Morgan Stanley)
7. Parcel Tracking: From 20% to 100% Real-Time (Royal Mail, UK)
7 Predictive Analytics, Spark, Streaming Use Cases
43. 1. Train Time Table Deviation: 40% Reduction in Spread (Dutch
Railways)
2. Intelligent Equipment: Saving $40M / year (Oil & Gas – Many)
3. Algorithmic Loyalty: Finding the Jacket You Didn’t Know You Needed
(North Face)
4. Predictive Risk & Compliance: Avoiding $440M in 40 Minutes of
Loss (ConvergEx)
5. Live Flight Optimization: Get You Home on Time (United Airlines)
6. IT and Transaction Optimization: Watch 20,000 Systems at Once
(Morgan Stanley)
7. Parcel Tracking: From 20% to 100% Real-Time (Royal Mail, UK)
7 Predictive Analytics, Spark, Streaming Use Cases
45. 1. Train Time Table Deviation: 40% Reduction in Spread (Dutch
Railways)
2. Intelligent Equipment: Saving $40M / year (Oil & Gas – Many)
3. Algorithmic Loyalty: Finding the Jacket You Didn’t Know You Needed
(North Face)
4. Predictive Risk & Compliance: Avoiding $440M in 40 Minutes of
Loss (ConvergEx)
5. Live Flight Optimization: Get You Home on Time (United Airlines)
6. IT and Transaction Optimization: Watch 20,000 Systems at Once
(Morgan Stanley)
7. Parcel Tracking: From 20% to 100% Real-Time (Royal Mail, UK)
7 Predictive Analytics, Spark, Streaming Use Cases
46. Location Automation
Rules
TIBCO BusinessEvents
Enterprise Integration BusTIBCO Enterprise Message Bus
Analytics Event
Aggregator
Hadoop
TIBCO BusinessWorks
Enterprise
Application
Web
In-Memory Operational
Data Store
TIBCO BusinessWorks, Activespaces
SMS
Email
PDA
API
Management
TIBCO API Exchange
Mobile Apps
Operational
Control
TIBCO Live Datamart & LiveView
Partners
Enterprise
Application
Enterprise
Application
Enterprise
Apps
Sensor Data
The Postal Service Internet of Things
47. 1. Live Train Time Tables: 40% Reduction in Spread (Dutch Railways)
2. Intelligent Equipment: Saving $40M / year (Oil & Gas – Many)
3. Algorithmic Loyalty: Finding the Jacket You Didn’t Know You Needed
(North Face)
4. Predictive Risk & Compliance: Avoiding $440M in 40 Minutes of
Loss (ConvergEx)
5. Live Flight Optimization: Get You Home on Time (United Airlines)
6. IT and Transaction Optimization: Watch 20,000 Systems at Once
(Morgan Stanley)
7. IoT Parcel Tracking: From 20% to 100% Real-Time (Royal Mail, UK
7 Predictive Analytics, Spark, Streaming Use Cases
48. Hadoop + Analytics: Discover, Automate, Act
AUTOMATE: Inject predictive
model into stream
DISCOVER: Data scientists use interactive analytics to
discover, score and model based on Hadoop / Spark
data lakes
Automation Case ManagementOperational Intelligence
ACT: Automation, alerting and refinement,
BPM
Editor's Notes
3:30
LET’S LOOK AT ONE OF OUR FIRST ACCELERATORS
VEHICLES TODAY ARE MOBILE DEVICES. CARS HAVE SENSORS. TRAINS HAVE SENSORS. DELIVERY VEHICLES HAVE SENSORS.
FEDEX, AIRLINES: CREDIBILITY
[CLICK]
A LOT OF COMPANIES STILL HAVE A REAR-VIEW –MIRROR
APPROACH.
[CLICK]
AT THE SAME TIME, CUSTOMERS – ESPECIALLY MILLENIALS – WERE PRACTICALLY BORN WITH MOBILE PHONES. THEY EXPECT REAL-TIME INSIGHT INTO EVERYTHING.`
ACCELERATORS AREN’T JUST DEMOS. THEY ARE ARCHITECTED, DOCUMENTED, AND TESTED INFRASTRUCTURE UPON WHICH YOU CAN GROW AND INNOVATE
HERE’S THE CVA ARCHITECTURE THAT SHOWS THE TIBCO COMPONENTS IN PLACE.
BY PROVIDING AN OUT-OF-THE-BOX ARCHITECTURE, YOUR TECHNICAL TEAMS CAN GET UP AND RUNNING WITH THE RIGHT ARCHITECTURE, MORE QUICKLY.
HERE’S A DEMONSTRATION OF THE CVA IN ACTION: WE LOOK AT A MOMENT IN THE LIFE OF A CONNECTED VEHICLE: A TRAIN.
HERE’S WHAT I’M GOING TO SHOW YOU:
THE CVA SIMULATES THOUSANDS OF TRAINS TRANSMITTING THEIR LOCATION, CAPACITY, ETC.
BAD WEATHER WILL INTRODUCE A DELAY
THE CVA WILL ALERT OPERATORS THAT AN EXTERNAL EVENT HAS CAUSED A SYSTEMIC PROBLEM, AND WILL HELP THEM PINPOINT WHERE THE PROBLEM IS, AND REMEDIATE THE PROBLEM.
THE DELAY WILL BE ISOLATED AND ADDRESSED IN REAL-TIME, AND THE SYSTEM WILL RETURN TO NORMAL.
1 – The CVA LiveView UI, with queries executing against the Live Datamart, shows trip 2202 operating from Dordrecht to Amsterdam.
The slightly off-grey box with the blue stripe and the orange strip is the current trip. The orange stripe means it’s currently operating on time. The same trainset will operate trip 2211 from Amsterdam back to Dordrecht after it completes 2202, which is the next box down vertically, with only the blue stripe. Not yet started, so no orange stripe. The grey box is the scheduled time, the blue stripe is BusinessEvents estimated times, the orange stripe is actual time.
AND THE CVA ARCHITECTURE SHOWS AN EFFECTIVE EXAMPLE OF HOW TO CONNECT, FOR EXAMPLE, BE AND LIVE DATAMART.
2 – In CVA, simulators are build-in to help you create an alert due to technical problems that may cause delays. Here, we simulate a “new technical problem alert” in the system.
3 – The technical problem delay creates alerts in BE rules, (last one down) plus all the block delay alerts generated by BE because of our technical alert. Each of these represent a trip that will be delayed because of the technical problem, but not directly… only as a consequence of resource scheduling.
4 – Now trip 2202 the blue stripe has turned red because it’s delayed and it extends beyond the schedule box, into the time period when trip 2211 would normally start. In other words, the trainset is going to arrive in Amsterdam at 0845 instead of 0817 so it won’t be able to depart on time for trip 2211 at 0842. Instead it will depart 8 minutes late at 0850.
CVA has now discovered an alert that could be sent to customers via BW, start a case n AMX BPM, send a message via EMS, Tweet to the public on Twitter….
An here, we show the operator clearing the alert – action has been taken to deal with the problem.
8 – Again trip 2202 the trip is still delayed, but much less. This is because it had already entered the speed restricted section, but it is able to speed up for the rest of it once the alert cleared. So now it’ll arrive late at 0828, but this is enough time for the trainset to operate the subsequent trip 2211 on time at 0842. So that block delay has disappeared.
HERE’S THE CVA ARCHITECTURE THAT SHOWS THE TIBCO COMPONENTS IN PLACE.
BY PROVIDING AN OUT-OF-THE-BOX ARCHITECTURE, YOUR TECHNICAL TEAMS CAN GET UP AND RUNNING WITH THE RIGHT ARCHITECTURE, MORE QUICKLY.
HERE’S THE CVA ARCHITECTURE THAT SHOWS THE TIBCO COMPONENTS IN PLACE.
BY PROVIDING AN OUT-OF-THE-BOX ARCHITECTURE, YOUR TECHNICAL TEAMS CAN GET UP AND RUNNING WITH THE RIGHT ARCHITECTURE, MORE QUICKLY.