kai
wähner
Tibco
Kai Wähner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.kai-waehner.de
Findability Day 2016 (Stockholm, Sweden)
How to Leverage Machine Learning to Find Insights in Historical Data
© Copyright 2000-2016 TIBCO Software Inc.
Apply Big Data Analytics to Real Time Processing
© Copyright 2000-2016 TIBCO Software Inc.
Analyze and Act on Critical Business Moments
© Copyright 2000-2016 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Real Time Processing
4) Real World Scenario
© Copyright 2000-2016 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Real Time Processing
4) Real World Scenario
Machine Learning
…. allows computers to find hidden insights without being
explicitly programmed where to look.
Real World Examples of Machine Learning
Spam Detection
Search Results +
Product Recommendation
Picture Detection
(Friends, Locations, Products)
Machine Learning is already present in daily life…
Now, every enterprise is beginning to leverage it!
The Next Disruption:
Google Beats Go Champion
© Copyright 2000-2016 TIBCO Software Inc.
Analytics Maturity Model
Immediate
Long-Term	
Competitive	AdvantageValue to the Organization
A good Big Data Analytics platform can provide value to the organization
across the full spectrum of use cases
Self-service	
Dashboards
Event	Processing	Advanced	Analytics
Measure Diagnose Predict Optimize Alert Automate
Analytics Maturity
Visual	Analytics Event	Processing	
Analytics
© Copyright 2000-2016 TIBCO Software Inc.
Analytics Maturity Model
Immediate
Long-Term	
Competitive	AdvantageValue to the Organization
Visual	Analytics Event	Processing	Advanced	Analytics
Measure Diagnose Predict Optimize Alert Automate
Analytics Maturity
A good Big Data Analytics platform can provide value to the organization
across the full spectrum of use cases
Analytics
© Copyright 2000-2016 TIBCO Software Inc.
Analytics Maturity Model
Immediate
Long-Term	
Competitive	AdvantageValue to the Organization
Self-service	
Dashboards
Event	Processing	Advanced	Analytics
Measure Diagnose Predict Optimize Alert Automate
Analytics Maturity
A good Big Data Analytics platform can provide value to the organization
across the full spectrum of use cases
Visual	Analytics Event	Processing	
Analytics
© Copyright 2000-2016 TIBCO Software Inc.
The first task in a new analytics projects
is to define a Business Case!
© Copyright 2000-2016 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Real Time Processing
4) Real World Scenario
© Copyright 2000-2016 TIBCO Software Inc.
Analytical Pipeline
© Copyright 2000-2016 TIBCO Software Inc.
Analytics Maturity Model
Immediate
Long-Term	
Competitive	AdvantageValue to the Organization
A good Big Data Analytics platform can provide value to the organization
across the full spectrum of use cases
Self-service	
Dashboards
Event	Processing	Advanced	Analytics
Measure Diagnose Predict Optimize Alert Automate
Analytics Maturity
Visual	Analytics Event	Processing	
Analytics
© Copyright 2000-2016 TIBCO Software Inc.
Analytical Pipeline
© Copyright 2000-2016 TIBCO Software Inc.
Data Acquisition
© Copyright 2000-2016 TIBCO Software Inc.
Analytical Pipeline
cust_id dept sku dollar gift date
1 104 C 12003 2.40 FALSE 2016-10-17
2 105 A 12005 62.85 FALSE 2016-10-17
3 102 C 12007 69.23 TRUE 2016-10-17
4 104 B 12004 9.33 FALSE 2016-10-18
5 105 C 12010 14.16 TRUE 2016-10-18
6 101 B 12003 90.43 FALSE 2016-10-19
7 103 C 12005 90.97 FALSE 2016-10-19
n … … … … … …
cust_id A B C total # orders first_dat
e
last_dat
e
1 100 21.76 23.67 0.00 45.43 2 2016-10-
19
2016-10-
20
2 101 0.01 74.65 0.00 74.66 3 2016-10-
19
2016-10-
20
3 102 0.00 60.92 50.29 111.21 6 2016-10-
17
2016-10-
20
4 103 0.00 0.00 52.30 52.30 2 2016-10-
19
2016-10-
20© Copyright 2000-2016 TIBCO Software Inc.
Data Munging - Transformations
© Copyright 2000-2016 TIBCO Software Inc.
Analytical Pipeline
“The greatest value of a picture
is when it forces us to notice
what we never expected to see”
John W. Tukey, 1977
© Copyright 2000-2016 TIBCO Software Inc.
Exploratory Data Analysis
Visual Analytics - Interactive Brush-Linked
© Copyright 2000-2016 TIBCO Software Inc.
© Copyright 2000-2016 TIBCO Software Inc.
Analytics Maturity Model
Immediate
Long-Term	
Competitive	AdvantageValue to the Organization
Visual	Analytics Event	Processing	Advanced	Analytics
Measure Diagnose Predict Optimize Alert Automate
Analytics Maturity
A good Big Data Analytics platform can provide value to the organization
across the full spectrum of use cases
Analytics
© Copyright 2000-2016 TIBCO Software Inc.
Analytical Pipeline
© Copyright 2000-2016 TIBCO Software Inc.
Which picture represents a model?
A model is a simplification of the truth that helps you with decision making.
© Copyright 2000-2016 TIBCO Software Inc.
Model Building
© Copyright 2000-2016 TIBCO Software Inc.
Model Building
Employees who write longer emails earn higher salaries!
© Copyright 2000-2016 TIBCO Software Inc.
Model Building
© Copyright 2000-2016 TIBCO Software Inc.
Model Improvement
Managers
Staff
© Copyright 2000-2016 TIBCO Software Inc.
Model Improvement
© Copyright 2000-2016 TIBCO Software Inc.
Analytical Pipeline
© Copyright 2000-2016 TIBCO Software Inc.
Model Validation
How is the IQ of a kid related to the IQ of his / her mum?
© Copyright 2000-2016 TIBCO Software Inc.
Frameworks and Tooling
© Copyright 2000-2016 TIBCO Software Inc.
“…as a next-generation data discovery capability that automatically finds and explains
insights from advanced analytics to business users or citizen data scientists”
Smart Data Discovery (for the Business User)
Leverage Machine Learning
without the help of a Data Scientist
Advanced Analytics and Big Data Tools (for Data Scientists)
Many more ….
TIBCO Spotfire with R / TERR Integration
© Copyright 2000-2016 TIBCO Software Inc.
Let the business user leverage Analytic Models (created by the Data Scientist) to find insights!
Example: Customer Churn with Random Forest Algorithm
• ‘refresh model’ button lives a ‘random forest algorithm’
• requires no a priori assumptions at all, it just always works
• The business user doesn’t need to know what random forest is to be empowered by it
Select variables
for the model
© Copyright 2000-2016 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Real Time Processing
4) Real World Scenario
© Copyright 2000-2016 TIBCO Software Inc.
Analytics Maturity Model
Immediate
Long-Term	
Competitive	AdvantageValue to the Organization
Self-service	
Dashboards
Event	Processing	Advanced	Analytics
Measure Diagnose Predict Optimize Alert Automate
Analytics Maturity
A good Big Data Analytics platform can provide value to the organization
across the full spectrum of use cases
Visual	Analytics Event	Processing	
Analytics
© Copyright 2000-2016 TIBCO Software Inc.
Operational Intelligence and Human Interaction
Actions by Operations
Human	decisions	in	real	time	informed	by	
up	to	date	information
38
Automated	action	based	on	models	of	history	
combined	with	live	context	and	business	rules
Machine-to-Machine Automation
© Copyright 2000-2016 TIBCO Software Inc.
Visual Coding for Streaming Analytics with TIBCO StreamBase
• Streaming	Operators
• Connectivity
• Visual	Development
• Testing	&	Simulation
• Mature	Tooling	/	Support
• Middleware	Integration
© Copyright 2000-2016 TIBCO Software Inc.
Live Visual Analytics UI with TIBCO Live Datamart
Dynamic	aggregation	
Live	visualization
Ad-hoc	continuous	query
Alerts
Action
© Copyright 2000-2016 TIBCO Software Inc.
How to
apply analytic models
to real time processing
without redevelopment?
TIBCO
StreamBaseH20.ai
Open
Source
R
TERR
Spark
ML
MATLAB
SAS
PMML
© Copyright 2000-2016 TIBCO Software Inc.
TIBCO StreamBase Connector for R and TERR
© Copyright 2000-2016 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Real Time Processing
4) Real World Scenario
Scenario: Predictive Scrapping of Parts in an Assembly Line
Goal: Scrap parts as early as possible automatically to reduce costs in a manufacturing process.
Question: When to scrap a part in Station 1 instead of doing re-work or sending it to Station 2?
Station 1 Station 2
Cost Before
9€
7€ 13€
Total Cost
29€
(or more)
Scrap? Scrap?
TIBCO Spotfire with H2O Integration
Data Discovery / Data Mining (“Are parts that repeat a station more likely scrap parts?”)
TIBCO Live Datamart
Operational Intelligence (“Monitor the manufacturing process and change rules in real time!”)
Live Dartmart Desktop Client
TIBCO Live Datamart
Operational Intelligence (“Monitor the manufacturing process and change rules in real time!”)
Live Dartmart Web API
© Copyright 2000-2016 TIBCO Software Inc.
TIBCO Accelerator for Apache Spark
1. Fast Data Preparation for IoT
Dozens of enterprise and IoT data preparation adapters:
MQTT, Databases; inbound creation of HDFS, Parquet, Hbase,
Avro…
2. Spotfire Model Discovery Template
Use Spotfire to explore Spark data lake, create predictive
model, train in H20, and deploy to Streaming Analytics.
3. Operationalize Predictive Models
Zookeeper deployment to StreamBase nodes living in Spark
cluster via H20, PMML, TERR models
4. Streaming Analytics for Automation
Automate action based on predictive models – make offers to
customers, stop fraudulent transactions, alert.
5. Monitor & Retrain Model
Monitor behavior of model, retrain when necessary.
6. Drag & Drop for Business Solution Developers
Code-free development environment for work with H20, HDFS,
Avro, TERR
The TIBCO Accelerator for Spark is a TIBCO
engineered, light-weight open-source fast-
start for systems to stream data into Spark,
discover patterns in Spark with Spotfire, and
operationalize the insights on Big Data.
FUNCTIONAL COMPONENTS
© Copyright 2000-2016 TIBCO Software Inc.
Key Take-Aways
Ø Insights are hidden in Historical Data on Big Data Platforms
Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models
Ø Event Processing uses these Models (without Redevelopment) to take Action in Real Time
Questions? Please contact me!
Kai Wähner
Technology Evangelist
kontakt@kai-waehner.de
@KaiWaehner
www.kai-waehner.de
LinkedIn

Findability Day 2016 - Big data analytics and machine learning

  • 2.
  • 3.
    Kai Wähner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.kai-waehner.de FindabilityDay 2016 (Stockholm, Sweden) How to Leverage Machine Learning to Find Insights in Historical Data
  • 4.
    © Copyright 2000-2016TIBCO Software Inc. Apply Big Data Analytics to Real Time Processing
  • 5.
    © Copyright 2000-2016TIBCO Software Inc. Analyze and Act on Critical Business Moments
  • 6.
    © Copyright 2000-2016TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Building an Analytic Model 3) Real Time Processing 4) Real World Scenario
  • 7.
    © Copyright 2000-2016TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Building an Analytic Model 3) Real Time Processing 4) Real World Scenario
  • 8.
    Machine Learning …. allowscomputers to find hidden insights without being explicitly programmed where to look.
  • 9.
    Real World Examplesof Machine Learning Spam Detection Search Results + Product Recommendation Picture Detection (Friends, Locations, Products) Machine Learning is already present in daily life… Now, every enterprise is beginning to leverage it! The Next Disruption: Google Beats Go Champion
  • 10.
    © Copyright 2000-2016TIBCO Software Inc. Analytics Maturity Model Immediate Long-Term Competitive AdvantageValue to the Organization A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases Self-service Dashboards Event Processing Advanced Analytics Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Visual Analytics Event Processing Analytics
  • 11.
    © Copyright 2000-2016TIBCO Software Inc. Analytics Maturity Model Immediate Long-Term Competitive AdvantageValue to the Organization Visual Analytics Event Processing Advanced Analytics Measure Diagnose Predict Optimize Alert Automate Analytics Maturity A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases Analytics
  • 12.
    © Copyright 2000-2016TIBCO Software Inc. Analytics Maturity Model Immediate Long-Term Competitive AdvantageValue to the Organization Self-service Dashboards Event Processing Advanced Analytics Measure Diagnose Predict Optimize Alert Automate Analytics Maturity A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases Visual Analytics Event Processing Analytics
  • 13.
    © Copyright 2000-2016TIBCO Software Inc. The first task in a new analytics projects is to define a Business Case!
  • 14.
    © Copyright 2000-2016TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Building an Analytic Model 3) Real Time Processing 4) Real World Scenario
  • 15.
    © Copyright 2000-2016TIBCO Software Inc. Analytical Pipeline
  • 16.
    © Copyright 2000-2016TIBCO Software Inc. Analytics Maturity Model Immediate Long-Term Competitive AdvantageValue to the Organization A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases Self-service Dashboards Event Processing Advanced Analytics Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Visual Analytics Event Processing Analytics
  • 17.
    © Copyright 2000-2016TIBCO Software Inc. Analytical Pipeline
  • 18.
    © Copyright 2000-2016TIBCO Software Inc. Data Acquisition
  • 19.
    © Copyright 2000-2016TIBCO Software Inc. Analytical Pipeline
  • 20.
    cust_id dept skudollar gift date 1 104 C 12003 2.40 FALSE 2016-10-17 2 105 A 12005 62.85 FALSE 2016-10-17 3 102 C 12007 69.23 TRUE 2016-10-17 4 104 B 12004 9.33 FALSE 2016-10-18 5 105 C 12010 14.16 TRUE 2016-10-18 6 101 B 12003 90.43 FALSE 2016-10-19 7 103 C 12005 90.97 FALSE 2016-10-19 n … … … … … … cust_id A B C total # orders first_dat e last_dat e 1 100 21.76 23.67 0.00 45.43 2 2016-10- 19 2016-10- 20 2 101 0.01 74.65 0.00 74.66 3 2016-10- 19 2016-10- 20 3 102 0.00 60.92 50.29 111.21 6 2016-10- 17 2016-10- 20 4 103 0.00 0.00 52.30 52.30 2 2016-10- 19 2016-10- 20© Copyright 2000-2016 TIBCO Software Inc. Data Munging - Transformations
  • 21.
    © Copyright 2000-2016TIBCO Software Inc. Analytical Pipeline
  • 22.
    “The greatest valueof a picture is when it forces us to notice what we never expected to see” John W. Tukey, 1977 © Copyright 2000-2016 TIBCO Software Inc. Exploratory Data Analysis
  • 23.
    Visual Analytics -Interactive Brush-Linked © Copyright 2000-2016 TIBCO Software Inc.
  • 24.
    © Copyright 2000-2016TIBCO Software Inc. Analytics Maturity Model Immediate Long-Term Competitive AdvantageValue to the Organization Visual Analytics Event Processing Advanced Analytics Measure Diagnose Predict Optimize Alert Automate Analytics Maturity A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases Analytics
  • 25.
    © Copyright 2000-2016TIBCO Software Inc. Analytical Pipeline
  • 26.
    © Copyright 2000-2016TIBCO Software Inc. Which picture represents a model? A model is a simplification of the truth that helps you with decision making.
  • 27.
    © Copyright 2000-2016TIBCO Software Inc. Model Building
  • 28.
    © Copyright 2000-2016TIBCO Software Inc. Model Building
  • 29.
    Employees who writelonger emails earn higher salaries! © Copyright 2000-2016 TIBCO Software Inc. Model Building
  • 30.
    © Copyright 2000-2016TIBCO Software Inc. Model Improvement
  • 31.
    Managers Staff © Copyright 2000-2016TIBCO Software Inc. Model Improvement
  • 32.
    © Copyright 2000-2016TIBCO Software Inc. Analytical Pipeline
  • 33.
    © Copyright 2000-2016TIBCO Software Inc. Model Validation How is the IQ of a kid related to the IQ of his / her mum?
  • 34.
    © Copyright 2000-2016TIBCO Software Inc. Frameworks and Tooling
  • 35.
    © Copyright 2000-2016TIBCO Software Inc. “…as a next-generation data discovery capability that automatically finds and explains insights from advanced analytics to business users or citizen data scientists” Smart Data Discovery (for the Business User) Leverage Machine Learning without the help of a Data Scientist
  • 36.
    Advanced Analytics andBig Data Tools (for Data Scientists) Many more ….
  • 37.
    TIBCO Spotfire withR / TERR Integration © Copyright 2000-2016 TIBCO Software Inc. Let the business user leverage Analytic Models (created by the Data Scientist) to find insights! Example: Customer Churn with Random Forest Algorithm • ‘refresh model’ button lives a ‘random forest algorithm’ • requires no a priori assumptions at all, it just always works • The business user doesn’t need to know what random forest is to be empowered by it Select variables for the model
  • 38.
    © Copyright 2000-2016TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Building an Analytic Model 3) Real Time Processing 4) Real World Scenario
  • 39.
    © Copyright 2000-2016TIBCO Software Inc. Analytics Maturity Model Immediate Long-Term Competitive AdvantageValue to the Organization Self-service Dashboards Event Processing Advanced Analytics Measure Diagnose Predict Optimize Alert Automate Analytics Maturity A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases Visual Analytics Event Processing Analytics
  • 40.
    © Copyright 2000-2016TIBCO Software Inc. Operational Intelligence and Human Interaction Actions by Operations Human decisions in real time informed by up to date information 38 Automated action based on models of history combined with live context and business rules Machine-to-Machine Automation
  • 41.
    © Copyright 2000-2016TIBCO Software Inc. Visual Coding for Streaming Analytics with TIBCO StreamBase • Streaming Operators • Connectivity • Visual Development • Testing & Simulation • Mature Tooling / Support • Middleware Integration
  • 42.
    © Copyright 2000-2016TIBCO Software Inc. Live Visual Analytics UI with TIBCO Live Datamart Dynamic aggregation Live visualization Ad-hoc continuous query Alerts Action
  • 43.
    © Copyright 2000-2016TIBCO Software Inc. How to apply analytic models to real time processing without redevelopment? TIBCO StreamBaseH20.ai Open Source R TERR Spark ML MATLAB SAS PMML
  • 44.
    © Copyright 2000-2016TIBCO Software Inc. TIBCO StreamBase Connector for R and TERR
  • 45.
    © Copyright 2000-2016TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Building an Analytic Model 3) Real Time Processing 4) Real World Scenario
  • 46.
    Scenario: Predictive Scrappingof Parts in an Assembly Line Goal: Scrap parts as early as possible automatically to reduce costs in a manufacturing process. Question: When to scrap a part in Station 1 instead of doing re-work or sending it to Station 2? Station 1 Station 2 Cost Before 9€ 7€ 13€ Total Cost 29€ (or more) Scrap? Scrap?
  • 47.
    TIBCO Spotfire withH2O Integration Data Discovery / Data Mining (“Are parts that repeat a station more likely scrap parts?”)
  • 48.
    TIBCO Live Datamart OperationalIntelligence (“Monitor the manufacturing process and change rules in real time!”) Live Dartmart Desktop Client
  • 49.
    TIBCO Live Datamart OperationalIntelligence (“Monitor the manufacturing process and change rules in real time!”) Live Dartmart Web API
  • 50.
    © Copyright 2000-2016TIBCO Software Inc. TIBCO Accelerator for Apache Spark 1. Fast Data Preparation for IoT Dozens of enterprise and IoT data preparation adapters: MQTT, Databases; inbound creation of HDFS, Parquet, Hbase, Avro… 2. Spotfire Model Discovery Template Use Spotfire to explore Spark data lake, create predictive model, train in H20, and deploy to Streaming Analytics. 3. Operationalize Predictive Models Zookeeper deployment to StreamBase nodes living in Spark cluster via H20, PMML, TERR models 4. Streaming Analytics for Automation Automate action based on predictive models – make offers to customers, stop fraudulent transactions, alert. 5. Monitor & Retrain Model Monitor behavior of model, retrain when necessary. 6. Drag & Drop for Business Solution Developers Code-free development environment for work with H20, HDFS, Avro, TERR The TIBCO Accelerator for Spark is a TIBCO engineered, light-weight open-source fast- start for systems to stream data into Spark, discover patterns in Spark with Spotfire, and operationalize the insights on Big Data. FUNCTIONAL COMPONENTS
  • 51.
    © Copyright 2000-2016TIBCO Software Inc. Key Take-Aways Ø Insights are hidden in Historical Data on Big Data Platforms Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models Ø Event Processing uses these Models (without Redevelopment) to take Action in Real Time
  • 52.
    Questions? Please contactme! Kai Wähner Technology Evangelist kontakt@kai-waehner.de @KaiWaehner www.kai-waehner.de LinkedIn