Kai Wähner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.kai-waehner.de
jDays - Gothenburg, Sweden (March 2017)
Advanced Analytics and Machine Learning
with R, Spark, H2O and TensorFlow for Real Time Processing
© Copyright 2000-2017 TIBCO Software Inc.
Apply Big Data Analytics to Real Time Processing
© Copyright 2000-2017 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
© Copyright 2000-2017 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
Machine Learning
…. allows computers to find hidden insights without being
explicitly programmed where to look.
Real World Examples of Machine Learning
Spam Detection
Search Results +
Product Recommendation
Picture Detection
(Friends, Locations, Products)
Machine Learning is already present in daily life…
Now, every enterprise is beginning to leverage it!
The Next Disruption:
Google Beats Go Champion
© Copyright 2000-2017 TIBCO Software Inc.
From Insight to Action - Closed Loop for Big Data Analytics
Insight ActionEVENTSEVENTS
© Copyright 2000-2017 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
© Copyright 2000-2017 TIBCO Software Inc.
Analytical Pipeline
1. Data Access
2. Data Preparation
3. Exploratory Data Analysis
4. Model Building
5. Model Validation
6. Model Execution
7. Deployment
© Copyright 2000-2017 TIBCO Software Inc.
Variety of Data in Enterprises
Custom	GUI-driven	
data	access	via	SDK
Siebel
eBusiness
Local	data	sources
AccessExcel STDF
Drag-and-drop
MySQL
SQL	Server
Oracle
Information	Services
(join,	transform,	reusable,	
parameterized,	dynamic	query	
for	in-memory	use)
Databases
JDBC/ODBC
Hadoop
SFDC
PostgreSQL
Teradata
Netezza
Etc.XML
RDBMS
Flat
Files
Spread-
sheets
Web
Services
Oracle
E-Business
RDBMS
RDBMS
RDBMS
SAP BWSAP R/3 D
A
T
A
F
A
B
R
I
C
Salesforce
ODBC
OLE	DB
SqlClient
Direct	
connection
Oracle
TeradataAsterMS	SSAS
Teradata
Direct	Query
(dynamically	query	and	retrieve	data	for	
visualization	and	analysis)
Databases
MySQL
Etc.
OBIEE
Netezza
Hadoop
© Copyright 2000-2017 TIBCO Software Inc.
Data Preparation
http://www.slideshare.net/odsc/feature-engineering
Data Preparation
Visual Analytics - Interactive Brush-Linked
© Copyright 2000-2017 TIBCO Software Inc.
© Copyright 2000-2017 TIBCO Software Inc.
Model Building
A model is a simplification of the truth
that helps you with decision making.
© Copyright 2000-2017 TIBCO Software Inc.
Cross-Validation Procedure
https://genome.tugraz.at/proclassify/help/pages/XV.html
© Copyright 2000-2017 TIBCO Software Inc.
Execution via Code / Scripting
Execution within the Visual Analytics Tooling
© Copyright 2000-2017 TIBCO Software Inc.
Customer Churn with Random Forest Algorithm:
Select variables
for the model
© Copyright 2000-2017 TIBCO Software Inc.
Frameworks and Tooling
Advanced Analytics and Big Data Tools for Data Scientists
Many more ….
Portable Format
for Analytics (PFA)
© Copyright 2000-2017 TIBCO Software Inc.
Demystify Data Science for the Business Analyst
Leverage Machine Learning
without the help of a Data Scientist
Development of Analytic Models
with R, TensorFlow, Apache Spark, RapidMiner, TIBCO Spotfire
Live DemoLive Demo
© Copyright 2000-2017 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
© Copyright 2000-2017 TIBCO Software Inc.
Analytical Pipeline
1. Data Access
2. Data Preparation
3. Exploratory Data Analysis
4. Model Building
5. Model Validation
6. Model Execution
7. Deployment
© Copyright 2000-2017 TIBCO Software Inc.
Streaming Analytics - Processing Pipeline
APIs
Adapters /
Channels
Integration
Messaging
Stream Ingest
Transformation
Aggregation
Enrichment
Filtering
Stream
Preprocessing
Process
Management
Analytics
(Real Time)
Applications
& APIs
Analytics /
DW Reporting
Stream
Outcomes
• Contextual Rules
• Windowing
• Patterns
• Analytics
• Deep ML
• …
Stream Analytics &
Processing
Index / SearchNormalization
Applying an Analytic Model
is just a piece of the puzzle!
© Copyright 2000-2017 TIBCO Software Inc.
Frameworks and Products
(no complete list!)
OPEN SOURCE CLOSED SOURCE
PRODUCT
FRAMEWORK
Azure Microsoft
Stream Analytics
© Copyright 2000-2017 TIBCO Software Inc.
How to
apply analytic models
to real time processing
without redevelopment?
Stream
Processing
H20.ai
Open
Source
R
TERR
Spark
ML
MATLAB
SAS
PMML
Apache Spark ML and Spark Streaming with PMML Models
https://github.com/jpmml/jpmml-spark
© Copyright 2000-2017 TIBCO Software Inc.
© Copyright 2000-2017 TIBCO Software Inc.
TIBCO StreamBase Connector for R and TERR
© Copyright 2000-2017 TIBCO Software Inc.
TIBCO StreamBase Connector for H2O.ai
© Copyright 2000-2017 TIBCO Software Inc.
TIBCO StreamBase Connector for PMML
Scenario: Predictive Scrapping of Parts in an Assembly Line
Station 1 Station 2
Cost Before
9€
7€ 13€
Total Cost
29€
(or more)
Scrap? Scrap?
Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)
Fast Data Architecture for Predictive Maintenance
Operational	Analytics
Operations
Live	UI
CSV Batch
JSON Real Time
XML Real Time
Streaming	AnalyticsAction
Aggregate
Rules
Analytics
Correlate
Live	Datamart
Continuous	query	
processing
Alerts
Manual	action,	
escalation
HISTORICAL	ANALYSIS Data	
Scientists
Flume
HDFS
Spotfire
R	/	TERR
HDFS
Hadoop (Cloudera)
StreamBase
TIBCO Fast Data Platform
H2O
Oracle	RDBMS
Avro Parquet … PMML
Internal	Data
TIBCO Spotfire with H2O Integration
© Copyright 2000-2017 TIBCO Software Inc.
Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)
TIBCO StreamBase / Live Datamart + H2O.ai
Live DemoLive Demo
© Copyright 2000-2017 TIBCO Software Inc.
From Insight to Action - Closed Loop for Big Data Analytics
Insight Action
MONITOR
PREDICT
ACT
DECIDE
MODEL
ACCESS
ANALYZE
WRANGLE
© Copyright 2000-2017 TIBCO Software Inc.
Key Take-Aways
Ø Insights are hidden in Historical Data on Big Data Platforms
Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models
Ø Event Processing uses these Models (without Redevelopment) to take Action in Real Time
Questions? Please contact me!
Kai Wähner
Technology Evangelist
kontakt@kai-waehner.de
@KaiWaehner
www.kai-waehner.de
LinkedIn

R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics

  • 1.
    Kai Wähner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.kai-waehner.de jDays- Gothenburg, Sweden (March 2017) Advanced Analytics and Machine Learning with R, Spark, H2O and TensorFlow for Real Time Processing
  • 2.
    © Copyright 2000-2017TIBCO Software Inc. Apply Big Data Analytics to Real Time Processing
  • 3.
    © Copyright 2000-2017TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time
  • 4.
    © Copyright 2000-2017TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time
  • 5.
    Machine Learning …. allowscomputers to find hidden insights without being explicitly programmed where to look.
  • 6.
    Real World Examplesof Machine Learning Spam Detection Search Results + Product Recommendation Picture Detection (Friends, Locations, Products) Machine Learning is already present in daily life… Now, every enterprise is beginning to leverage it! The Next Disruption: Google Beats Go Champion
  • 7.
    © Copyright 2000-2017TIBCO Software Inc. From Insight to Action - Closed Loop for Big Data Analytics Insight ActionEVENTSEVENTS
  • 8.
    © Copyright 2000-2017TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time
  • 9.
    © Copyright 2000-2017TIBCO Software Inc. Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Validation 6. Model Execution 7. Deployment
  • 10.
    © Copyright 2000-2017TIBCO Software Inc. Variety of Data in Enterprises Custom GUI-driven data access via SDK Siebel eBusiness Local data sources AccessExcel STDF Drag-and-drop MySQL SQL Server Oracle Information Services (join, transform, reusable, parameterized, dynamic query for in-memory use) Databases JDBC/ODBC Hadoop SFDC PostgreSQL Teradata Netezza Etc.XML RDBMS Flat Files Spread- sheets Web Services Oracle E-Business RDBMS RDBMS RDBMS SAP BWSAP R/3 D A T A F A B R I C Salesforce ODBC OLE DB SqlClient Direct connection Oracle TeradataAsterMS SSAS Teradata Direct Query (dynamically query and retrieve data for visualization and analysis) Databases MySQL Etc. OBIEE Netezza Hadoop
  • 11.
    © Copyright 2000-2017TIBCO Software Inc. Data Preparation http://www.slideshare.net/odsc/feature-engineering Data Preparation
  • 12.
    Visual Analytics -Interactive Brush-Linked © Copyright 2000-2017 TIBCO Software Inc.
  • 13.
    © Copyright 2000-2017TIBCO Software Inc. Model Building A model is a simplification of the truth that helps you with decision making.
  • 14.
    © Copyright 2000-2017TIBCO Software Inc. Cross-Validation Procedure https://genome.tugraz.at/proclassify/help/pages/XV.html
  • 15.
    © Copyright 2000-2017TIBCO Software Inc. Execution via Code / Scripting
  • 16.
    Execution within theVisual Analytics Tooling © Copyright 2000-2017 TIBCO Software Inc. Customer Churn with Random Forest Algorithm: Select variables for the model
  • 17.
    © Copyright 2000-2017TIBCO Software Inc. Frameworks and Tooling
  • 18.
    Advanced Analytics andBig Data Tools for Data Scientists Many more …. Portable Format for Analytics (PFA)
  • 19.
    © Copyright 2000-2017TIBCO Software Inc. Demystify Data Science for the Business Analyst Leverage Machine Learning without the help of a Data Scientist
  • 20.
    Development of AnalyticModels with R, TensorFlow, Apache Spark, RapidMiner, TIBCO Spotfire Live DemoLive Demo
  • 21.
    © Copyright 2000-2017TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time
  • 22.
    © Copyright 2000-2017TIBCO Software Inc. Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Validation 6. Model Execution 7. Deployment
  • 23.
    © Copyright 2000-2017TIBCO Software Inc. Streaming Analytics - Processing Pipeline APIs Adapters / Channels Integration Messaging Stream Ingest Transformation Aggregation Enrichment Filtering Stream Preprocessing Process Management Analytics (Real Time) Applications & APIs Analytics / DW Reporting Stream Outcomes • Contextual Rules • Windowing • Patterns • Analytics • Deep ML • … Stream Analytics & Processing Index / SearchNormalization Applying an Analytic Model is just a piece of the puzzle!
  • 24.
    © Copyright 2000-2017TIBCO Software Inc. Frameworks and Products (no complete list!) OPEN SOURCE CLOSED SOURCE PRODUCT FRAMEWORK Azure Microsoft Stream Analytics
  • 25.
    © Copyright 2000-2017TIBCO Software Inc. How to apply analytic models to real time processing without redevelopment? Stream Processing H20.ai Open Source R TERR Spark ML MATLAB SAS PMML
  • 26.
    Apache Spark MLand Spark Streaming with PMML Models https://github.com/jpmml/jpmml-spark © Copyright 2000-2017 TIBCO Software Inc.
  • 27.
    © Copyright 2000-2017TIBCO Software Inc. TIBCO StreamBase Connector for R and TERR
  • 28.
    © Copyright 2000-2017TIBCO Software Inc. TIBCO StreamBase Connector for H2O.ai
  • 29.
    © Copyright 2000-2017TIBCO Software Inc. TIBCO StreamBase Connector for PMML
  • 30.
    Scenario: Predictive Scrappingof Parts in an Assembly Line Station 1 Station 2 Cost Before 9€ 7€ 13€ Total Cost 29€ (or more) Scrap? Scrap? Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)
  • 31.
    Fast Data Architecturefor Predictive Maintenance Operational Analytics Operations Live UI CSV Batch JSON Real Time XML Real Time Streaming AnalyticsAction Aggregate Rules Analytics Correlate Live Datamart Continuous query processing Alerts Manual action, escalation HISTORICAL ANALYSIS Data Scientists Flume HDFS Spotfire R / TERR HDFS Hadoop (Cloudera) StreamBase TIBCO Fast Data Platform H2O Oracle RDBMS Avro Parquet … PMML Internal Data
  • 32.
    TIBCO Spotfire withH2O Integration © Copyright 2000-2017 TIBCO Software Inc. Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)
  • 33.
    TIBCO StreamBase /Live Datamart + H2O.ai Live DemoLive Demo
  • 34.
    © Copyright 2000-2017TIBCO Software Inc. From Insight to Action - Closed Loop for Big Data Analytics Insight Action MONITOR PREDICT ACT DECIDE MODEL ACCESS ANALYZE WRANGLE
  • 35.
    © Copyright 2000-2017TIBCO Software Inc. Key Take-Aways Ø Insights are hidden in Historical Data on Big Data Platforms Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models Ø Event Processing uses these Models (without Redevelopment) to take Action in Real Time
  • 36.
    Questions? Please contactme! Kai Wähner Technology Evangelist kontakt@kai-waehner.de @KaiWaehner www.kai-waehner.de LinkedIn