Apply Machine Learning to Microservices

Kai Wähner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.kai-waehner.de
O’Reilly Software Architecture Conference 2016 (London, UK)
How to apply big data analytics and machine learning
to real-time processing of microservice events

© Copyright 2000-2016 TIBCO Software Inc.
Digital Transformation - Physical and Digital Worlds are Merging

Apply Big Data Analytics to Real Time Processing

Analyze and Act on Critical Business Moments

Key Take-Aways
Ø Insights are hidden in Historical Data on Big Data Platforms
Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models
Ø Event Processing uses these Models (without Redevelopment) to take Action in Real Time

Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Real Time Processing
4) Live Demo
5) Intelligent Microservices

Machine Learning
…. allows computers to find hidden insights without being
explicitly programmed where to look.

Real World Examples of Machine Learning
Spam Detection
Search Results +
Product Recommendation
Picture Detection
(Friends, Locations, Products)
Machine Learning is already present in daily life…
Now, every enterprise is beginning to leverage it!
The Next Disruption:
Google Beats Go Champion

Example: Decision Tree – Titanic Survival Rate
family size
Wikipedia

Decision Tree – Product Pass / Fail by Equipment Sensor Readings
Bad Product
Good Product
Step 8 Temperature
< 122 C >= 122 C
Step 2 Recipe
A B
Step 11 Pressure
TV Color Display Problem

Decision Tree – Training and Test Data Sets

Ensemble Tree Algorithms
• Random Forest, Gradient Boosting Machine (GBM)
• Method – Average many simple trees
• Sample the data: fit a simple tree
• Re-sample the data; up-weighting the observations that weren’t fitted well in
previous model
• Continue adding trees until fit is good
• Save all the trees and average them
• Better fit + prediction than single trees

Closed Loop for Big Data Analytics

Analytics Maturity Model
Immediate
Long-Term
Competitive AdvantageValue to the Organization
A good Big Data Analytics platform can provide value to the organization
across the full spectrum of use cases
Self-service
Dashboards
Event Processing Advanced Analytics
Measure Diagnose Predict Optimize Alert Automate
Analytics Maturity
Visual Analytics Event Processing
Analytics

Immediate
Long-Term
Visual Analytics Event Processing Advanced Analytics
Analytics Maturity
Analytics

Immediate
Long-Term
Self-service
Dashboards
Event Processing Advanced Analytics
Analytics Maturity
Visual Analytics Event Processing
Analytics

The first task in a new analytics projects
is to define a Business Case!

From a Business Case to Proactive Actions
Model
Present
Data Wrangling Signals Dashboards
SAP
Historian
Production
Well
Filter
Enrich
Merge
Shape
Explore
Clean
Assemble DataBusiness Case
Increase
Productivity
Grow
Revenue
Completions
Visualize GeoLocation
Production
Value
Theses
Reduce Risk
G&G
Equipment
Decision, Action
Prediction Action
Develop Model
Pressure
Temperature
Production
Interrupt
Drill Bit
Movement
Equipment
Failure

Analytical Pipeline

Variety of Data in Enterprises
Custom GUI-driven
data access via SDK
Siebel
eBusiness
Local data sources
AccessExcel STDF
Drag-and-drop
MySQL
SQL Server
Oracle
Information Services
(join, transform, reusable,
parameterized, dynamic query
for in-memory use)
Databases
JDBC/ODBC
Hadoop
SFDC
PostgreSQL
Teradata
Netezza
Etc.XML
RDBMS
Flat
Files
Spread-
sheets
Web
Services
Oracle
E-Business
RDBMS
RDBMS
RDBMS
SAP BWSAP R/3 D
A
T
A
F
A
B
R
I
C
Salesforce
ODBC
OLE DB
SqlClient
Direct
connection
Oracle
TeradataAsterMS SSAS
Teradata
Direct Query
(dynamically query and retrieve data for
visualization and analysis)
Databases
MySQL
Etc.
OBIEE
Netezza
Hadoop

Data Acquisition
“Smart Recommendation Engine”

Data Munging / Wrangling / Mash-up

cust_id dept sku dollar gift date
1 104 C 12003 2.40 FALSE 2016-10-17
2 105 A 12005 62.85 FALSE 2016-10-17
3 102 C 12007 69.23 TRUE 2016-10-17
4 104 B 12004 9.33 FALSE 2016-10-18
5 105 C 12010 14.16 TRUE 2016-10-18
6 101 B 12003 90.43 FALSE 2016-10-19
7 103 C 12005 90.97 FALSE 2016-10-19
n … … … … … …
cust_id A B C total # orders first_dat
e
last_dat
e
1 100 21.76 23.67 0.00 45.43 2 2016-10-
19
2016-10-
20
2 101 0.01 74.65 0.00 74.66 3 2016-10-
19
2016-10-
20
3 102 0.00 60.92 50.29 111.21 6 2016-10-
17
2016-10-
20
4 103 0.00 0.00 52.30 52.30 2 2016-10-
19
2016-10-
20© Copyright 2000-2016 TIBCO Software Inc.
Data Munging - Transformations

Exploratory Data Analysis

Exploratory Data Analysis (EDA) is
an approach/philosophy for data
analysis that employs a variety of
techniques (mostly graphical)
1. to maximize insight into a data set
2. uncover underlying structure
3. extract important variables
4. detect outliers and anomalies
5. test underlying assumptions
6. develop parsimonious models
7. determine optimal factor settings

“The greatest value of a picture
is when it forces us to notice
what we never expected to see”
John W. Tukey, 1977

Visual Analytics - Interactive Brush-Linked
… and “Inline Data Wrangling” à Ad-hoc data preparation instead of just ETL

Which picture represents a model?
A model is a simplification of the truth that helps you with decision making.

Model Building

Employees who write longer emails earn higher salaries!
Model Building

Model Improvement

Managers
Staff
Model Improvement

Model Validation
How is the IQ of a kid related to the IQ of his / her mum?

Frameworks and Tooling

“…as a next-generation data discovery capability that automatically finds and explains
insights from advanced analytics to business users or citizen data scientists”
Smart Data Discovery (for the Business User)
Leverage Machine Learning
without the help of a Data Scientist

Advanced Analytics and Big Data Tools (for Data Scientists)
Many more ….

R Language
• Built for data scientists
• Very active community

R with Revolution Analytics (now Microsoft)
Open Source GPL License
(including its restrictions) http://www.revolutionanalytics.com/webinars/introducing-revolution-r-open-enhanced-open-source-r-distribution-
revolution-analytics

TIBCO has rewritten R as a Commercial Compute Engine
• Latest statistics scripting engine: S a S-PLUS® a R a TERR
• Runs R code including CRAN packages
Engine internals rebuilt from scratch at low-level
• Redesigned data objects, memory management
• High performance + Big Data
TERR is licensed from TIBCO
• TERR Installs (free) with Spotfire Analyst / Desktop + other TIBCO products
• Spotfire Server can manage all TERR / R scripts, artifacts for reuse
• Standalone Developer Edition
• Supported by TIBCO
• No GPL license issues
TERR - TIBCO’s Enterprise Runtime for R

Which R to use?
http://www.forbes.com/sites/danwoods/2016/01/27/microsofts-revolution-analytics-acquisition-is-the-wrong-
way-to-embrace-r/

Apache Spark
General Data-processing Framework
à However, focus is especially on Analytics (at least these days)

Apache Spark MLlib
Spark ML is Spark’s machine
learning library.
Its goal is to make practical
machine learning scalable and easy.
It consists of common learning
algorithms and utilities, including
classification, regression, clustering
and collaborative filtering.
General Data-processing Framework
à However, focus is especially on Analytics (at least these days)
x

H2O.ai
An Extensible Open Source Platform for
Analytics
• Best of Breed Open Source Technology
• Easy-to-use Web UI and Familiar Interfaces
• Data Agnostic Support for all Common
Database and File Types
• Massively Scalable Big Data Analysis
• Real-time Data Scoring (“Nanofast Scoring
Engine”)
http://www.h2o.ai/

TIBCO Spotfire with R / TERR Integration
Let the business user leverage Analytic Models (created by the Data Scientist) to find insights!
Example: Customer Churn with Random Forest Algorithm
• ‘refresh model’ button lives a ‘random forest algorithm’
• requires no a priori assumptions at all, it just always works
• The business user doesn’t need to know what random forest is to be empowered by it
Select variables
for the model

TIBCO Spotfire with H2O Integration
Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)

SaaS Machine Learning
• Managed SaaS service for building ML models and generating predictions
• Integrated into the corresponding cloud ecosystem
• Easy to use, but limited feature set and potential latency issues if combined
with external data or applications
http://docs.aws.amazon.com/machine-learning/latest/dg/tutoria

PMML (Predictive Model Markup Language )
• XML-based de facto standard to represent predictive analytic models
• Developed by the Data Mining Group (DMG)
• Easily share models between PMML compliant applications
(e.g. between model creation and deployment for operations)

What is Prescriptive Analytics?

Real Time Streaming Analytics
time
1 2 3 4 5 6 7 8 9
Event Streams
• Continuous Queries
• Sliding Windows
• Filter
• Aggregation
• Correlation
• …

Operational Intelligence and Human Interaction
Actions by Operations
Human decisions in real time informed
by up to date information
65
Automated action based on models of history
combined with live context and business rules
Machine-to-Machine Automation

Alternatives for Streaming Analytics (no complete list!)
Azure Microsoft
Stream Analytics
CLOSED SOURCEOPEN SOURCE
FRAMEWORK
PRODUCT

What Kind of Streaming Analytics do you need?
Visual IDE (Dev, Test, Debug)
Simulation (Feed Testing, Test Generation)
Live UI (monitoring, proactive interaction)
Maturity (24/7 support, consulting)
Integration (out-of-the-box: ESB, MDM, etc.)
Library (Java, .NET, Python)
Query Language (often similar to SQL)
Scalability (horizontal and vertical, fail over)
Connectivity (technologies, markets, products)
Operators (Filter, Sort, Aggregate)
Time
to
Market
Streaming
Frameworks
Streaming
Products
Slow Fast
Streaming
Concepts

Comparison of Stream Processing Frameworks and Products
Slide Deck from JavaOne 2015:
http://www.kai-waehner.de/blog/2015/10/25/
comparison-of-stream-processing-frameworks-and-products/ Updated slide deck coming
in November 2016
(Big Data Spain, Madrid)

Visual Coding for Streaming Analytics
• Streaming Operators
• Connectivity
• Visual Development
• Testing & Simulation
• Mature Tooling / Support
• Middleware Integration

Live Visual Analytics UI
Dynamic aggregation
Live visualization
Ad-hoc continuous query
Alerts
Action

How to
apply analytic models
to real time processing
without redevelopment?
Stream
Processing
H20.ai
Open
Source
R
TERR
Spark
ML
MATLAB
SAS
PMML

TIBCO StreamBase Connector for R and TERR

TIBCO StreamBase Connector for H2O.ai

TIBCO StreamBase Connector for PMML

Real World Streaming Application for Customer Churn

Closed Loop à Automatically Re-Compute (and Improve) the Analytic Model
Compute
your
performance
metric Spot not
good enough
performance
Re-compute
model

• Reactive – Run to failure
• Preventive – Scheduled service (reliability)
• Condition-based – Monitor condition (sensors)
• Predictive – Predict failures
• Proactive – Deploy automatic actions
Evolution of Equipment Maintenance Strategies

Scenario: Predictive Scrapping of Parts in an Assembly Line
Goal: Scrap parts as early as possible automatically to reduce costs in a manufacturing process.
Question: When to scrap a part in Station 1 instead of doing re-work or sending it to Station 2?
Station 1 Station 2
Cost Before
9€
7€ 13€
Total Cost
29€
(or more)
Scrap? Scrap?

Fast Data Architecture for Predictive Maintenance
Operational Analytics
Operations
Live UI
CSV Batch
JSON Real Time
XML Real Time
Streaming AnalyticsAction
Aggregate
Rules
Analytics
Correlate
Live Datamart
Continuous query
processing
Alerts
Manual action,
escalation
HISTORICAL ANALYSIS Data
Scientists
Flume
HDFS
Spotfire
R / TERR
HDFS
Hadoop (Cloudera)
StreamBase
TIBCO Fast Data Platform
H2O
Oracle RDBMS
Avro Parquet … PMML
Internal Data

TIBCO Spotfire with H2O Integration
Data Discovery / Data Mining (“Are parts that repeat a station more likely scrap parts?”)

TIBCO Live Datamart
Operational Intelligence (“Monitor the manufacturing process and change rules in real time!”)
Live Dartmart Desktop Client

TIBCO Live Datamart
Operational Intelligence (“Monitor the manufacturing process and change rules in real time!”)
Live Dartmart Web API

TIBCO Spotfire + StreamBase + H2O.ai + Live Datamart
Live DemoLive Demo

TIBCO Accelerator for Apache Spark
1. Fast Data Preparation for IoT
Dozens of enterprise and IoT data preparation adapters:
MQTT, Databases; inbound creation of HDFS, Parquet, Hbase,
Avro…
2. Spotfire Model Discovery Template
Use Spotfire to explore Spark data lake, create predictive
model, train in H20, and deploy to Streaming Analytics.
3. Operationalize Predictive Models
Zookeeper deployment to StreamBase nodes living in Spark
cluster via H20, PMML, TERR models
4. Streaming Analytics for Automation
Automate action based on predictive models – make offers to
customers, stop fraudulent transactions, alert.
5. Monitor & Retrain Model
Monitor behavior of model, retrain when necessary.
6. Drag & Drop for Business Solution Developers
Code-free development environment for work with H20, HDFS,
Avro, TERR
The TIBCO Accelerator for Spark is a TIBCO
engineered, light-weight open-source fast-
start for systems to stream data into Spark,
discover patterns in Spark with Spotfire, and
operationalize the insights on Big Data.
FUNCTIONAL COMPONENTS

Evolving Demands from the Business
AGILITY &
SPEED
REDUCED
CYCLE
TIMES
WEB
SCALE
LOWER
COST
FAIL FAST

Development of
Intelligent Microservices

12 Factor Apps for Cloud Native Microservices
Codebase
One codebase
tracked in
revision control,
many deploys.
Dependencies
Explicitly declare
and isolate
dependencies.
Config
Store config in
the environment.
Backing
Services
Treat backing
services as
attached
resources.
Build, Release,
Run
Strictly separate
build and run
stages.
Processes
Execute the app
as one or more
stateless
processes.
Port Binding
Export services
via port binding.
Concurrency
Scale out via the
process model.
Disposability
Maximize
robustness with
fast startup and
graceful
shutdown.
Dev / Prod
Parity
Keep dev,
staging, and
prod as similar as
possible.
Logs
Treat logs as
event streams.
Admin
Processes
Run
admin/mgmt
tasks as one-off
processes.
https://12factor.net/

Why Containers?
http://www.slideshare.net/andersjanmyr/docker-the-future-of-devops
Containers enable:
• Lightweight deployment
• Automation
• Better resource utilization
• Scaling up and down quickly
• Platform agnostic deployment
• Innovation and Fail Fast Concepts
• Standardization ? Ø The Open Container Initiative (OCI)
Ø Docker Fork Discussions (!!!)

DevOps Elements – Culture and Technology!
Process
Tools
Automation
Culture
Continuous Integration/
Continuous Development
APIs
MicroservicesFrequent releases
Collaboration

Develop fast. Fail fast. Change fast.
Visual Analytics + Visual Coding + DevOps
= Agile Intelligent Microservices

Application of Analytic Models
to other Microservices

Real Time Streaming Analytics
time
1 2 3 4 5 6 7 8 9
Event Streams
Apply your intelligent (micro)service to any event.
Microservice event. Application event. Legacy event. IoT event. You name it.

Questions? Please contact me!
Kai Wähner
Technology Evangelist at TIBCO
kontakt@kai-waehner.de
@KaiWaehner
www.kai-waehner.de
LinkedIn

Apply Machine Learning to Microservices

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apply Machine Learning to Microservices

Similar to Apply Machine Learning to Microservices (20)

More from Kai Wähner

More from Kai Wähner (20)

Recently uploaded

Recently uploaded (20)

Apply Machine Learning to Microservices