SlideShare a Scribd company logo
1 of 125
Download to read offline
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
MACHINE LEARNING WITH SAS WORKSHOP
GETTING THE MOST OUT OF YOUR DATA
Longhow Lam
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
AGENDA AND SOME READING MATERIAL
 Intro & positioning of Machine learning
 SAS platform for Machine learning
 Overview of Specific methods
 Some examples
Further reading
An experimental comparison of classification techniques for imbalanced
credit scoring data sets using SAS® Enterprise Miner
http://support.sas.com/resources/papers/proceedings12/129-2012.pdf
Benchmarking state-of-the-art classification algorithms for credit scoring: A ten-year update
http://www.business-school.ed.ac.uk/waf/crc_archive/2013/42.pdf
An absolute recommender for more detail:
The elements of statistical learning, Hasting, Tibshirani & Friedman
http://www-stat.stanford.edu/~tibs/ElemStatLearn/
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
LONGHOW LAM SHORT BIO
 MSc Mathematics (1995) Vrije Universiteit Amsterdam (drs. wiskunde)
 MTD Applied Statistics (1997) Technical University Delft (twee jarige AIO toegepaste statistiek)
 10+ year SAS experience (Base / Stat / Guide/ Miner / VA / VS)
 10+ year R experience ( An introduction to R)
 10 + year predictive modeling experience
 ABNAMRO – Risk modeler
 Basel, Credit risk, ALM models
 Business&Decision – Quantitative consultant
 ING Belgium, Fortis
 Leaseplan, Belgium Post
 Experian – data mininer
 Collection Score, Delphi credit score, consulting
@longhowlamFollow me:
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
INTRO MACHINE LEARNING
Wikipedia:
“Machine learning is a scientific discipline that deals with the construction
and study of algorithms that can learn from data. Such algorithms operate by
building a model based on inputs and using that to make predictions or
decisions, rather than following only explicitly programmed instructions.”
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
MACHINE LEARNING AND SOME OTHER TERMS YOU OFTEN HEAR
Statistical
modeling
Supervised
Learning
Clustering
Unsupervised
Learning
Data mining
Machine
learning
Dimension
reduction
Association
rules
Recommender
Auto
encoders
Self
organizing
maps
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SAS SOFTWARE
FOR MACHINE LEARNING (AND DATA MINING)
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
IDENTIFY /
FORMULATE
PROBLEM
DATA
PREPARATION
DATA
EXPLORATION
TRANSFORM
& SELECT
BUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTS
SAS In-Database Scoring
SAS Decision Manager
BUSINESS
MANAGER
SAS Model Manager
IT SYSTEMS /
MANAGEMENT
SAS Enterprise Guide
BUSINESS
ANALYST
Enterprise Miner / Text Miner
SAS IMSTAT / Recommender
DATA MINER /
DATA SCIENTIST
THE ANALYTICS
LIFECYCLE
SAS Visual Analytics
SAS Visual Statistics
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
EASY TO USE GUI FOR MACHINE LEARNING COMBINED WITH CODE LIBRARIES
PROC hpbnet data = creditdata
structure = markovblanket;
model default = x1 LTV income age;
selction = Y
RUN;
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
MACHINE LEARNING
Machine Learning algorithms designed to run on single
blade or multi blade distributed memory environments
HIGH PERFORMANCE
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Manage
Rules + Data + Models
Deployment flexibility:
Batch
Real Time
Stored Process
In Database
Drive Reuse and
Consistency
EASY DEPLOYABLE
Model
Data
Rules
Model
MACHINE LEARNING WITH SAS
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
PREDICT SOMEONE’S INCOME
Income = 15.2 + 1.102 × Age
Age
Income
Predict someones income from his/her age
 Collect some data
 Plot the data
 Analytical Base Table
IS THIS MACHINE LEARNING?
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
MACHINE LEARNING ADDRESSING SOME MODELING ISSUES
 The problem may not be linear: X2, X3, Log(X), Sqrt(X), 1/X ,…….?
 You do not have one input variable: X1, X2, X3,……X567
 Interactions en correlations between input variables
age
income
male
female
Analytical base table Derived inputs
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
MACHINE LEARNING WHY IT CAN MATTER € € €
Suppose we have an untargeted direct mailing of 100.000 ‘letters’ to randomly
sampled prospects:
 Conversion rate is around 1%. Profit per conversion €80, Cost per mailing is €0.70
 Total ROI = 100.000 X 1% X € 80 − 100.000 X € 0.70 = € 10,000
Now we have a targeted mailing with a machine learning predictive model, that uses
prospect input data that can distinguish between high / low responders.
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
MACHINE LEARNING WHY IT CAN MATTER € € €
Decile N Conversion Profit Cumulative
1 10.000 2.00% 9.000 9.000
2 10.000 1.50% 5.000 14.000
3 10.000 1.00% 1.000 15.000
4 10.000 1.00% 1.000 16.000
5 10.000 1.00% 1.000 17.000
6 10.000 1.00% 1.000 18.000
7 10.000 1.00% 1.000 19.000
8 10.000 0.80% -600 18.400
9 10.000 0.50% -3.000 15.400
10 10.000 0.20% -5.400 10.000
The profit by using a model to sent
letters only to the first 7 deciles is now:
€ 19.000 (instead of € 10.000)
If you have 100 of such campaigns a
year that means an increase of
€ 0.9 mln !!
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
MACHINE LEARNING WHY IT CAN MATTER € € €
Decile N Conversion Profit Cumulative
1 10.000 3.00% 17.000 17.000
2 10.000 2.00% 9.000 26.000
3 10.000 1.40% 4.200 30.200
4 10.000 1.15% 2.200 32.400
5 10.000 1.00% 1.000 33.400
6 10.000 0.60% -2.200 31.200
7 10.000 0.40% -3.800 27.400
8 10.000 0.30% -4.600 22.800
9 10.000 0.10% -6.200 16.600
10 10.000 0.05% -6.600 10.000
The profit by using a much better model
to sent letters only to the first 5 deciles
is now:
€ 33.400 (instead of € 10.000)
If you have 100 of such campaigns a
year that means an increase of
€ 2.34 mln !!
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
MACHINE LEARNING WHY IT CAN MATTER? € € €
Decile N Conversion Profit Cumulative
1 10.000 3.35% 19.800 19.800
2 10.000 2.23% 10.840 30.640
3 10.000 1.30% 3.400 34.040
4 10.000 1.10% 1.800 35.840
5 10.000 1.00% 1.000 36.840
6 10.000 0.55% -2.600 34.240
7 10.000 0.28% -4.760 29.480
8 10.000 0.25% -5.000 24.480
9 10.000 0.05% -6.600 17.880
10 10.000 0.02% -6.840 11.040
Now lets suppose we have even a
slightly better model than the last one
€ 36.840
If you have 100 of such campaigns a
year that means an increase of
€ 2.68 mln !!
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
OVERVIEW OF SPECIFIC
MACHINE LEARNING METHODS
 Classical regression
 Decision trees
 Dimension reduction
 Bagging & Boosting
 Support vector machines
 K-Nearest Neighbour
 Neural networks / deep learning
 Bayesian networks
 Text mining
 Recommendation engine
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
“CLASSICAL” REGRESSION
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
LINEAR & LOGISTIC REGRESSION
Income = a + b × Age
Age
Income
Age
P(Churn)
1
0
P(Churn) =
1
1+𝐸𝑋𝑃(𝑎+𝑏 × Age)
Numeric target variable Binairy target variable
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SPLINE REGRESSION MODELING NON LINEARITIES
Often there is a non linear relation
• Transformation of inputs: X2 , X3 , log(X) etc…
• Buckets / binning of variables
Y / logit(y)
X
Smoothing Splines
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SPLINE REGRESSION MODELING NON LINEARITIES
Smoothing Splines: Piecewise polynomials that are glued together at knots
Two special cases for λ:
λ = 0 Any function that interpolates the data
λ = ∞ Simple Least square line fit
Choose λ by cross validation
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
OPEL ASTRA CAR EXAMPLESPLINE REGRESSION
Extracted data from car sales site. For many cars we have the
kilometres driven and the car price. For the Opel Astra we have 2360 cars:
 What is the relation between km driven and car sales price?
Too much smoothing and too little smoothing
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
OPEL ASTRA CAR EXAMPLESPLINE REGRESSION
0.2 is the optimal smoothing paramter
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Some other car make/models with
spline estimates of car depreciation
versus kilometres driven.
Hmmm.. my Renault Clio looks nice
but after 50.000 km I only have 46%
of the original value left… 
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
MODELING NON LINEARITIES
In SAS we have TPSLINE, LOESS and the ADAPTIVEREG procedure
to fit multivariate regression splines
Supports:
 More than one input
 linear, logistic, Poisson, GLM regressions
 combines both regression splines and model selection methods.
 supports partitioning of data into training, validation, and testing roles
SPLINE REGRESSION
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
DECISION TREES
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
DECISION TREES
How does it work? A simple example
Suppose we have the following group of people
 50% Response
 50% No Response
We have/know Age and Marital Status
50%
50%
Age≤ 45 Age> 45
30%
70%
60%
40%
Married
Divorced UnMarried
20%
80%
60%
40%
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
DECISION TREES REGRESSION & CLASSIFICATION
Target X1 X2 X3 X4 X5
Y 12 A 456 1.2 X
N 21 B 456 1.5 X
Y 32 A 545 1.3 U
Y 34 C 443 1.1 U
N 23 A 345 1.7 U
N 13 B 567 1.2 X
N 45 A 654 1.9 X
… … … … … …
… … … … … …
Y 46 A 657 2.1 X
A recursive splitting algorithm:
1. Loop trough all inputs
2. Determine per input how to split
3. Take the best input to split
4. On the two new data sets apply 1,2,3 again….
5. Stop somewhere….
• How to split X1 or X2 ?
• When to stop?
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
DECISION TREES
How to split?
Number is usualy 2 or 3.
More splits will exhaust the data too fast
Why split X1 <t1 beter dan X1 <s1?
 Regression: Mean squared Error
 Classification:
 Mis-classification rate,
 Cross-entropy, Chi-Squared
Regression tree: Mean square error
..
.
.
.
.
. . .
.
.
.
.
.. .
Split s1 Split t1
x
Y Y
x
REGRESSION & CLASSIFICATION
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
DECISION TREES
How to split?
Number is usualy 2 or 3.
More splits will exhaust the data too fast
Why split X1 <t1 beter dan X1 <s1?
 Regression: Mean squared Error
 Classification:
 Mis-classification rate,
 Cross-entropy, Chi-Squared
Classification tree: Mis classificatie rate
xSplit s1 Split t1
REGRESSION & CLASSIFICATION
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Decision trees (regressie & classificatie)
When to stop?
 Not too early not too late!
Pruning
Remove parts the tree
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
DECISION TREES SOME COMMON TYPES
CHAID (chi-squared automatic interaction detection)
C4.5 / C5.0
CART (Classification and Regression)
The difference is mainly in the different splitting options
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Decision trees pros and cons
pros
 Interaction between variables
 Interpretable rules
 Missing values easy to incorporate.
cons
 Unstable
 “Lack-of-Smoothnes”
 Fit of obvious (non)linear relations
man vrouw
Inkomen < 45 K Leeftijd < 33
Response rate
Opel Astras
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
DIMENSION REDUCTION
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
PRINCIPLE
COMPONENTS
ANALYSIS
Linear transformation of data to uncorrelated data
The transformation W is such that
 The largest variance is in the first coordinate
 The second largets variance is in the second coordinate
 Etc…
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
PRINCIPLE
COMPONENTS
ANALYSIS
X1
X2
x x x x x x x
x
x
x
x
x
x
x
x
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
PRINCIPLE
COMPONENTS
ANALYSIS
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
PRINCIPLE
COMPONENTS
ANALYSIS
The Math behind
P = X W
𝑝11 𝑝21
.
.
.
.
.
.
𝑝1𝑛 𝑝2𝑛
=
𝑥11 𝑥21.
.
.
.
.
.
𝑥1𝑛 𝑥2𝑛
𝑤11 𝑤21
𝑤12 𝑤22
w11 and w12 are the loadings corresponding to the first principle component.
w21 and w22 are the loadings corresponding to the second principle component.
With two dimensions In general
It turns out that the columns of W
Are the eigenvalue vectors of the matrix XTX
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
PRINCIPLE
COMPONENTS
ANALYSIS
Scaling the inputs is important here
Applications of PCA
 Dimension reduction
 Visualisation
  Outlier / anomalie detectie
 PCA regression
 Use PC instead of the original inputs
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
PRINCIPLE
COMPONENTS
DIMENSION REDUCTION
P = X W
Now only take the first L columns of W
PL = X WL
For example for visualization only use the first
2 or 3 columns so that PL only has 2 or 3
columns that can be visualized in scatter or
contour plots
X
W
P
=
X
WL
PL
=
(10000 by 100 ) (100 by 100)(10000 by 100 )
(10000 by100 ) (100 by2)(10000 by 2)
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SINGULAR VALUE DECOMPOSITION
Matrix SVD decomposition:
Diagonal with r singular values
[ could be a large number]
UA
VT
═ Σ
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SINGULAR VALUE DECOMPOSITION
A datapoint d can now be represented by k dimensional point
Matrix SVD decomposition:
Diagonal with r singular values
[ could be a large number]
UA
VT
═ Σ
Take only k << r singular values
Uk
Ak
VT
k
═
Σk
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SVD EXAMPLE USING MY SON AS AN EXPERIMENT
Original
2448 X 3264 ~ 8 mln numbers
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SVD EXAMPLE USING MY SON AS AN EXPERIMENT
SVD: 15 largest SV’s
1% of the data
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SVD EXAMPLE USING MY SON AS AN EXPERIMENT
SVD: 75 largest V’s
5% of the data
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
VARIABLE
CLUSTERING
TO REDUCE THE DIMENSION
Variabele selection
 I have 500 inputs but maybe there are only ten clusters of inputs
 Within 1 cluster the variables are (strongly) correlated.
 Then use only 1 input per cluster for predictive modeling
X1, X2, X3, ….., X500
X1, X21, X35, X430,…..  X35
X17, X29, X353, X490,….  X29
X37, X95, X251, X393,….  X251
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
VARIABLE
CLUSTERING
TO REDUCE THE DIMENSION
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
VARIABLE
CLUSTERING
TO REDUCE THE DIMENSION
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
BAGGING & BOOSTING
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
COMBINE MODELS BAGGING & BOOSTING
If one model is not good enough: let multiple models vote for a prediction
Bootstrap Aggregation (Bagging)
This makes only sense if underlying models are different enough
and have some predictive power
Random
sample
Final
model
data
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Bagging & Boosting: Random Forests
Random forests ≈ Bagging with trees
Apply underlying steps repeatedly
1. Generate a bootstrap sample
2. Choose randomly m inputs m << P
3. Fit a tree on the bootstrap sample with the m inputs (do not prune)
In case of a classification tree:
 The random forest prediction is the majority vote of all trees
In case of a regression tree:
 The random forest prediction is the average of all trees
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
FOREST VS TREE EXAMPLE ON SIMULATED DATA
Decision tree and Random forest (100
sub trees) fitted on the simulated data
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
FOREST VS TREE EXAMPLE ON SIMULATED DATA
It is clear to see that the forest can produce much smoother predictions.
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
GRADIENT BOOSTING DON’T LET THE FORMULAS INTIMIDATE YOU
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
GRADIENT
BOOSTING
SCHEMATIC OVERVIEW
Gradient Boosting, M iterations m = 1,2,…,M
Inputs
x
r1
Final
model FM
… M
At each succesive iteration a base learner hm
(which is a decision tree) is fit on the pseudo residuals
using inputs x to “correct” the previous learner.
Pseudo residuals rim at each step
r2 rM
Inputs
x
Inputs
x
Fm = Fm-1 + γ·hm
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SUPPORT VECTOR MACHINES
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Support vector machines (SVM)
 Suppose we have a separable classification problem.
 Find a linear decision boundary between the two groups with
maxium margin M. So green line would be better than blue line.
 If not separable you have to allow that some points are on the
wrong side. These points are penalized. SVM still maximizes the
margin M, but with the constraint that total penalty is smaller than
C.
 The input space might not be linear. We could apply non linear
mappings to the inputs: I.e. x2 , x3 , of spline(x).
 The beauty of SVM is that in the calculations of the decision
boundary we do not need to explicitly use these transformations
 “The kernel trick”
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Support vector machines (SVM)
 Suppose we have a separable classification problem.
 Find a linear decision boundary between the two groups with
maxium margin M. So green line would be better than blue line.
 If not separable you have to allow that some points are on the
wrong side. These points are penalized. SVM still maximizes the
margin M, but with the constraint that total penalty is smaller than
C.
 The input space might not be linear. We could apply non linear
mappings to the inputs: I.e. x2 , x3 , of spline(x).
 The beauty of SVM is that in the calculations of the decision
boundary we do not need to explicitly use these transformations
 “The kernel trick”
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Support vector machines (SVM)
 Suppose we have a separable classification problem.
 Find a linear decision boundary between the two groups with
maxium margin M. So green line would be better than blue line.
 If not separable you have to allow that some points are on the
wrong side. These points are penalized. SVM still maximizes the
margin M, but with the constraint that total penalty is smaller than
C.
 The input space might not be linear. We could apply non linear
mappings to the inputs: I.e. x2 , x3 , of spline(x).
 The beauty of SVM is that in the calculations of the decision
boundary we do not need to explicitly use these transformations
 “The kernel trick”
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SVM UNDERLYING MATHEMATICAL OPTIMIZATION PROBLEMS
Separable classification
Non Separable classification
Non Separable classification rewritten using
Lagrange Dual problem
Kernels to model nonlinear behaviour
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
https://www.youtube.com/watch?v=3liCbRZPrZA
Linear not separable, but in 3D space they are!
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
K – NEAREST NEIGHBOUR
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
K-NN METHOD
• No model is fitted. Given a query point x0 , find the k points x1, x2,..., xk that are
closest in distance to x0.
• Classify x0 using the majority vote among the k neighbours
x0
5 nearest neighbours of x0
 3 of them are red
 2 of them are green
 so we predict x0 to be red
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
K-NN METHOD
1 nearest neighbour 15 nearest neighbour
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
K-NN METHOD
Use different numbers k of nearest neighbours test and traning errors
Despite its simplicity, k-nearest-neighbors has been
successful used in problems like
• handwritten digits,
• Satellite image scenes
• EKG patterns
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
K-NN EXAMPLE DUTCH HOUSE PRICES
Extract house for sale prices from a Dutch housing site
 For 108K Dutch postal codes (out of 463K) there are one or more houses for sale.
 How can we estimate the house value for the postal codes without a house price?
For a Postal code with no price estimate the price
by taking the k closest house for sale prices.
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Comparing different nearest neighbours in SAS Enterprise Miner
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
K-NN EXAMPLE DUTCH HOUSE PRICES
 30% of the data was used as validation set
 In Enterprise Miner different values for k were used
 k=5 nearest neighboor has the lowest Average squared error
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
NEURAL NETWORKS
DEEP LEARNING
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
NEURAL NETWORK LINEAR REGRESSION
f Y = f(X,w) = w1 + w2X2 + w3X3 + w4X41
X2
X3
X4
w4
w3
w1
w2 Neural network compute node
f is the so-called activation function.
This could be the logit function, but
other choices are possible
There are four weights w’s that have
to be determined
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
NEURAL NETWORKS MATHEMATICAL FORMULATION
In formula the prediction forumla for a NN is geiven by
Leeftijd
Inkomen
Regio
Geslacht
X1
X2
X3
X4
Z1
Z2
Z3
Y
N
X inputs Hidden layer z outputs
α1
β1
P Y X) = 𝑔 𝑇𝑌
𝑇𝑌 = 𝛽0𝑌 + 𝛽 𝑌
𝑇
𝑍
𝑍 𝑚 = 𝜎 𝛼0𝑚 + 𝛼 𝑚
𝑇
𝑋
De functions g and σ are defined as
𝑔 𝑇𝑌 =
𝑒 𝑇 𝑌
𝑒 𝑇 𝑁+𝑒 𝑇 𝑌
, 𝜎(𝑥) =
1
1+𝑒−𝑥
In case of a binary classifier 𝑃 𝑁 𝑋 = 1 − 𝑃(𝑌|𝑋)
The model weights α and β have to be estimated from the data
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
NEURAL NETWORKS ESTIMATING THE WEIGHTS
Back propagation algorithm
 Randomly choose small values for all wi’ s
 For each data point (observation)
1. Calculate the neural net prediction
2. Calculate the error E (for example: E = (actual – prediction)2)
3. Adjust weights w according to:
4. Stop if error E is small enough.
𝑤𝑖
𝑛𝑒𝑤
= 𝑤𝑖 + ∆𝑤𝑖
∆𝑤𝑖 = −𝛼
𝜕𝐸
𝜕𝑤𝑖
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
DEEP LEARNING NEURAL NET WORK WITH MORE THAN 2 HIDDEN LAYERS
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
NEURAL NETS AUTOENCODERS
http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf
Neural networks that use inputs to predict the inputs
X1
X2
X3
X4
X1
X2
X3
X4
ENCODE DECODE
Linear activation function  corresponds with 2 dimensional principle components analysis
2 dimensional middle layer
For visualisation
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
NEURAL NETS AUTOENCODERS
http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf
Often more hidden layers with many nodes
ENCODE DECODE
INPUT OUTPUT = INPUT
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
NEURAL NET CARS EXAMPLE
2 dimensional PCA
Autoencoder network
25 – 15 – 2 – 15 – 25
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
NEURAL NETS AUTOENCODER EXAMPLE
• 1000 images of digits
• Each image has 400 pixels
• So a 400 dimensional input vector X = (x1,…,x400)
• Compare two dimensional PCA with an neural net auto encoder
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
NEURAL NETS AUTOENCODER EXAMPLE
proc neural
data= autoencoderTraining
dmdbcat= work.autoencoderTrainingCat;
performance compile details cpucount= 12 threads= yes;
/* DEFAULTS: ACT= TANH COMBINE= LINEAR */
/* IDS ARE USED AS LAYER INDICATORS – SEE FIGURE 6 */
/* INPUTS AND TARGETS SHOULD BE STANDARDIZED */
archi MLP hidden= 5;
hidden 300 / id= h1;
hidden 100 / id= h2;
hidden 2 / id= h3 act= linear;
hidden 100 / id= h4;
hidden 300 / id= h5;
input corruptedPixel1 - corruptedPixel400 / id= i level= int std=
std;
target pixel1-pixel400 / act= identity id= t level= int std= std;
/* BEFORE PRELIMINARY TRAINING WEIGHTS WILL BE RANDOM */
initial random= 123;
prelim 10 preiter= 10;
run;
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Two dimensional representation of 400 dimensial ‘digit’ data
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
BAYESIAN NETWORKS
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
BAYESIAN NETWORKS -- ACYCLIC GRAPHICAL MODELS
• Nodes represent random variables,
• Links between nodes represent conditional dependencies,
• Conditional probabilty tables are derived from training data for each node,
• Random variables are typically
binary or discrete,
• The graph structure can be
learned from the data,
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
TEXT MINING
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
TEXT MINING BASICS
“Advanced” word counting
 Parse & Filter
 Part of speech
 Entity detection
 Mixed / numeric / abbrev.
 Stemming
 Spell checks, Stop list, Synonim list
 Multi-term words
 Apply Traditional data mining
 Clustering
 Prediction / machine learning
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
TEXT MINING BASICS
Document 1: “Ik loop over straat in Amsterdam, 1057DK, met mijn fiets”
Document 2: “Zij liep niet maar fietste met haar blauwe fieets, //bitly.com/sdrtw”
Document 3: “Mijn tweewieler is kapot, wat een slecht stuk ijzer, @#$%$@!”
Terms Doc 1 Doc 2 Doc 3
+Fiets (znmw) 1 1 1
Fietsen (ww) 0 1 0
Blauwe (bvg) 0 1 0
Amsterdam (locatie) 1 0 0
+Lopen (ww) 1 1 0
Straat (znmw) 1 0 0
Kapot (bijw) 0 0 1
Slecht 0 0 1
Stuk Ijzer 0 0 1
1057DK (postcode) 1 0 0
//bitly.com/sdrtw (Internet) 0 1 0
TERM DOCUMENT MATRIX: A
• Each text document is (very) long vector
of word counts (often with many zeros!)
• Apply further mining on this matrix A.
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
TEXT MINING TERM DOCUMENT MATRIX A
It is not useful to apply data mining techniques directly on the term document
matrix
• Often more terms than documents
• Rows could be strongly correlated
• Matrix is often very sparse
Apply Singular value decomposition first.
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
TEXT MINING SVD ON THE TERM DOCUMENT MATRIX A
A document d is not a long vector of m word counts but a much shorter vector 𝑑,
say of length 300.
Matrix SVD decompositie:
Diagonal with r singular values
[ could be many thousands ]
UA
VT
═ Σ
take only the first k << r singular values
Uk
Ak
VT
k
═
Σk
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
TEXT MINING APPLICATIONS
Combine customer structured data and unstructured data to better predict behaviour (churn / fraud)
Apply machine learning to create
a model f to predict the target
Automatically generate topics within large document collections
Apply clustering techniques to classify
documents into clusters (topics)
Topic 1 Topic 2 Topic 3
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
RECOMMENDATION ENGINE
Which product should I recommend my customers?
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
RECOMMENDATION
ENGINE
USER – ITEM MATRIX EXPLICIT RECOMMENDATIONS
 Users rated items (products) explicitly
 Matrix is often very sparse
 1 mln users 100K items  ~ 0.01%??
User - Item Matrix – Data
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 3 2 5 4 5
User 2 - - - 1 1
User 3 1 - 2 5 -
User 4 - - 1 2 5
User 5 2 1 4 2 3
User 6 2 3 - 5 1
User 7 5 1 - 3 4
User 8 - 1 - 4 1
User 9 2 3 2 4 2
User 10 - 1 3 - 1
User 4's Item Ratings
User 4 - - 1 2 5
After some math…. recommendations are:
User 4 3.21 4.82 1 2 5
Recommend item 2!
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
RECOMMENDATION
ENGINE
ALGORITHMS IN PROC RECOMMEND
Memory-based algorithms
 Slope one (slope1)
 K nearest neighbors (knn)
Model-based algorithms
 Matrix factorization (SVD - LBFGS)
Market basket analysis
 Association rules mining (arm)
Mixture of different methods
 Clustering(cluster)
 Ensemble
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
RE METHODS SLOPE ONE
 Y = x + b with slope equal to 1;
 See notes
 Item-item based
𝑟𝑢𝑖 =
𝑗 𝑤 𝑖𝑗 𝑟 𝑢𝑗
𝑗 𝑤 𝑖𝑗
 Weight wij: the number of users having rated both items i and j;
 Rating ruj : the average rating computed from item j;
Sample rating database
Customer Item A Item B Item C
John 5 3 2
Mark 3 4 ??
Lucy ?? 2 5
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
RE METHODS K NEAREST NEIGHBORS
The rating rui is determined by the ratings “in the neighborhood”
𝑟𝑢𝑖 =
𝑗∈N 𝑖;𝑢 𝑠𝑖𝑚 𝑖𝑗 𝑟 𝑢𝑗
𝑗∈N 𝑖;𝑢 𝑠𝑖𝑚 𝑖𝑗
How to determine the neighbors and how many (k) to use?
How to compute the similarity/distance measure 𝒘𝒊𝒋
• Pearson’s correlation coefficient
• Cosine distance
• Other adjustments
Similarity w
Neighbors N
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
RE METHODS
PEARSON CORRELATION
𝑎, 𝑏 : users
𝑟𝑎,𝑝 : rating of user 𝑎 for item 𝑝
𝑃 : set of items, rated both by 𝑎 and 𝑏
• Possible similarity values between −1 and 1
𝒔𝒊𝒎 𝒂, 𝒃 =
𝒑 ∈𝑷(𝒓 𝒂,𝒑 − 𝒓 𝒂)(𝒓 𝒃,𝒑 − 𝒓 𝒃)
𝒑 ∈𝑷 𝒓 𝒂,𝒑 − 𝒓 𝒂
𝟐
𝒑 ∈𝑷 𝒓 𝒃,𝒑 − 𝒓 𝒃
𝟐
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
RE METHODS K NEAREST NEIGHBORS METHOD
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
RE METHODS MATRIX FACTORIZATION
How do we fill in the missing data?
m  n
R U=
V
m  k k  n
 Select loss function (squared error)
 Select the number of hidden factors k
 Optimization problem
 L-BFGS
 ALS
users
items
𝑅𝑖𝑗 = 𝑈𝑖
𝑇
𝑉𝑗Predict New Rating R:
Minimize prediction error: min
𝑢,𝑣
𝑖,𝑗
(𝑅𝑖𝑗−𝑈𝑖
𝑇
𝑉𝑗)2
+ 𝜆( 𝑈𝑖
2
+ 𝑉𝑗
2
)
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
RE METHODS CLUSTER
Knn within
one subgroup
User/item
profile
User/item
rating
Predictions
Clustering
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
RE METHOD ASSOCIATION RULE MINING (MARKET BASKET ANALYSIS)
Basic steps for assoc rules mining
 Identify frequent itemsets (rules) in the transaction data:
 IF item A and B THEN item C
 IF item X THEN item Y
 Not all rules are interesting, use ‘support’ and ‘lift’ to judge importance of a rule
# trxs. {X}  {Y}
Total # trxs.
Support (X,Y) =
Lift =
Support (X,Y)
Support (X) * Support(Y)
Support & Lift Diapers  Beer 0.8%
Diapers  Candles 0.018%
For example a lift of 2.5 means:
If people have X they are 2.5 more likely
to buy Y than if they don’t have X
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
METHOD ENSEMBLE
 Linear combination of previous methods
 Achieve better performance
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
PROC RECOMMEND recom = rs.IENS;
* Add a recommendation system;
ADD rs.IENS /item = item user = user rating = rating;
* Add tables;
ADDTABLE LHL1209.IENS_UIR / recom = rs.IENS type = rating vars=(item user rating);
* Method SVD LBFGS met 20 factoren ;
METHOD svd /
factors = 20
label = "svd" fconv = 1e-3
gconv = 1e-3 maxiter = 100
MAXFEVAL = 5000 function = L2
lamda = 0.2
technique = lbfgs;
RUN;
METHOD ARM /
label = "ARM" ;
RUN;
/* information on the recommender system */
INFO;
QUIT;
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
/** prediction with the SVD method ***/
PROC RECOMMEND recom = rs.IENS;
PREDICT /
method = svd
label = "svd"
Num = 3
users = ("Longhow Lam");
run;
QUIT;
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
LAST SLIDE 
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
OF MORE MODERN MACHINE LEARNING
CONS
 Unfamilar with broader audiance, (more) difficult to explain
 Black box approach (you are rejected: The computer says NO)
 Often relations can already be modeled with classical regression models
 It allows you to not think about the business problem
PROS
 Often less data prep (manual tuning) neccesary (just throw it in the algorithm…)
 Interactions often “automatically” taken into account
 Superior for Text mining, Image & Speech recognition
 Better lift possible (paar procent “gratis”)
 It allows you to not think about the business problem
(compared to traditional linear /logistic regression)
PROS AND CONS
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
WHY SAS FOR MACHINE LEARNING
• Many different techniques
• Easy to use GUI’s combined with flexible coding
• High performance scalability
• Easy Deployable
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SOME MACHINE LEARNING EXAMPLES
 Text mining
 Image recognition
 Sound recognition
 Strange faces
So can a machine read, see and hear?
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
PREDICTING SENTIMENT FROM
RESTAURANT REVIEWS
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
IENS REVIEWS COLLECTED AROUND 16.000 REVIEWS AND THEIR SCORES
 Used text miner to parse and filter reviews,
 and transform reviews to data points in SVD space.
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Predicted review score vs. Given review score
USE MACHINE LEARNING TO PREDICT TARGET WITH THE 300 INPUTS
R2 Linear regression = 0.5
R2 Neural Net = 0.6
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
Predicted review score vs. Given review score
USE MACHINE LEARNING TO PREDICT TARGET WITH THE 300 INPUTS
R2 Linear regression = 0.5
R2 Neural Net = 0.6
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
IENS REVIEWS APPLY MODEL ON ‘NEW REVIEWS’
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
MNIST DATA IN SAS
MODIFIED NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
MNIST TRAINING DATA
 42.000 pictures of hand-written digits
 Each digit is a picture of 28 by 28 pixels
 So a 784 dimensional vector
First 100 digits of the MNIST data and there KNOWN labels in red
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
MNIST DATA TRYING DIFFERENT LEARNING TECHNIQUES
8 – Nearest Neighbour has the lowest misclassification
rate. 3.6% of the digits in the validation set are mis
classified.
70/30 training/validation split
 PCA regression on 50 largest PC’s
 Seven singel layer neural nets: 3, 6, 12, 24,
48, 100, 200 neurons
 Seven multi layer neural nets
 Three Random forest: 100, 500 and 1000
trees
 8, 16 and 24 nearest neighbors
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
MNIST DATA APPLY MODEL ON TEST SET
28.000 digits without known labels.
Our best model predicted the label for
these digits.
First 100 predicted digits, together with
the handwritten digits are displayed
here.
Red numbers are predicted labels. We
see obvious some mistakes…..
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SPEECH RECOGNITION
DIGITS RECORDED WITH IPHONE
1 2
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SPEECH RECOGNITION
 WAV files consists of ~ 30.000 points  too much redundancy
 Use spectral analysis to convert signal to frequency domain
 Still too much  apply principle components
 TRAIN DATA
 8 spoken ‘ones’ in wav files
 8 spoken ‘twos’ in wav files
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SPEECH RECOGNITION
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
SPEECH RECOGNITION
Zero errors on training data
Zero errors on test data
Also 8 ‘ones’ and 8 ‘twos’
In Enterprise Miner:
Neural network with 9 neurons in one hidden layer
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
STRANGE FACE
DETECTION
COMBO OF OPEN API / R & SAS
Little joke on my colleagues….
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
STRANGE FACE
DETECTION
COMBO OF OPEN API / R & SAS
 Get free API key for Face++
 Their API returns 83 facial landmarks (in JSON format)
Apply advanced analytics on the ABT
Which faces are look-alikes  proc cluster (hierarchical cluster)
Sales faces?  Predictive modeling / machine learning
Who is the Brad Pit?  Nearest Neighbor
Strange faces?  proc neural / auto-encoder
 Create R script to
 Retrieve the SAS faces from our site
 put them trough the Face++ API
 Collect JSON results and store them in an ABT
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
STRANGE FACE
DETECTION
LOOK ALIKE FACES
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
STRANGE FACE
DETECTION
BRAD PIT LOOK A LIKES
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
STRANGE FACE
DETECTION
STRANGE FACES
SAS Faces, Actors Faces
Read more on my blog
Copyright © 2012, SAS Institute Inc. All rights reserv ed.
STRANGE FACE
DETECTION
COMBO OF OPEN API / R & SAS
SAS Faces, Actors Faces
Read more on my blog

More Related Content

What's hot

Intro to Machine Learning & AI
Intro to Machine Learning & AIIntro to Machine Learning & AI
Intro to Machine Learning & AIMostafa Elsheikh
 
Artificial Intelligence power point presentation
Artificial Intelligence power point presentationArtificial Intelligence power point presentation
Artificial Intelligence power point presentationDavid Raj Kanthi
 
Artificial Intelligence (A.I) and Its Application -Seminar
Artificial Intelligence (A.I) and Its Application -SeminarArtificial Intelligence (A.I) and Its Application -Seminar
Artificial Intelligence (A.I) and Its Application -SeminarBIJAY NAYAK
 
Generative AI Risks & Concerns
Generative AI Risks & ConcernsGenerative AI Risks & Concerns
Generative AI Risks & ConcernsAjitesh Kumar
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overviewprih_yah
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Security in the age of Artificial Intelligence
Security in the age of Artificial IntelligenceSecurity in the age of Artificial Intelligence
Security in the age of Artificial IntelligenceFaction XYZ
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learningsafa cimenli
 
Machine Learning Project Lifecycle
Machine Learning Project LifecycleMachine Learning Project Lifecycle
Machine Learning Project LifecycleAbdelhak MAHMOUDI
 
AI and Managerial Decision Making
AI and Managerial Decision MakingAI and Managerial Decision Making
AI and Managerial Decision MakingLee Schlenker
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningHaptik
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithmsShalitha Suranga
 
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...Difference between Artificial Intelligence, Machine Learning, Deep Learning a...
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...Sanjay Srivastava
 
Impact of AI on Business Intelligence
Impact of AI on Business IntelligenceImpact of AI on Business Intelligence
Impact of AI on Business IntelligenceDeesha Mukherjee
 
Overview of computer vision and machine learning
Overview of computer vision and machine learningOverview of computer vision and machine learning
Overview of computer vision and machine learningsmckeever
 

What's hot (20)

Machine learning
Machine learningMachine learning
Machine learning
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Intro to Machine Learning & AI
Intro to Machine Learning & AIIntro to Machine Learning & AI
Intro to Machine Learning & AI
 
OpenAI Gym & Universe
OpenAI Gym & UniverseOpenAI Gym & Universe
OpenAI Gym & Universe
 
Artificial Intelligence power point presentation
Artificial Intelligence power point presentationArtificial Intelligence power point presentation
Artificial Intelligence power point presentation
 
Artificial Intelligence (A.I) and Its Application -Seminar
Artificial Intelligence (A.I) and Its Application -SeminarArtificial Intelligence (A.I) and Its Application -Seminar
Artificial Intelligence (A.I) and Its Application -Seminar
 
Generative AI Risks & Concerns
Generative AI Risks & ConcernsGenerative AI Risks & Concerns
Generative AI Risks & Concerns
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Security in the age of Artificial Intelligence
Security in the age of Artificial IntelligenceSecurity in the age of Artificial Intelligence
Security in the age of Artificial Intelligence
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Machine Learning Project Lifecycle
Machine Learning Project LifecycleMachine Learning Project Lifecycle
Machine Learning Project Lifecycle
 
AI and Managerial Decision Making
AI and Managerial Decision MakingAI and Managerial Decision Making
AI and Managerial Decision Making
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithms
 
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...Difference between Artificial Intelligence, Machine Learning, Deep Learning a...
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...
 
Impact of AI on Business Intelligence
Impact of AI on Business IntelligenceImpact of AI on Business Intelligence
Impact of AI on Business Intelligence
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Overview of computer vision and machine learning
Overview of computer vision and machine learningOverview of computer vision and machine learning
Overview of computer vision and machine learning
 

Viewers also liked

Ch 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-basedCh 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-basedbutest
 
Natural Language Processing and Machine Learning
Natural Language Processing and Machine LearningNatural Language Processing and Machine Learning
Natural Language Processing and Machine LearningKarthik Sankar
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.docbutest
 
Solving problems by searching
Solving problems by searchingSolving problems by searching
Solving problems by searchingLuigi Ceccaroni
 
Ai for Human Communication
Ai for Human CommunicationAi for Human Communication
Ai for Human CommunicationMills Davis
 
Knowledge representation and Predicate logic
Knowledge representation and Predicate logicKnowledge representation and Predicate logic
Knowledge representation and Predicate logicAmey Kerkar
 
Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence Yasir Khan
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AIVishal Singh
 

Viewers also liked (10)

Ch 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-basedCh 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-based
 
Natural Language Processing and Machine Learning
Natural Language Processing and Machine LearningNatural Language Processing and Machine Learning
Natural Language Processing and Machine Learning
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.doc
 
Solving problems by searching
Solving problems by searchingSolving problems by searching
Solving problems by searching
 
Ai for Human Communication
Ai for Human CommunicationAi for Human Communication
Ai for Human Communication
 
Knowledge representation and Predicate logic
Knowledge representation and Predicate logicKnowledge representation and Predicate logic
Knowledge representation and Predicate logic
 
Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AI
 

Similar to Machine learning overview (with SAS software)

2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.NetBruno Capuano
 
The Journey to Big Data Analytics
The Journey to Big Data AnalyticsThe Journey to Big Data Analytics
The Journey to Big Data AnalyticsDr.Stefan Radtke
 
Business Valuation PowerPoint Presentation Slides
Business Valuation PowerPoint Presentation SlidesBusiness Valuation PowerPoint Presentation Slides
Business Valuation PowerPoint Presentation SlidesSlideTeam
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopGarrett Teoh Hor Keong
 
1505 Statistical Thinking course extract
1505 Statistical Thinking course extract1505 Statistical Thinking course extract
1505 Statistical Thinking course extractJefferson Lynch
 
Amazon sage maker infinitely scalable machine learning algorithms
Amazon sage maker infinitely scalable machine learning algorithmsAmazon sage maker infinitely scalable machine learning algorithms
Amazon sage maker infinitely scalable machine learning algorithmsMLconf
 
Track 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptx
Track 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptxTrack 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptx
Track 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptxAmazon Web Services
 
Track 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptx
Track 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptxTrack 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptx
Track 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptxAmazon Web Services
 
2020 09 24 - CONDG ML.Net
2020 09 24 - CONDG ML.Net2020 09 24 - CONDG ML.Net
2020 09 24 - CONDG ML.NetBruno Capuano
 
2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.NetBruno Capuano
 
NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...
NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...
NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...Amazon Web Services
 
High Performance Analytics - The Future of Analytics is Here
High Performance Analytics - The Future of Analytics is HereHigh Performance Analytics - The Future of Analytics is Here
High Performance Analytics - The Future of Analytics is HereSAS Institute India Pvt. Ltd
 
-BAFN 305 - Multiple Regression QuestionsThe attached .docx
-BAFN 305 - Multiple Regression QuestionsThe attached .docx-BAFN 305 - Multiple Regression QuestionsThe attached .docx
-BAFN 305 - Multiple Regression QuestionsThe attached .docxmercysuttle
 
Final Presentation Insight-2
Final Presentation Insight-2Final Presentation Insight-2
Final Presentation Insight-2Carl Schiro
 
Simplifying MBSE Tasks with Capella and MapleMBSE
Simplifying MBSE Tasks with Capella and MapleMBSESimplifying MBSE Tasks with Capella and MapleMBSE
Simplifying MBSE Tasks with Capella and MapleMBSEObeo
 
Ampersand Academy - SAS Course Curriculum
Ampersand Academy - SAS Course CurriculumAmpersand Academy - SAS Course Curriculum
Ampersand Academy - SAS Course CurriculumAmpersand Academy
 
Deep Dive Amazon SageMaker
Deep Dive Amazon SageMakerDeep Dive Amazon SageMaker
Deep Dive Amazon SageMakerCobus Bernard
 
The Enterprise Cloud Journey - Level 200
The Enterprise Cloud Journey - Level 200The Enterprise Cloud Journey - Level 200
The Enterprise Cloud Journey - Level 200Amazon Web Services
 

Similar to Machine learning overview (with SAS software) (20)

2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net
 
The Journey to Big Data Analytics
The Journey to Big Data AnalyticsThe Journey to Big Data Analytics
The Journey to Big Data Analytics
 
Business Valuation PowerPoint Presentation Slides
Business Valuation PowerPoint Presentation SlidesBusiness Valuation PowerPoint Presentation Slides
Business Valuation PowerPoint Presentation Slides
 
Machine Learning: Decision Trees
Machine Learning: Decision TreesMachine Learning: Decision Trees
Machine Learning: Decision Trees
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy Workshop
 
1505 Statistical Thinking course extract
1505 Statistical Thinking course extract1505 Statistical Thinking course extract
1505 Statistical Thinking course extract
 
Amazon sage maker infinitely scalable machine learning algorithms
Amazon sage maker infinitely scalable machine learning algorithmsAmazon sage maker infinitely scalable machine learning algorithms
Amazon sage maker infinitely scalable machine learning algorithms
 
Track 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptx
Track 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptxTrack 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptx
Track 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptx
 
Track 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptx
Track 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptxTrack 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptx
Track 6 Session 3_如何藉由 AWS AI 和機器學習平台搭建多功能的 AI 解決方案.pptx
 
2020 09 24 - CONDG ML.Net
2020 09 24 - CONDG ML.Net2020 09 24 - CONDG ML.Net
2020 09 24 - CONDG ML.Net
 
CSCCIX2005
CSCCIX2005CSCCIX2005
CSCCIX2005
 
2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net
 
NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...
NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...
NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...
 
High Performance Analytics - The Future of Analytics is Here
High Performance Analytics - The Future of Analytics is HereHigh Performance Analytics - The Future of Analytics is Here
High Performance Analytics - The Future of Analytics is Here
 
-BAFN 305 - Multiple Regression QuestionsThe attached .docx
-BAFN 305 - Multiple Regression QuestionsThe attached .docx-BAFN 305 - Multiple Regression QuestionsThe attached .docx
-BAFN 305 - Multiple Regression QuestionsThe attached .docx
 
Final Presentation Insight-2
Final Presentation Insight-2Final Presentation Insight-2
Final Presentation Insight-2
 
Simplifying MBSE Tasks with Capella and MapleMBSE
Simplifying MBSE Tasks with Capella and MapleMBSESimplifying MBSE Tasks with Capella and MapleMBSE
Simplifying MBSE Tasks with Capella and MapleMBSE
 
Ampersand Academy - SAS Course Curriculum
Ampersand Academy - SAS Course CurriculumAmpersand Academy - SAS Course Curriculum
Ampersand Academy - SAS Course Curriculum
 
Deep Dive Amazon SageMaker
Deep Dive Amazon SageMakerDeep Dive Amazon SageMaker
Deep Dive Amazon SageMaker
 
The Enterprise Cloud Journey - Level 200
The Enterprise Cloud Journey - Level 200The Enterprise Cloud Journey - Level 200
The Enterprise Cloud Journey - Level 200
 

More from Longhow Lam

Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptxLonghow Lam
 
A Unifying theory for blockchain and AI
A Unifying theory for blockchain and AIA Unifying theory for blockchain and AI
A Unifying theory for blockchain and AILonghow Lam
 
Data science inspiratie_sessie
Data science inspiratie_sessieData science inspiratie_sessie
Data science inspiratie_sessieLonghow Lam
 
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en IensJaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en IensLonghow Lam
 
text2vec SatRDay Amsterdam
text2vec SatRDay Amsterdamtext2vec SatRDay Amsterdam
text2vec SatRDay AmsterdamLonghow Lam
 
Dataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 AmsterdamDataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 AmsterdamLonghow Lam
 
Data science in action
Data science in actionData science in action
Data science in actionLonghow Lam
 
MasterSearch_Meetup_AdvancedAnalytics
MasterSearch_Meetup_AdvancedAnalyticsMasterSearch_Meetup_AdvancedAnalytics
MasterSearch_Meetup_AdvancedAnalyticsLonghow Lam
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & PythonLonghow Lam
 
Latent transwarp neural networks
Latent transwarp neural networksLatent transwarp neural networks
Latent transwarp neural networksLonghow Lam
 
MathPaperPublished
MathPaperPublishedMathPaperPublished
MathPaperPublishedLonghow Lam
 
Heliview 29sep2015 slideshare
Heliview 29sep2015 slideshareHeliview 29sep2015 slideshare
Heliview 29sep2015 slideshareLonghow Lam
 
Parameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov modelParameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov modelLonghow Lam
 
The analysis of doubly censored survival data
The analysis of doubly censored survival dataThe analysis of doubly censored survival data
The analysis of doubly censored survival dataLonghow Lam
 

More from Longhow Lam (14)

Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptx
 
A Unifying theory for blockchain and AI
A Unifying theory for blockchain and AIA Unifying theory for blockchain and AI
A Unifying theory for blockchain and AI
 
Data science inspiratie_sessie
Data science inspiratie_sessieData science inspiratie_sessie
Data science inspiratie_sessie
 
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en IensJaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
 
text2vec SatRDay Amsterdam
text2vec SatRDay Amsterdamtext2vec SatRDay Amsterdam
text2vec SatRDay Amsterdam
 
Dataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 AmsterdamDataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 Amsterdam
 
Data science in action
Data science in actionData science in action
Data science in action
 
MasterSearch_Meetup_AdvancedAnalytics
MasterSearch_Meetup_AdvancedAnalyticsMasterSearch_Meetup_AdvancedAnalytics
MasterSearch_Meetup_AdvancedAnalytics
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
Latent transwarp neural networks
Latent transwarp neural networksLatent transwarp neural networks
Latent transwarp neural networks
 
MathPaperPublished
MathPaperPublishedMathPaperPublished
MathPaperPublished
 
Heliview 29sep2015 slideshare
Heliview 29sep2015 slideshareHeliview 29sep2015 slideshare
Heliview 29sep2015 slideshare
 
Parameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov modelParameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov model
 
The analysis of doubly censored survival data
The analysis of doubly censored survival dataThe analysis of doubly censored survival data
The analysis of doubly censored survival data
 

Recently uploaded

5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 

Recently uploaded (17)

5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 

Machine learning overview (with SAS software)

  • 1. Copyright © 2012, SAS Institute Inc. All rights reserv ed. MACHINE LEARNING WITH SAS WORKSHOP GETTING THE MOST OUT OF YOUR DATA Longhow Lam
  • 2. Copyright © 2012, SAS Institute Inc. All rights reserv ed. AGENDA AND SOME READING MATERIAL  Intro & positioning of Machine learning  SAS platform for Machine learning  Overview of Specific methods  Some examples Further reading An experimental comparison of classification techniques for imbalanced credit scoring data sets using SAS® Enterprise Miner http://support.sas.com/resources/papers/proceedings12/129-2012.pdf Benchmarking state-of-the-art classification algorithms for credit scoring: A ten-year update http://www.business-school.ed.ac.uk/waf/crc_archive/2013/42.pdf An absolute recommender for more detail: The elements of statistical learning, Hasting, Tibshirani & Friedman http://www-stat.stanford.edu/~tibs/ElemStatLearn/
  • 3. Copyright © 2012, SAS Institute Inc. All rights reserv ed. LONGHOW LAM SHORT BIO  MSc Mathematics (1995) Vrije Universiteit Amsterdam (drs. wiskunde)  MTD Applied Statistics (1997) Technical University Delft (twee jarige AIO toegepaste statistiek)  10+ year SAS experience (Base / Stat / Guide/ Miner / VA / VS)  10+ year R experience ( An introduction to R)  10 + year predictive modeling experience  ABNAMRO – Risk modeler  Basel, Credit risk, ALM models  Business&Decision – Quantitative consultant  ING Belgium, Fortis  Leaseplan, Belgium Post  Experian – data mininer  Collection Score, Delphi credit score, consulting @longhowlamFollow me:
  • 4. Copyright © 2012, SAS Institute Inc. All rights reserv ed. INTRO MACHINE LEARNING Wikipedia: “Machine learning is a scientific discipline that deals with the construction and study of algorithms that can learn from data. Such algorithms operate by building a model based on inputs and using that to make predictions or decisions, rather than following only explicitly programmed instructions.”
  • 5. Copyright © 2012, SAS Institute Inc. All rights reserv ed. MACHINE LEARNING AND SOME OTHER TERMS YOU OFTEN HEAR Statistical modeling Supervised Learning Clustering Unsupervised Learning Data mining Machine learning Dimension reduction Association rules Recommender Auto encoders Self organizing maps
  • 6. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SAS SOFTWARE FOR MACHINE LEARNING (AND DATA MINING)
  • 7. Copyright © 2012, SAS Institute Inc. All rights reserv ed. IDENTIFY / FORMULATE PROBLEM DATA PREPARATION DATA EXPLORATION TRANSFORM & SELECT BUILD MODEL VALIDATE MODEL DEPLOY MODEL EVALUATE / MONITOR RESULTS SAS In-Database Scoring SAS Decision Manager BUSINESS MANAGER SAS Model Manager IT SYSTEMS / MANAGEMENT SAS Enterprise Guide BUSINESS ANALYST Enterprise Miner / Text Miner SAS IMSTAT / Recommender DATA MINER / DATA SCIENTIST THE ANALYTICS LIFECYCLE SAS Visual Analytics SAS Visual Statistics
  • 8. Copyright © 2012, SAS Institute Inc. All rights reserv ed. EASY TO USE GUI FOR MACHINE LEARNING COMBINED WITH CODE LIBRARIES PROC hpbnet data = creditdata structure = markovblanket; model default = x1 LTV income age; selction = Y RUN;
  • 9. Copyright © 2012, SAS Institute Inc. All rights reserv ed. MACHINE LEARNING Machine Learning algorithms designed to run on single blade or multi blade distributed memory environments HIGH PERFORMANCE
  • 10. Copyright © 2012, SAS Institute Inc. All rights reserv ed. Manage Rules + Data + Models Deployment flexibility: Batch Real Time Stored Process In Database Drive Reuse and Consistency EASY DEPLOYABLE Model Data Rules Model MACHINE LEARNING WITH SAS
  • 11. Copyright © 2012, SAS Institute Inc. All rights reserv ed. PREDICT SOMEONE’S INCOME Income = 15.2 + 1.102 × Age Age Income Predict someones income from his/her age  Collect some data  Plot the data  Analytical Base Table IS THIS MACHINE LEARNING?
  • 12. Copyright © 2012, SAS Institute Inc. All rights reserv ed. MACHINE LEARNING ADDRESSING SOME MODELING ISSUES  The problem may not be linear: X2, X3, Log(X), Sqrt(X), 1/X ,…….?  You do not have one input variable: X1, X2, X3,……X567  Interactions en correlations between input variables age income male female Analytical base table Derived inputs
  • 13. Copyright © 2012, SAS Institute Inc. All rights reserv ed. MACHINE LEARNING WHY IT CAN MATTER € € € Suppose we have an untargeted direct mailing of 100.000 ‘letters’ to randomly sampled prospects:  Conversion rate is around 1%. Profit per conversion €80, Cost per mailing is €0.70  Total ROI = 100.000 X 1% X € 80 − 100.000 X € 0.70 = € 10,000 Now we have a targeted mailing with a machine learning predictive model, that uses prospect input data that can distinguish between high / low responders.
  • 14. Copyright © 2012, SAS Institute Inc. All rights reserv ed. MACHINE LEARNING WHY IT CAN MATTER € € € Decile N Conversion Profit Cumulative 1 10.000 2.00% 9.000 9.000 2 10.000 1.50% 5.000 14.000 3 10.000 1.00% 1.000 15.000 4 10.000 1.00% 1.000 16.000 5 10.000 1.00% 1.000 17.000 6 10.000 1.00% 1.000 18.000 7 10.000 1.00% 1.000 19.000 8 10.000 0.80% -600 18.400 9 10.000 0.50% -3.000 15.400 10 10.000 0.20% -5.400 10.000 The profit by using a model to sent letters only to the first 7 deciles is now: € 19.000 (instead of € 10.000) If you have 100 of such campaigns a year that means an increase of € 0.9 mln !!
  • 15. Copyright © 2012, SAS Institute Inc. All rights reserv ed. MACHINE LEARNING WHY IT CAN MATTER € € € Decile N Conversion Profit Cumulative 1 10.000 3.00% 17.000 17.000 2 10.000 2.00% 9.000 26.000 3 10.000 1.40% 4.200 30.200 4 10.000 1.15% 2.200 32.400 5 10.000 1.00% 1.000 33.400 6 10.000 0.60% -2.200 31.200 7 10.000 0.40% -3.800 27.400 8 10.000 0.30% -4.600 22.800 9 10.000 0.10% -6.200 16.600 10 10.000 0.05% -6.600 10.000 The profit by using a much better model to sent letters only to the first 5 deciles is now: € 33.400 (instead of € 10.000) If you have 100 of such campaigns a year that means an increase of € 2.34 mln !!
  • 16. Copyright © 2012, SAS Institute Inc. All rights reserv ed. MACHINE LEARNING WHY IT CAN MATTER? € € € Decile N Conversion Profit Cumulative 1 10.000 3.35% 19.800 19.800 2 10.000 2.23% 10.840 30.640 3 10.000 1.30% 3.400 34.040 4 10.000 1.10% 1.800 35.840 5 10.000 1.00% 1.000 36.840 6 10.000 0.55% -2.600 34.240 7 10.000 0.28% -4.760 29.480 8 10.000 0.25% -5.000 24.480 9 10.000 0.05% -6.600 17.880 10 10.000 0.02% -6.840 11.040 Now lets suppose we have even a slightly better model than the last one € 36.840 If you have 100 of such campaigns a year that means an increase of € 2.68 mln !!
  • 17. Copyright © 2012, SAS Institute Inc. All rights reserv ed. OVERVIEW OF SPECIFIC MACHINE LEARNING METHODS  Classical regression  Decision trees  Dimension reduction  Bagging & Boosting  Support vector machines  K-Nearest Neighbour  Neural networks / deep learning  Bayesian networks  Text mining  Recommendation engine
  • 18. Copyright © 2012, SAS Institute Inc. All rights reserv ed. “CLASSICAL” REGRESSION
  • 19. Copyright © 2012, SAS Institute Inc. All rights reserv ed. LINEAR & LOGISTIC REGRESSION Income = a + b × Age Age Income Age P(Churn) 1 0 P(Churn) = 1 1+𝐸𝑋𝑃(𝑎+𝑏 × Age) Numeric target variable Binairy target variable
  • 20. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SPLINE REGRESSION MODELING NON LINEARITIES Often there is a non linear relation • Transformation of inputs: X2 , X3 , log(X) etc… • Buckets / binning of variables Y / logit(y) X Smoothing Splines
  • 21. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SPLINE REGRESSION MODELING NON LINEARITIES Smoothing Splines: Piecewise polynomials that are glued together at knots Two special cases for λ: λ = 0 Any function that interpolates the data λ = ∞ Simple Least square line fit Choose λ by cross validation
  • 22. Copyright © 2012, SAS Institute Inc. All rights reserv ed. OPEL ASTRA CAR EXAMPLESPLINE REGRESSION Extracted data from car sales site. For many cars we have the kilometres driven and the car price. For the Opel Astra we have 2360 cars:  What is the relation between km driven and car sales price? Too much smoothing and too little smoothing
  • 23. Copyright © 2012, SAS Institute Inc. All rights reserv ed. OPEL ASTRA CAR EXAMPLESPLINE REGRESSION 0.2 is the optimal smoothing paramter
  • 24. Copyright © 2012, SAS Institute Inc. All rights reserv ed. Some other car make/models with spline estimates of car depreciation versus kilometres driven. Hmmm.. my Renault Clio looks nice but after 50.000 km I only have 46% of the original value left… 
  • 25. Copyright © 2012, SAS Institute Inc. All rights reserv ed. MODELING NON LINEARITIES In SAS we have TPSLINE, LOESS and the ADAPTIVEREG procedure to fit multivariate regression splines Supports:  More than one input  linear, logistic, Poisson, GLM regressions  combines both regression splines and model selection methods.  supports partitioning of data into training, validation, and testing roles SPLINE REGRESSION
  • 26. Copyright © 2012, SAS Institute Inc. All rights reserv ed. DECISION TREES
  • 27. Copyright © 2012, SAS Institute Inc. All rights reserv ed. DECISION TREES How does it work? A simple example Suppose we have the following group of people  50% Response  50% No Response We have/know Age and Marital Status 50% 50% Age≤ 45 Age> 45 30% 70% 60% 40% Married Divorced UnMarried 20% 80% 60% 40%
  • 28. Copyright © 2012, SAS Institute Inc. All rights reserv ed. DECISION TREES REGRESSION & CLASSIFICATION Target X1 X2 X3 X4 X5 Y 12 A 456 1.2 X N 21 B 456 1.5 X Y 32 A 545 1.3 U Y 34 C 443 1.1 U N 23 A 345 1.7 U N 13 B 567 1.2 X N 45 A 654 1.9 X … … … … … … … … … … … … Y 46 A 657 2.1 X A recursive splitting algorithm: 1. Loop trough all inputs 2. Determine per input how to split 3. Take the best input to split 4. On the two new data sets apply 1,2,3 again…. 5. Stop somewhere…. • How to split X1 or X2 ? • When to stop?
  • 29. Copyright © 2012, SAS Institute Inc. All rights reserv ed. DECISION TREES How to split? Number is usualy 2 or 3. More splits will exhaust the data too fast Why split X1 <t1 beter dan X1 <s1?  Regression: Mean squared Error  Classification:  Mis-classification rate,  Cross-entropy, Chi-Squared Regression tree: Mean square error .. . . . . . . . . . . . .. . Split s1 Split t1 x Y Y x REGRESSION & CLASSIFICATION
  • 30. Copyright © 2012, SAS Institute Inc. All rights reserv ed. DECISION TREES How to split? Number is usualy 2 or 3. More splits will exhaust the data too fast Why split X1 <t1 beter dan X1 <s1?  Regression: Mean squared Error  Classification:  Mis-classification rate,  Cross-entropy, Chi-Squared Classification tree: Mis classificatie rate xSplit s1 Split t1 REGRESSION & CLASSIFICATION
  • 31. Copyright © 2012, SAS Institute Inc. All rights reserv ed. Decision trees (regressie & classificatie) When to stop?  Not too early not too late! Pruning Remove parts the tree
  • 32. Copyright © 2012, SAS Institute Inc. All rights reserv ed. DECISION TREES SOME COMMON TYPES CHAID (chi-squared automatic interaction detection) C4.5 / C5.0 CART (Classification and Regression) The difference is mainly in the different splitting options
  • 33. Copyright © 2012, SAS Institute Inc. All rights reserv ed. Decision trees pros and cons pros  Interaction between variables  Interpretable rules  Missing values easy to incorporate. cons  Unstable  “Lack-of-Smoothnes”  Fit of obvious (non)linear relations man vrouw Inkomen < 45 K Leeftijd < 33 Response rate Opel Astras
  • 34. Copyright © 2012, SAS Institute Inc. All rights reserv ed. DIMENSION REDUCTION
  • 35. Copyright © 2012, SAS Institute Inc. All rights reserv ed. PRINCIPLE COMPONENTS ANALYSIS Linear transformation of data to uncorrelated data The transformation W is such that  The largest variance is in the first coordinate  The second largets variance is in the second coordinate  Etc…
  • 36. Copyright © 2012, SAS Institute Inc. All rights reserv ed. PRINCIPLE COMPONENTS ANALYSIS X1 X2 x x x x x x x x x x x x x x x
  • 37. Copyright © 2012, SAS Institute Inc. All rights reserv ed. PRINCIPLE COMPONENTS ANALYSIS
  • 38. Copyright © 2012, SAS Institute Inc. All rights reserv ed. PRINCIPLE COMPONENTS ANALYSIS The Math behind P = X W 𝑝11 𝑝21 . . . . . . 𝑝1𝑛 𝑝2𝑛 = 𝑥11 𝑥21. . . . . . 𝑥1𝑛 𝑥2𝑛 𝑤11 𝑤21 𝑤12 𝑤22 w11 and w12 are the loadings corresponding to the first principle component. w21 and w22 are the loadings corresponding to the second principle component. With two dimensions In general It turns out that the columns of W Are the eigenvalue vectors of the matrix XTX
  • 39. Copyright © 2012, SAS Institute Inc. All rights reserv ed. PRINCIPLE COMPONENTS ANALYSIS Scaling the inputs is important here Applications of PCA  Dimension reduction  Visualisation   Outlier / anomalie detectie  PCA regression  Use PC instead of the original inputs
  • 40. Copyright © 2012, SAS Institute Inc. All rights reserv ed. PRINCIPLE COMPONENTS DIMENSION REDUCTION P = X W Now only take the first L columns of W PL = X WL For example for visualization only use the first 2 or 3 columns so that PL only has 2 or 3 columns that can be visualized in scatter or contour plots X W P = X WL PL = (10000 by 100 ) (100 by 100)(10000 by 100 ) (10000 by100 ) (100 by2)(10000 by 2)
  • 41. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SINGULAR VALUE DECOMPOSITION Matrix SVD decomposition: Diagonal with r singular values [ could be a large number] UA VT ═ Σ
  • 42. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SINGULAR VALUE DECOMPOSITION A datapoint d can now be represented by k dimensional point Matrix SVD decomposition: Diagonal with r singular values [ could be a large number] UA VT ═ Σ Take only k << r singular values Uk Ak VT k ═ Σk
  • 43. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SVD EXAMPLE USING MY SON AS AN EXPERIMENT Original 2448 X 3264 ~ 8 mln numbers
  • 44. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SVD EXAMPLE USING MY SON AS AN EXPERIMENT SVD: 15 largest SV’s 1% of the data
  • 45. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SVD EXAMPLE USING MY SON AS AN EXPERIMENT SVD: 75 largest V’s 5% of the data
  • 46. Copyright © 2012, SAS Institute Inc. All rights reserv ed. VARIABLE CLUSTERING TO REDUCE THE DIMENSION Variabele selection  I have 500 inputs but maybe there are only ten clusters of inputs  Within 1 cluster the variables are (strongly) correlated.  Then use only 1 input per cluster for predictive modeling X1, X2, X3, ….., X500 X1, X21, X35, X430,…..  X35 X17, X29, X353, X490,….  X29 X37, X95, X251, X393,….  X251
  • 47. Copyright © 2012, SAS Institute Inc. All rights reserv ed. VARIABLE CLUSTERING TO REDUCE THE DIMENSION
  • 48. Copyright © 2012, SAS Institute Inc. All rights reserv ed. VARIABLE CLUSTERING TO REDUCE THE DIMENSION
  • 49. Copyright © 2012, SAS Institute Inc. All rights reserv ed. BAGGING & BOOSTING
  • 50. Copyright © 2012, SAS Institute Inc. All rights reserv ed. COMBINE MODELS BAGGING & BOOSTING If one model is not good enough: let multiple models vote for a prediction Bootstrap Aggregation (Bagging) This makes only sense if underlying models are different enough and have some predictive power Random sample Final model data
  • 51. Copyright © 2012, SAS Institute Inc. All rights reserv ed. Bagging & Boosting: Random Forests Random forests ≈ Bagging with trees Apply underlying steps repeatedly 1. Generate a bootstrap sample 2. Choose randomly m inputs m << P 3. Fit a tree on the bootstrap sample with the m inputs (do not prune) In case of a classification tree:  The random forest prediction is the majority vote of all trees In case of a regression tree:  The random forest prediction is the average of all trees
  • 52. Copyright © 2012, SAS Institute Inc. All rights reserv ed. FOREST VS TREE EXAMPLE ON SIMULATED DATA Decision tree and Random forest (100 sub trees) fitted on the simulated data
  • 53. Copyright © 2012, SAS Institute Inc. All rights reserv ed. FOREST VS TREE EXAMPLE ON SIMULATED DATA It is clear to see that the forest can produce much smoother predictions.
  • 54. Copyright © 2012, SAS Institute Inc. All rights reserv ed. GRADIENT BOOSTING DON’T LET THE FORMULAS INTIMIDATE YOU
  • 55. Copyright © 2012, SAS Institute Inc. All rights reserv ed. GRADIENT BOOSTING SCHEMATIC OVERVIEW Gradient Boosting, M iterations m = 1,2,…,M Inputs x r1 Final model FM … M At each succesive iteration a base learner hm (which is a decision tree) is fit on the pseudo residuals using inputs x to “correct” the previous learner. Pseudo residuals rim at each step r2 rM Inputs x Inputs x Fm = Fm-1 + γ·hm
  • 56. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SUPPORT VECTOR MACHINES
  • 57. Copyright © 2012, SAS Institute Inc. All rights reserv ed. Support vector machines (SVM)  Suppose we have a separable classification problem.  Find a linear decision boundary between the two groups with maxium margin M. So green line would be better than blue line.  If not separable you have to allow that some points are on the wrong side. These points are penalized. SVM still maximizes the margin M, but with the constraint that total penalty is smaller than C.  The input space might not be linear. We could apply non linear mappings to the inputs: I.e. x2 , x3 , of spline(x).  The beauty of SVM is that in the calculations of the decision boundary we do not need to explicitly use these transformations  “The kernel trick”
  • 58. Copyright © 2012, SAS Institute Inc. All rights reserv ed. Support vector machines (SVM)  Suppose we have a separable classification problem.  Find a linear decision boundary between the two groups with maxium margin M. So green line would be better than blue line.  If not separable you have to allow that some points are on the wrong side. These points are penalized. SVM still maximizes the margin M, but with the constraint that total penalty is smaller than C.  The input space might not be linear. We could apply non linear mappings to the inputs: I.e. x2 , x3 , of spline(x).  The beauty of SVM is that in the calculations of the decision boundary we do not need to explicitly use these transformations  “The kernel trick”
  • 59. Copyright © 2012, SAS Institute Inc. All rights reserv ed. Support vector machines (SVM)  Suppose we have a separable classification problem.  Find a linear decision boundary between the two groups with maxium margin M. So green line would be better than blue line.  If not separable you have to allow that some points are on the wrong side. These points are penalized. SVM still maximizes the margin M, but with the constraint that total penalty is smaller than C.  The input space might not be linear. We could apply non linear mappings to the inputs: I.e. x2 , x3 , of spline(x).  The beauty of SVM is that in the calculations of the decision boundary we do not need to explicitly use these transformations  “The kernel trick”
  • 60. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SVM UNDERLYING MATHEMATICAL OPTIMIZATION PROBLEMS Separable classification Non Separable classification Non Separable classification rewritten using Lagrange Dual problem Kernels to model nonlinear behaviour
  • 61. Copyright © 2012, SAS Institute Inc. All rights reserv ed. https://www.youtube.com/watch?v=3liCbRZPrZA Linear not separable, but in 3D space they are!
  • 62. Copyright © 2012, SAS Institute Inc. All rights reserv ed. K – NEAREST NEIGHBOUR
  • 63. Copyright © 2012, SAS Institute Inc. All rights reserv ed. K-NN METHOD • No model is fitted. Given a query point x0 , find the k points x1, x2,..., xk that are closest in distance to x0. • Classify x0 using the majority vote among the k neighbours x0 5 nearest neighbours of x0  3 of them are red  2 of them are green  so we predict x0 to be red
  • 64. Copyright © 2012, SAS Institute Inc. All rights reserv ed. K-NN METHOD 1 nearest neighbour 15 nearest neighbour
  • 65. Copyright © 2012, SAS Institute Inc. All rights reserv ed. K-NN METHOD Use different numbers k of nearest neighbours test and traning errors Despite its simplicity, k-nearest-neighbors has been successful used in problems like • handwritten digits, • Satellite image scenes • EKG patterns
  • 66. Copyright © 2012, SAS Institute Inc. All rights reserv ed. K-NN EXAMPLE DUTCH HOUSE PRICES Extract house for sale prices from a Dutch housing site  For 108K Dutch postal codes (out of 463K) there are one or more houses for sale.  How can we estimate the house value for the postal codes without a house price? For a Postal code with no price estimate the price by taking the k closest house for sale prices.
  • 67. Copyright © 2012, SAS Institute Inc. All rights reserv ed. Comparing different nearest neighbours in SAS Enterprise Miner
  • 68. Copyright © 2012, SAS Institute Inc. All rights reserv ed. K-NN EXAMPLE DUTCH HOUSE PRICES  30% of the data was used as validation set  In Enterprise Miner different values for k were used  k=5 nearest neighboor has the lowest Average squared error
  • 69. Copyright © 2012, SAS Institute Inc. All rights reserv ed.
  • 70. Copyright © 2012, SAS Institute Inc. All rights reserv ed. NEURAL NETWORKS DEEP LEARNING
  • 71. Copyright © 2012, SAS Institute Inc. All rights reserv ed. NEURAL NETWORK LINEAR REGRESSION f Y = f(X,w) = w1 + w2X2 + w3X3 + w4X41 X2 X3 X4 w4 w3 w1 w2 Neural network compute node f is the so-called activation function. This could be the logit function, but other choices are possible There are four weights w’s that have to be determined
  • 72. Copyright © 2012, SAS Institute Inc. All rights reserv ed. NEURAL NETWORKS MATHEMATICAL FORMULATION In formula the prediction forumla for a NN is geiven by Leeftijd Inkomen Regio Geslacht X1 X2 X3 X4 Z1 Z2 Z3 Y N X inputs Hidden layer z outputs α1 β1 P Y X) = 𝑔 𝑇𝑌 𝑇𝑌 = 𝛽0𝑌 + 𝛽 𝑌 𝑇 𝑍 𝑍 𝑚 = 𝜎 𝛼0𝑚 + 𝛼 𝑚 𝑇 𝑋 De functions g and σ are defined as 𝑔 𝑇𝑌 = 𝑒 𝑇 𝑌 𝑒 𝑇 𝑁+𝑒 𝑇 𝑌 , 𝜎(𝑥) = 1 1+𝑒−𝑥 In case of a binary classifier 𝑃 𝑁 𝑋 = 1 − 𝑃(𝑌|𝑋) The model weights α and β have to be estimated from the data
  • 73. Copyright © 2012, SAS Institute Inc. All rights reserv ed. NEURAL NETWORKS ESTIMATING THE WEIGHTS Back propagation algorithm  Randomly choose small values for all wi’ s  For each data point (observation) 1. Calculate the neural net prediction 2. Calculate the error E (for example: E = (actual – prediction)2) 3. Adjust weights w according to: 4. Stop if error E is small enough. 𝑤𝑖 𝑛𝑒𝑤 = 𝑤𝑖 + ∆𝑤𝑖 ∆𝑤𝑖 = −𝛼 𝜕𝐸 𝜕𝑤𝑖
  • 74. Copyright © 2012, SAS Institute Inc. All rights reserv ed. DEEP LEARNING NEURAL NET WORK WITH MORE THAN 2 HIDDEN LAYERS
  • 75. Copyright © 2012, SAS Institute Inc. All rights reserv ed. NEURAL NETS AUTOENCODERS http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf Neural networks that use inputs to predict the inputs X1 X2 X3 X4 X1 X2 X3 X4 ENCODE DECODE Linear activation function  corresponds with 2 dimensional principle components analysis 2 dimensional middle layer For visualisation
  • 76. Copyright © 2012, SAS Institute Inc. All rights reserv ed. NEURAL NETS AUTOENCODERS http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf Often more hidden layers with many nodes ENCODE DECODE INPUT OUTPUT = INPUT
  • 77. Copyright © 2012, SAS Institute Inc. All rights reserv ed. NEURAL NET CARS EXAMPLE 2 dimensional PCA Autoencoder network 25 – 15 – 2 – 15 – 25
  • 78. Copyright © 2012, SAS Institute Inc. All rights reserv ed. NEURAL NETS AUTOENCODER EXAMPLE • 1000 images of digits • Each image has 400 pixels • So a 400 dimensional input vector X = (x1,…,x400) • Compare two dimensional PCA with an neural net auto encoder
  • 79. Copyright © 2012, SAS Institute Inc. All rights reserv ed. NEURAL NETS AUTOENCODER EXAMPLE proc neural data= autoencoderTraining dmdbcat= work.autoencoderTrainingCat; performance compile details cpucount= 12 threads= yes; /* DEFAULTS: ACT= TANH COMBINE= LINEAR */ /* IDS ARE USED AS LAYER INDICATORS – SEE FIGURE 6 */ /* INPUTS AND TARGETS SHOULD BE STANDARDIZED */ archi MLP hidden= 5; hidden 300 / id= h1; hidden 100 / id= h2; hidden 2 / id= h3 act= linear; hidden 100 / id= h4; hidden 300 / id= h5; input corruptedPixel1 - corruptedPixel400 / id= i level= int std= std; target pixel1-pixel400 / act= identity id= t level= int std= std; /* BEFORE PRELIMINARY TRAINING WEIGHTS WILL BE RANDOM */ initial random= 123; prelim 10 preiter= 10; run;
  • 80. Copyright © 2012, SAS Institute Inc. All rights reserv ed. Two dimensional representation of 400 dimensial ‘digit’ data
  • 81. Copyright © 2012, SAS Institute Inc. All rights reserv ed. BAYESIAN NETWORKS
  • 82. Copyright © 2012, SAS Institute Inc. All rights reserv ed. BAYESIAN NETWORKS -- ACYCLIC GRAPHICAL MODELS • Nodes represent random variables, • Links between nodes represent conditional dependencies, • Conditional probabilty tables are derived from training data for each node, • Random variables are typically binary or discrete, • The graph structure can be learned from the data,
  • 83. Copyright © 2012, SAS Institute Inc. All rights reserv ed.
  • 84. Copyright © 2012, SAS Institute Inc. All rights reserv ed. TEXT MINING
  • 85. Copyright © 2012, SAS Institute Inc. All rights reserv ed. TEXT MINING BASICS “Advanced” word counting  Parse & Filter  Part of speech  Entity detection  Mixed / numeric / abbrev.  Stemming  Spell checks, Stop list, Synonim list  Multi-term words  Apply Traditional data mining  Clustering  Prediction / machine learning
  • 86. Copyright © 2012, SAS Institute Inc. All rights reserv ed. TEXT MINING BASICS Document 1: “Ik loop over straat in Amsterdam, 1057DK, met mijn fiets” Document 2: “Zij liep niet maar fietste met haar blauwe fieets, //bitly.com/sdrtw” Document 3: “Mijn tweewieler is kapot, wat een slecht stuk ijzer, @#$%$@!” Terms Doc 1 Doc 2 Doc 3 +Fiets (znmw) 1 1 1 Fietsen (ww) 0 1 0 Blauwe (bvg) 0 1 0 Amsterdam (locatie) 1 0 0 +Lopen (ww) 1 1 0 Straat (znmw) 1 0 0 Kapot (bijw) 0 0 1 Slecht 0 0 1 Stuk Ijzer 0 0 1 1057DK (postcode) 1 0 0 //bitly.com/sdrtw (Internet) 0 1 0 TERM DOCUMENT MATRIX: A • Each text document is (very) long vector of word counts (often with many zeros!) • Apply further mining on this matrix A.
  • 87. Copyright © 2012, SAS Institute Inc. All rights reserv ed. TEXT MINING TERM DOCUMENT MATRIX A It is not useful to apply data mining techniques directly on the term document matrix • Often more terms than documents • Rows could be strongly correlated • Matrix is often very sparse Apply Singular value decomposition first.
  • 88. Copyright © 2012, SAS Institute Inc. All rights reserv ed. TEXT MINING SVD ON THE TERM DOCUMENT MATRIX A A document d is not a long vector of m word counts but a much shorter vector 𝑑, say of length 300. Matrix SVD decompositie: Diagonal with r singular values [ could be many thousands ] UA VT ═ Σ take only the first k << r singular values Uk Ak VT k ═ Σk
  • 89. Copyright © 2012, SAS Institute Inc. All rights reserv ed. TEXT MINING APPLICATIONS Combine customer structured data and unstructured data to better predict behaviour (churn / fraud) Apply machine learning to create a model f to predict the target Automatically generate topics within large document collections Apply clustering techniques to classify documents into clusters (topics) Topic 1 Topic 2 Topic 3
  • 90. Copyright © 2012, SAS Institute Inc. All rights reserv ed. RECOMMENDATION ENGINE Which product should I recommend my customers?
  • 91. Copyright © 2012, SAS Institute Inc. All rights reserv ed. RECOMMENDATION ENGINE USER – ITEM MATRIX EXPLICIT RECOMMENDATIONS  Users rated items (products) explicitly  Matrix is often very sparse  1 mln users 100K items  ~ 0.01%?? User - Item Matrix – Data Item 1 Item 2 Item 3 Item 4 Item 5 User 1 3 2 5 4 5 User 2 - - - 1 1 User 3 1 - 2 5 - User 4 - - 1 2 5 User 5 2 1 4 2 3 User 6 2 3 - 5 1 User 7 5 1 - 3 4 User 8 - 1 - 4 1 User 9 2 3 2 4 2 User 10 - 1 3 - 1 User 4's Item Ratings User 4 - - 1 2 5 After some math…. recommendations are: User 4 3.21 4.82 1 2 5 Recommend item 2!
  • 92. Copyright © 2012, SAS Institute Inc. All rights reserv ed. RECOMMENDATION ENGINE ALGORITHMS IN PROC RECOMMEND Memory-based algorithms  Slope one (slope1)  K nearest neighbors (knn) Model-based algorithms  Matrix factorization (SVD - LBFGS) Market basket analysis  Association rules mining (arm) Mixture of different methods  Clustering(cluster)  Ensemble
  • 93. Copyright © 2012, SAS Institute Inc. All rights reserv ed. RE METHODS SLOPE ONE  Y = x + b with slope equal to 1;  See notes  Item-item based 𝑟𝑢𝑖 = 𝑗 𝑤 𝑖𝑗 𝑟 𝑢𝑗 𝑗 𝑤 𝑖𝑗  Weight wij: the number of users having rated both items i and j;  Rating ruj : the average rating computed from item j; Sample rating database Customer Item A Item B Item C John 5 3 2 Mark 3 4 ?? Lucy ?? 2 5
  • 94. Copyright © 2012, SAS Institute Inc. All rights reserv ed. RE METHODS K NEAREST NEIGHBORS The rating rui is determined by the ratings “in the neighborhood” 𝑟𝑢𝑖 = 𝑗∈N 𝑖;𝑢 𝑠𝑖𝑚 𝑖𝑗 𝑟 𝑢𝑗 𝑗∈N 𝑖;𝑢 𝑠𝑖𝑚 𝑖𝑗 How to determine the neighbors and how many (k) to use? How to compute the similarity/distance measure 𝒘𝒊𝒋 • Pearson’s correlation coefficient • Cosine distance • Other adjustments Similarity w Neighbors N
  • 95. Copyright © 2012, SAS Institute Inc. All rights reserv ed. RE METHODS PEARSON CORRELATION 𝑎, 𝑏 : users 𝑟𝑎,𝑝 : rating of user 𝑎 for item 𝑝 𝑃 : set of items, rated both by 𝑎 and 𝑏 • Possible similarity values between −1 and 1 𝒔𝒊𝒎 𝒂, 𝒃 = 𝒑 ∈𝑷(𝒓 𝒂,𝒑 − 𝒓 𝒂)(𝒓 𝒃,𝒑 − 𝒓 𝒃) 𝒑 ∈𝑷 𝒓 𝒂,𝒑 − 𝒓 𝒂 𝟐 𝒑 ∈𝑷 𝒓 𝒃,𝒑 − 𝒓 𝒃 𝟐
  • 96. Copyright © 2012, SAS Institute Inc. All rights reserv ed. RE METHODS K NEAREST NEIGHBORS METHOD
  • 97. Copyright © 2012, SAS Institute Inc. All rights reserv ed. RE METHODS MATRIX FACTORIZATION How do we fill in the missing data? m  n R U= V m  k k  n  Select loss function (squared error)  Select the number of hidden factors k  Optimization problem  L-BFGS  ALS users items 𝑅𝑖𝑗 = 𝑈𝑖 𝑇 𝑉𝑗Predict New Rating R: Minimize prediction error: min 𝑢,𝑣 𝑖,𝑗 (𝑅𝑖𝑗−𝑈𝑖 𝑇 𝑉𝑗)2 + 𝜆( 𝑈𝑖 2 + 𝑉𝑗 2 )
  • 98. Copyright © 2012, SAS Institute Inc. All rights reserv ed. RE METHODS CLUSTER Knn within one subgroup User/item profile User/item rating Predictions Clustering
  • 99. Copyright © 2012, SAS Institute Inc. All rights reserv ed. RE METHOD ASSOCIATION RULE MINING (MARKET BASKET ANALYSIS) Basic steps for assoc rules mining  Identify frequent itemsets (rules) in the transaction data:  IF item A and B THEN item C  IF item X THEN item Y  Not all rules are interesting, use ‘support’ and ‘lift’ to judge importance of a rule # trxs. {X}  {Y} Total # trxs. Support (X,Y) = Lift = Support (X,Y) Support (X) * Support(Y) Support & Lift Diapers  Beer 0.8% Diapers  Candles 0.018% For example a lift of 2.5 means: If people have X they are 2.5 more likely to buy Y than if they don’t have X
  • 100. Copyright © 2012, SAS Institute Inc. All rights reserv ed. METHOD ENSEMBLE  Linear combination of previous methods  Achieve better performance
  • 101. Copyright © 2012, SAS Institute Inc. All rights reserv ed. PROC RECOMMEND recom = rs.IENS; * Add a recommendation system; ADD rs.IENS /item = item user = user rating = rating; * Add tables; ADDTABLE LHL1209.IENS_UIR / recom = rs.IENS type = rating vars=(item user rating); * Method SVD LBFGS met 20 factoren ; METHOD svd / factors = 20 label = "svd" fconv = 1e-3 gconv = 1e-3 maxiter = 100 MAXFEVAL = 5000 function = L2 lamda = 0.2 technique = lbfgs; RUN; METHOD ARM / label = "ARM" ; RUN; /* information on the recommender system */ INFO; QUIT;
  • 102. Copyright © 2012, SAS Institute Inc. All rights reserv ed. /** prediction with the SVD method ***/ PROC RECOMMEND recom = rs.IENS; PREDICT / method = svd label = "svd" Num = 3 users = ("Longhow Lam"); run; QUIT;
  • 103. Copyright © 2012, SAS Institute Inc. All rights reserv ed. LAST SLIDE 
  • 104. Copyright © 2012, SAS Institute Inc. All rights reserv ed. OF MORE MODERN MACHINE LEARNING CONS  Unfamilar with broader audiance, (more) difficult to explain  Black box approach (you are rejected: The computer says NO)  Often relations can already be modeled with classical regression models  It allows you to not think about the business problem PROS  Often less data prep (manual tuning) neccesary (just throw it in the algorithm…)  Interactions often “automatically” taken into account  Superior for Text mining, Image & Speech recognition  Better lift possible (paar procent “gratis”)  It allows you to not think about the business problem (compared to traditional linear /logistic regression) PROS AND CONS
  • 105. Copyright © 2012, SAS Institute Inc. All rights reserv ed. WHY SAS FOR MACHINE LEARNING • Many different techniques • Easy to use GUI’s combined with flexible coding • High performance scalability • Easy Deployable
  • 106. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SOME MACHINE LEARNING EXAMPLES  Text mining  Image recognition  Sound recognition  Strange faces So can a machine read, see and hear?
  • 107. Copyright © 2012, SAS Institute Inc. All rights reserv ed. PREDICTING SENTIMENT FROM RESTAURANT REVIEWS
  • 108. Copyright © 2012, SAS Institute Inc. All rights reserv ed. IENS REVIEWS COLLECTED AROUND 16.000 REVIEWS AND THEIR SCORES  Used text miner to parse and filter reviews,  and transform reviews to data points in SVD space.
  • 109. Copyright © 2012, SAS Institute Inc. All rights reserv ed. Predicted review score vs. Given review score USE MACHINE LEARNING TO PREDICT TARGET WITH THE 300 INPUTS R2 Linear regression = 0.5 R2 Neural Net = 0.6
  • 110. Copyright © 2012, SAS Institute Inc. All rights reserv ed. Predicted review score vs. Given review score USE MACHINE LEARNING TO PREDICT TARGET WITH THE 300 INPUTS R2 Linear regression = 0.5 R2 Neural Net = 0.6
  • 111. Copyright © 2012, SAS Institute Inc. All rights reserv ed. IENS REVIEWS APPLY MODEL ON ‘NEW REVIEWS’
  • 112. Copyright © 2012, SAS Institute Inc. All rights reserv ed. MNIST DATA IN SAS MODIFIED NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY
  • 113. Copyright © 2012, SAS Institute Inc. All rights reserv ed. MNIST TRAINING DATA  42.000 pictures of hand-written digits  Each digit is a picture of 28 by 28 pixels  So a 784 dimensional vector First 100 digits of the MNIST data and there KNOWN labels in red
  • 114. Copyright © 2012, SAS Institute Inc. All rights reserv ed. MNIST DATA TRYING DIFFERENT LEARNING TECHNIQUES 8 – Nearest Neighbour has the lowest misclassification rate. 3.6% of the digits in the validation set are mis classified. 70/30 training/validation split  PCA regression on 50 largest PC’s  Seven singel layer neural nets: 3, 6, 12, 24, 48, 100, 200 neurons  Seven multi layer neural nets  Three Random forest: 100, 500 and 1000 trees  8, 16 and 24 nearest neighbors
  • 115. Copyright © 2012, SAS Institute Inc. All rights reserv ed. MNIST DATA APPLY MODEL ON TEST SET 28.000 digits without known labels. Our best model predicted the label for these digits. First 100 predicted digits, together with the handwritten digits are displayed here. Red numbers are predicted labels. We see obvious some mistakes…..
  • 116. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SPEECH RECOGNITION DIGITS RECORDED WITH IPHONE 1 2
  • 117. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SPEECH RECOGNITION  WAV files consists of ~ 30.000 points  too much redundancy  Use spectral analysis to convert signal to frequency domain  Still too much  apply principle components  TRAIN DATA  8 spoken ‘ones’ in wav files  8 spoken ‘twos’ in wav files
  • 118. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SPEECH RECOGNITION
  • 119. Copyright © 2012, SAS Institute Inc. All rights reserv ed. SPEECH RECOGNITION Zero errors on training data Zero errors on test data Also 8 ‘ones’ and 8 ‘twos’ In Enterprise Miner: Neural network with 9 neurons in one hidden layer
  • 120. Copyright © 2012, SAS Institute Inc. All rights reserv ed. STRANGE FACE DETECTION COMBO OF OPEN API / R & SAS Little joke on my colleagues….
  • 121. Copyright © 2012, SAS Institute Inc. All rights reserv ed. STRANGE FACE DETECTION COMBO OF OPEN API / R & SAS  Get free API key for Face++  Their API returns 83 facial landmarks (in JSON format) Apply advanced analytics on the ABT Which faces are look-alikes  proc cluster (hierarchical cluster) Sales faces?  Predictive modeling / machine learning Who is the Brad Pit?  Nearest Neighbor Strange faces?  proc neural / auto-encoder  Create R script to  Retrieve the SAS faces from our site  put them trough the Face++ API  Collect JSON results and store them in an ABT
  • 122. Copyright © 2012, SAS Institute Inc. All rights reserv ed. STRANGE FACE DETECTION LOOK ALIKE FACES
  • 123. Copyright © 2012, SAS Institute Inc. All rights reserv ed. STRANGE FACE DETECTION BRAD PIT LOOK A LIKES
  • 124. Copyright © 2012, SAS Institute Inc. All rights reserv ed. STRANGE FACE DETECTION STRANGE FACES SAS Faces, Actors Faces Read more on my blog
  • 125. Copyright © 2012, SAS Institute Inc. All rights reserv ed. STRANGE FACE DETECTION COMBO OF OPEN API / R & SAS SAS Faces, Actors Faces Read more on my blog

Editor's Notes

  1. Lineaire and logistische regressie worden al sinds jaar en dag gebruikt. Met creatieve constructive van regressive variabele zijn al vrij geode modellen te maken.
  2. Lineaire and logistische regressie worden al sinds jaar en dag gebruikt. Met creatieve constructive van regressive variabele zijn al vrij geode modellen te maken.
  3. Decision tree 20:25 – 20:35
  4. Vanaf eind Jaren 80 al bekend met het werk van Leo Breiman
  5. Wanneer stoppen we? Te vroeg: missen bepaalde structuur in de data. Te laat: de tree wordt te groot en we overfitten Mogelijke stop strategieën Stop met splitsen als er geen echte afname is in MSE of GINI Te kortzichtig omdat in een verdere split de afname alsnog kan komen Maak eerste een grote tree, stop splitsen alleen als een minimum aantal data punten overblijft. Pas pruning toe op de grote tree (snoeien) Knip weer stukken van de boom af, maar alleen als dat niet tot een te grote toename in MSE of GINI leidt.
  6. CHAID (chi-squared automatic interaction detection) Categorical or continuous target Multiple splits Criteria = Chi-Square Stops before a tree gets too large Uses missing values as an additional category CART (Classification and Regression) Categorical or continuous target Binary splits Criteria = Gini Large trees then prune Uses surrogate field for missing values C4.5 / C5.0 Only categorical target Multiple splits Criteria = entropy Large trees then prune Imputes missing values
  7. Rond midden Jaren 90 verschenen hier artiekelen over. Leo Breiman Bagging Predictors Bootstrap Aggregation (Bagging) Neem meerdere (onafhankelijke) random samples uit de data, bijvoorbeeld K samples. Fit op elk sample een model, resulterend in model M1, M2, … ,MK Uiteindelijke voorspelling is een meerderheids-stem of averaging van de K modellen. Boosting Begin met een simpel model (M1), dit model maakt goede en foute beslissingen In een tweede iteratie: geef goed geclassificeerde cases meer gewicht en fit opnieuw een model (M2) Ga zo door tot je de modellen M1, M2,….,MK hebt en neem als uiteindelijke model de (gewogen) meerderheids-vote van deze K modellen.
  8. Pas onderstaande stappen een flink aantal keren toe. Trek random N cases uit de data (met terug leggen, de bootstrap sample) Als er P inputs zijn trek random m << P inputs uit de bootstrap sample. Fit op de bootstrap sample met m inputs een tree (zonder pruning) In geval van een classificatie tree: De random forest predictie is de meerderheidsstem van alle trees In geval van een regression tree: De random forest predictie is het gemiddelde van alle trees
  9. Rond 1995 geintroduceerd door Vladimir Vapnik De punten op de maximale afstands lijnen zijn de support vectors, dit zijn er maar een paar. Vind een lineaire decision boundary tussen twee lineair separabele groepen met maximale margin M. Dus groene lijn is beter dan blauwe lijn. Indien de groepen niet lineair separabel zijn, moet je toestaan dat sommige punten aan de verkeerde kant liggen. Deze punten krijgen een straf. SVM maximaliseert nog steeds de margin M, maar met als restrictie dat de totale straf kleiner is dan een constante. De wereld is niet lineair. We leggen een mapping naar een niet lineaire wereld. De inputs kunnen we transformeren bijvoorbeeld x2 , x3 , of spline(x). Het mooie van SVM is dat in de berekeningen van de decision boundary deze transformaties niet expliciet berekend hoeven te worden.
  10. Rond 1995 geintroduceerd door Vladimir Vapnik De punten op de maximale afstands lijnen zijn de support vectors, dit zijn er maar een paar. Vind een lineaire decision boundary tussen twee lineair separabele groepen met maximale margin M. Dus groene lijn is beter dan blauwe lijn. Indien de groepen niet lineair separabel zijn, moet je toestaan dat sommige punten aan de verkeerde kant liggen. Deze punten krijgen een straf. SVM maximaliseert nog steeds de margin M, maar met als restrictie dat de totale straf kleiner is dan een constante. De wereld is niet lineair. We leggen een mapping naar een niet lineaire wereld. De inputs kunnen we transformeren bijvoorbeeld x2 , x3 , of spline(x). Het mooie van SVM is dat in de berekeningen van de decision boundary deze transformaties niet expliciet berekend hoeven te worden.
  11. Rond 1995 geintroduceerd door Vladimir Vapnik De punten op de maximale afstands lijnen zijn de support vectors, dit zijn er maar een paar. Vind een lineaire decision boundary tussen twee lineair separabele groepen met maximale margin M. Dus groene lijn is beter dan blauwe lijn. Indien de groepen niet lineair separabel zijn, moet je toestaan dat sommige punten aan de verkeerde kant liggen. Deze punten krijgen een straf. SVM maximaliseert nog steeds de margin M, maar met als restrictie dat de totale straf kleiner is dan een constante. De wereld is niet lineair. We leggen een mapping naar een niet lineaire wereld. De inputs kunnen we transformeren bijvoorbeeld x2 , x3 , of spline(x). Het mooie van SVM is dat in de berekeningen van de decision boundary deze transformaties niet expliciet berekend hoeven te worden.
  12. binaire clasificatie  2 output nodes: Y en N 4 input variabelen  4 input nodes: X = (X1,..,X4) 1 hidden layer met 3 hidden nodes: Z = (Z1, Z2, Z3)
  13. With Explicit rating
  14. First Compute the difference between two items A and B: [ 2 + (-1) ] / 2 = 0.5; r_{LucyA} = 2+0.5 = 2.5 First Compute the difference between two items A and C: 3; r_{LucyC} = 5+3 = 8; Weighted sum: r_{LucyA} = ( 2.5 * 2 + 8 ) / 3 = 4.33