Machine learning 101 sit hvr

Machine Learning
101
Fred Verheul

What we won’t cover…
• Deep learning / Neural Networks
• Specifics of ML-algorithms
• Tools / Libraries / Code
• SAP Products, like HANA / Predictive Analytics / Vora / …
• Ethics, algorithmic transparency & fairness
• Hardware
2

Examples: Recommender systems
3

Examples, continued…
4
SPAM-
filtering
Handwriting
recognition

ML in the news: Deepmind’s AlphaGo
5

Machine Learning
"Field of study that gives computers the ability to learn
without being explicitly programmed” (Arthur Samuel, 1959)
7

What is Machine Learning?
8
Computer
Computer
Traditional Programming
Machine Learning
Data
Data
Program
Output
Program
Output

Sweet spot for Machine Learning
• It’s impossible to write down the rules in code:
• Too many rules
• Too many factors influencing the rules
• Too finely tuned
• We just don’t know the rules (image recognition)
• Lots of labeled data (examples) available (e.g. historical data)
9

Basic Machine Learning ‘workflow’
10
Feature
Vectors
Training
data
Labels
Machine
Learning
Algorithm
Feature
Vectors
New data Prediction
Training Phase
Operational Phase
Predictive
Model

Training Phase in more detail
11
Raw data
Data
preparation Feature
Vectors
Training
Data
Test
data
Model Building
(by ML
algorithm)
Model
Evaluation
Predictive
Model
Feedback loop
data cleansing
data transformation
normalization
feature extraction
aka
‘learning’

CRISP-DM: data mining process
12
ML
important
ML
important

Examples of ML tasks
Supervised learning
Regression 
target is numeric
Classification 
target is categorical
13
Unsupervised learning
Clustering
Dimensionality
reduction

Modeling: so many algorithms…
14

ML Algorithms: by Representation
Collection of candidate models/programs, aka hypothesis space
15
Decision trees
Instance-based
Neural networks
Model ensembles

ML Algorithms: by Evaluation
Evaluation: Quality measure for a model
16
Regression
Example metric: Root Mean Squared Error
RMSE =
Binary classification: confusion matrix
Accuracy: 8 + 971 -> 97,9%
Example: medical test
for a disease
Positive Negative
P
True
positives
TP
False
Negatives
FN
N
False
positives
FP
True
Negatives
TN
True
Class
Predicted class
Accuracy: Better evaluation metrics:
• Precision: 8 / (8 + 19)
• Recall: 8 / (8 + 2)

Optimization: how the algorithm ‘learns’, depends on representation and
evaluation
ML Algorithms: by Optimization
17
Greedy Search,
ex. of
combinatorial
optimization
Gradient Descent (or in general: Convex Optimization)
Linear Programming (or in general:
Constrained/Nonlinear Optimization)

Training error vs test error
18

Data Science for Business
• Focuses more on general principles
than specific algorithms
• Not math-heavy, does contain some
math
• O’Reilly link:
http://shop.oreilly.com/product/063692
0028918.do
• Book website: http://data-science-for-
biz.com/DSB/Home.html
19

Take-aways
• Goal of ML: generalize from training data (not optimization!!)
• Part of ‘Data Mining Process’, not a goal in and of itself
• No magic! Just some clever algorithms…
• Increasingly important non-technical aspects:
• Ethics
• Algorithmic transparency
20

Thank You
www.soapeople.com
info@soapeople.com
@SOAPEOPLE
Fred Verheul
Big Data Consultant
+31 6 3919 2986
fred.verheul@soapeople.com
@fredverheul

Machine learning 101 sit hvr

More Related Content

What's hot

Viewers also liked

Similar to Machine learning 101 sit hvr

Recently uploaded

Machine learning 101 sit hvr

Editor's Notes