Introduction to Data Science
Frank Kienle
Machine Learning
Artificial intelligence is …
the term "artificial intelligence" is applied when a machine mimics "cognitive"
functions that humans associate with other human minds, such as "learning" and
"problem solving"
Machine Learning is …
an algorithm that can learn from data without relying on rules-based programming.
Statistical Modeling is …
formalization of relationships between variables in the form of mathematical
equations.
Machine Learning vs. Statistical Modeling
01/08/2017 Frank Kienle, p. 35
A computer program is said to learn form experience (E) with some class of tasks
(T) and a performance measure (P) if its performance at tasks in T as measured by P
improves with E
Learning = Improving with experience at some task
•  Improve over task T
•  With respect to performance measure P
•  Base on experience E
Example Spam Filtering: Spam is all email the user does not want to receive and
has not asked to receive
•  T: Identify Spam Emails
•  P: % of spam emails that where filtered - % of ham/(non-spam) emails that where incorrectly
filtered out
•  E: a database of emails that were labelled by users
Machine Learning
01/08/2017 p. 36
optical character recognition:
•  categorize images of handwritten characters by the letters represented
face detection:
•  find faces in images (or indicate if a face is present)
customer segmentation:
•  predict, for instance, which customers will respond to a particular promotion
fraud detection:
•  identify credit card transactions (for instance) which may be fraud- ulent in nature
demand prediction:
•  predict demand for individual products
Examples of Machine Learning
01/08/2017 Frank Kienle, p. 37
Batch processing:
Most of the machine learning algorithms assume that we are mining a database.
That is, all our data is available when and if we want it.
Stream processing for e.g. machinery sensors:
data arrives in a stream or streams, and if it is not processed immediately or stored,
then it is lost forever.
Both can be embedding in fault tolerant architectures:
See for example Lambda (http://lambda-architecture.net) architecture or the Kappa
architecture (kappa-architecture.com) for further discussion (discussed in a
separate lecture)
Batch Processing vs Stream Processing
01/08/2017 Frank Kienle, p. 38
Machine Learning Overview
01/08/2017 39
Machine
Learning
Supervised
Regression Classification
Unsuperwised
Clustering
Dimension
Reduction
what is the difference between supervised and un-supervised learning?
what is the difference between regression problem and classification problem?
Unsupervised
•  Clustering & Dimensionality
Reduction
•  SVD
•  PCA
•  K-means
•  Association Analysis
•  Apriori
•  FP-Growth
•  Hidden Markov Model
Supervised
•  Regression
•  Linear
•  Polynomial
•  Decision Trees
•  Random Forests
•  Classification
•  KNN
•  Trees
•  Logistic Regression
•  Naïve Bayes
•  SVM
Machine Learning Algorithms
(small excerpt)
01/08/2017 Frank Kienle, p. 40
ContinuousCategorical
It is all about the assumption of the underlying model
Machine Learning
01/08/2017 Frank Kienle, p. 41
input: x
output: y
What is the best relation (function)
between x and y, which can be
used for mapping new examples of
x to infer a output y
Input to output example
01/08/2017 Frank Kienle, p. 42
Input to output example
01/08/2017 Frank Kienle, p. 43
Model
hypothesis
input: x
output: y
By making an initial hypothesis
on the model structure h(x) we
can infer the model parameters
w
The process to infer the model parameters is denoted as learning in the following
01/08/2017 Frank Kienle, p. 44
Model
hypothesis
input: x
output: y
By applying the model on a new
input variable we obtain a new
estimate:
The process to infer the model parameters is denoted as learning in the following
Applying the learned model to new input data will lead to an inferred result. This
process is denoted as prediction. The term inference and prediction are used as
synonyms in the following.
ˆy
ˆy
Input to output example
01/08/2017 Frank Kienle, p. 45
Model
hypothesis
Supervised learning is the machine learning task of inferring a function from
labeled training data. The training data consist of a set of training examples
How can we derive the ,best’
model parameters
Choose model parameters so that
all used training samples x will
result in a nearby result h(x) to y
y supervises the learning process,
Supervised learning
01/08/2017 Frank Kienle, p. 46
Model
hypothesis
The mean square error (MSE) is the average of the squares of the errors or
deviations.
Supervised learning: cost function and MSE
Cost function
MSE =
1
n
nX
i=1
( ˆyi yi)2
finding the parameters w which
minimizes this cost function will
result in the estimator with the
smallest possible MSE
Typical regression scenario with more input variables
01/08/2017 Frank Kienle, p. 47
x0 Rpm
x1
Gas
x2
Valve
x3
Temp
x4
Watt
y
1 500 5.8 5 200 3
1 900 4.5 9 400 5
1 2500 13 15 400 5
1 3000 95 90 400 100
X =
2
6
6
4
1 500 5.8 200
1 900 4.5 400
1 2500 13 400
1 3000 90 400
3
7
7
5 y =
2
6
6
4
3
5
5
100
3
7
7
5
Typical classification scenario with more input variables
01/08/2017 Frank Kienle, p. 48
x0 Rpm
x1
Gas
x2
Valve
x3
Temp
x4
Watt
x5
Status
y
1 500 5.8 5 200 3 0
1 900 4.5 9 400 5 0
1 2500 13 15 400 5 0
1 3000 95 90 400 100 1
X =
2
6
6
4
1 500 5.8 200
1 900 4.5 400
1 2500 13 400
1 3000 90 400
3
7
7
5
m: training samples (rows)
n: features (columns)
X: design matrix, feature matrix
y: target vector (or sometimes denoted with t)
Supervised Learning: terminology
01/08/2017 Frank Kienle, p. 49
ky h(X, w)k
2
=
0
B
B
B
B
B
B
@
y1
y2
...
...
ym
1
C
C
C
C
C
C
A
0
B
B
B
B
B
B
@
x1,1 x1,2 · · · x1,n
x2,1 x2,2 0 x2,n
...
...
...
...
...
...
...
...
xm,1 xm,2 · · · xm,n
1
C
C
C
C
C
C
A
0
B
B
B
@
w1
w2
...
wn
1
C
C
C
A
2
=
mX
i=1
ym
nX
j=1
xj,iwi
2
.
A computer program is said to learn form experience (E) with some class of tasks
(T) and a performance measure (P) if its performance at tasks in T as measured by P
improves with E
Learning = Improving with experience at some task
•  Improve over task T à model the target
•  With respect to performance measure P à define the cost function
•  Base on experience E à by using historic data
Machine Learning
01/08/2017 Frank Kienle, p. 50
Machine Learning (technical steps)
01/08/2017 Frank Kienle, p. 51
Training Phase
Prediction
data
Pre-
processing
Prepare for
cleaned/correct
information and
provide correct
data format
Learning
Develop new or
decide for
appropriate
mathematical
model
Validation
Control quality
and correctness
of model
(trained)
model
(trained)
model
new data Prediction
52
Source: scikit-learn
01/08/2017
53
Source: scikit-learn
01/08/2017

Machine Learning part 2 - Introduction to Data Science

  • 1.
    Introduction to DataScience Frank Kienle Machine Learning
  • 2.
    Artificial intelligence is… the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving" Machine Learning is … an algorithm that can learn from data without relying on rules-based programming. Statistical Modeling is … formalization of relationships between variables in the form of mathematical equations. Machine Learning vs. Statistical Modeling 01/08/2017 Frank Kienle, p. 35
  • 3.
    A computer programis said to learn form experience (E) with some class of tasks (T) and a performance measure (P) if its performance at tasks in T as measured by P improves with E Learning = Improving with experience at some task •  Improve over task T •  With respect to performance measure P •  Base on experience E Example Spam Filtering: Spam is all email the user does not want to receive and has not asked to receive •  T: Identify Spam Emails •  P: % of spam emails that where filtered - % of ham/(non-spam) emails that where incorrectly filtered out •  E: a database of emails that were labelled by users Machine Learning 01/08/2017 p. 36
  • 4.
    optical character recognition: • categorize images of handwritten characters by the letters represented face detection: •  find faces in images (or indicate if a face is present) customer segmentation: •  predict, for instance, which customers will respond to a particular promotion fraud detection: •  identify credit card transactions (for instance) which may be fraud- ulent in nature demand prediction: •  predict demand for individual products Examples of Machine Learning 01/08/2017 Frank Kienle, p. 37
  • 5.
    Batch processing: Most ofthe machine learning algorithms assume that we are mining a database. That is, all our data is available when and if we want it. Stream processing for e.g. machinery sensors: data arrives in a stream or streams, and if it is not processed immediately or stored, then it is lost forever. Both can be embedding in fault tolerant architectures: See for example Lambda (http://lambda-architecture.net) architecture or the Kappa architecture (kappa-architecture.com) for further discussion (discussed in a separate lecture) Batch Processing vs Stream Processing 01/08/2017 Frank Kienle, p. 38
  • 6.
    Machine Learning Overview 01/08/201739 Machine Learning Supervised Regression Classification Unsuperwised Clustering Dimension Reduction what is the difference between supervised and un-supervised learning? what is the difference between regression problem and classification problem?
  • 7.
    Unsupervised •  Clustering &Dimensionality Reduction •  SVD •  PCA •  K-means •  Association Analysis •  Apriori •  FP-Growth •  Hidden Markov Model Supervised •  Regression •  Linear •  Polynomial •  Decision Trees •  Random Forests •  Classification •  KNN •  Trees •  Logistic Regression •  Naïve Bayes •  SVM Machine Learning Algorithms (small excerpt) 01/08/2017 Frank Kienle, p. 40 ContinuousCategorical
  • 8.
    It is allabout the assumption of the underlying model Machine Learning 01/08/2017 Frank Kienle, p. 41
  • 9.
    input: x output: y Whatis the best relation (function) between x and y, which can be used for mapping new examples of x to infer a output y Input to output example 01/08/2017 Frank Kienle, p. 42
  • 10.
    Input to outputexample 01/08/2017 Frank Kienle, p. 43 Model hypothesis input: x output: y By making an initial hypothesis on the model structure h(x) we can infer the model parameters w The process to infer the model parameters is denoted as learning in the following
  • 11.
    01/08/2017 Frank Kienle,p. 44 Model hypothesis input: x output: y By applying the model on a new input variable we obtain a new estimate: The process to infer the model parameters is denoted as learning in the following Applying the learned model to new input data will lead to an inferred result. This process is denoted as prediction. The term inference and prediction are used as synonyms in the following. ˆy ˆy Input to output example
  • 12.
    01/08/2017 Frank Kienle,p. 45 Model hypothesis Supervised learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples How can we derive the ,best’ model parameters Choose model parameters so that all used training samples x will result in a nearby result h(x) to y y supervises the learning process, Supervised learning
  • 13.
    01/08/2017 Frank Kienle,p. 46 Model hypothesis The mean square error (MSE) is the average of the squares of the errors or deviations. Supervised learning: cost function and MSE Cost function MSE = 1 n nX i=1 ( ˆyi yi)2 finding the parameters w which minimizes this cost function will result in the estimator with the smallest possible MSE
  • 14.
    Typical regression scenariowith more input variables 01/08/2017 Frank Kienle, p. 47 x0 Rpm x1 Gas x2 Valve x3 Temp x4 Watt y 1 500 5.8 5 200 3 1 900 4.5 9 400 5 1 2500 13 15 400 5 1 3000 95 90 400 100 X = 2 6 6 4 1 500 5.8 200 1 900 4.5 400 1 2500 13 400 1 3000 90 400 3 7 7 5 y = 2 6 6 4 3 5 5 100 3 7 7 5
  • 15.
    Typical classification scenariowith more input variables 01/08/2017 Frank Kienle, p. 48 x0 Rpm x1 Gas x2 Valve x3 Temp x4 Watt x5 Status y 1 500 5.8 5 200 3 0 1 900 4.5 9 400 5 0 1 2500 13 15 400 5 0 1 3000 95 90 400 100 1 X = 2 6 6 4 1 500 5.8 200 1 900 4.5 400 1 2500 13 400 1 3000 90 400 3 7 7 5
  • 16.
    m: training samples(rows) n: features (columns) X: design matrix, feature matrix y: target vector (or sometimes denoted with t) Supervised Learning: terminology 01/08/2017 Frank Kienle, p. 49 ky h(X, w)k 2 = 0 B B B B B B @ y1 y2 ... ... ym 1 C C C C C C A 0 B B B B B B @ x1,1 x1,2 · · · x1,n x2,1 x2,2 0 x2,n ... ... ... ... ... ... ... ... xm,1 xm,2 · · · xm,n 1 C C C C C C A 0 B B B @ w1 w2 ... wn 1 C C C A 2 = mX i=1 ym nX j=1 xj,iwi 2 .
  • 17.
    A computer programis said to learn form experience (E) with some class of tasks (T) and a performance measure (P) if its performance at tasks in T as measured by P improves with E Learning = Improving with experience at some task •  Improve over task T à model the target •  With respect to performance measure P à define the cost function •  Base on experience E à by using historic data Machine Learning 01/08/2017 Frank Kienle, p. 50
  • 18.
    Machine Learning (technicalsteps) 01/08/2017 Frank Kienle, p. 51 Training Phase Prediction data Pre- processing Prepare for cleaned/correct information and provide correct data format Learning Develop new or decide for appropriate mathematical model Validation Control quality and correctness of model (trained) model (trained) model new data Prediction
  • 19.
  • 20.