SlideShare a Scribd company logo
1 of 210
Download to read offline
Basics of
Machine Learning
Dmitry Ryabokon, github.com/dryabokon

ξ
ξ
Content
1.1 Approaches for ML
1.2 Bayesian approach
1.3 Classifying the independent binary features
2.1 Naive Bayesian classifier
2.2 Gaussian classifier
2.3 Linear discrimination
2.4 Logistic regression
2.5 Ensemble learning
2.6 Decision Tree
2.7 Random Forest
2.7 Adaboost
3.1 ROC-Curve
3.2 Precision, recall, Accuracy, F1 score
3.3 Benchmarking the Classifiers
4.1 Unsupervised learning
4.2 Non-Bayesian approach
4.3 Time Series
5.1 Bias vs Variance
5.2 Regularization
5.3 Multi-collinearity
5.4 Variable importance chart
5.5 Variance inflation factor VIF
5.6 Reducing data dimensions
5.7 Imbalanced data set
5.8 Stratified sampling
5.9 Correlation: continuous & categorical
5.10 Missing values: the approach
5.11 Back propagation
Approaches for ML
4
Nxxx ,...,, 21
Nkkk ,...,, 21
features and
states
   kxfkxp ,
 Kk ,...,2,1 domain for k
The model is parametrized by
K ,...,, 21
   

N
i
iiK kxp
K 1,...,,
*
21 ,maxarg,...,,
21 

input
output
Approaches for ML
Maximum Likelihood
5
 
N
i
ii kxp
K 1,...,,
,maxarg
21 
    






N
i
iii kxpkp
K 1,...,, 21
maxarg

      







   KK XxXxXx
Kxpxpxp
2121
21maxarg
,...,, 
  






N
i
ii kxp
K 1,...,, 21
maxarg

 

kXx
k xf 

,maxarg
*
Approaches for ML
Maximum Likelihood
6
46 46
45
37 37
36
37
      2
11 exp,1   xxfkxp       2
22 exp,2   xxfkxp
   

7
1
21 ,maxarg,
i
ii kxp
      222
1 454645expmaxarg 


Approaches for ML
Maximum Likelihood: example
7
46 46
45
37 37
36
37
      222
1 454645expmaxarg 


      222
1 454645minarg 


1
3
454645
1 X
x


Approaches for ML
Maximum Likelihood: example
8
  

,1minmaxarg
1
*
1 

kxp
Xx
Approaches for ML
Bias in training data
Nxxx ,...,, 21
Nkkk ,...,, 21
   kxfkxp ,
 Kk ,...,2,1
The model is parametrized by
K ,...,, 21
input
output
features and
states
domain for k
9
  

,1minmaxarg
1
*
1 

kxp
Xx
   
   1,2
1,1
2
1


Nkxp
Nkxp


1
2
Approaches for ML
Bias in training data: example
10
  

,1minmaxarg
1
*
1 

kxp
Xx
   
   1,2
1,1
2
1


Nkxp
Nkxp


1
2
center of the circle that
holds all the points from X1
Approaches for ML
Bias in training data: example
11
Nxxx ,...,, 21
Nkkk ,...,, 21
 Kk ,...,2,1
 kkW , penalty function
decision strategy is parametrized by  Kxq , 
Approaches for ML
ERM: Empirical risk minimization
features and
states
domain for k
12
        

Xx Kk
kxqWkxpqRisk ,,, 
        

N
i
ii
N
i
iiii kxqWkxqWkxp
11
,,,,, 
 

qRiskminarg*

  Kxq ,
Approaches for ML
ERM: Empirical risk minimization
decision strategy is parametrized by 
13


 






kk
kk
kkW
1
0
,
   ,qRisk Number of errors
Approaches for ML
ERM: Empirical risk minimization
𝑞( ҧ𝑥, ത𝛼, 𝜃) = ቊ
−1 𝑤ℎ𝑒𝑛 ҧ𝑥, ത𝛼 < 𝜃
+1 𝑤ℎ𝑒𝑛 ҧ𝑥, ത𝛼 > 𝜃
Bayesian approach
15
Definitions
Xx
k
feature
state (hidden)
 kxp , joint probability
RW : penalty function
       

k Xx
xqkWkxpqRisk ,,
Xq: decision strategy
Bayesian approach
16
       

k Xxxqxq
xqkWkxpxqRiskxq ,,minarg)(minarg)(
)()(
*
input
output
Bayesian approach
The problem
Xx
k
feature
state (hidden)
 kxp , joint probability
RW : penalty function
Xq: decision strategy
17
Bayes' theorem
law of total probability
   
   



k
kxpkp
kxpkp
xkp )(
      
k
kxpkpxp
joint probability         xkpxpkxpkpkxp ,
p(k) prior probability
p(k|x) posterior probability
p(x|k) conditional probability of x, assuming k
p(x) probability of x
Bayesian approach
Some basics
18
 12,11,10,9,8X time
  %1081  xkp
  %9082  xkp
5)1,1(  kkW 100)1,2(  kkW
5)2,1(  kkW 0)2,2(  kkW
К = {1-revision, 2-free_ride}
Bayesian approach
Example
19
Bayesian approach
Flexibility to not make a decision
𝑊(𝑘, 𝑘′) = ቐ
0 𝑤ℎ𝑒𝑛 𝑘 = 𝑘′
1 𝑤ℎ𝑒𝑛 𝑘 ≠ 𝑘′
𝜀 𝑤ℎ𝑒𝑛 𝑘 = 𝑟𝑒𝑗𝑒𝑐𝑡
𝑅𝑖𝑠𝑘 𝑟𝑒𝑗𝑒𝑐𝑡 = ෍
𝑘∈𝐾
𝑝(𝑘ȁ 𝑥) ∙ 𝑊 𝑘, 𝑟𝑒𝑗𝑒𝑐𝑡 =
= ෍
𝑘∈𝐾
𝑝(𝑘ȁ 𝑥) ∙ 𝜀 = 𝜀
20
)(min kRisk
k
if
then    

kk
kkWxkpk ,minarg*
else
0 
Bayesian approach
Flexibility to not make a decision
reject
)(min kRisk
k
if
then    

kk
kkWxkpk ,minarg*
else reject
21
kkkkw ),(
 2
),( kkkkw 






kk
kk
kkw
1
0
),(






kk
kk
kkw
1
0
),(
median
average
the most probable sample
center of the most probable section Δ
Bayesian approach
Decision strategies for different penalty functions
Classifying the
independent
binary features
23
 Nxxxx ,...,, 21 feature – a set of independent binary values
       kxpkxpkxpkxp N 21
 2,1k Few states are possible
 
 

2
1
2
1




kxp
kxp
Decision strategy
Classifying the independent binary features
The problem
24
 
 
   
   






22
11
log
2
1
log
1
1
kxpkxp
kxpkxp
kxp
kxp
n
n

  
 
 
 






2
1
log
2
1
log
1
1
kxp
kxp
kxp
kxp
n
n

   
   
 
 







20
10
log
1021
2011
log
1
1
11
11
1
kxp
kxp
kxpkxp
kxpkxp
x
   
   
 
 20
10
log
1021
2011
log






kxp
kxp
kxpkxp
kxpkxp
x
n
n
nn
nn
N

Classifying the independent binary features
The problem
25
   
   
 
 
log
2
1
20
10
log
1021
2011
log
1 







N
i i
i
ii
ii
i
kxp
kxp
kxpkxp
kxpkxp
x




2
1
1
N
i
ii ax
Classifying the independent binary features
Decision rule
26
1x
2x
1
1
0




2
1
1
N
i
ii ax
Classifying the independent binary features
Example
27
1x
2x
1
1
0
 1xp
 1xp
 2xp  2xp
1/4 3/4
3/4 1/4
1/4 2/4
3/4 2/4
Classifying the independent binary features
Example
28
1x
2x
 1xp
 1xp
 2xp  2xp
1/4 3/4
3/4 1/4
1/4 2/4
3/4 2/4
(1/4)x(1/4) < (2/4)x(3/4)
Classifying the independent binary features
Example
29
1x
2x
 1xp
 1xp
 2xp  2xp
1/4 3/4
3/4 1/4
1/4 2/4
3/4 2/4
(1/4)x(3/4) > (2/4)x(1/4)
Classifying the independent binary features
Example
30
1x
2x
 1xp
 1xp
 2xp  2xp
1/4 3/4
3/4 1/4
1/4 2/4
3/4 2/4
(3/4)x(1/4) < (2/4)x(3/4)
Classifying the independent binary features
Example
31
1x
2x
 1xp
 1xp
 2xp  2xp
1/4 3/4
3/4 1/4
1/4 2/4
3/4 2/4
(3/4)x(3/4) > (2/4)x(1/4)
Classifying the independent binary features
Example
32
1x
2x
Classifying the independent binary features
Example
33
1x
2x
Classifying the independent binary features
Example
34
1x
2x
1
1
0
 1xp
 1xp
 2xp  2xp
5/9 4/8
4/8 4/8
5/9 4/8
4/9 4/8
Classifying the independent binary features
Another example
35
1x
2x
 1xp
 1xp
 2xp  2xp
(4/9)x(4/9) < (4/8)x(4/8)
Classifying the independent binary features
Another example
36
1x
2x
Classifying the independent binary features
Another example
37
1x
2x
Classifying the independent binary features
Another example
Naive Bayesian
classifier
39
 Nxxxx ,...,, 21 feature
       kxpkxpkxpkxp N 21
 2,1k few states are possible
 
 

2
1
2
1




kxp
kxp
decision strategy
Naive Bayesian classifier
The problem
40
1x
2x
2
2
0 1
1
 1xp
 1xp
 2xp  2xp
4/6 1/6
2/9 3/9
1/6 3/9
4/6 3/9
1/6
4/9
1/6 3/9
Naive Bayesian classifier
Example
41
1x
2x
2
2
0 1
1
 1xp
 1xp
 2xp  2xp
4/6 1/6
2/9 3/9
1/6 3/9
4/6 3/9
1/6
4/9
1/6 3/9
Naive Bayesian classifier
Example
Gaussian classifier
43
 Nxxxx ,...,, 21 feature
    







  
n
i
n
j
k
jj
k
ii
k
ji xxakxp
1 1
][][][
,5.0exp 
 kxOM i
k
i ..][

 1][][ 
 kk
BA
   ][][][
.. k
jj
k
ii
k
ij xxOMb  
Gaussian classifier
Definitions
44
    







  
n
i
n
j
jjiiji xxaxp
1 1
,5.0exp 
2 3 4 5
2
3
4
5
Gaussian classifier
Example
45
  311  xMO
  5,322  xMO
    1111111   xxMOb
8/)41000111( 
    4/3222222   xxMOb
    8/122112112   xxMObb
AB 















18/1
8/14/3
47
64
4/38/1
8/11
1
  











 2
221
2
1 )5,3(
1
1
)5,3)(3(
4
1
)3(
4
3
47
64
exp xxxxxp
2 3 4 5
2
3
4
5
Gaussian classifier
46
 
 
  
  

 
 





n
i
n
j
jjiiji
n
i
n
j
jjiiji
xxa
xxa
kxp
kxp
1 1
]2[]2[]2[
,
1 1
]1[]1[]1[
,
2
1
log



2
1


 i i
ii
j
jiij xxx
Gaussian classifier
Decision strategy
Linear
discrimination
ChervonenkisVapnik
48
 sxxxX  ,,, 21 
 rxxxX ,,, 21 
n
R

 
 















n
i
ii
n
i
ii
xxXx
xxXx
1
1
,
,
input
output
Linear discrimination
Perceptron
Rosenblatt
49
n
R

 
 















n
i
ii
n
i
ii
xxXx
xxXx
1
1
,
,
output


Linear discrimination
Perceptron
50















01
01
1
1
1
1
n
n
i
ii
n
n
i
ii
xXx
xXx

















n
i
ii
n
i
ii
xXx
xXx
1
1
=
Linear discrimination
Perceptron
51

  
  

;0,
;0,
1
1
xaxXxif
xaxXxif
ttt
ttt






Linear discrimination
Perceptron: tuning
𝑡 = 0
𝛼 𝑡 = 0
while (sets are not separated by hyperplane)
52
1
Linear discrimination
Perceptron: tuning
53
1
Linear discrimination
Perceptron: tuning
54
1
Linear discrimination
Perceptron: tuning
55
2
Linear discrimination
Perceptron: tuning
56
2
Linear discrimination
Perceptron: tuning
57
2
Linear discrimination
Perceptron: tuning
58
3
Linear discrimination
Perceptron: tuning
59
3
Linear discrimination
Perceptron: tuning
60
Xx
Xx
Dx


max 1*

 
 





0,
0,
*
*


xXx
xXx

2
D
t 
Linear discrimination
Perceptron: convergence
61
CByAx  
a line in R2
x
y
Linear discrimination
Transformation of the feature space
62
FEyDxCyBxyAx  22

x
y
hyperplane in R
5
elliptic curve in R
2
Linear discrimination
Transformation of the feature space
63
  min, 
 
 















n
i
ii
n
i
ii
xxXx
xxXx
1
1
,
,
Separate by the
widest band
Linear discrimination
SVM: support vector machine
64
   
   













xxxXx
xxxXx
n
i
ii
n
i
ii


1
1
,
,
    min,
,
 XX
xconst 
penalty

ξ
ξ

Linear discrimination
SVM: support vector machine
Separate by the
widest band
Logistic
regression
66
    

n
i
n
i
iiii xayyxmrginerrors
1 1
min0,]0),([#
Logistic regression
Target function: empirical risk = number of errors

margin<0

margin<0
67
Logistic regression
Target function: upper bound
0
1
margin
[margin<0]
log(1+exp(-margin))
      min),exp(1log0,
11
  
n
i
ii
n
i
ii xayxay
68
Logistic regression
Relationship with P(y|x)
max
),exp(1
1
log
1










n
i ii xay
    min),exp(1log0,
11
  
n
i
ii
n
i
ii xayxay
   max,log
1

n
i
ii xaysigmoid
   maxlog
1

n
i
ii xyp
   iiii xaysigmoidxyp ,
model
69
Logistic regression
Logistic function: the sigmoid
 xsigmoid
Bias vs Variance
71
Bias vs Variance
Bias is an error from wrong assumptions in the learning algorithm
High bias can cause underfit: the algorithm can miss relations
between features and target
Variance is an error from sensitivity to small fluctuations in the
training set. High variance can cause overfit: the algorithm can
model the random noise in the training data, rather than the
intended outputs
72
Bias vs Variance
How to avoid overfit (high variance)
o Keep the model simpler
o Do cross-validation (e.g. k-folds)
o Use regularization techniques (e.g. LASSO, Ridge)
o Use top n features from variable importance chart
Regularization
74
Regularization
xxx
Multicollinearity
76
Multicollinearity
xxx
Ensemble Learning
78
Ensemble Learning
Bootstrapping: random sampling with replacement
bootstrap set 1
bootstrap set 2
bootstrap set 3
Pick up, clone, put back
79
Ensemble Learning
Bagging: Bootstrap Aggregation
Classifier 1
Classifier 2
Classifier 3
Voting / Aggregation
decision
80
Ensemble Learning
Boosting: weighted average of multiple weak classifiers
81
Ensemble Learning
Boosting: weighted average of multiple weak classifiers
82
Ensemble Learning
Boosting: weighted average of multiple weak classifiers
83
Ensemble Learning
Boosting: weighted average of multiple weak classifiers
84
Ensemble Learning
Bagging Boosting
- Aims to decrease variance - Aims to decrease bias
- Aims to solve over-fitting problem
Decision Tree
86
Decision tree
F1 F2 F3 Target
A A A 0
A A B 0
B B A 1
A B B 1
H(0,0,1,1) = ½ log(½) + ½ log(½) = 1
87
Decision tree
F1 F2 F3 Target Attempt split on F1
A A A 0 0
A A B 0 0
B B A 1 1
A B B 1 0
Information Gain = H(0011) - 3/4 H(001) - 1/4 H(1) = 0.91
88
Decision tree
F1 F2 F3 Target Attempt split on F2
A A A 0 0
A A B 0 0
B B A 1 1
A B B 1 1
Information Gain = H(0011) - 1/2 H(00) - 1/2 H(11) = 1
89
Decision tree
F1 F2 F3 Target Attempt split on F3
A A A 0 0
A A B 0 0
B B A 1 1
A B B 1 1
Information Gain = H(0011) - 1/2 H(01) - 1/2 H(01) = 0
90
Decision tree
F1 F2 F3 Target
A A A 0
A A B 0
B B A 1
A B B 1
91
Decision tree: pruning
92
Decision tree: pruning
50%58% = (7x7 + 3x3)/100 58% = (3x3 + 7x7)/100
50% x 58% + 50% x 58% = 58%
58% > 50% OK
93
Decision tree: pruning
55.5% 62.5%
50% x 58% + 50% (60% x 55.5% + 40% x 62.5%) = 50% x 58% + 50% x 58.3 = 58.16%
58.16% > 58% this is small improvement
58%58%
94
Decision tree: pruning
55.5% 62.5%
58%58%
95
Decision tree: pruning
58%
50% x 58% + 50% (80% x 62.5% + 20% x 50%) = 50% x 58% + 50% x 60% = 59%
59% > 58% OK
62.5% 50%
58%
Random Forest
https://www.youtube.com/watch?v=J4Wdy0Wc_xQ
97
Random forest
Original dataset
F1 F2 F3 Target
A A A 0
A A B 0
B B A 1
A B B 1
Bootstrapped dataset
F1 F2 F3 Target
A B B 1
A A B 0
A A A 0
A B B 1
98
Random forest
Bootstrapped dataset
F1 F2 F3 Target
A B B 1
A A B 0
A A A 0
A B B 1
Determine the best root node out of randomly selected sub-set of candidates
candidate candidate
99
Random forest
Bootstrapped dataset
F1 F2 F3 Target
A B B 1
A A B 0
A A A 0
A B B 1
root
100
Random forest
Bootstrapped dataset
F1 F2 F3 Target
A B B 1
A A B 0
A A A 0
A B B 1
candidate
Build tree as usual, but considering a random subset of features at each step
candidate
101
Random forest
Bagging: do aggregate decisions against the bootstrapped data
102
Random forest
Validation: run aggregated decisions against out-of-bag dataset
Original dataset
F1 F2 F3 Target
A A A 0
A A B 0
B B A 1
A B B 1
Bootstrapped dataset
F1 F2 F3 Target
A B B 1
A A B 0
A A A 0
A B B 1
out-of-bag
103
Random forest
The process
Build random forest
Estimate
Train / Validation error
Adjust RF parameters
# of variables per step,
regularization params, etc
Adaboost
Yoav Freund
Robert Schapire
105
Ni xxxx ,...,,, 21
...3,2,1t
 it xW
Ni kkkk ,...,,, 21
 xht
t
    
t
tt xhxk 
Training features
Iteration number
Weight of the element on iteration t
Weak classifier on iteration t
Weight if a weak classifier on iteration t
   1:0? xxh
= {-1 ,1} Training targets
Adaboost
Definitions
106
      ii
i
t
h
t xhkiWxh  minarg
     )(exp1 itittt xhkiWiW  
Update weights
Weak classifier to minimize
the weighted error
 















kxhif
kxhif
r
r
r
r
iW
t
t
t
t
t
t
t
)(
)(
1
1
1
1
      
i
tii
i
tt iWkxhiWr








t
t
t
r
r
1
1
log
2
1

Calculate the weight od classifier
Adaboost
Algorithm
107
1x
2x
   :?5.0211  xxxht
Adaboost
Example: step 1
108
1x
2x 3
3
3
3
4
4
   :?5.0211  xxxht
Adaboost
Example: step 1
109
   :?5.0211  xxxht
1x
2x
34.033.0 11   ttr 
Adaboost
Example: step 1
110
   :?5.0211  xxxht
34.033.0 11   ttr 
     )(exp 1112 iit xhkiWiW   1x
2x
Adaboost
Example: step 1
111
   :?5.0211  xxxht
34.033.0 11   ttr 
     )(exp 1112 iit xhkiWiW  
1.5
0.75
1.50.75
0.75
1.5
0.75
0.75 0.75
Adaboost
Example: step 1
112
   :?5.0212  xxxht
3
1.5
0.75
1.50.75
0.75
1.5
0.75
0.75 0.75
3.75
3.75
3.75
4.5
3.75
1x20 1
Adaboost
Example: step 2
113
   :?5.0212  xxxht
34.033.0 22   ttr 
Adaboost
Example: step 2
114
   :?5.0212  xxxht
34.033.0 22   ttr 
     )(exp 1112 iit xhkiWiW  
1
0.5 11
1
2
0.5
0.5 0.5
Adaboost
Example: step 2
115
   :?5.0213  xxxht
1
0.5 11
1
2
0.5
0.5 0.5
3
4 3.5
2.5
3.5
3.5
Adaboost
Example: step 3
116
   :?5.0213  xxxht
39.038.0 33   ttr 
Adaboost
Example: step 3
117
    
t
tt xhxk 
(0,0) (1,0) (1,0) (1,0) (0,1) (0,1) (0,1) (0,1) (0,1) (1,1)
1 -1 -1 -1 -1 -1 -1 -1 -1 -1
3th
1h
2th
x
35.01 
2h35.02 
3h39.03 
    
t
tt xhxk 
1 1 1 1 -1 -1 -1 -1 -1 1
1 -1 -1 -1 1 1 1 1 1 1
1 -1 -1 -1 -1 -1 -1 -1 -1 1
Adaboost
Example: result
118
x20 1 43 65
Adaboost
Another example
119
x
4 3 3 4 33
0 1 2 3 4 5 6
Adaboost
120
x
0 1 2 3 4 5 6
   1:1?5.21  xxht
      8
211
1
1 1
  ii
i
iWt kxhiWr
i
     )(exp 1112 iit xhkiWiW    















kxhif
kxhif
r
r
r
r
iW
t
t
t
t
)(
)(
1
1
1
1
1
1
1









t
t
t
r
r
1
1
log
2
1

Adaboost
121
x
0 1 2 3 4 5 6
      8
211
1
1 1
  ii
i
iWt kxhiWr
i
     )(exp 1112 iit xhkiWiW    










kxhif
kxhif
iW
)(
)(
3
5
5
3
1
1
1









t
t
t
r
r
1
1
log
2
1

Adaboost
122
x
0 1 2 3 4 5 6
3.2 4 3.4 3.7 2.93.5
1.3
0.8
1.30.8
0.80.81.31.3
Adaboost
123
x
0 1 2 3 4 5 6
   12  xht
      9.222
1
2 2
  ii
i
iWt kxhiWr
i
  iWt 3  















kxhif
kxhif
r
r
r
r
iW
t
t
t
t
)(
)(
1
1
1
1
2
2
2









t
t
t
r
r
1
1
log
2
1

Adaboost
124
x
0 1 2 3 4 5 6
      9.222
1
2 2
  ii
i
iWt kxhiWr
i
  iWt 3









t
t
t
r
r
1
1
log
2
1

 















kxhif
kxhif
r
r
r
r
iW
t
t
t
t
)(
)(
1
1
1
1
2
2
2
Adaboost
125
x
0 1 2 3 4 5 6
1.7
0.6
11
0.60.611
2.8 3.7 2.6 3.2 3.73.7
Adaboost
126
x
0 1 2 3 4 5 6
   1:1?5.33  xxht
1 1 1 -1 -1 -125.01 
27.02 
33.03 
-1 -1 -1 -1 -1 -1
1 1 1 1 -1 -1
1 1 1 -1 -1 -1
Adaboost
ROC curve
128
ROC curve
FP
TP
FN
TN
score (confidence level)
%
129
Actual
Positive Negative
ROC curve
Error types
130
Actual
true pos TP: 2/3
FN: 1/3false neg
FP: 2/7 false pos
ROC curve
Error types
131
TP: 0/3
FN: 3/3
FP: 0/7
FP
TP
ROC curve
132
TP: 1/3
FN: 2/3
FP: 0/7
FP
TP
ROC curve
133
TP: 2/3
FN: 1/3
FP: 1/7
FP
TP
ROC curve
134
TP: 2/3
FN: 1/3
FP: 2/7
FP
TP
ROC curve
135
TP: 2/3
FN: 1/3
FP: 4/7
FP
TP
ROC curve
136
TP: 3/3
FN: 0/3
FP: 7/7
FP
TP
ROC curve
137
FP
TP
ROC curve
AUC: area under ROC curve
target
138
1x
2x
4/6 1/6
2/9 3/9
1/6 3/9
4/6 3/9
1/6
4/9
1/6 3/9
ROC curve
Example: Naive Bayesian classifier
139
1x
2x
4/6 1/6
2/9 3/9
1/6 3/9
4/6 3/9
1/6
4/9
1/6 3/9
x
k
 xp
xp
2
)1(
(0,0) (1,0) (2,0) (2,0) (0,1) (0,1) (1,1) (1,1) (1,1) (1,1) (2,1), (0,2) (0,2) (1,2) (2,2)
0.19 1.50 0.25 0.25 0.75 0.75 6.00 6.00 6.00 6.00 1.00 0.19 0.19 1.50 0.25
ROC curve
Example: Naive Bayesian classifier
1406.00 6.00 6.00 6.00 1.50 1.50 1.00 0.75 0.75 0.25 0.25 0.25 0.19 0.19 0.19
FP
TP
k
 xp
xp
2
)1(
ROC curve
Example: Naive Bayesian classifier
141
63 81 23 54 10 17 29 5 34 21
86 98 18 42 21 23 37 18 74 27
Strategy 1:
Strategy 2:
ROC curve
The task: compare accuracy of two strategies
142
ROC curve
def display_roc_curve_from_file(plt,fig,path_scores,caption=''):
data = load_mat(path_scores, dtype = numpy.chararray, delim='t')
labels = (data [:, 0]).astype('float32')
scores = data[:, 1:].astype('float32')
fpr, tpr, thresholds = metrics.roc_curve(labels, scores)
roc_auc = auc(fpr, tpr)
plot_tp_fp(plt,fig,tpr,fpr,roc_auc,caption)
return
def plot_tp_fp(plt,fig,tpr,fpr,roc_auc,caption=''):
lw = 2
plt.plot(fpr, tpr, color='darkgreen', lw=lw, label='AUC = %0.2f' % roc_auc)
plt.plot([0, 1.05], [0, 1.05], color='lightgray', lw=lw, linestyle='--’)
plt.grid(which='major', color='lightgray', linestyle='--')
fig.canvas.set_window_title(caption + ('AUC = %0.4f' % roc_auc))
return
Precision-Recall
144
Precision - Recall
Precision: TP / (TP+FP)
Recall: TP / (TP+FN)
Accuracy: (TP+TN) /Total
F1-score: 2 x Precision x Recall / (Precision+ Recall)
145
TP: 0
FN: 3
FP: 0
Recall
Precision
Precision - Recall
Precision: 0/0
Recall: 0/3
146
TP: 1
FN: 2
FP: 0
Precision - Recall
Precision: 1
Recall: 1/3
Recall
Precision
147
TP: 2
FN: 1
FP: 1
Precision - Recall
Precision: 2/3
Recall: 2/3
Recall
Precision
148
TP: 2
FN: 1
FP: 2
Precision - Recall
Precision: 1/2
Recall: 2/3
Recall
Precision
149
TP: 2
FN: 1
FP: 4
Precision - Recall
Precision: 2/6
Recall: 2/3
Recall
Precision
150
TP: 2
FN: 1
FP: 5
Precision - Recall
Recall
Precision
Precision: 2/7
Recall: 2/3
151
TP: 3
FN: 0
FP: 7
Precision - Recall
Recall
Precision
Precision: 3/7
Recall: 1
152
Recall
Precision
Precision - Recall
TP: 3
FN: 0
FP: 7
Recall: 1
153
TP: 0
FN: 3
FP: 0
Precision - Recall
Recall
Precision
Precision: 0
Recall: 0
154
TP: 1
FN: 2
FP: 0
Precision - Recall
Recall
Precision
Precision: 1
Recall: 1/3
155
TP: 2
FN: 1
FP: 0
Precision - Recall
Recall
Precision
Precision: 1
Recall: 2/3
156
TP: 2
FN: 1
FP: 1
Precision - Recall
Recall
Precision
Precision: 2/3
Recall: 2/3
157
TP: 2
FN: 1
FP: 2
Precision - Recall
Recall
Precision
Precision: 2/4
Recall: 2/3
158
TP: 3
FN: 0
FP: 2
Precision - Recall
Recall
Precision
Precision: 2/5
Recall: 1
159
TP: 3
FN: 0
FP: 4
Precision - Recall
Recall
Precision
Precision: 3/7
Recall: 1
160
TP: 3
FN: 0
FP: 4
Precision - Recall
Recall
Precision
Precision: 3/7
Recall: 1
Benchmarking the
classifiers
162
Benchmarking the Classifiers
o Regression
o Naive Bayes
o Gaussian
o SVM
o Nearest Neighbors
o Decision Tree
o Random Forest
o AdaBoost
o xgboost
o Neural Net
163
Benchmarking the Classifiers
Gaussian distribution
164
Benchmarking the Classifiers
165
Benchmarking the Classifiers
166
Benchmarking the Classifiers
167
Benchmarking the Classifiers
168
Benchmarking the Classifiers
169
Benchmarking the Classifiers
170
Benchmarking the Classifiers
171
Benchmarking the Classifiers
172
Benchmarking the Classifiers
173
Benchmarking the Classifiers
Non-gaussian distribution
174
Benchmarking the Classifiers
175
Benchmarking the Classifiers
176
Benchmarking the Classifiers
def E2E_on_features_files_2_classes(Classifier,folder_out, filename_data_pos,filename_data_neg,filename_scrs_pos=None,filename_scrs_neg=None,fig=None):
filename_data_grid = folder_out+'data_grid.txt'
filename_scores_grid = folder_out+'scores_grid.txt'
Pos = (IO.load_mat(filename_data_pos, numpy.chararray, 't')).shape[0]
Neg = (IO.load_mat(filename_data_neg, numpy.chararray, 't')).shape[0]
numpy.random.seed(125)
idx_pos_train = numpy.random.choice(Pos, int(Pos/2),replace=False)
idx_neg_train = numpy.random.choice(Neg, int(Neg/2),replace=False)
idx_pos_test = [x for x in range(0, Pos) if x not in idx_pos_train]
idx_neg_test = [x for x in range(0, Neg) if x not in idx_neg_train]
generate_data_grid(filename_data_pos, filename_data_neg, filename_data_grid)
model = learn_on_pos_neg_files(Classifier,filename_data_pos,filename_data_neg, 't', idx_pos_train,idx_neg_train)
score_feature_file(Classifier, filename_data_pos, filename_scrs =filename_scrs_pos,delimeter='t', append=0, rand_sel=idx_pos_test)
score_feature_file(Classifier, filename_data_neg, filename_scrs =filename_scrs_neg,delimeter='t', append=0, rand_sel=idx_neg_test)
score_feature_file(Classifier, filename_data_grid, filename_scrs =filename_scores_grid)
model = learn_on_pos_neg_files(Classifier,filename_data_pos,filename_data_neg, 't', idx_pos_test,idx_neg_test)
score_feature_file(Classifier,filename_data_pos, filename_scrs = filename_scrs_pos,delimeter='t',append= 1,rand_sel=idx_pos_train)
score_feature_file(Classifier,filename_data_neg, filename_scrs = filename_scrs_neg,delimeter='t',append= 1,rand_sel=idx_neg_train)
tpr, fpr, auc = IO.get_roc_data_from_scores_file_v2(filename_scrs_pos,filename_scrs_neg)
if(fig!=None):
th = get_th_v2(filename_scrs_pos, filename_scrs_neg)
IO.display_roc_curve_from_descriptions(plt.subplot(1,3,3),fig,filename_scrs_pos,filename_scrs_neg,delim='t')
IO.display_distributions(plt.subplot(1,3,2),fig, filename_scrs_pos,filename_scrs_neg,delim='t')
IO.plot_2D_scores(plt.subplot(1,3,1),fig,filename_data_pos,filename_data_neg,filename_data_grid, filename_scores_grid, th, noice_needed=0)
plt.tight_layout()
plt.show()
return tpr, fpr, auc
Unsupervised
learning
178
Learning
   kxfkxp ,






n
n
kk
xx
,,
,,
1
1


 xkp
k
 kp





 nxx ,,1 
 xkp
 kp k
Learning
   kxfkxp ,
Supervisor
Unsupervised learning
EM algorithm
179
x20 1 43 65
Unsupervised learning
EM algorithm
180
x20 1 43 65
Unsupervised learning
EM algorithm
181
x20 1 43 65
  5.43122 2222
4
1
   5.43003 2222
4
1

2)5300(4
1
 3)6330(4
1

Unsupervised learning
EM algorithm
182
x20 1 43 65
  5.43122 2222
4
1
   5.43003 2222
4
1

2)5300(4
1
 3)6330(4
1

Unsupervised learning
EM algorithm
183
x20 1 43 65
0)000(3
1
 4)65333(2
1

  0000 222
3
1
   6.121111 22222
5
1

Unsupervised learning
EM algorithm
184
Unsupervised learning
K-means
https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
http://stanford.edu/class/ee103/visualizations/kmeans/kmeans.html
185
Unsupervised learning
K-means
186
Unsupervised learning
K-means
187
Unsupervised learning
K-means
188
Unsupervised learning
K-means
189
Unsupervised learning
K-means
190
Unsupervised learning
K-means
191
Unsupervised learning
K-means
192
Unsupervised learning
K-means
193
Unsupervised learning
K-means
Non-Bayesian
approach
Neyman
195
Xx
k
feature
hidden state
 kp priory probability of the state
RW : penalty function
 kxp conditional probability to expose x by state k
Neyman-Pearson approach
Limitations of the Bayesian approach
196
Neyman-Pearson approach
Limitations of the Bayesian approach
Xx
k
feature
hidden state
 kp priory probability of the state
RW :
 kxp conditional probability to expose x by state k
penalty function
197
Neyman-Pearson approach
Limitations of the Bayesian approach: example
Xx
k
heartbeat
state: normal, sick
 kxp distribution of the heartbeat for
normal and sick
 kp ???
W(normal, sick)
W(sick, normal)
???
198
input
output
Neyman-Pearson approach
The problem
𝑥 ∈ 𝑋 𝑘 ∈ 𝐾 = 𝑛𝑜𝑟𝑛𝑎𝑙, 𝑟𝑖𝑠𝑘𝑦
𝑝(𝑥ȁ 𝑘 = 𝑛𝑜𝑟𝑚)
𝑝(𝑥ȁ 𝑘 = 𝑟𝑖𝑠𝑘)
𝑋 𝑛𝑜𝑟𝑚 ∪ 𝑋𝑟𝑖𝑠𝑘= 𝑋
𝑋 𝑛𝑜𝑟𝑚 ∩ 𝑋𝑟𝑖𝑠𝑘= 0
𝑒𝑟𝑟𝑜𝑟 𝑡𝑦𝑝𝑒 1 = 𝑃 𝑓𝑎𝑙𝑠𝑒 𝑎𝑙𝑎𝑟𝑚 = ෍
𝑥∈𝑋 𝑟𝑖𝑠𝑘
𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚) → 𝑚𝑖𝑛
𝑒𝑟𝑟𝑜𝑟 𝑡𝑦𝑝𝑒 2 = 𝑃 𝑚𝑖𝑠𝑠 𝑡ℎ𝑒 𝑟𝑖𝑠𝑘 = ෍
𝑥∈𝑋 𝑛𝑜𝑟𝑚
𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘) ≤ 𝜀
𝑋 𝑛𝑜𝑟𝑚, 𝑋𝑟𝑖𝑠𝑘 =?
199
Neyman-Pearson approach
The problem
𝑥 ∈ 𝑋 𝑘 ∈ 𝐾 = 𝑛𝑜𝑟𝑛𝑎𝑙, 𝑟𝑖𝑠𝑘𝑦
𝛼 𝑛𝑜𝑟𝑚 𝑥 = ቊ
1 𝑤ℎ𝑒𝑛 𝑥 ∈ 𝑋 𝑛𝑜𝑟𝑚
0 𝑤ℎ𝑒𝑛 𝑥 ∉ 𝑋 𝑛𝑜𝑟𝑚
𝛼 𝑟𝑖𝑠𝑘 𝑥 = ቊ
1 𝑤ℎ𝑒𝑛 𝑥 ∈ 𝑋𝑟𝑖𝑠𝑘
0 𝑤ℎ𝑒𝑛 𝑥 ∉ 𝑋𝑟𝑖𝑠𝑘
200
Neyman-Pearson approach
The problem
෍
𝑥∈𝑋 𝑟𝑖𝑠𝑘
𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚) → 𝑚𝑖𝑛
෍
𝑥∈𝑋 𝑛𝑜𝑟𝑚
𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘) ≤ 𝜀
෍
𝑥∈𝑋
𝛼 𝑟𝑖𝑠𝑘(𝑥) ∙ 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚) → 𝑚𝑖𝑛
− ෍
𝑥∈𝑋
𝛼 𝑛𝑜𝑟𝑚 𝑥 ∙ 𝑝 𝑥ȁ 𝑛𝑜𝑟𝑚 ≥ −𝜀
𝛼 𝑛𝑜𝑟𝑚 𝑥 + 𝛼 𝑟𝑖𝑠𝑘 𝑥 = 1 ∀ 𝑥 ∈ 𝑋
𝛼 𝑛𝑜𝑟𝑚 𝑥 ≥ 0 ∀ 𝑥 ∈ 𝑋
𝛼 𝑟𝑖𝑠𝑘 𝑥 ≥ 0 ∀ 𝑥 ∈ 𝑋
201
maxxpxp nn11  
1nn1111 bxaxa  
knkn11k bxaxa  
1knn,1k11,1k bxaxa   
mnmn11m bxaxa  
0x1 
0xs 
.перем.свx 1s 
.перем.свxn 
minybyb mm11  






























0y1 
0yk 
.перем.свy 1k 
.перем.свym 
1m1m111 pyaya  
smms1s1 pyaya  
1sm1s,m11s,1 pyaya   
nmmn1n1 pyaya  
Neyman-Pearson approach
202
maxxpxp nn11   minybyb mm11  
1nn1111 bxaxa  
knkn11k bxaxa  
0y1 
0yk 
1knn,1k11,1k bxaxa   
mnmn11m bxaxa  
.перем.свy 1k 
.перем.свym 
0x1 
0xs 
1m1m111 pyaya  
smms1s1 pyaya  
.перем.свx 1s 
.перем.свxn 
1sm1s,m11s,1 pyaya   
nmmn1n1 pyaya  
  0111111  ybxaxa nn






























  0111111  xpyaya mm
Neyman-Pearson approach
203
Neyman-Pearson approach
Solution
෍
𝑥∈𝑋
𝛼 𝑟𝑖𝑠𝑘(𝑥) ∙ 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚) → 𝑚𝑖𝑛
− ෍
𝑥∈𝑋
𝛼 𝑛𝑜𝑟𝑚 𝑥 ∙ 𝑝 𝑥ȁ 𝑛𝑜𝑟𝑚 ≥ −𝜀
𝛼 𝑛𝑜𝑟𝑚 𝑥 + 𝛼 𝑟𝑖𝑠𝑘 𝑥 = 1 ∀ 𝑥 ∈ 𝑋
𝛼 𝑛𝑜𝑟𝑚 𝑥 ≥ 0 ∀ 𝑥 ∈ 𝑋
𝛼 𝑟𝑖𝑠𝑘 𝑥 ≥ 0 ∀ 𝑥 ∈ 𝑋
−𝜀 ∙ 𝜏 + 1 ∙ 𝑡 𝑥1 + ⋯ + 1 ∙ 𝑡 𝑥𝑖 → 𝑚𝑎𝑥
𝜏 ≥ 0
𝑡 𝑥 𝑓𝑟𝑒𝑒 𝑣𝑎𝑟
−𝑝 𝑥ȁ 𝑟𝑖𝑠𝑘 ∙ 𝜏 + 1 ∙ 𝑡(𝑥) ≤ 0
1 ∙ 𝑡 𝑥 ≤ 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚)
204
Neyman-Pearson approach
Solution
𝑥 ∈ 𝑋 𝑛𝑜𝑟𝑚 ⇒ 𝛼 𝑛𝑜𝑟𝑚 𝑥 = 1, 𝛼 𝑟𝑖𝑠𝑘 𝑥 = 0
𝑥 ∈ 𝑋𝑟𝑖𝑠𝑘 ⇒ 𝛼 𝑛𝑜𝑟𝑚 𝑥 = 0, 𝛼 𝑟𝑖𝑠𝑘 𝑥 = 1
⇒
⇒
ቊ
−𝑝 𝑥ȁ 𝑟𝑖𝑠𝑘 ∙ 𝜏 + 1 ∙ 𝑡 𝑥 = 0
1 ∙ 𝑡 𝑥 < 𝑝 𝑥ȁ 𝑛𝑜𝑟𝑚
𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘) ∙ 𝜏 ≤ 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚)
⇒
⇒
ቊ
−𝑝 𝑥ȁ 𝑟𝑖𝑠𝑘 ∙ 𝜏 + 1 ∙ 𝑡 𝑥 ≤ 0
1 ∙ 𝑡 𝑥 = 𝑝 𝑥ȁ 𝑛𝑜𝑟𝑚
𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘) ∙ 𝜏 ≥ 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚)⇒
⇒
205
Neyman-Pearson approach
Solution
𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚)
𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘)
𝜏
norm
>
risk
<
206
Neyman-Pearson approach
Some Variations: two risky states
෍
𝑥∈𝑋 𝑛𝑜𝑟𝑚
𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘1) ≤ 𝜀
෍
𝑥∈𝑋 𝑛𝑜𝑟𝑚
𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘2) ≤ 𝜀
෍
𝑥∈𝑋 𝑟𝑖𝑠𝑘
𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚) → 𝑚𝑖𝑛
207
Neyman-Pearson approach
Some Variations: two normal states
෍
𝑥∈𝑋 𝑛𝑜𝑟𝑚
𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘) ≤ 𝜀
෍
𝑥∈𝑋 𝑟𝑖𝑠𝑘
𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚1) + ෍
𝑥∈𝑋 𝑟𝑖𝑠𝑘
𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚2) → 𝑚𝑖𝑛
208
Neyman-Pearson approach
Minimax problem
𝑚𝑎𝑥 ෍
𝑥∉𝑋1
𝑝(𝑥ȁ1) , ෍
𝑥∉𝑋2
𝑝(𝑥ȁ2) . . , ෍
𝑥∉𝑋 𝐾
𝑝(𝑥ȁ 𝐾) → 𝑚𝑖𝑛
Missing values
210
Missing values
o Delete missing rows
o Replace with mean/median
o Encode as another level of a categorical variable
o Create a new engineered feature
o Hot deck

More Related Content

Similar to Machine Learning

2012 mdsp pr02 1004
2012 mdsp pr02 10042012 mdsp pr02 1004
2012 mdsp pr02 1004nozomuhamada
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussiannozomuhamada
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
2012 mdsp pr07 bayes decision
2012 mdsp pr07 bayes decision2012 mdsp pr07 bayes decision
2012 mdsp pr07 bayes decisionnozomuhamada
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca ldanozomuhamada
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningVahid Mirjalili
 
Solving Optimization Problems using the Matlab Optimization.docx
Solving Optimization Problems using the Matlab Optimization.docxSolving Optimization Problems using the Matlab Optimization.docx
Solving Optimization Problems using the Matlab Optimization.docxwhitneyleman54422
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdfRahul926331
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methodsChristian Robert
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 
Shriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa NaikShriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa NaikShriram Nandakumar
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxSivam Chinna
 
Interval programming
Interval programming Interval programming
Interval programming Zahra Sadeghi
 
Statistics and Fitting: overview and thoughts
Statistics and Fitting: overview and thoughtsStatistics and Fitting: overview and thoughts
Statistics and Fitting: overview and thoughtsMichael Marino
 

Similar to Machine Learning (20)

2012 mdsp pr02 1004
2012 mdsp pr02 10042012 mdsp pr02 1004
2012 mdsp pr02 1004
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
2012 mdsp pr07 bayes decision
2012 mdsp pr07 bayes decision2012 mdsp pr07 bayes decision
2012 mdsp pr07 bayes decision
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
Cookbook en
Cookbook enCookbook en
Cookbook en
 
Solving Optimization Problems using the Matlab Optimization.docx
Solving Optimization Problems using the Matlab Optimization.docxSolving Optimization Problems using the Matlab Optimization.docx
Solving Optimization Problems using the Matlab Optimization.docx
 
ML ALL in one (1).pdf
ML ALL in one (1).pdfML ALL in one (1).pdf
ML ALL in one (1).pdf
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
Shriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa NaikShriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa Naik
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Sandia Fast Matmul
Sandia Fast MatmulSandia Fast Matmul
Sandia Fast Matmul
 
M.Sc thesis
M.Sc thesisM.Sc thesis
M.Sc thesis
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
Interval programming
Interval programming Interval programming
Interval programming
 
Statistics and Fitting: overview and thoughts
Statistics and Fitting: overview and thoughtsStatistics and Fitting: overview and thoughts
Statistics and Fitting: overview and thoughts
 

Recently uploaded

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Recently uploaded (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

Machine Learning

  • 1. Basics of Machine Learning Dmitry Ryabokon, github.com/dryabokon  ξ ξ
  • 2. Content 1.1 Approaches for ML 1.2 Bayesian approach 1.3 Classifying the independent binary features 2.1 Naive Bayesian classifier 2.2 Gaussian classifier 2.3 Linear discrimination 2.4 Logistic regression 2.5 Ensemble learning 2.6 Decision Tree 2.7 Random Forest 2.7 Adaboost 3.1 ROC-Curve 3.2 Precision, recall, Accuracy, F1 score 3.3 Benchmarking the Classifiers 4.1 Unsupervised learning 4.2 Non-Bayesian approach 4.3 Time Series 5.1 Bias vs Variance 5.2 Regularization 5.3 Multi-collinearity 5.4 Variable importance chart 5.5 Variance inflation factor VIF 5.6 Reducing data dimensions 5.7 Imbalanced data set 5.8 Stratified sampling 5.9 Correlation: continuous & categorical 5.10 Missing values: the approach 5.11 Back propagation
  • 4. 4 Nxxx ,...,, 21 Nkkk ,...,, 21 features and states    kxfkxp ,  Kk ,...,2,1 domain for k The model is parametrized by K ,...,, 21      N i iiK kxp K 1,...,, * 21 ,maxarg,...,, 21   input output Approaches for ML Maximum Likelihood
  • 5. 5   N i ii kxp K 1,...,, ,maxarg 21             N i iii kxpkp K 1,...,, 21 maxarg                   KK XxXxXx Kxpxpxp 2121 21maxarg ,...,,           N i ii kxp K 1,...,, 21 maxarg     kXx k xf   ,maxarg * Approaches for ML Maximum Likelihood
  • 6. 6 46 46 45 37 37 36 37       2 11 exp,1   xxfkxp       2 22 exp,2   xxfkxp      7 1 21 ,maxarg, i ii kxp       222 1 454645expmaxarg    Approaches for ML Maximum Likelihood: example
  • 7. 7 46 46 45 37 37 36 37       222 1 454645expmaxarg          222 1 454645minarg    1 3 454645 1 X x   Approaches for ML Maximum Likelihood: example
  • 8. 8     ,1minmaxarg 1 * 1   kxp Xx Approaches for ML Bias in training data Nxxx ,...,, 21 Nkkk ,...,, 21    kxfkxp ,  Kk ,...,2,1 The model is parametrized by K ,...,, 21 input output features and states domain for k
  • 9. 9     ,1minmaxarg 1 * 1   kxp Xx        1,2 1,1 2 1   Nkxp Nkxp   1 2 Approaches for ML Bias in training data: example
  • 10. 10     ,1minmaxarg 1 * 1   kxp Xx        1,2 1,1 2 1   Nkxp Nkxp   1 2 center of the circle that holds all the points from X1 Approaches for ML Bias in training data: example
  • 11. 11 Nxxx ,...,, 21 Nkkk ,...,, 21  Kk ,...,2,1  kkW , penalty function decision strategy is parametrized by  Kxq ,  Approaches for ML ERM: Empirical risk minimization features and states domain for k
  • 12. 12           Xx Kk kxqWkxpqRisk ,,,            N i ii N i iiii kxqWkxqWkxp 11 ,,,,,     qRiskminarg*    Kxq , Approaches for ML ERM: Empirical risk minimization decision strategy is parametrized by 
  • 13. 13           kk kk kkW 1 0 ,    ,qRisk Number of errors Approaches for ML ERM: Empirical risk minimization 𝑞( ҧ𝑥, ത𝛼, 𝜃) = ቊ −1 𝑤ℎ𝑒𝑛 ҧ𝑥, ത𝛼 < 𝜃 +1 𝑤ℎ𝑒𝑛 ҧ𝑥, ത𝛼 > 𝜃
  • 15. 15 Definitions Xx k feature state (hidden)  kxp , joint probability RW : penalty function          k Xx xqkWkxpqRisk ,, Xq: decision strategy Bayesian approach
  • 16. 16          k Xxxqxq xqkWkxpxqRiskxq ,,minarg)(minarg)( )()( * input output Bayesian approach The problem Xx k feature state (hidden)  kxp , joint probability RW : penalty function Xq: decision strategy
  • 17. 17 Bayes' theorem law of total probability            k kxpkp kxpkp xkp )(        k kxpkpxp joint probability         xkpxpkxpkpkxp , p(k) prior probability p(k|x) posterior probability p(x|k) conditional probability of x, assuming k p(x) probability of x Bayesian approach Some basics
  • 18. 18  12,11,10,9,8X time   %1081  xkp   %9082  xkp 5)1,1(  kkW 100)1,2(  kkW 5)2,1(  kkW 0)2,2(  kkW К = {1-revision, 2-free_ride} Bayesian approach Example
  • 19. 19 Bayesian approach Flexibility to not make a decision 𝑊(𝑘, 𝑘′) = ቐ 0 𝑤ℎ𝑒𝑛 𝑘 = 𝑘′ 1 𝑤ℎ𝑒𝑛 𝑘 ≠ 𝑘′ 𝜀 𝑤ℎ𝑒𝑛 𝑘 = 𝑟𝑒𝑗𝑒𝑐𝑡 𝑅𝑖𝑠𝑘 𝑟𝑒𝑗𝑒𝑐𝑡 = ෍ 𝑘∈𝐾 𝑝(𝑘ȁ 𝑥) ∙ 𝑊 𝑘, 𝑟𝑒𝑗𝑒𝑐𝑡 = = ෍ 𝑘∈𝐾 𝑝(𝑘ȁ 𝑥) ∙ 𝜀 = 𝜀
  • 20. 20 )(min kRisk k if then      kk kkWxkpk ,minarg* else 0  Bayesian approach Flexibility to not make a decision reject )(min kRisk k if then      kk kkWxkpk ,minarg* else reject
  • 21. 21 kkkkw ),(  2 ),( kkkkw        kk kk kkw 1 0 ),(       kk kk kkw 1 0 ),( median average the most probable sample center of the most probable section Δ Bayesian approach Decision strategies for different penalty functions
  • 23. 23  Nxxxx ,...,, 21 feature – a set of independent binary values        kxpkxpkxpkxp N 21  2,1k Few states are possible      2 1 2 1     kxp kxp Decision strategy Classifying the independent binary features The problem
  • 24. 24                   22 11 log 2 1 log 1 1 kxpkxp kxpkxp kxp kxp n n                 2 1 log 2 1 log 1 1 kxp kxp kxp kxp n n                     20 10 log 1021 2011 log 1 1 11 11 1 kxp kxp kxpkxp kxpkxp x            20 10 log 1021 2011 log       kxp kxp kxpkxp kxpkxp x n n nn nn N  Classifying the independent binary features The problem
  • 25. 25             log 2 1 20 10 log 1021 2011 log 1         N i i i ii ii i kxp kxp kxpkxp kxpkxp x     2 1 1 N i ii ax Classifying the independent binary features Decision rule
  • 27. 27 1x 2x 1 1 0  1xp  1xp  2xp  2xp 1/4 3/4 3/4 1/4 1/4 2/4 3/4 2/4 Classifying the independent binary features Example
  • 28. 28 1x 2x  1xp  1xp  2xp  2xp 1/4 3/4 3/4 1/4 1/4 2/4 3/4 2/4 (1/4)x(1/4) < (2/4)x(3/4) Classifying the independent binary features Example
  • 29. 29 1x 2x  1xp  1xp  2xp  2xp 1/4 3/4 3/4 1/4 1/4 2/4 3/4 2/4 (1/4)x(3/4) > (2/4)x(1/4) Classifying the independent binary features Example
  • 30. 30 1x 2x  1xp  1xp  2xp  2xp 1/4 3/4 3/4 1/4 1/4 2/4 3/4 2/4 (3/4)x(1/4) < (2/4)x(3/4) Classifying the independent binary features Example
  • 31. 31 1x 2x  1xp  1xp  2xp  2xp 1/4 3/4 3/4 1/4 1/4 2/4 3/4 2/4 (3/4)x(3/4) > (2/4)x(1/4) Classifying the independent binary features Example
  • 32. 32 1x 2x Classifying the independent binary features Example
  • 33. 33 1x 2x Classifying the independent binary features Example
  • 34. 34 1x 2x 1 1 0  1xp  1xp  2xp  2xp 5/9 4/8 4/8 4/8 5/9 4/8 4/9 4/8 Classifying the independent binary features Another example
  • 35. 35 1x 2x  1xp  1xp  2xp  2xp (4/9)x(4/9) < (4/8)x(4/8) Classifying the independent binary features Another example
  • 36. 36 1x 2x Classifying the independent binary features Another example
  • 37. 37 1x 2x Classifying the independent binary features Another example
  • 39. 39  Nxxxx ,...,, 21 feature        kxpkxpkxpkxp N 21  2,1k few states are possible      2 1 2 1     kxp kxp decision strategy Naive Bayesian classifier The problem
  • 40. 40 1x 2x 2 2 0 1 1  1xp  1xp  2xp  2xp 4/6 1/6 2/9 3/9 1/6 3/9 4/6 3/9 1/6 4/9 1/6 3/9 Naive Bayesian classifier Example
  • 41. 41 1x 2x 2 2 0 1 1  1xp  1xp  2xp  2xp 4/6 1/6 2/9 3/9 1/6 3/9 4/6 3/9 1/6 4/9 1/6 3/9 Naive Bayesian classifier Example
  • 43. 43  Nxxxx ,...,, 21 feature                n i n j k jj k ii k ji xxakxp 1 1 ][][][ ,5.0exp   kxOM i k i ..][   1][][   kk BA    ][][][ .. k jj k ii k ij xxOMb   Gaussian classifier Definitions
  • 44. 44                n i n j jjiiji xxaxp 1 1 ,5.0exp  2 3 4 5 2 3 4 5 Gaussian classifier Example
  • 45. 45   311  xMO   5,322  xMO     1111111   xxMOb 8/)41000111(      4/3222222   xxMOb     8/122112112   xxMObb AB                 18/1 8/14/3 47 64 4/38/1 8/11 1                2 221 2 1 )5,3( 1 1 )5,3)(3( 4 1 )3( 4 3 47 64 exp xxxxxp 2 3 4 5 2 3 4 5 Gaussian classifier
  • 46. 46                     n i n j jjiiji n i n j jjiiji xxa xxa kxp kxp 1 1 ]2[]2[]2[ , 1 1 ]1[]1[]1[ , 2 1 log    2 1    i i ii j jiij xxx Gaussian classifier Decision strategy
  • 48. 48  sxxxX  ,,, 21   rxxxX ,,, 21  n R                     n i ii n i ii xxXx xxXx 1 1 , , input output Linear discrimination Perceptron Rosenblatt
  • 51. 51         ;0, ;0, 1 1 xaxXxif xaxXxif ttt ttt       Linear discrimination Perceptron: tuning 𝑡 = 0 𝛼 𝑡 = 0 while (sets are not separated by hyperplane)
  • 60. 60 Xx Xx Dx   max 1*           0, 0, * *   xXx xXx  2 D t  Linear discrimination Perceptron: convergence
  • 61. 61 CByAx   a line in R2 x y Linear discrimination Transformation of the feature space
  • 62. 62 FEyDxCyBxyAx  22  x y hyperplane in R 5 elliptic curve in R 2 Linear discrimination Transformation of the feature space
  • 63. 63   min,                     n i ii n i ii xxXx xxXx 1 1 , , Separate by the widest band Linear discrimination SVM: support vector machine
  • 64. 64                      xxxXx xxxXx n i ii n i ii   1 1 , ,     min, ,  XX xconst  penalty  ξ ξ  Linear discrimination SVM: support vector machine Separate by the widest band
  • 66. 66       n i n i iiii xayyxmrginerrors 1 1 min0,]0),([# Logistic regression Target function: empirical risk = number of errors  margin<0  margin<0
  • 67. 67 Logistic regression Target function: upper bound 0 1 margin [margin<0] log(1+exp(-margin))       min),exp(1log0, 11    n i ii n i ii xayxay
  • 68. 68 Logistic regression Relationship with P(y|x) max ),exp(1 1 log 1           n i ii xay     min),exp(1log0, 11    n i ii n i ii xayxay    max,log 1  n i ii xaysigmoid    maxlog 1  n i ii xyp    iiii xaysigmoidxyp , model
  • 69. 69 Logistic regression Logistic function: the sigmoid  xsigmoid
  • 71. 71 Bias vs Variance Bias is an error from wrong assumptions in the learning algorithm High bias can cause underfit: the algorithm can miss relations between features and target Variance is an error from sensitivity to small fluctuations in the training set. High variance can cause overfit: the algorithm can model the random noise in the training data, rather than the intended outputs
  • 72. 72 Bias vs Variance How to avoid overfit (high variance) o Keep the model simpler o Do cross-validation (e.g. k-folds) o Use regularization techniques (e.g. LASSO, Ridge) o Use top n features from variable importance chart
  • 78. 78 Ensemble Learning Bootstrapping: random sampling with replacement bootstrap set 1 bootstrap set 2 bootstrap set 3 Pick up, clone, put back
  • 79. 79 Ensemble Learning Bagging: Bootstrap Aggregation Classifier 1 Classifier 2 Classifier 3 Voting / Aggregation decision
  • 80. 80 Ensemble Learning Boosting: weighted average of multiple weak classifiers
  • 81. 81 Ensemble Learning Boosting: weighted average of multiple weak classifiers
  • 82. 82 Ensemble Learning Boosting: weighted average of multiple weak classifiers
  • 83. 83 Ensemble Learning Boosting: weighted average of multiple weak classifiers
  • 84. 84 Ensemble Learning Bagging Boosting - Aims to decrease variance - Aims to decrease bias - Aims to solve over-fitting problem
  • 86. 86 Decision tree F1 F2 F3 Target A A A 0 A A B 0 B B A 1 A B B 1 H(0,0,1,1) = ½ log(½) + ½ log(½) = 1
  • 87. 87 Decision tree F1 F2 F3 Target Attempt split on F1 A A A 0 0 A A B 0 0 B B A 1 1 A B B 1 0 Information Gain = H(0011) - 3/4 H(001) - 1/4 H(1) = 0.91
  • 88. 88 Decision tree F1 F2 F3 Target Attempt split on F2 A A A 0 0 A A B 0 0 B B A 1 1 A B B 1 1 Information Gain = H(0011) - 1/2 H(00) - 1/2 H(11) = 1
  • 89. 89 Decision tree F1 F2 F3 Target Attempt split on F3 A A A 0 0 A A B 0 0 B B A 1 1 A B B 1 1 Information Gain = H(0011) - 1/2 H(01) - 1/2 H(01) = 0
  • 90. 90 Decision tree F1 F2 F3 Target A A A 0 A A B 0 B B A 1 A B B 1
  • 92. 92 Decision tree: pruning 50%58% = (7x7 + 3x3)/100 58% = (3x3 + 7x7)/100 50% x 58% + 50% x 58% = 58% 58% > 50% OK
  • 93. 93 Decision tree: pruning 55.5% 62.5% 50% x 58% + 50% (60% x 55.5% + 40% x 62.5%) = 50% x 58% + 50% x 58.3 = 58.16% 58.16% > 58% this is small improvement 58%58%
  • 95. 95 Decision tree: pruning 58% 50% x 58% + 50% (80% x 62.5% + 20% x 50%) = 50% x 58% + 50% x 60% = 59% 59% > 58% OK 62.5% 50% 58%
  • 97. 97 Random forest Original dataset F1 F2 F3 Target A A A 0 A A B 0 B B A 1 A B B 1 Bootstrapped dataset F1 F2 F3 Target A B B 1 A A B 0 A A A 0 A B B 1
  • 98. 98 Random forest Bootstrapped dataset F1 F2 F3 Target A B B 1 A A B 0 A A A 0 A B B 1 Determine the best root node out of randomly selected sub-set of candidates candidate candidate
  • 99. 99 Random forest Bootstrapped dataset F1 F2 F3 Target A B B 1 A A B 0 A A A 0 A B B 1 root
  • 100. 100 Random forest Bootstrapped dataset F1 F2 F3 Target A B B 1 A A B 0 A A A 0 A B B 1 candidate Build tree as usual, but considering a random subset of features at each step candidate
  • 101. 101 Random forest Bagging: do aggregate decisions against the bootstrapped data
  • 102. 102 Random forest Validation: run aggregated decisions against out-of-bag dataset Original dataset F1 F2 F3 Target A A A 0 A A B 0 B B A 1 A B B 1 Bootstrapped dataset F1 F2 F3 Target A B B 1 A A B 0 A A A 0 A B B 1 out-of-bag
  • 103. 103 Random forest The process Build random forest Estimate Train / Validation error Adjust RF parameters # of variables per step, regularization params, etc
  • 105. 105 Ni xxxx ,...,,, 21 ...3,2,1t  it xW Ni kkkk ,...,,, 21  xht t      t tt xhxk  Training features Iteration number Weight of the element on iteration t Weak classifier on iteration t Weight if a weak classifier on iteration t    1:0? xxh = {-1 ,1} Training targets Adaboost Definitions
  • 106. 106       ii i t h t xhkiWxh  minarg      )(exp1 itittt xhkiWiW   Update weights Weak classifier to minimize the weighted error                  kxhif kxhif r r r r iW t t t t t t t )( )( 1 1 1 1        i tii i tt iWkxhiWr         t t t r r 1 1 log 2 1  Calculate the weight od classifier Adaboost Algorithm
  • 107. 107 1x 2x    :?5.0211  xxxht Adaboost Example: step 1
  • 108. 108 1x 2x 3 3 3 3 4 4    :?5.0211  xxxht Adaboost Example: step 1
  • 109. 109    :?5.0211  xxxht 1x 2x 34.033.0 11   ttr  Adaboost Example: step 1
  • 110. 110    :?5.0211  xxxht 34.033.0 11   ttr       )(exp 1112 iit xhkiWiW   1x 2x Adaboost Example: step 1
  • 111. 111    :?5.0211  xxxht 34.033.0 11   ttr       )(exp 1112 iit xhkiWiW   1.5 0.75 1.50.75 0.75 1.5 0.75 0.75 0.75 Adaboost Example: step 1
  • 112. 112    :?5.0212  xxxht 3 1.5 0.75 1.50.75 0.75 1.5 0.75 0.75 0.75 3.75 3.75 3.75 4.5 3.75 1x20 1 Adaboost Example: step 2
  • 113. 113    :?5.0212  xxxht 34.033.0 22   ttr  Adaboost Example: step 2
  • 114. 114    :?5.0212  xxxht 34.033.0 22   ttr       )(exp 1112 iit xhkiWiW   1 0.5 11 1 2 0.5 0.5 0.5 Adaboost Example: step 2
  • 115. 115    :?5.0213  xxxht 1 0.5 11 1 2 0.5 0.5 0.5 3 4 3.5 2.5 3.5 3.5 Adaboost Example: step 3
  • 116. 116    :?5.0213  xxxht 39.038.0 33   ttr  Adaboost Example: step 3
  • 117. 117      t tt xhxk  (0,0) (1,0) (1,0) (1,0) (0,1) (0,1) (0,1) (0,1) (0,1) (1,1) 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 3th 1h 2th x 35.01  2h35.02  3h39.03       t tt xhxk  1 1 1 1 -1 -1 -1 -1 -1 1 1 -1 -1 -1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 Adaboost Example: result
  • 118. 118 x20 1 43 65 Adaboost Another example
  • 119. 119 x 4 3 3 4 33 0 1 2 3 4 5 6 Adaboost
  • 120. 120 x 0 1 2 3 4 5 6    1:1?5.21  xxht       8 211 1 1 1   ii i iWt kxhiWr i      )(exp 1112 iit xhkiWiW                    kxhif kxhif r r r r iW t t t t )( )( 1 1 1 1 1 1 1          t t t r r 1 1 log 2 1  Adaboost
  • 121. 121 x 0 1 2 3 4 5 6       8 211 1 1 1   ii i iWt kxhiWr i      )(exp 1112 iit xhkiWiW               kxhif kxhif iW )( )( 3 5 5 3 1 1 1          t t t r r 1 1 log 2 1  Adaboost
  • 122. 122 x 0 1 2 3 4 5 6 3.2 4 3.4 3.7 2.93.5 1.3 0.8 1.30.8 0.80.81.31.3 Adaboost
  • 123. 123 x 0 1 2 3 4 5 6    12  xht       9.222 1 2 2   ii i iWt kxhiWr i   iWt 3                  kxhif kxhif r r r r iW t t t t )( )( 1 1 1 1 2 2 2          t t t r r 1 1 log 2 1  Adaboost
  • 124. 124 x 0 1 2 3 4 5 6       9.222 1 2 2   ii i iWt kxhiWr i   iWt 3          t t t r r 1 1 log 2 1                   kxhif kxhif r r r r iW t t t t )( )( 1 1 1 1 2 2 2 Adaboost
  • 125. 125 x 0 1 2 3 4 5 6 1.7 0.6 11 0.60.611 2.8 3.7 2.6 3.2 3.73.7 Adaboost
  • 126. 126 x 0 1 2 3 4 5 6    1:1?5.33  xxht 1 1 1 -1 -1 -125.01  27.02  33.03  -1 -1 -1 -1 -1 -1 1 1 1 1 -1 -1 1 1 1 -1 -1 -1 Adaboost
  • 130. 130 Actual true pos TP: 2/3 FN: 1/3false neg FP: 2/7 false pos ROC curve Error types
  • 131. 131 TP: 0/3 FN: 3/3 FP: 0/7 FP TP ROC curve
  • 132. 132 TP: 1/3 FN: 2/3 FP: 0/7 FP TP ROC curve
  • 133. 133 TP: 2/3 FN: 1/3 FP: 1/7 FP TP ROC curve
  • 134. 134 TP: 2/3 FN: 1/3 FP: 2/7 FP TP ROC curve
  • 135. 135 TP: 2/3 FN: 1/3 FP: 4/7 FP TP ROC curve
  • 136. 136 TP: 3/3 FN: 0/3 FP: 7/7 FP TP ROC curve
  • 137. 137 FP TP ROC curve AUC: area under ROC curve target
  • 138. 138 1x 2x 4/6 1/6 2/9 3/9 1/6 3/9 4/6 3/9 1/6 4/9 1/6 3/9 ROC curve Example: Naive Bayesian classifier
  • 139. 139 1x 2x 4/6 1/6 2/9 3/9 1/6 3/9 4/6 3/9 1/6 4/9 1/6 3/9 x k  xp xp 2 )1( (0,0) (1,0) (2,0) (2,0) (0,1) (0,1) (1,1) (1,1) (1,1) (1,1) (2,1), (0,2) (0,2) (1,2) (2,2) 0.19 1.50 0.25 0.25 0.75 0.75 6.00 6.00 6.00 6.00 1.00 0.19 0.19 1.50 0.25 ROC curve Example: Naive Bayesian classifier
  • 140. 1406.00 6.00 6.00 6.00 1.50 1.50 1.00 0.75 0.75 0.25 0.25 0.25 0.19 0.19 0.19 FP TP k  xp xp 2 )1( ROC curve Example: Naive Bayesian classifier
  • 141. 141 63 81 23 54 10 17 29 5 34 21 86 98 18 42 21 23 37 18 74 27 Strategy 1: Strategy 2: ROC curve The task: compare accuracy of two strategies
  • 142. 142 ROC curve def display_roc_curve_from_file(plt,fig,path_scores,caption=''): data = load_mat(path_scores, dtype = numpy.chararray, delim='t') labels = (data [:, 0]).astype('float32') scores = data[:, 1:].astype('float32') fpr, tpr, thresholds = metrics.roc_curve(labels, scores) roc_auc = auc(fpr, tpr) plot_tp_fp(plt,fig,tpr,fpr,roc_auc,caption) return def plot_tp_fp(plt,fig,tpr,fpr,roc_auc,caption=''): lw = 2 plt.plot(fpr, tpr, color='darkgreen', lw=lw, label='AUC = %0.2f' % roc_auc) plt.plot([0, 1.05], [0, 1.05], color='lightgray', lw=lw, linestyle='--’) plt.grid(which='major', color='lightgray', linestyle='--') fig.canvas.set_window_title(caption + ('AUC = %0.4f' % roc_auc)) return
  • 144. 144 Precision - Recall Precision: TP / (TP+FP) Recall: TP / (TP+FN) Accuracy: (TP+TN) /Total F1-score: 2 x Precision x Recall / (Precision+ Recall)
  • 145. 145 TP: 0 FN: 3 FP: 0 Recall Precision Precision - Recall Precision: 0/0 Recall: 0/3
  • 146. 146 TP: 1 FN: 2 FP: 0 Precision - Recall Precision: 1 Recall: 1/3 Recall Precision
  • 147. 147 TP: 2 FN: 1 FP: 1 Precision - Recall Precision: 2/3 Recall: 2/3 Recall Precision
  • 148. 148 TP: 2 FN: 1 FP: 2 Precision - Recall Precision: 1/2 Recall: 2/3 Recall Precision
  • 149. 149 TP: 2 FN: 1 FP: 4 Precision - Recall Precision: 2/6 Recall: 2/3 Recall Precision
  • 150. 150 TP: 2 FN: 1 FP: 5 Precision - Recall Recall Precision Precision: 2/7 Recall: 2/3
  • 151. 151 TP: 3 FN: 0 FP: 7 Precision - Recall Recall Precision Precision: 3/7 Recall: 1
  • 153. 153 TP: 0 FN: 3 FP: 0 Precision - Recall Recall Precision Precision: 0 Recall: 0
  • 154. 154 TP: 1 FN: 2 FP: 0 Precision - Recall Recall Precision Precision: 1 Recall: 1/3
  • 155. 155 TP: 2 FN: 1 FP: 0 Precision - Recall Recall Precision Precision: 1 Recall: 2/3
  • 156. 156 TP: 2 FN: 1 FP: 1 Precision - Recall Recall Precision Precision: 2/3 Recall: 2/3
  • 157. 157 TP: 2 FN: 1 FP: 2 Precision - Recall Recall Precision Precision: 2/4 Recall: 2/3
  • 158. 158 TP: 3 FN: 0 FP: 2 Precision - Recall Recall Precision Precision: 2/5 Recall: 1
  • 159. 159 TP: 3 FN: 0 FP: 4 Precision - Recall Recall Precision Precision: 3/7 Recall: 1
  • 160. 160 TP: 3 FN: 0 FP: 4 Precision - Recall Recall Precision Precision: 3/7 Recall: 1
  • 162. 162 Benchmarking the Classifiers o Regression o Naive Bayes o Gaussian o SVM o Nearest Neighbors o Decision Tree o Random Forest o AdaBoost o xgboost o Neural Net
  • 176. 176 Benchmarking the Classifiers def E2E_on_features_files_2_classes(Classifier,folder_out, filename_data_pos,filename_data_neg,filename_scrs_pos=None,filename_scrs_neg=None,fig=None): filename_data_grid = folder_out+'data_grid.txt' filename_scores_grid = folder_out+'scores_grid.txt' Pos = (IO.load_mat(filename_data_pos, numpy.chararray, 't')).shape[0] Neg = (IO.load_mat(filename_data_neg, numpy.chararray, 't')).shape[0] numpy.random.seed(125) idx_pos_train = numpy.random.choice(Pos, int(Pos/2),replace=False) idx_neg_train = numpy.random.choice(Neg, int(Neg/2),replace=False) idx_pos_test = [x for x in range(0, Pos) if x not in idx_pos_train] idx_neg_test = [x for x in range(0, Neg) if x not in idx_neg_train] generate_data_grid(filename_data_pos, filename_data_neg, filename_data_grid) model = learn_on_pos_neg_files(Classifier,filename_data_pos,filename_data_neg, 't', idx_pos_train,idx_neg_train) score_feature_file(Classifier, filename_data_pos, filename_scrs =filename_scrs_pos,delimeter='t', append=0, rand_sel=idx_pos_test) score_feature_file(Classifier, filename_data_neg, filename_scrs =filename_scrs_neg,delimeter='t', append=0, rand_sel=idx_neg_test) score_feature_file(Classifier, filename_data_grid, filename_scrs =filename_scores_grid) model = learn_on_pos_neg_files(Classifier,filename_data_pos,filename_data_neg, 't', idx_pos_test,idx_neg_test) score_feature_file(Classifier,filename_data_pos, filename_scrs = filename_scrs_pos,delimeter='t',append= 1,rand_sel=idx_pos_train) score_feature_file(Classifier,filename_data_neg, filename_scrs = filename_scrs_neg,delimeter='t',append= 1,rand_sel=idx_neg_train) tpr, fpr, auc = IO.get_roc_data_from_scores_file_v2(filename_scrs_pos,filename_scrs_neg) if(fig!=None): th = get_th_v2(filename_scrs_pos, filename_scrs_neg) IO.display_roc_curve_from_descriptions(plt.subplot(1,3,3),fig,filename_scrs_pos,filename_scrs_neg,delim='t') IO.display_distributions(plt.subplot(1,3,2),fig, filename_scrs_pos,filename_scrs_neg,delim='t') IO.plot_2D_scores(plt.subplot(1,3,1),fig,filename_data_pos,filename_data_neg,filename_data_grid, filename_scores_grid, th, noice_needed=0) plt.tight_layout() plt.show() return tpr, fpr, auc
  • 178. 178 Learning    kxfkxp ,       n n kk xx ,, ,, 1 1    xkp k  kp       nxx ,,1   xkp  kp k Learning    kxfkxp , Supervisor Unsupervised learning EM algorithm
  • 179. 179 x20 1 43 65 Unsupervised learning EM algorithm
  • 180. 180 x20 1 43 65 Unsupervised learning EM algorithm
  • 181. 181 x20 1 43 65   5.43122 2222 4 1    5.43003 2222 4 1  2)5300(4 1  3)6330(4 1  Unsupervised learning EM algorithm
  • 182. 182 x20 1 43 65   5.43122 2222 4 1    5.43003 2222 4 1  2)5300(4 1  3)6330(4 1  Unsupervised learning EM algorithm
  • 183. 183 x20 1 43 65 0)000(3 1  4)65333(2 1    0000 222 3 1    6.121111 22222 5 1  Unsupervised learning EM algorithm
  • 195. 195 Xx k feature hidden state  kp priory probability of the state RW : penalty function  kxp conditional probability to expose x by state k Neyman-Pearson approach Limitations of the Bayesian approach
  • 196. 196 Neyman-Pearson approach Limitations of the Bayesian approach Xx k feature hidden state  kp priory probability of the state RW :  kxp conditional probability to expose x by state k penalty function
  • 197. 197 Neyman-Pearson approach Limitations of the Bayesian approach: example Xx k heartbeat state: normal, sick  kxp distribution of the heartbeat for normal and sick  kp ??? W(normal, sick) W(sick, normal) ???
  • 198. 198 input output Neyman-Pearson approach The problem 𝑥 ∈ 𝑋 𝑘 ∈ 𝐾 = 𝑛𝑜𝑟𝑛𝑎𝑙, 𝑟𝑖𝑠𝑘𝑦 𝑝(𝑥ȁ 𝑘 = 𝑛𝑜𝑟𝑚) 𝑝(𝑥ȁ 𝑘 = 𝑟𝑖𝑠𝑘) 𝑋 𝑛𝑜𝑟𝑚 ∪ 𝑋𝑟𝑖𝑠𝑘= 𝑋 𝑋 𝑛𝑜𝑟𝑚 ∩ 𝑋𝑟𝑖𝑠𝑘= 0 𝑒𝑟𝑟𝑜𝑟 𝑡𝑦𝑝𝑒 1 = 𝑃 𝑓𝑎𝑙𝑠𝑒 𝑎𝑙𝑎𝑟𝑚 = ෍ 𝑥∈𝑋 𝑟𝑖𝑠𝑘 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚) → 𝑚𝑖𝑛 𝑒𝑟𝑟𝑜𝑟 𝑡𝑦𝑝𝑒 2 = 𝑃 𝑚𝑖𝑠𝑠 𝑡ℎ𝑒 𝑟𝑖𝑠𝑘 = ෍ 𝑥∈𝑋 𝑛𝑜𝑟𝑚 𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘) ≤ 𝜀 𝑋 𝑛𝑜𝑟𝑚, 𝑋𝑟𝑖𝑠𝑘 =?
  • 199. 199 Neyman-Pearson approach The problem 𝑥 ∈ 𝑋 𝑘 ∈ 𝐾 = 𝑛𝑜𝑟𝑛𝑎𝑙, 𝑟𝑖𝑠𝑘𝑦 𝛼 𝑛𝑜𝑟𝑚 𝑥 = ቊ 1 𝑤ℎ𝑒𝑛 𝑥 ∈ 𝑋 𝑛𝑜𝑟𝑚 0 𝑤ℎ𝑒𝑛 𝑥 ∉ 𝑋 𝑛𝑜𝑟𝑚 𝛼 𝑟𝑖𝑠𝑘 𝑥 = ቊ 1 𝑤ℎ𝑒𝑛 𝑥 ∈ 𝑋𝑟𝑖𝑠𝑘 0 𝑤ℎ𝑒𝑛 𝑥 ∉ 𝑋𝑟𝑖𝑠𝑘
  • 200. 200 Neyman-Pearson approach The problem ෍ 𝑥∈𝑋 𝑟𝑖𝑠𝑘 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚) → 𝑚𝑖𝑛 ෍ 𝑥∈𝑋 𝑛𝑜𝑟𝑚 𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘) ≤ 𝜀 ෍ 𝑥∈𝑋 𝛼 𝑟𝑖𝑠𝑘(𝑥) ∙ 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚) → 𝑚𝑖𝑛 − ෍ 𝑥∈𝑋 𝛼 𝑛𝑜𝑟𝑚 𝑥 ∙ 𝑝 𝑥ȁ 𝑛𝑜𝑟𝑚 ≥ −𝜀 𝛼 𝑛𝑜𝑟𝑚 𝑥 + 𝛼 𝑟𝑖𝑠𝑘 𝑥 = 1 ∀ 𝑥 ∈ 𝑋 𝛼 𝑛𝑜𝑟𝑚 𝑥 ≥ 0 ∀ 𝑥 ∈ 𝑋 𝛼 𝑟𝑖𝑠𝑘 𝑥 ≥ 0 ∀ 𝑥 ∈ 𝑋
  • 201. 201 maxxpxp nn11   1nn1111 bxaxa   knkn11k bxaxa   1knn,1k11,1k bxaxa    mnmn11m bxaxa   0x1  0xs  .перем.свx 1s  .перем.свxn  minybyb mm11                                 0y1  0yk  .перем.свy 1k  .перем.свym  1m1m111 pyaya   smms1s1 pyaya   1sm1s,m11s,1 pyaya    nmmn1n1 pyaya   Neyman-Pearson approach
  • 202. 202 maxxpxp nn11   minybyb mm11   1nn1111 bxaxa   knkn11k bxaxa   0y1  0yk  1knn,1k11,1k bxaxa    mnmn11m bxaxa   .перем.свy 1k  .перем.свym  0x1  0xs  1m1m111 pyaya   smms1s1 pyaya   .перем.свx 1s  .перем.свxn  1sm1s,m11s,1 pyaya    nmmn1n1 pyaya     0111111  ybxaxa nn                                 0111111  xpyaya mm Neyman-Pearson approach
  • 203. 203 Neyman-Pearson approach Solution ෍ 𝑥∈𝑋 𝛼 𝑟𝑖𝑠𝑘(𝑥) ∙ 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚) → 𝑚𝑖𝑛 − ෍ 𝑥∈𝑋 𝛼 𝑛𝑜𝑟𝑚 𝑥 ∙ 𝑝 𝑥ȁ 𝑛𝑜𝑟𝑚 ≥ −𝜀 𝛼 𝑛𝑜𝑟𝑚 𝑥 + 𝛼 𝑟𝑖𝑠𝑘 𝑥 = 1 ∀ 𝑥 ∈ 𝑋 𝛼 𝑛𝑜𝑟𝑚 𝑥 ≥ 0 ∀ 𝑥 ∈ 𝑋 𝛼 𝑟𝑖𝑠𝑘 𝑥 ≥ 0 ∀ 𝑥 ∈ 𝑋 −𝜀 ∙ 𝜏 + 1 ∙ 𝑡 𝑥1 + ⋯ + 1 ∙ 𝑡 𝑥𝑖 → 𝑚𝑎𝑥 𝜏 ≥ 0 𝑡 𝑥 𝑓𝑟𝑒𝑒 𝑣𝑎𝑟 −𝑝 𝑥ȁ 𝑟𝑖𝑠𝑘 ∙ 𝜏 + 1 ∙ 𝑡(𝑥) ≤ 0 1 ∙ 𝑡 𝑥 ≤ 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚)
  • 204. 204 Neyman-Pearson approach Solution 𝑥 ∈ 𝑋 𝑛𝑜𝑟𝑚 ⇒ 𝛼 𝑛𝑜𝑟𝑚 𝑥 = 1, 𝛼 𝑟𝑖𝑠𝑘 𝑥 = 0 𝑥 ∈ 𝑋𝑟𝑖𝑠𝑘 ⇒ 𝛼 𝑛𝑜𝑟𝑚 𝑥 = 0, 𝛼 𝑟𝑖𝑠𝑘 𝑥 = 1 ⇒ ⇒ ቊ −𝑝 𝑥ȁ 𝑟𝑖𝑠𝑘 ∙ 𝜏 + 1 ∙ 𝑡 𝑥 = 0 1 ∙ 𝑡 𝑥 < 𝑝 𝑥ȁ 𝑛𝑜𝑟𝑚 𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘) ∙ 𝜏 ≤ 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚) ⇒ ⇒ ቊ −𝑝 𝑥ȁ 𝑟𝑖𝑠𝑘 ∙ 𝜏 + 1 ∙ 𝑡 𝑥 ≤ 0 1 ∙ 𝑡 𝑥 = 𝑝 𝑥ȁ 𝑛𝑜𝑟𝑚 𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘) ∙ 𝜏 ≥ 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚)⇒ ⇒
  • 206. 206 Neyman-Pearson approach Some Variations: two risky states ෍ 𝑥∈𝑋 𝑛𝑜𝑟𝑚 𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘1) ≤ 𝜀 ෍ 𝑥∈𝑋 𝑛𝑜𝑟𝑚 𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘2) ≤ 𝜀 ෍ 𝑥∈𝑋 𝑟𝑖𝑠𝑘 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚) → 𝑚𝑖𝑛
  • 207. 207 Neyman-Pearson approach Some Variations: two normal states ෍ 𝑥∈𝑋 𝑛𝑜𝑟𝑚 𝑝(𝑥ȁ 𝑟𝑖𝑠𝑘) ≤ 𝜀 ෍ 𝑥∈𝑋 𝑟𝑖𝑠𝑘 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚1) + ෍ 𝑥∈𝑋 𝑟𝑖𝑠𝑘 𝑝(𝑥ȁ 𝑛𝑜𝑟𝑚2) → 𝑚𝑖𝑛
  • 208. 208 Neyman-Pearson approach Minimax problem 𝑚𝑎𝑥 ෍ 𝑥∉𝑋1 𝑝(𝑥ȁ1) , ෍ 𝑥∉𝑋2 𝑝(𝑥ȁ2) . . , ෍ 𝑥∉𝑋 𝐾 𝑝(𝑥ȁ 𝐾) → 𝑚𝑖𝑛
  • 210. 210 Missing values o Delete missing rows o Replace with mean/median o Encode as another level of a categorical variable o Create a new engineered feature o Hot deck