SlideShare a Scribd company logo
INTRODUCTION TO
Machine Learning
ETHEM ALPAYDIN
© The MIT Press, 2004
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml
Lecture Slides for
CHAPTER 5:
Multivariate Methods
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
3
Multivariate Data
 Multiple measurements (sensors)
 d inputs/features/attributes: d-variate
 N instances/observations/examples















N
d
N
N
d
d
X
X
X
X
X
X
X
X
X




2
1
2
2
2
2
1
1
1
2
1
1
X
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
4
Multivariate Parameters
   
 
 
j
i
ij
ij
j
i
j
i
ij
T
d
X
,
X
X
,
X
,...,
E












Corr
:
n
Correlatio
Cov
:
Covariance
:
Mean 1
μ
x
    
 




















2
2
1
2
2
2
21
1
12
2
1
Cov
d
d
d
d
d
T
E













μ
μ X
X
X
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
5
Parameter Estimation
  
j
i
ij
ij
j
t
j
N
t i
t
i
ij
N
t
t
i
i
s
s
s
r
:
N
m
x
m
x
s
:
d
,...,
i
,
N
x
m
:










R
S
m
matrix
n
Correlatio
matrix
Covariance
1
mean
Sample
1
1
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
6
Estimation of Missing Values
 What to do if certain instances have missing attributes?
 Ignore those instances: not a good idea if the sample is
small
 Use ‘missing’ as an attribute: may give information
 Imputation: Fill in the missing value
 Mean imputation: Use the most likely value (e.g., mean)
 Imputation by regression: Predict based on other attributes
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
7
Multivariate Normal Distribution
 
 
 
   









 
μ
x
μ
x
x
μ
x
1
2
1
2
2
1
exp
2
1
Σ
Σ
Σ
T
/
/
d
d
p
~ ,
N
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
8
Multivariate Normal Distribution
 Mahalanobis distance: (x – μ)T ∑–1 (x – μ)
measures the distance from x to μ in terms of ∑
(normalizes for difference in variances and
correlations)
 Bivariate: d = 2 











 2
2
2
1
2
1
2
1

 
  
  i
i
i
i /
x
z
z
z
z
z
x
,
x
p




















 2
2
2
1
2
1
2
2
2
1
2
1 2
1
2
1
exp
1
2
1
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
9
Bivariate Normal
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
10
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
11
Independent Inputs: Naive Bayes
 If xi are independent, offdiagonals of ∑ are 0,
Mahalanobis distance reduces to weighted (by 1/σi)
Euclidean distance:
 If variances are also equal, reduces to Euclidean
distance
   
  






















 
 


d
i i
i
i
d
i
i
/
d
d
i
i
i
x
x
p
p
1
2
1
2
1 2
1
exp
2
1

x
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
12
Parametric Classification
 If p (x | Ci ) ~ N ( μi , ∑i )
 Discriminant functions are
 
 
   











i
i
T
i
/
i
/
d
i
C
p μ
x
μ
x
x
1
2
1
2
2
1
exp
2
1
| Σ
Σ
     
     
i
i
i
T
i
i
i
i
i
C
P
d
C
P
C
p
g
log
2
1
log
2
1
2
log
2
log
|
log
1










μ
Σ
μ
Σ x
x
x
x

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
13
Estimation of Parameters
 
  










t
t
i
T
i
t
t i
t
t
i
i
t
t
i
t
t
t
i
i
t
t
i
i
r
r
r
r
N
r
C
P̂
m
x
m
x
x
m
S
       
i
i
i
T
i
i
i C
P̂
g log
2
1
log
2
1 1







m
x
m
x
x S
S
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
14
Different Si
 Quadratic discriminant
     
 
i
i
i
i
T
i
i
i
i
i
i
i
i
T
i
i
T
i
i
i
T
i
i
i
T
i
T
i
i
C
P̂
w
w
C
P̂
g
log
log
2
1
2
1
2
1
where
log
2
2
1
log
2
1
1
0
1
1
0
1
1
1






















S
S
S
S
W
W
S
S
S
S
m
m
m
w
x
w
x
x
m
m
m
x
x
x
x
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
15
likelihoods
posterior for C1
discriminant:
P (C1|x ) = 0.5
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
16
Common Covariance Matrix S
 Shared common sample covariance S
 Discriminant reduces to
which is a linear discriminant
  i
i
i
C
P̂ S
S 

       
i
i
T
i
i C
P̂
g log
2
1 1




 
m
x
m
x
x S
 
 
i
i
T
i
i
i
i
i
T
i
i
C
P̂
w
w
g
log
2
1
where
1
0
1
0








m
m
m
w
x
w
x
S
S
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
17
Common Covariance Matrix S
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
18
Diagonal S
 When xj j = 1,..d, are independent, ∑ is diagonal
p (x|Ci) = ∏j
p (xj |Ci) (Naive Bayes’ assumption)
Classify based on weighted Euclidean distance (in sj
units) to the nearest mean
   
i
d
j j
ij
t
j
i C
P̂
s
m
x
g log
2
1
1
2








 

 

x
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
19
Diagonal S
variances may be
different
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
20
Diagonal S, equal variances
 Nearest mean classifier: Classify based on Euclidean
distance to the nearest mean
 Each mean can be considered a prototype or template
and this is template matching
   
   
i
d
j
ij
t
j
i
i
i
C
P̂
m
x
s
C
P̂
s
g
log
2
1
log
2
2
1
2
2
2










m
x
x
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
21
Diagonal S, equal variances
*?
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
22
Model Selection
 As we increase complexity (less restricted S), bias
decreases and variance increases
 Assume simple models (allow some bias) to control
variance (regularization)
Assumption Covariance matrix No of parameters
Shared, Hyperspheric Si=S=s2I 1
Shared, Axis-aligned Si=S, with sij=0 d
Shared, Hyperellipsoidal Si=S d(d+1)/2
Different, Hyperellipsoidal Si K d(d+1)/2
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
23
Discrete Features
 Binary features:
if xj are independent (Naive Bayes’)
the discriminant is linear
     
   
   
i
j
ij
j
ij
j
i
i
i
C
P
p
x
p
x
C
P
C
p
g
log
1
log
1
log
log
|
log








x
x
Estimated parameters
 
i
j
ij C
x
p
p |
1


    





d
j
x
ij
x
ij
i
j
j
p
p
C
x
p
1
1
1
|



t
t
i
t
t
i
t
j
ij
r
r
x
p̂
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
24
Discrete Features
 Multinomial (1-of-nj) features: xj  {v1, v2,..., vnj
}
if xj are independent
   
i
k
j
i
jk
ijk C
v
x
p
C
z
p
p |
|
1 



 
   


 





 
t
t
i
t
t
i
t
jk
ijk
i
ijk
j k jk
i
d
j
n
k
z
ijk
i
r
r
z
p̂
C
P
p
z
g
p
C
p
j
jk
log
log
|
1 1
x
x
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
25
Multivariate Regression
 Multivariate linear model
 Multivariate polynomial model:
Define new higher-order variables
z1=x1, z2=x2, z3=x1
2, z4=x2
2, z5=x1x2
and use the linear model in this new z space
(basis functions, kernel trick, SVM: Chapter 10)
  

 d
t
t
w
,...,
w
,
w
x
g
r 1
0
|
   2
1
1
0
1
0
2
2
1
1
0
2
1
|  








t
t
d
d
t
t
d
t
d
d
t
t
x
w
x
w
w
r
w
,...,
w
,
w
E
x
w
x
w
x
w
w


X

More Related Content

Similar to i2ml-chap5-v1-1.ppt

i2ml-chap1-v1-1 (3).ppt
i2ml-chap1-v1-1 (3).ppti2ml-chap1-v1-1 (3).ppt
i2ml-chap1-v1-1 (3).ppt
atul404633
 
Core–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectorsCore–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectors
Francesco Tudisco
 

Similar to i2ml-chap5-v1-1.ppt (20)

i2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppti2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppt
 
machine learning presentation classes000
machine learning presentation classes000machine learning presentation classes000
machine learning presentation classes000
 
i2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppti2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppt
 
i2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppti2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppt
 
i2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppti2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppt
 
i2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppti2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppt
 
i2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppti2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppt
 
introduction to machine learning cemp.ppt
introduction to machine learning cemp.pptintroduction to machine learning cemp.ppt
introduction to machine learning cemp.ppt
 
i2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppti2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppt
 
i2ml-chap1-v1-1 (3).ppt
i2ml-chap1-v1-1 (3).ppti2ml-chap1-v1-1 (3).ppt
i2ml-chap1-v1-1 (3).ppt
 
grace presentation power point presentation
grace presentation power point presentationgrace presentation power point presentation
grace presentation power point presentation
 
Core–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectorsCore–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectors
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
 
Fundamentals cig 4thdec
Fundamentals cig 4thdecFundamentals cig 4thdec
Fundamentals cig 4thdec
 
Declare Your Language: Dynamic Semantics
Declare Your Language: Dynamic SemanticsDeclare Your Language: Dynamic Semantics
Declare Your Language: Dynamic Semantics
 
A comparison of three learning methods to predict N20 fluxes and N leaching
A comparison of three learning methods to predict N20 fluxes and N leachingA comparison of three learning methods to predict N20 fluxes and N leaching
A comparison of three learning methods to predict N20 fluxes and N leaching
 
Mm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsMm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithms
 
UofT_ML_lecture.pptx
UofT_ML_lecture.pptxUofT_ML_lecture.pptx
UofT_ML_lecture.pptx
 
7 - Model Assessment and Selection
7 - Model Assessment and Selection7 - Model Assessment and Selection
7 - Model Assessment and Selection
 

Recently uploaded

Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdf
YibeltalNibretu
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 

Recently uploaded (20)

Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdf
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
 
NLC-2024-Orientation-for-RO-SDO (1).pptx
NLC-2024-Orientation-for-RO-SDO (1).pptxNLC-2024-Orientation-for-RO-SDO (1).pptx
NLC-2024-Orientation-for-RO-SDO (1).pptx
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
NCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdfNCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdf
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptx
 
B.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdfB.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdf
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 

i2ml-chap5-v1-1.ppt

  • 1. INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Lecture Slides for
  • 3. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Multivariate Data  Multiple measurements (sensors)  d inputs/features/attributes: d-variate  N instances/observations/examples                N d N N d d X X X X X X X X X     2 1 2 2 2 2 1 1 1 2 1 1 X
  • 4. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Multivariate Parameters         j i ij ij j i j i ij T d X , X X , X ,..., E             Corr : n Correlatio Cov : Covariance : Mean 1 μ x                            2 2 1 2 2 2 21 1 12 2 1 Cov d d d d d T E              μ μ X X X
  • 5. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Parameter Estimation    j i ij ij j t j N t i t i ij N t t i i s s s r : N m x m x s : d ,..., i , N x m :           R S m matrix n Correlatio matrix Covariance 1 mean Sample 1 1
  • 6. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 Estimation of Missing Values  What to do if certain instances have missing attributes?  Ignore those instances: not a good idea if the sample is small  Use ‘missing’ as an attribute: may give information  Imputation: Fill in the missing value  Mean imputation: Use the most likely value (e.g., mean)  Imputation by regression: Predict based on other attributes
  • 7. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 Multivariate Normal Distribution                      μ x μ x x μ x 1 2 1 2 2 1 exp 2 1 Σ Σ Σ T / / d d p ~ , N
  • 8. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8 Multivariate Normal Distribution  Mahalanobis distance: (x – μ)T ∑–1 (x – μ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)  Bivariate: d = 2              2 2 2 1 2 1 2 1         i i i i / x z z z z z x , x p                      2 2 2 1 2 1 2 2 2 1 2 1 2 1 2 1 exp 1 2 1
  • 9. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9 Bivariate Normal
  • 10. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10
  • 11. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 Independent Inputs: Naive Bayes  If xi are independent, offdiagonals of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/σi) Euclidean distance:  If variances are also equal, reduces to Euclidean distance                                    d i i i i d i i / d d i i i x x p p 1 2 1 2 1 2 1 exp 2 1  x
  • 12. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 Parametric Classification  If p (x | Ci ) ~ N ( μi , ∑i )  Discriminant functions are                    i i T i / i / d i C p μ x μ x x 1 2 1 2 2 1 exp 2 1 | Σ Σ             i i i T i i i i i C P d C P C p g log 2 1 log 2 1 2 log 2 log | log 1           μ Σ μ Σ x x x x 
  • 13. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Estimation of Parameters                t t i T i t t i t t i i t t i t t t i i t t i i r r r r N r C P̂ m x m x x m S         i i i T i i i C P̂ g log 2 1 log 2 1 1        m x m x x S S
  • 14. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14 Different Si  Quadratic discriminant         i i i i T i i i i i i i i T i i T i i i T i i i T i T i i C P̂ w w C P̂ g log log 2 1 2 1 2 1 where log 2 2 1 log 2 1 1 0 1 1 0 1 1 1                       S S S S W W S S S S m m m w x w x x m m m x x x x
  • 15. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 likelihoods posterior for C1 discriminant: P (C1|x ) = 0.5
  • 16. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16 Common Covariance Matrix S  Shared common sample covariance S  Discriminant reduces to which is a linear discriminant   i i i C P̂ S S           i i T i i C P̂ g log 2 1 1       m x m x x S     i i T i i i i i T i i C P̂ w w g log 2 1 where 1 0 1 0         m m m w x w x S S
  • 17. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Common Covariance Matrix S
  • 18. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 Diagonal S  When xj j = 1,..d, are independent, ∑ is diagonal p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’ assumption) Classify based on weighted Euclidean distance (in sj units) to the nearest mean     i d j j ij t j i C P̂ s m x g log 2 1 1 2               x
  • 19. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19 Diagonal S variances may be different
  • 20. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 Diagonal S, equal variances  Nearest mean classifier: Classify based on Euclidean distance to the nearest mean  Each mean can be considered a prototype or template and this is template matching         i d j ij t j i i i C P̂ m x s C P̂ s g log 2 1 log 2 2 1 2 2 2           m x x
  • 21. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21 Diagonal S, equal variances *?
  • 22. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22 Model Selection  As we increase complexity (less restricted S), bias decreases and variance increases  Assume simple models (allow some bias) to control variance (regularization) Assumption Covariance matrix No of parameters Shared, Hyperspheric Si=S=s2I 1 Shared, Axis-aligned Si=S, with sij=0 d Shared, Hyperellipsoidal Si=S d(d+1)/2 Different, Hyperellipsoidal Si K d(d+1)/2
  • 23. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23 Discrete Features  Binary features: if xj are independent (Naive Bayes’) the discriminant is linear               i j ij j ij j i i i C P p x p x C P C p g log 1 log 1 log log | log         x x Estimated parameters   i j ij C x p p | 1             d j x ij x ij i j j p p C x p 1 1 1 |    t t i t t i t j ij r r x p̂
  • 24. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24 Discrete Features  Multinomial (1-of-nj) features: xj  {v1, v2,..., vnj } if xj are independent     i k j i jk ijk C v x p C z p p | | 1                      t t i t t i t jk ijk i ijk j k jk i d j n k z ijk i r r z p̂ C P p z g p C p j jk log log | 1 1 x x
  • 25. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25 Multivariate Regression  Multivariate linear model  Multivariate polynomial model: Define new higher-order variables z1=x1, z2=x2, z3=x1 2, z4=x2 2, z5=x1x2 and use the linear model in this new z space (basis functions, kernel trick, SVM: Chapter 10)      d t t w ,..., w , w x g r 1 0 |    2 1 1 0 1 0 2 2 1 1 0 2 1 |           t t d d t t d t d d t t x w x w w r w ,..., w , w E x w x w x w w   X