More Related Content
Similar to i2ml-chap5-v1-1.ppt (20)
i2ml-chap5-v1-1.ppt
- 3. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
3
Multivariate Data
Multiple measurements (sensors)
d inputs/features/attributes: d-variate
N instances/observations/examples
N
d
N
N
d
d
X
X
X
X
X
X
X
X
X
2
1
2
2
2
2
1
1
1
2
1
1
X
- 4. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
4
Multivariate Parameters
j
i
ij
ij
j
i
j
i
ij
T
d
X
,
X
X
,
X
,...,
E
Corr
:
n
Correlatio
Cov
:
Covariance
:
Mean 1
μ
x
2
2
1
2
2
2
21
1
12
2
1
Cov
d
d
d
d
d
T
E
μ
μ X
X
X
- 5. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
5
Parameter Estimation
j
i
ij
ij
j
t
j
N
t i
t
i
ij
N
t
t
i
i
s
s
s
r
:
N
m
x
m
x
s
:
d
,...,
i
,
N
x
m
:
R
S
m
matrix
n
Correlatio
matrix
Covariance
1
mean
Sample
1
1
- 6. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
6
Estimation of Missing Values
What to do if certain instances have missing attributes?
Ignore those instances: not a good idea if the sample is
small
Use ‘missing’ as an attribute: may give information
Imputation: Fill in the missing value
Mean imputation: Use the most likely value (e.g., mean)
Imputation by regression: Predict based on other attributes
- 7. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
7
Multivariate Normal Distribution
μ
x
μ
x
x
μ
x
1
2
1
2
2
1
exp
2
1
Σ
Σ
Σ
T
/
/
d
d
p
~ ,
N
- 8. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
8
Multivariate Normal Distribution
Mahalanobis distance: (x – μ)T ∑–1 (x – μ)
measures the distance from x to μ in terms of ∑
(normalizes for difference in variances and
correlations)
Bivariate: d = 2
2
2
2
1
2
1
2
1
i
i
i
i /
x
z
z
z
z
z
x
,
x
p
2
2
2
1
2
1
2
2
2
1
2
1 2
1
2
1
exp
1
2
1
- 9. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
9
Bivariate Normal
- 10. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
10
- 11. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
11
Independent Inputs: Naive Bayes
If xi are independent, offdiagonals of ∑ are 0,
Mahalanobis distance reduces to weighted (by 1/σi)
Euclidean distance:
If variances are also equal, reduces to Euclidean
distance
d
i i
i
i
d
i
i
/
d
d
i
i
i
x
x
p
p
1
2
1
2
1 2
1
exp
2
1
x
- 12. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
12
Parametric Classification
If p (x | Ci ) ~ N ( μi , ∑i )
Discriminant functions are
i
i
T
i
/
i
/
d
i
C
p μ
x
μ
x
x
1
2
1
2
2
1
exp
2
1
| Σ
Σ
i
i
i
T
i
i
i
i
i
C
P
d
C
P
C
p
g
log
2
1
log
2
1
2
log
2
log
|
log
1
μ
Σ
μ
Σ x
x
x
x
- 13. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
13
Estimation of Parameters
t
t
i
T
i
t
t i
t
t
i
i
t
t
i
t
t
t
i
i
t
t
i
i
r
r
r
r
N
r
C
P̂
m
x
m
x
x
m
S
i
i
i
T
i
i
i C
P̂
g log
2
1
log
2
1 1
m
x
m
x
x S
S
- 14. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
14
Different Si
Quadratic discriminant
i
i
i
i
T
i
i
i
i
i
i
i
i
T
i
i
T
i
i
i
T
i
i
i
T
i
T
i
i
C
P̂
w
w
C
P̂
g
log
log
2
1
2
1
2
1
where
log
2
2
1
log
2
1
1
0
1
1
0
1
1
1
S
S
S
S
W
W
S
S
S
S
m
m
m
w
x
w
x
x
m
m
m
x
x
x
x
- 15. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
15
likelihoods
posterior for C1
discriminant:
P (C1|x ) = 0.5
- 16. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
16
Common Covariance Matrix S
Shared common sample covariance S
Discriminant reduces to
which is a linear discriminant
i
i
i
C
P̂ S
S
i
i
T
i
i C
P̂
g log
2
1 1
m
x
m
x
x S
i
i
T
i
i
i
i
i
T
i
i
C
P̂
w
w
g
log
2
1
where
1
0
1
0
m
m
m
w
x
w
x
S
S
- 17. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
17
Common Covariance Matrix S
- 18. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
18
Diagonal S
When xj j = 1,..d, are independent, ∑ is diagonal
p (x|Ci) = ∏j
p (xj |Ci) (Naive Bayes’ assumption)
Classify based on weighted Euclidean distance (in sj
units) to the nearest mean
i
d
j j
ij
t
j
i C
P̂
s
m
x
g log
2
1
1
2
x
- 19. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
19
Diagonal S
variances may be
different
- 20. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
20
Diagonal S, equal variances
Nearest mean classifier: Classify based on Euclidean
distance to the nearest mean
Each mean can be considered a prototype or template
and this is template matching
i
d
j
ij
t
j
i
i
i
C
P̂
m
x
s
C
P̂
s
g
log
2
1
log
2
2
1
2
2
2
m
x
x
- 21. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
21
Diagonal S, equal variances
*?
- 22. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
22
Model Selection
As we increase complexity (less restricted S), bias
decreases and variance increases
Assume simple models (allow some bias) to control
variance (regularization)
Assumption Covariance matrix No of parameters
Shared, Hyperspheric Si=S=s2I 1
Shared, Axis-aligned Si=S, with sij=0 d
Shared, Hyperellipsoidal Si=S d(d+1)/2
Different, Hyperellipsoidal Si K d(d+1)/2
- 23. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
23
Discrete Features
Binary features:
if xj are independent (Naive Bayes’)
the discriminant is linear
i
j
ij
j
ij
j
i
i
i
C
P
p
x
p
x
C
P
C
p
g
log
1
log
1
log
log
|
log
x
x
Estimated parameters
i
j
ij C
x
p
p |
1
d
j
x
ij
x
ij
i
j
j
p
p
C
x
p
1
1
1
|
t
t
i
t
t
i
t
j
ij
r
r
x
p̂
- 24. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
24
Discrete Features
Multinomial (1-of-nj) features: xj {v1, v2,..., vnj
}
if xj are independent
i
k
j
i
jk
ijk C
v
x
p
C
z
p
p |
|
1
t
t
i
t
t
i
t
jk
ijk
i
ijk
j k jk
i
d
j
n
k
z
ijk
i
r
r
z
p̂
C
P
p
z
g
p
C
p
j
jk
log
log
|
1 1
x
x
- 25. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
25
Multivariate Regression
Multivariate linear model
Multivariate polynomial model:
Define new higher-order variables
z1=x1, z2=x2, z3=x1
2, z4=x2
2, z5=x1x2
and use the linear model in this new z space
(basis functions, kernel trick, SVM: Chapter 10)
d
t
t
w
,...,
w
,
w
x
g
r 1
0
|
2
1
1
0
1
0
2
2
1
1
0
2
1
|
t
t
d
d
t
t
d
t
d
d
t
t
x
w
x
w
w
r
w
,...,
w
,
w
E
x
w
x
w
x
w
w
X