Week 13 Feature Selection Computer Vision Bagian 2
1. Program Studi Teknik Informatika
Fakultas Teknik โ Universitas Surabaya
Feature Selection
Week 13
1604C055 - Machine Learning
2. Feature selection
โข A huge number of features used in building a machine learning
model does not always produce good performance.
โข Irrelevant features has impact to the model performance.
โข Redundant features:
โ lead to overfitting
โ reduce the generalization capability of the model
โ reduce the accuracy of the model.
โข Adding more and more features to the model:
โ increases the overall complexity of the model
โ computational time.
3. Feature selection
โข Feature selection is a process used to automatically select the best
subset of features in the dataset that contribute most to the
prediction variable or output.
โข Benefit:
โ Reduces Overfitting: Less redundant data means less opportunity to
make decisions based on noise
โ Improves Accuracy: Less misleading data means modeling accuracy
improves
โ Reduces Training Time: Less data means that algorithms train faster.
4. Some techniques
โข Filter methods: select the best features by examining their statistical
properties. E.g., variance threshold, correlation coefficient, chi-
square test, ANOVA F-value statistic
โข Wrapper methods use trial and error to find the subset of features
that produce models with the highest quality predictions. E.g.,
forward feature selection, backward feature selection.
โข Embedded methods select the best feature subset as part or as an
extension of a learning algorithmโs training process
5. Variance threshold
โข A simple method for feature selection based on variance
โข Remove all features whose variance less than or equal a threshold
value
โข Variance of feature with n observations ๐ฅ = ๐ฅ1, ๐ฅ2, โฆ , ๐ฅ๐ :
Var ๐ฅ =
1
๐
๐=1
๐
๐ฅ๐ โ ๐ฅ 2
, ๐ฅ =
1
๐
๐=1
๐
๐ฅ๐
โข For binary feature with ๐ observations ๐ฅ = ๐ฅ1, ๐ฅ2, โฆ , ๐ฅ๐ , ๐ฅ๐ โ 0,1 :
Var ๐ฅ = ๐(1 โ ๐)
๐: the proportion of 1
6. Variance threshold: example
๐๐ ๐๐ ๐๐
1 0 1
0 1 0
0 1 1
0 1 0
0 1 1
๐ = 0.2 ๐ = 0.8 ๐ = 0.6
โข Select feature with the
proportion of 1 or 0 is less than
0.75
โข Threshold:
๐ = 0.75 1 โ 0.75 = 0.1875
โข Var ๐ฅ1 โค ๐, Var ๐ฅ2 โค ๐
โข Var ๐ฅ3 > ๐
โข Selected feature: ๐ฅ3
Var ๐ฅ1 = 0.2 1 โ 0.2 = 0.16
Var ๐ฅ2 = 0.8 1 โ 0.8 = 0.16
Var ๐ฅ3 = 0.6 1 โ 0.6 = 0.24
10. Correlation coefficient
โข If two features are highly correlated, then the information contained
in that features is very similar.
โข Highly correlated can be considered as redundant features.
โข Remove all features that highly correlated (greater that a threshold
value)
โข Correlation coefficient of two features with ๐ observations ๐ฅ1 =
๐ฅ11, ๐ฅ12, โฆ , ๐ฅ1๐ , ๐ฅ2 = ๐ฅ21, ๐ฅ22, โฆ , ๐ฅ2๐ :
corr ๐ฅ1, ๐ฅ2 =
๐=1
๐
๐ฅ1๐ โ ๐ฅ1 ๐ฅ2๐ โ ๐ฅ2
๐=1
๐
๐ฅ1๐ โ ๐ฅ1
2
๐=1
๐
๐ฅ2๐ โ ๐ฅ2
2
11. Correlation coefficient: example
๐๐ ๐๐ ๐๐
1 1 1
2 3 0
3 5 1
4 7 0
5 8 1
โข Select features with correlation
coefficient with other features
less than 0.95
โข corr ๐ฅ1, ๐ฅ2 > 0.95
โข corr ๐ฅ1, ๐ฅ3 < 0.95
โข corr ๐ฅ2, ๐ฅ3 < 0.95
โข Selected feature: ๐ฅ2, ๐ฅ3
corr ๐ฅ1, ๐ฅ2 = 0.994
corr ๐ฅ1, ๐ฅ3 = 0
corr ๐ฅ2, ๐ฅ3 = 0.064
16. Chi-square test
โข Can only be applied for categorical features.
โข Used to examine the independence of two categorical vectors i.e.,
feature and target.
โข If the feature and target are independent, then the feature is
considered as irrelevant.
โข For numerical feature, chi-square test still can be applied by first
transforming the quantitative feature into a categorical feature.
โข Chi-Square measures how expected count ๐ธ and observed count ๐
deviates each other.
17. Chi-square test
โข Chi-square statistic (๐2
) is the difference between the observed
number of observations in each class of a categorical feature and its
expected value if that feature was independent (i.e., no relationship)
with the target.
๐๐ ๐ก๐๐ก
2
=
๐=1
๐
๐๐ โ ๐ธ๐
2
๐ธ๐
โข ๐๐: the number of observations in class ๐
โข ๐ธ๐: the expected value of observations in class ๐ if there is no
relationship between the feature and target
18. Chi-square test steps
โข Define Hypothesis:
โ ๐ป0: feature and target are independent
โ ๐ป1: feature and target are not independent
โข Build a Contingency table.
โข Find the expected values:
โ Under ๐ป0 feature and target are independent
โ ๐ ๐ด โฉ ๐ต = ๐ ๐ด ๐ ๐ต
โข Calculate the Chi-Square statistic: ๐๐ ๐ก๐๐ก
2
โข Accept or reject the null hypothesis
19. Chi-square test steps
โข Accept or reject the null hypothesis:
โ Choose level of significant, usually ๐ผ = 0.05
โ Determine degree of freedom, ๐๐ = ๐๐ โ 1 ๐๐ โ 1 , where ๐๐ and ๐๐
are the number of columns and rows in contingency table, resp.
โ Determine ๐๐ผ,๐๐
2
from ๐2 distribution table or p-value
๐ โ ๐ฃ๐๐๐ข๐ = ๐ ๐2 > ๐๐ ๐ก๐๐ก
2
โ Reject ๐ป0 if ๐๐ ๐ก๐๐ก
2
> ๐๐ผ,๐๐
2
or p-value < ๐ผ
20. Chi-square test: example
๐๐ ๐๐ ๐
A M 1
A M 1
B M 0
B M 0
B F 0
โฆ โฆ โฆ
Target
Feature
0 1 Total
A 20 10 30
B 40 30 70
Total 60 40 100
Contingency table
๐ฅ1 and ๐ฆ
โข Expected value:
โ ๐ธ๐ด0 = 100 ร
30
100
ร
60
100
= 18
โ ๐ธ๐ด1 = 100 ร
30
100
ร
40
100
= 12
โ ๐ธ๐ต0 = 100 ร
70
100
ร
60
100
= 42
โ ๐ธ๐ต1 = 100 ร
70
100
ร
40
100
= 28
โข Chi-square statistic
๐๐ ๐ก๐๐ก
2
=
20 โ 18 2
18
+
10 โ 12 2
12
+
40 โ 42 2
42
+
30 โ 28 2
28
= 0.7937
21. Chi-square test: example
โข Accept or Reject the Null Hypothesis:
โ Level of significant ๐ผ = 0.05
โ Degree of freedom ๐๐ = 2 โ 1 2 โ 1 = 1
โ ๐0.05,1
2
= 3.841 from ๐2 distribution table
โ Since ๐๐ ๐ก๐๐ก
2
< ๐0.05,1
2
, then ๐ป0 should be accepted
โ Conclusion: ๐ฅ1 and ๐ฆ are independent (๐ฅ1 cannot be selected)
22. Chi-square test: example
๐๐ ๐๐ ๐
A M 1
A M 1
B M 0
B M 0
B F 0
โฆ โฆ โฆ
Target
Feature
0 1 Total
M 20 30 50
F 40 10 50
Total 60 40 100
Contingency table
๐ฅ2 and ๐ฆ
โข Expected value:
โ ๐ธ๐0 = 100 ร
50
100
ร
60
100
= 30
โ ๐ธ๐1 = 100 ร
50
100
ร
40
100
= 20
โ ๐ธ๐น0 = 100 ร
50
100
ร
60
100
= 30
โ ๐ธ๐น1 = 100 ร
50
100
ร
40
100
= 20
โข Chi-square statistic
๐๐ ๐ก๐๐ก
2
=
20 โ 30 2
30
+
30 โ 20 2
20
+
40 โ 30 2
30
+
10 โ 20 2
20
= 11.667
23. Chi-square test: example
โข Accept or Reject the Null Hypothesis:
โ Level of significant ๐ผ = 0.05
โ Degree of freedom ๐๐ = 2 โ 1 2 โ 1 = 1
โ ๐0.05,๐๐
2
= 3.841
โ Since ๐๐ ๐ก๐๐ก
2
> ๐0.05,๐๐
2
, then ๐ป0 should be rejected
โ Conclusion: ๐ฅ2 and ๐ฆ are dependent (๐ฅ2 can be selected)
27. ANOVA
โข ANOVA (Analysis of Variance) is a statistical test used to examine
weather two or more groups differ from each other significantly by
comparing the mean of each group
โข In feature selection, the observations in a features are grouped
based on target class.
โข If the means of each group are significantly different, then the
feature is considered as relevant.
28. ANOVA steps
โข Define Hypothesis:
โ ๐ป0: all groups mean values are same
โ ๐ป1: at least one of the groups mean values differ
โข Calculate total sum of square (SST), between group sum of square
(SSB), and error sum of square (SSE).
โข Determine degree of freedom.
โข Calculate between group mean of square (MSB), and error mean of
square (MSE).
โข Calculate ๐น statistic
29. ANOVA steps
โข Suppose ๐ฅ๐๐ = ๐ฅ๐1, ๐ฅ๐2, โฆ , ๐ฅ๐๐๐
is the observations of feature that belong to
group (class) ๐, for ๐ = 1,2, โฆ , ๐, where ๐ is the number of group (class).
โข Calculate some of square:
๐ฅ =
1
๐
๐=1
๐
๐=1
๐๐
๐ฅ๐๐ , ๐ =
๐=1
๐
๐๐ , ๐ฅ๐ =
1
๐๐
๐=1
๐๐
๐ฅ๐๐
๐๐๐ =
๐=1
๐
๐=1
๐๐
๐ฅ๐๐ โ ๐ฅ
2
๐๐๐ต =
๐=1
๐
๐๐ ๐ฅ๐ โ ๐ฅ 2
๐๐๐ธ = ๐๐๐ โ ๐๐๐ต