SlideShare a Scribd company logo
1 of 34
Program Studi Teknik Informatika
Fakultas Teknik โ€“ Universitas Surabaya
Feature Selection
Week 13
1604C055 - Machine Learning
Feature selection
โ€ข A huge number of features used in building a machine learning
model does not always produce good performance.
โ€ข Irrelevant features has impact to the model performance.
โ€ข Redundant features:
โ€“ lead to overfitting
โ€“ reduce the generalization capability of the model
โ€“ reduce the accuracy of the model.
โ€ข Adding more and more features to the model:
โ€“ increases the overall complexity of the model
โ€“ computational time.
Feature selection
โ€ข Feature selection is a process used to automatically select the best
subset of features in the dataset that contribute most to the
prediction variable or output.
โ€ข Benefit:
โ€“ Reduces Overfitting: Less redundant data means less opportunity to
make decisions based on noise
โ€“ Improves Accuracy: Less misleading data means modeling accuracy
improves
โ€“ Reduces Training Time: Less data means that algorithms train faster.
Some techniques
โ€ข Filter methods: select the best features by examining their statistical
properties. E.g., variance threshold, correlation coefficient, chi-
square test, ANOVA F-value statistic
โ€ข Wrapper methods use trial and error to find the subset of features
that produce models with the highest quality predictions. E.g.,
forward feature selection, backward feature selection.
โ€ข Embedded methods select the best feature subset as part or as an
extension of a learning algorithmโ€™s training process
Variance threshold
โ€ข A simple method for feature selection based on variance
โ€ข Remove all features whose variance less than or equal a threshold
value
โ€ข Variance of feature with n observations ๐‘ฅ = ๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘› :
Var ๐‘ฅ =
1
๐‘›
๐‘–=1
๐‘›
๐‘ฅ๐‘– โˆ’ ๐‘ฅ 2
, ๐‘ฅ =
1
๐‘›
๐‘–=1
๐‘›
๐‘ฅ๐‘–
โ€ข For binary feature with ๐‘› observations ๐‘ฅ = ๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘› , ๐‘ฅ๐‘– โˆˆ 0,1 :
Var ๐‘ฅ = ๐‘(1 โˆ’ ๐‘)
๐‘: the proportion of 1
Variance threshold: example
๐’™๐Ÿ ๐’™๐Ÿ ๐’™๐Ÿ‘
1 0 1
0 1 0
0 1 1
0 1 0
0 1 1
๐‘ = 0.2 ๐‘ = 0.8 ๐‘ = 0.6
โ€ข Select feature with the
proportion of 1 or 0 is less than
0.75
โ€ข Threshold:
๐‘‡ = 0.75 1 โˆ’ 0.75 = 0.1875
โ€ข Var ๐‘ฅ1 โ‰ค ๐‘‡, Var ๐‘ฅ2 โ‰ค ๐‘‡
โ€ข Var ๐‘ฅ3 > ๐‘‡
โ€ข Selected feature: ๐‘ฅ3
Var ๐‘ฅ1 = 0.2 1 โˆ’ 0.2 = 0.16
Var ๐‘ฅ2 = 0.8 1 โˆ’ 0.8 = 0.16
Var ๐‘ฅ3 = 0.6 1 โˆ’ 0.6 = 0.24
Variance threshold with
sklearn.feature_selection.VarianceThreshold
Variance threshold with
sklearn.feature_selection.VarianceThreshold
Variance threshold with
sklearn.feature_selection.VarianceThreshold
Correlation coefficient
โ€ข If two features are highly correlated, then the information contained
in that features is very similar.
โ€ข Highly correlated can be considered as redundant features.
โ€ข Remove all features that highly correlated (greater that a threshold
value)
โ€ข Correlation coefficient of two features with ๐‘› observations ๐‘ฅ1 =
๐‘ฅ11, ๐‘ฅ12, โ€ฆ , ๐‘ฅ1๐‘› , ๐‘ฅ2 = ๐‘ฅ21, ๐‘ฅ22, โ€ฆ , ๐‘ฅ2๐‘› :
corr ๐‘ฅ1, ๐‘ฅ2 =
๐‘–=1
๐‘›
๐‘ฅ1๐‘– โˆ’ ๐‘ฅ1 ๐‘ฅ2๐‘– โˆ’ ๐‘ฅ2
๐‘–=1
๐‘›
๐‘ฅ1๐‘– โˆ’ ๐‘ฅ1
2
๐‘–=1
๐‘›
๐‘ฅ2๐‘– โˆ’ ๐‘ฅ2
2
Correlation coefficient: example
๐’™๐Ÿ ๐’™๐Ÿ ๐’™๐Ÿ‘
1 1 1
2 3 0
3 5 1
4 7 0
5 8 1
โ€ข Select features with correlation
coefficient with other features
less than 0.95
โ€ข corr ๐‘ฅ1, ๐‘ฅ2 > 0.95
โ€ข corr ๐‘ฅ1, ๐‘ฅ3 < 0.95
โ€ข corr ๐‘ฅ2, ๐‘ฅ3 < 0.95
โ€ข Selected feature: ๐‘ฅ2, ๐‘ฅ3
corr ๐‘ฅ1, ๐‘ฅ2 = 0.994
corr ๐‘ฅ1, ๐‘ฅ3 = 0
corr ๐‘ฅ2, ๐‘ฅ3 = 0.064
Correlation coefficient with pandas
Correlation coefficient with pandas
Correlation coefficient with pandas
Correlation coefficient with pandas
Chi-square test
โ€ข Can only be applied for categorical features.
โ€ข Used to examine the independence of two categorical vectors i.e.,
feature and target.
โ€ข If the feature and target are independent, then the feature is
considered as irrelevant.
โ€ข For numerical feature, chi-square test still can be applied by first
transforming the quantitative feature into a categorical feature.
โ€ข Chi-Square measures how expected count ๐ธ and observed count ๐‘‚
deviates each other.
Chi-square test
โ€ข Chi-square statistic (๐œ’2
) is the difference between the observed
number of observations in each class of a categorical feature and its
expected value if that feature was independent (i.e., no relationship)
with the target.
๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก
2
=
๐‘–=1
๐‘›
๐‘‚๐‘– โˆ’ ๐ธ๐‘–
2
๐ธ๐‘–
โ€ข ๐‘‚๐‘–: the number of observations in class ๐‘–
โ€ข ๐ธ๐‘–: the expected value of observations in class ๐‘– if there is no
relationship between the feature and target
Chi-square test steps
โ€ข Define Hypothesis:
โ€“ ๐ป0: feature and target are independent
โ€“ ๐ป1: feature and target are not independent
โ€ข Build a Contingency table.
โ€ข Find the expected values:
โ€“ Under ๐ป0 feature and target are independent
โ€“ ๐‘ƒ ๐ด โˆฉ ๐ต = ๐‘ƒ ๐ด ๐‘ƒ ๐ต
โ€ข Calculate the Chi-Square statistic: ๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก
2
โ€ข Accept or reject the null hypothesis
Chi-square test steps
โ€ข Accept or reject the null hypothesis:
โ€“ Choose level of significant, usually ๐›ผ = 0.05
โ€“ Determine degree of freedom, ๐‘‘๐‘“ = ๐‘›๐‘ โˆ’ 1 ๐‘›๐‘Ÿ โˆ’ 1 , where ๐‘›๐‘ and ๐‘›๐‘Ÿ
are the number of columns and rows in contingency table, resp.
โ€“ Determine ๐œ’๐›ผ,๐‘‘๐‘“
2
from ๐œ’2 distribution table or p-value
๐‘ โˆ’ ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ = ๐‘ƒ ๐œ’2 > ๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก
2
โ€“ Reject ๐ป0 if ๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก
2
> ๐œ’๐›ผ,๐‘‘๐‘“
2
or p-value < ๐›ผ
Chi-square test: example
๐’™๐Ÿ ๐’™๐Ÿ ๐’š
A M 1
A M 1
B M 0
B M 0
B F 0
โ€ฆ โ€ฆ โ€ฆ
Target
Feature
0 1 Total
A 20 10 30
B 40 30 70
Total 60 40 100
Contingency table
๐‘ฅ1 and ๐‘ฆ
โ€ข Expected value:
โ€“ ๐ธ๐ด0 = 100 ร—
30
100
ร—
60
100
= 18
โ€“ ๐ธ๐ด1 = 100 ร—
30
100
ร—
40
100
= 12
โ€“ ๐ธ๐ต0 = 100 ร—
70
100
ร—
60
100
= 42
โ€“ ๐ธ๐ต1 = 100 ร—
70
100
ร—
40
100
= 28
โ€ข Chi-square statistic
๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก
2
=
20 โˆ’ 18 2
18
+
10 โˆ’ 12 2
12
+
40 โˆ’ 42 2
42
+
30 โˆ’ 28 2
28
= 0.7937
Chi-square test: example
โ€ข Accept or Reject the Null Hypothesis:
โ€“ Level of significant ๐›ผ = 0.05
โ€“ Degree of freedom ๐‘‘๐‘“ = 2 โˆ’ 1 2 โˆ’ 1 = 1
โ€“ ๐œ’0.05,1
2
= 3.841 from ๐œ’2 distribution table
โ€“ Since ๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก
2
< ๐œ’0.05,1
2
, then ๐ป0 should be accepted
โ€“ Conclusion: ๐‘ฅ1 and ๐‘ฆ are independent (๐‘ฅ1 cannot be selected)
Chi-square test: example
๐’™๐Ÿ ๐’™๐Ÿ ๐’š
A M 1
A M 1
B M 0
B M 0
B F 0
โ€ฆ โ€ฆ โ€ฆ
Target
Feature
0 1 Total
M 20 30 50
F 40 10 50
Total 60 40 100
Contingency table
๐‘ฅ2 and ๐‘ฆ
โ€ข Expected value:
โ€“ ๐ธ๐‘€0 = 100 ร—
50
100
ร—
60
100
= 30
โ€“ ๐ธ๐‘€1 = 100 ร—
50
100
ร—
40
100
= 20
โ€“ ๐ธ๐น0 = 100 ร—
50
100
ร—
60
100
= 30
โ€“ ๐ธ๐น1 = 100 ร—
50
100
ร—
40
100
= 20
โ€ข Chi-square statistic
๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก
2
=
20 โˆ’ 30 2
30
+
30 โˆ’ 20 2
20
+
40 โˆ’ 30 2
30
+
10 โˆ’ 20 2
20
= 11.667
Chi-square test: example
โ€ข Accept or Reject the Null Hypothesis:
โ€“ Level of significant ๐›ผ = 0.05
โ€“ Degree of freedom ๐‘‘๐‘“ = 2 โˆ’ 1 2 โˆ’ 1 = 1
โ€“ ๐œ’0.05,๐‘‘๐‘“
2
= 3.841
โ€“ Since ๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก
2
> ๐œ’0.05,๐‘‘๐‘“
2
, then ๐ป0 should be rejected
โ€“ Conclusion: ๐‘ฅ2 and ๐‘ฆ are dependent (๐‘ฅ2 can be selected)
Chi-square test with
sklearn.feature_selection.chi2
Chi-square test with
sklearn.feature_selection.chi2
ANOVA
โ€ข ANOVA (Analysis of Variance) is a statistical test used to examine
weather two or more groups differ from each other significantly by
comparing the mean of each group
โ€ข In feature selection, the observations in a features are grouped
based on target class.
โ€ข If the means of each group are significantly different, then the
feature is considered as relevant.
ANOVA steps
โ€ข Define Hypothesis:
โ€“ ๐ป0: all groups mean values are same
โ€“ ๐ป1: at least one of the groups mean values differ
โ€ข Calculate total sum of square (SST), between group sum of square
(SSB), and error sum of square (SSE).
โ€ข Determine degree of freedom.
โ€ข Calculate between group mean of square (MSB), and error mean of
square (MSE).
โ€ข Calculate ๐น statistic
ANOVA steps
โ€ข Suppose ๐‘ฅ๐‘–๐‘— = ๐‘ฅ๐‘–1, ๐‘ฅ๐‘–2, โ€ฆ , ๐‘ฅ๐‘–๐‘›๐‘–
is the observations of feature that belong to
group (class) ๐‘–, for ๐‘– = 1,2, โ€ฆ , ๐‘˜, where ๐‘˜ is the number of group (class).
โ€ข Calculate some of square:
๐‘ฅ =
1
๐‘
๐‘–=1
๐‘˜
๐‘—=1
๐‘›๐‘–
๐‘ฅ๐‘–๐‘— , ๐‘ =
๐‘–=1
๐‘˜
๐‘›๐‘– , ๐‘ฅ๐‘– =
1
๐‘›๐‘–
๐‘—=1
๐‘›๐‘–
๐‘ฅ๐‘–๐‘—
๐‘†๐‘†๐‘‡ =
๐‘–=1
๐‘˜
๐‘—=1
๐‘›๐‘–
๐‘ฅ๐‘–๐‘— โˆ’ ๐‘ฅ
2
๐‘†๐‘†๐ต =
๐‘–=1
๐‘˜
๐‘›๐‘– ๐‘ฅ๐‘– โˆ’ ๐‘ฅ 2
๐‘†๐‘†๐ธ = ๐‘†๐‘†๐‘‡ โˆ’ ๐‘†๐‘†๐ต
ANOVA steps
โ€ข Determine degree of freedom
โ€“ Between group: ๐‘‘๐‘“๐ต = ๐‘˜ โˆ’ 1
โ€“ Error: ๐‘‘๐‘“๐ธ = ๐‘ โˆ’ ๐‘˜
โ€“ Total: ๐‘‘๐‘“๐‘‡ = ๐‘ โˆ’ 1
โ€ข Calculate mean of square
โ€“ ๐‘€๐‘†๐ต =
๐‘†๐‘†๐ต
๐‘‘๐‘“๐ต
โ€“ ๐‘€๐‘†๐ธ =
๐‘†๐‘†๐ธ
๐‘‘๐‘“๐ธ
โ€ข Calculate statistic ๐น: ๐น๐‘ ๐‘ก๐‘Ž๐‘ก =
๐‘€๐‘†๐ต
๐‘€๐‘†๐ธ
ANOVA steps
โ€ข Accept or reject the null hypothesis:
โ€“ Choose level of significant, usually ๐›ผ = 0.05
โ€“ Determine ๐น๐›ผ;๐‘‘๐‘“๐ต,๐‘‘๐‘“๐ธ
from ๐น distribution table or p-value
๐‘ โˆ’ ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ = ๐‘ƒ ๐น > ๐น๐‘ ๐‘ก๐‘Ž๐‘ก
โ€“ Reject ๐ป0 if ๐น๐‘ ๐‘ก๐‘Ž๐‘ก > ๐น๐›ผ;๐‘‘๐‘“๐ต,๐‘‘๐‘“๐ธ
or p-value < ๐›ผ
ANOVA with
sklearn.feature_selection.f_classif
ANOVA with
sklearn.feature_selection.f_value

More Related Content

Similar to Week 13 Feature Selection Computer Vision Bagian 2

Data Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptxData Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptxSubrata Kumer Paul
ย 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selectionMarco Meoni
ย 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
ย 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxPratik Gohel
ย 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
ย 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining TechniquesValerii Klymchuk
ย 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluationkhairulhuda242
ย 
EMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptxEMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptxAliElMoselhy
ย 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detectionQuantUniversity
ย 
Unit 3 โ€“ AIML.pptx
Unit 3 โ€“ AIML.pptxUnit 3 โ€“ AIML.pptx
Unit 3 โ€“ AIML.pptxhiblooms
ย 
Attribute MSA
Attribute MSAAttribute MSA
Attribute MSAdishashah4993
ย 
Attribute MSA
Attribute MSA Attribute MSA
Attribute MSA dishashah4993
ย 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringChangsung Moon
ย 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
ย 
2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptxZhiwuGuo1
ย 
Data mining with weka
Data mining with wekaData mining with weka
Data mining with wekaHein Min Htike
ย 
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhg
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhgjhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhg
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhgUMAIRASHFAQ20
ย 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka
ย 

Similar to Week 13 Feature Selection Computer Vision Bagian 2 (20)

Data Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptxData Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptx
ย 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selection
ย 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
ย 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptx
ย 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
ย 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
ย 
Msa presentation
Msa presentationMsa presentation
Msa presentation
ย 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluation
ย 
EMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptxEMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptx
ย 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
ย 
Unit 3 โ€“ AIML.pptx
Unit 3 โ€“ AIML.pptxUnit 3 โ€“ AIML.pptx
Unit 3 โ€“ AIML.pptx
ย 
Attribute MSA
Attribute MSAAttribute MSA
Attribute MSA
ย 
Attribute MSA
Attribute MSA Attribute MSA
Attribute MSA
ย 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
ย 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
ย 
2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx
ย 
Machine learning
Machine learningMachine learning
Machine learning
ย 
Data mining with weka
Data mining with wekaData mining with weka
Data mining with weka
ย 
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhg
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhgjhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhg
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhg
ย 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
ย 

Recently uploaded

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
ย 
Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”
Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”
Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”soniya singh
ย 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
ย 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
ย 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
ย 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
ย 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
ย 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
ย 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
ย 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
ย 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
ย 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
ย 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxnull - The Open Security Community
ย 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
ย 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
ย 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
ย 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
ย 

Recently uploaded (20)

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
ย 
Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”
Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”
Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”
ย 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
ย 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
ย 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
ย 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
ย 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
ย 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
ย 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
ย 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
ย 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
ย 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
ย 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
ย 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
ย 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
ย 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
ย 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
ย 
โ˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
โ˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCRโ˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
โ˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
ย 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
ย 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
ย 

Week 13 Feature Selection Computer Vision Bagian 2

  • 1. Program Studi Teknik Informatika Fakultas Teknik โ€“ Universitas Surabaya Feature Selection Week 13 1604C055 - Machine Learning
  • 2. Feature selection โ€ข A huge number of features used in building a machine learning model does not always produce good performance. โ€ข Irrelevant features has impact to the model performance. โ€ข Redundant features: โ€“ lead to overfitting โ€“ reduce the generalization capability of the model โ€“ reduce the accuracy of the model. โ€ข Adding more and more features to the model: โ€“ increases the overall complexity of the model โ€“ computational time.
  • 3. Feature selection โ€ข Feature selection is a process used to automatically select the best subset of features in the dataset that contribute most to the prediction variable or output. โ€ข Benefit: โ€“ Reduces Overfitting: Less redundant data means less opportunity to make decisions based on noise โ€“ Improves Accuracy: Less misleading data means modeling accuracy improves โ€“ Reduces Training Time: Less data means that algorithms train faster.
  • 4. Some techniques โ€ข Filter methods: select the best features by examining their statistical properties. E.g., variance threshold, correlation coefficient, chi- square test, ANOVA F-value statistic โ€ข Wrapper methods use trial and error to find the subset of features that produce models with the highest quality predictions. E.g., forward feature selection, backward feature selection. โ€ข Embedded methods select the best feature subset as part or as an extension of a learning algorithmโ€™s training process
  • 5. Variance threshold โ€ข A simple method for feature selection based on variance โ€ข Remove all features whose variance less than or equal a threshold value โ€ข Variance of feature with n observations ๐‘ฅ = ๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘› : Var ๐‘ฅ = 1 ๐‘› ๐‘–=1 ๐‘› ๐‘ฅ๐‘– โˆ’ ๐‘ฅ 2 , ๐‘ฅ = 1 ๐‘› ๐‘–=1 ๐‘› ๐‘ฅ๐‘– โ€ข For binary feature with ๐‘› observations ๐‘ฅ = ๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘› , ๐‘ฅ๐‘– โˆˆ 0,1 : Var ๐‘ฅ = ๐‘(1 โˆ’ ๐‘) ๐‘: the proportion of 1
  • 6. Variance threshold: example ๐’™๐Ÿ ๐’™๐Ÿ ๐’™๐Ÿ‘ 1 0 1 0 1 0 0 1 1 0 1 0 0 1 1 ๐‘ = 0.2 ๐‘ = 0.8 ๐‘ = 0.6 โ€ข Select feature with the proportion of 1 or 0 is less than 0.75 โ€ข Threshold: ๐‘‡ = 0.75 1 โˆ’ 0.75 = 0.1875 โ€ข Var ๐‘ฅ1 โ‰ค ๐‘‡, Var ๐‘ฅ2 โ‰ค ๐‘‡ โ€ข Var ๐‘ฅ3 > ๐‘‡ โ€ข Selected feature: ๐‘ฅ3 Var ๐‘ฅ1 = 0.2 1 โˆ’ 0.2 = 0.16 Var ๐‘ฅ2 = 0.8 1 โˆ’ 0.8 = 0.16 Var ๐‘ฅ3 = 0.6 1 โˆ’ 0.6 = 0.24
  • 10. Correlation coefficient โ€ข If two features are highly correlated, then the information contained in that features is very similar. โ€ข Highly correlated can be considered as redundant features. โ€ข Remove all features that highly correlated (greater that a threshold value) โ€ข Correlation coefficient of two features with ๐‘› observations ๐‘ฅ1 = ๐‘ฅ11, ๐‘ฅ12, โ€ฆ , ๐‘ฅ1๐‘› , ๐‘ฅ2 = ๐‘ฅ21, ๐‘ฅ22, โ€ฆ , ๐‘ฅ2๐‘› : corr ๐‘ฅ1, ๐‘ฅ2 = ๐‘–=1 ๐‘› ๐‘ฅ1๐‘– โˆ’ ๐‘ฅ1 ๐‘ฅ2๐‘– โˆ’ ๐‘ฅ2 ๐‘–=1 ๐‘› ๐‘ฅ1๐‘– โˆ’ ๐‘ฅ1 2 ๐‘–=1 ๐‘› ๐‘ฅ2๐‘– โˆ’ ๐‘ฅ2 2
  • 11. Correlation coefficient: example ๐’™๐Ÿ ๐’™๐Ÿ ๐’™๐Ÿ‘ 1 1 1 2 3 0 3 5 1 4 7 0 5 8 1 โ€ข Select features with correlation coefficient with other features less than 0.95 โ€ข corr ๐‘ฅ1, ๐‘ฅ2 > 0.95 โ€ข corr ๐‘ฅ1, ๐‘ฅ3 < 0.95 โ€ข corr ๐‘ฅ2, ๐‘ฅ3 < 0.95 โ€ข Selected feature: ๐‘ฅ2, ๐‘ฅ3 corr ๐‘ฅ1, ๐‘ฅ2 = 0.994 corr ๐‘ฅ1, ๐‘ฅ3 = 0 corr ๐‘ฅ2, ๐‘ฅ3 = 0.064
  • 16. Chi-square test โ€ข Can only be applied for categorical features. โ€ข Used to examine the independence of two categorical vectors i.e., feature and target. โ€ข If the feature and target are independent, then the feature is considered as irrelevant. โ€ข For numerical feature, chi-square test still can be applied by first transforming the quantitative feature into a categorical feature. โ€ข Chi-Square measures how expected count ๐ธ and observed count ๐‘‚ deviates each other.
  • 17. Chi-square test โ€ข Chi-square statistic (๐œ’2 ) is the difference between the observed number of observations in each class of a categorical feature and its expected value if that feature was independent (i.e., no relationship) with the target. ๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก 2 = ๐‘–=1 ๐‘› ๐‘‚๐‘– โˆ’ ๐ธ๐‘– 2 ๐ธ๐‘– โ€ข ๐‘‚๐‘–: the number of observations in class ๐‘– โ€ข ๐ธ๐‘–: the expected value of observations in class ๐‘– if there is no relationship between the feature and target
  • 18. Chi-square test steps โ€ข Define Hypothesis: โ€“ ๐ป0: feature and target are independent โ€“ ๐ป1: feature and target are not independent โ€ข Build a Contingency table. โ€ข Find the expected values: โ€“ Under ๐ป0 feature and target are independent โ€“ ๐‘ƒ ๐ด โˆฉ ๐ต = ๐‘ƒ ๐ด ๐‘ƒ ๐ต โ€ข Calculate the Chi-Square statistic: ๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก 2 โ€ข Accept or reject the null hypothesis
  • 19. Chi-square test steps โ€ข Accept or reject the null hypothesis: โ€“ Choose level of significant, usually ๐›ผ = 0.05 โ€“ Determine degree of freedom, ๐‘‘๐‘“ = ๐‘›๐‘ โˆ’ 1 ๐‘›๐‘Ÿ โˆ’ 1 , where ๐‘›๐‘ and ๐‘›๐‘Ÿ are the number of columns and rows in contingency table, resp. โ€“ Determine ๐œ’๐›ผ,๐‘‘๐‘“ 2 from ๐œ’2 distribution table or p-value ๐‘ โˆ’ ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ = ๐‘ƒ ๐œ’2 > ๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก 2 โ€“ Reject ๐ป0 if ๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก 2 > ๐œ’๐›ผ,๐‘‘๐‘“ 2 or p-value < ๐›ผ
  • 20. Chi-square test: example ๐’™๐Ÿ ๐’™๐Ÿ ๐’š A M 1 A M 1 B M 0 B M 0 B F 0 โ€ฆ โ€ฆ โ€ฆ Target Feature 0 1 Total A 20 10 30 B 40 30 70 Total 60 40 100 Contingency table ๐‘ฅ1 and ๐‘ฆ โ€ข Expected value: โ€“ ๐ธ๐ด0 = 100 ร— 30 100 ร— 60 100 = 18 โ€“ ๐ธ๐ด1 = 100 ร— 30 100 ร— 40 100 = 12 โ€“ ๐ธ๐ต0 = 100 ร— 70 100 ร— 60 100 = 42 โ€“ ๐ธ๐ต1 = 100 ร— 70 100 ร— 40 100 = 28 โ€ข Chi-square statistic ๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก 2 = 20 โˆ’ 18 2 18 + 10 โˆ’ 12 2 12 + 40 โˆ’ 42 2 42 + 30 โˆ’ 28 2 28 = 0.7937
  • 21. Chi-square test: example โ€ข Accept or Reject the Null Hypothesis: โ€“ Level of significant ๐›ผ = 0.05 โ€“ Degree of freedom ๐‘‘๐‘“ = 2 โˆ’ 1 2 โˆ’ 1 = 1 โ€“ ๐œ’0.05,1 2 = 3.841 from ๐œ’2 distribution table โ€“ Since ๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก 2 < ๐œ’0.05,1 2 , then ๐ป0 should be accepted โ€“ Conclusion: ๐‘ฅ1 and ๐‘ฆ are independent (๐‘ฅ1 cannot be selected)
  • 22. Chi-square test: example ๐’™๐Ÿ ๐’™๐Ÿ ๐’š A M 1 A M 1 B M 0 B M 0 B F 0 โ€ฆ โ€ฆ โ€ฆ Target Feature 0 1 Total M 20 30 50 F 40 10 50 Total 60 40 100 Contingency table ๐‘ฅ2 and ๐‘ฆ โ€ข Expected value: โ€“ ๐ธ๐‘€0 = 100 ร— 50 100 ร— 60 100 = 30 โ€“ ๐ธ๐‘€1 = 100 ร— 50 100 ร— 40 100 = 20 โ€“ ๐ธ๐น0 = 100 ร— 50 100 ร— 60 100 = 30 โ€“ ๐ธ๐น1 = 100 ร— 50 100 ร— 40 100 = 20 โ€ข Chi-square statistic ๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก 2 = 20 โˆ’ 30 2 30 + 30 โˆ’ 20 2 20 + 40 โˆ’ 30 2 30 + 10 โˆ’ 20 2 20 = 11.667
  • 23. Chi-square test: example โ€ข Accept or Reject the Null Hypothesis: โ€“ Level of significant ๐›ผ = 0.05 โ€“ Degree of freedom ๐‘‘๐‘“ = 2 โˆ’ 1 2 โˆ’ 1 = 1 โ€“ ๐œ’0.05,๐‘‘๐‘“ 2 = 3.841 โ€“ Since ๐œ’๐‘ ๐‘ก๐‘Ž๐‘ก 2 > ๐œ’0.05,๐‘‘๐‘“ 2 , then ๐ป0 should be rejected โ€“ Conclusion: ๐‘ฅ2 and ๐‘ฆ are dependent (๐‘ฅ2 can be selected)
  • 24.
  • 27. ANOVA โ€ข ANOVA (Analysis of Variance) is a statistical test used to examine weather two or more groups differ from each other significantly by comparing the mean of each group โ€ข In feature selection, the observations in a features are grouped based on target class. โ€ข If the means of each group are significantly different, then the feature is considered as relevant.
  • 28. ANOVA steps โ€ข Define Hypothesis: โ€“ ๐ป0: all groups mean values are same โ€“ ๐ป1: at least one of the groups mean values differ โ€ข Calculate total sum of square (SST), between group sum of square (SSB), and error sum of square (SSE). โ€ข Determine degree of freedom. โ€ข Calculate between group mean of square (MSB), and error mean of square (MSE). โ€ข Calculate ๐น statistic
  • 29. ANOVA steps โ€ข Suppose ๐‘ฅ๐‘–๐‘— = ๐‘ฅ๐‘–1, ๐‘ฅ๐‘–2, โ€ฆ , ๐‘ฅ๐‘–๐‘›๐‘– is the observations of feature that belong to group (class) ๐‘–, for ๐‘– = 1,2, โ€ฆ , ๐‘˜, where ๐‘˜ is the number of group (class). โ€ข Calculate some of square: ๐‘ฅ = 1 ๐‘ ๐‘–=1 ๐‘˜ ๐‘—=1 ๐‘›๐‘– ๐‘ฅ๐‘–๐‘— , ๐‘ = ๐‘–=1 ๐‘˜ ๐‘›๐‘– , ๐‘ฅ๐‘– = 1 ๐‘›๐‘– ๐‘—=1 ๐‘›๐‘– ๐‘ฅ๐‘–๐‘— ๐‘†๐‘†๐‘‡ = ๐‘–=1 ๐‘˜ ๐‘—=1 ๐‘›๐‘– ๐‘ฅ๐‘–๐‘— โˆ’ ๐‘ฅ 2 ๐‘†๐‘†๐ต = ๐‘–=1 ๐‘˜ ๐‘›๐‘– ๐‘ฅ๐‘– โˆ’ ๐‘ฅ 2 ๐‘†๐‘†๐ธ = ๐‘†๐‘†๐‘‡ โˆ’ ๐‘†๐‘†๐ต
  • 30. ANOVA steps โ€ข Determine degree of freedom โ€“ Between group: ๐‘‘๐‘“๐ต = ๐‘˜ โˆ’ 1 โ€“ Error: ๐‘‘๐‘“๐ธ = ๐‘ โˆ’ ๐‘˜ โ€“ Total: ๐‘‘๐‘“๐‘‡ = ๐‘ โˆ’ 1 โ€ข Calculate mean of square โ€“ ๐‘€๐‘†๐ต = ๐‘†๐‘†๐ต ๐‘‘๐‘“๐ต โ€“ ๐‘€๐‘†๐ธ = ๐‘†๐‘†๐ธ ๐‘‘๐‘“๐ธ โ€ข Calculate statistic ๐น: ๐น๐‘ ๐‘ก๐‘Ž๐‘ก = ๐‘€๐‘†๐ต ๐‘€๐‘†๐ธ
  • 31. ANOVA steps โ€ข Accept or reject the null hypothesis: โ€“ Choose level of significant, usually ๐›ผ = 0.05 โ€“ Determine ๐น๐›ผ;๐‘‘๐‘“๐ต,๐‘‘๐‘“๐ธ from ๐น distribution table or p-value ๐‘ โˆ’ ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ = ๐‘ƒ ๐น > ๐น๐‘ ๐‘ก๐‘Ž๐‘ก โ€“ Reject ๐ป0 if ๐น๐‘ ๐‘ก๐‘Ž๐‘ก > ๐น๐›ผ;๐‘‘๐‘“๐ต,๐‘‘๐‘“๐ธ or p-value < ๐›ผ
  • 32.