1. Feature Selection Methods:
Variance Threshold
21-11-2021 1
Department of Computer Science and Engineering
National Institute of Technology Silchar, Assam, INDIA
By-
Nurul Amin Choudhury
NIT Silchar - PhD Scholar
2. 21-11-2021 2
Variance Threshold for feature Selection
• Main idea: Removing Numerical features with low variance.
• Assumption: Higher variance may contain more useful information.
• It is a Filter-based method.
• The idea is to compute the variance of each features (independently), and select the subset of features based on
a user-specified threshold (𝜶).
• This feature selection algorithm looks only at the features (X), not the desired outputs (y), and can thus be used
for unsupervised learning.
• It is applicable only on Numerical features.
• Drawbacks: As we are not taking the relationship between features variables, we may delete the dependency
among the features.
3. 21-11-2021 3
Example Problem-
F1 F2 Class
5 1 C1
4 4 C1
3 5 C2
6 3 C2
2 4 C1
Table 1. Sample Dataset
Step 1. Let us consider the Variance Threshold (𝜶) >= 2.
Step 2. Finding Mean of Feature F1 and F2 respectively, using Ui= 𝒊=𝟎
𝒏
𝒙𝒏
𝒏
.
Mean (F1): U1= (5+4+3+6+2) / 5 = 4
Mean (F2): U2 = (1+4+5+3+4) / 5 = 3.4
Step 3. Finding the Variance of F1 and F2 using Pi = 𝒊=𝟎
𝒏
𝒙𝒊−𝑼𝟏
𝟐
𝒏
Variance (F1): P1 = [ (5-4)2 + (4-4)2 + (3-4)2 + (6-4)2 + (4-2)2 ] / 5
P1 = 2
Variance (F2): P1 = [ (1-3.4)2 + (4-3.4)2 + (5-3.4)2 + (3-3.4)2 + (4-3.4)2 ] / 5
P1 = 1.84
4. 21-11-2021 4
Cntd…
Step 4. Comparing the Variance with our threshold Variance (𝜶) which was 2.
P1 = 2 Is, P1 >= 𝜶 (Yes, it satisfies) (It can be chosen in our feature subset)
P2 = 1.84 Is, P1 >= 𝜶 (No, it does not satisfies) (It cannot be chosen in our feature subset)
Step 5. Finally, Feature F1 is chosen among the features F1 and F2.