Machine
Learning - II
Methods for
feature/variable selection
Algorithms to choose the variables
Types:
1) Forward: this method begins by calculating and examining the
univariate chi-square or individual predictive power of each
variable. It looks for the predictive variable that has the most
variation or greatest differences between its levels when compared
to the different levels of the target variable.
2) Stepwise: this method is very similar to forward selection. The main
difference is that if any variable newly entered or already in the
model, becomes insignificant after it or another variable enters, it
will be removed. This method offers some additional power over
selection in finding the best set of predictors. Its main disadvantage
is slower processing time because each step considers every
variable for entry or removal.
Rupak Roy
Algorithms to choose the variables
3) Backward:
This method begins with all the variables in the model. Each variable
begins the process with a multivariate chi-square or a measure of
predictive power when considered in conjunction with all other
variables. It then removes any variable whose predictive power is
insignificant, beginning with the most insignificant variable. After each
variable is removed, the multivariate chi-square for all variables still in
the model is recalculated with one less variable.
This continues until all remaining variables have multivariate
significance. This method has on distinct benefit over forward and
stepwise. It allows variables of lower significance to be considered in
combination that might never enter the model under the forward and
stepwise methods.
Therefore, the resulting model may depend on more equal
contributions of many variables instead of the dominance of one or two
very powerful variables
Rupak Roy
Algorithms to choose the variables
4) Score:
This method constructs models using all possible subsets of variables
within the list of candidate variables using the highest likelihood score
Chi-square statistic. It does not derive the model coefficients. It simply
lists the best variables for each model along with the overall chi-square.
To apply the algorithms to choose the best variables use:
>step(m,direction=“both”)
Where,
m= the regression model
direction = both indicates use both backward and forward selection
Rupak Roy
Thank you
Rupak Roy

Methods for feature/variable selection in Regression Analysis

  • 1.
    Machine Learning - II Methodsfor feature/variable selection
  • 2.
    Algorithms to choosethe variables Types: 1) Forward: this method begins by calculating and examining the univariate chi-square or individual predictive power of each variable. It looks for the predictive variable that has the most variation or greatest differences between its levels when compared to the different levels of the target variable. 2) Stepwise: this method is very similar to forward selection. The main difference is that if any variable newly entered or already in the model, becomes insignificant after it or another variable enters, it will be removed. This method offers some additional power over selection in finding the best set of predictors. Its main disadvantage is slower processing time because each step considers every variable for entry or removal. Rupak Roy
  • 3.
    Algorithms to choosethe variables 3) Backward: This method begins with all the variables in the model. Each variable begins the process with a multivariate chi-square or a measure of predictive power when considered in conjunction with all other variables. It then removes any variable whose predictive power is insignificant, beginning with the most insignificant variable. After each variable is removed, the multivariate chi-square for all variables still in the model is recalculated with one less variable. This continues until all remaining variables have multivariate significance. This method has on distinct benefit over forward and stepwise. It allows variables of lower significance to be considered in combination that might never enter the model under the forward and stepwise methods. Therefore, the resulting model may depend on more equal contributions of many variables instead of the dominance of one or two very powerful variables Rupak Roy
  • 4.
    Algorithms to choosethe variables 4) Score: This method constructs models using all possible subsets of variables within the list of candidate variables using the highest likelihood score Chi-square statistic. It does not derive the model coefficients. It simply lists the best variables for each model along with the overall chi-square. To apply the algorithms to choose the best variables use: >step(m,direction=“both”) Where, m= the regression model direction = both indicates use both backward and forward selection Rupak Roy
  • 5.