3. Build the classification model
1. Select features 2. Split dataset 3. Build models 4. Assess models
- Remove highly correlated
features (>0.75)
- Features reduced from 436
to 208
- 3 feature subsets
● LVQ #20
● RFE #15
● Boruta #11
- Model Set: first 80%
● Train 70%
● Test 30 %
- Validation: last 20% data
- Linear methods: Linear
Discriminant Analysis and Logistic
Regression.
- Non-Linear methods: Neural
Network, SVM, kNN
- Trees and Rules: CART
- Ensembles of Trees:
Bagging CART, Random Forest
and Stochastic Gradient Boosting
- Features selected using
RFE gave the best results
with the minimum error rate
and the highest precision
- Bagging CART selected
based on Cohen’s Kappa
(Kursa and Rudnicki 2010),
(Guyon and Elisseeff 203)
(Holte 1993) (Lee, Lessler, and Stuart 2010),
(Cutler and Zhao 2001), (Mohanbir
1996), (Kohavi 1995)
(Wilson 1927), (Cohen 1960)