Random Forestsrffit.1 <- randomForest(takes.loan ~ ., data=bank)Most important parameters are: Variable Description Default ntree Number of Trees 500 mtry Number of variables to randomly • square root of # predictors for select at each node classification • # predictors / 3 for regression
Tuning RFrffit.1 <- tuneRF(X, y, mtryStart=1, stepFactor=2,improve=0.05)
Benefits of RF• Good accuracy with default settings• Relatively easy to make parallel• Many implementations • R, Weka, RapidMiner, Mahout
References• A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.• Breiman, Leo. Classification and Regression Trees. Belmont, Calif: Wadsworth International Group, 1984. Print.• Brieman, Leo and Adele Cutler. Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.htm• S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.