Regression tree (prostate cancer)
Advantages of a Tree
• Works for both classification and regression.
• Handles categorical predictors naturally.
• No formal distributional assumptions.
• Can handle highly non-linear interactions and
• Handles missing values in the variables.
Advantages of Random Forests
• Built-in estimates of accuracy.
• Automatic variable selection.
• Variable importance.
• Works well “off the shelf”.
• Handles “wide” data.
Grow a forest of many trees.
Each tree is a little different (slightly different
data, different choices of predictors).
Combine the trees to get predictions for new
Idea: most of the trees are good for most of
the data and make mistakes in different
RF handles thousands of predictors
Ramón Díaz-Uriarte, Sara Alvarez de Andrés
Bioinformatics Unit, Spanish National Cancer Center
March, 2005 http://ligarto.org/rdiaz
SVM, linear kernel
KNN/crossvalidation (Dudoit et al. JASA 2002)
Shrunken Centroids (Tibshirani et al. PNAS 2002)
“Given its performance, random forest and variable selection
using random forest should probably become part of the
standard tool-box of methods for the analysis of microarray