3. Classification and Regression Trees (CART)
• Decision trees can be applied to both regression and classification
problems.
• Decision trees are powerful classifiers, which utilize a tree structure
to model the relationships among the features and the potential
outcomes.
• Decision trees are built using a heuristic called recursive partitioning
(divide and conquer).
• We first consider regression problems (with continuous response Y )
and then move on to classification.
5. Example
• In order to motivate regression trees, we begin with a simple example.
• We will look at housing data collected in California in the 1990 census
• Aggregated data are available for 20,640 neighbourhoods in California
• Our goal is to predict the median house price in each neighbourhood
using some or all of the following predictor variables:
Latitude; Longitude; Median income; Median house age; Average
occupancy per house; Average number of rooms per house; Average
number of bedrooms per house; Neighbourhood population size
6.
7. We will begin
by considering
only latitude and
longitude as
potential predictors.....
8.
9.
10.
11.
12.
13. Terminology
• Parent node: is the immediate predecessor of Node j.
• Child node: is the immediate successor of a parent node.
• Root node: is the top node of the tree, all observations are together
in this node.
• Terminal node: is the node that do not have children nodes,
decisions are made looking at these nodes.
• Depth: is the maximal length of a path from the root node to a
terminal node.
25. The Problem of Overfitting
• A model that is excessively complex will fit the sample data very
well, but will not be good at predicting new responses
• This is called overfitting the sample data
26. Avoiding Overfitting
• We are interested in how well our model will predict the responses of
unseen data
• To assess this, we must divide our dataset into two parts: a training
set and a test set
• We will fit the model to the training data and then assess the
prediction error on the test set
• We will then choose the model with the lowest test error
• This may be achieved by a technique called cross-validation
27. Cross-Validation
• In K-fold cross-validation, we randomly divide the set observations into
K groups or folds of roughly equal size
• The model is fitted to all data excluding the kth fold and predictions
are then made for the data in the kth fold
• This is repeated for k = 1,…,K and the test errors are then combined
across folds
• 10-fold cross-validation is the most popular. When K = n, it is called
leave-one-out cross-validation