Machine Learning (Decisoion Trees)

MBD1202-Tree-Based Methods
Machine Learning Algorithm
Master of Science in Big Data Analytics- 2022

Classification and Regression Trees
(CART)

Classification and Regression Trees (CART)
• Decision trees can be applied to both regression and classification
problems.
• Decision trees are powerful classifiers, which utilize a tree structure
to model the relationships among the features and the potential
outcomes.
• Decision trees are built using a heuristic called recursive partitioning
(divide and conquer).
• We first consider regression problems (with continuous response Y )
and then move on to classification.

Example
• In order to motivate regression trees, we begin with a simple example.
• We will look at housing data collected in California in the 1990 census
• Aggregated data are available for 20,640 neighbourhoods in California
• Our goal is to predict the median house price in each neighbourhood
using some or all of the following predictor variables:
Latitude; Longitude; Median income; Median house age; Average
occupancy per house; Average number of rooms per house; Average
number of bedrooms per house; Neighbourhood population size

We will begin
by considering
only latitude and
longitude as
potential predictors.....

Terminology
• Parent node: is the immediate predecessor of Node j.
• Child node: is the immediate successor of a parent node.
• Root node: is the top node of the tree, all observations are together
in this node.
• Terminal node: is the node that do not have children nodes,
decisions are made looking at these nodes.
• Depth: is the maximal length of a path from the root node to a
terminal node.

The Problem of Overfitting
• A model that is excessively complex will fit the sample data very
well, but will not be good at predicting new responses
• This is called overfitting the sample data

Avoiding Overfitting
• We are interested in how well our model will predict the responses of
unseen data
• To assess this, we must divide our dataset into two parts: a training
set and a test set
• We will fit the model to the training data and then assess the
prediction error on the test set
• We will then choose the model with the lowest test error
• This may be achieved by a technique called cross-validation

Cross-Validation
• In K-fold cross-validation, we randomly divide the set observations into
K groups or folds of roughly equal size
• The model is fitted to all data excluding the kth fold and predictions
are then made for the data in the kth fold
• This is repeated for k = 1,…,K and the test errors are then combined
across folds
• 10-fold cross-validation is the most popular. When K = n, it is called
leave-one-out cross-validation

Cross-Validation
Example: 4-fold cross-validation for the linear model

Machine Learning (Decisoion Trees)

Recommended

Recommended

More Related Content

Similar to Machine Learning (Decisoion Trees)

Similar to Machine Learning (Decisoion Trees) (20)

Recently uploaded

Recently uploaded (20)

Machine Learning (Decisoion Trees)