ML_DT.pdf

ML
First semester 2021
By Amir EL-Ghamry
Classification

Outline
¨ Classification: Basic Concepts
¨ Decision Tree Induction
¨ Prediction using classification model
¨ Model Evaluation and Selection
¨ Summary
2

Supervised vs. Unsupervised Learning
¨ Supervised learning
(classification)
The class labels and the number of
classes of training data is known.
New data is classified based on
the training set.
¨ Unsupervised learning
(clustering)
The class labels of training
data is unknown.
Given a set of measurements,
observations, etc.… with the aim of
establishing the existence of
classes or clusters in the data.
Learning
Supervised Learning Unsupervised Learning

Example: Supervised vs. Unsupervised
Class labels √
Number of classes√
Class labels x
Number of classes√ or x
√ : Known in advance
X : Unknown in advance
4

Prediction Problems: Classification vs. Regression
¨ Classification and Regression are two major types of prediction problems.
¨ Classification:
Predicts categorical class labels (discrete or nominal).
Classifies data (constructs a model) based on the training set and the values (class
labels) in a classifying attribute and uses it in classifying new data.
¨ Regression:
Models continuous-valued functions, i.e., predicts unknown or missing values.
5
Activity: (classification or regression?)
For some loan application data à predict “safe” or “risky”.
For a marketing manager à predict how much a given customer will spend during a sale.

Typical applications of classification
¨ Credit/loan approval: if a customer should be approved or not.
¨ Medical diagnosis: if a tumor is malignant or benign.
¨ Fraud detection: if a transaction is fraudulent.
6

Classification Process: the 3 steps
7

Step 1: Model construction (Learning)
¨ Each tuple is assumed to belong to a predefined class, as determined by one of
the attributes, called the class label.
¨ The set of all tuples used for construction of the model is called training set.
¨ The model is represented in the following forms:
1. Classification rules, (IF-THEN statements).
2. Decision tree.
3. Mathematical formula.

Step 1: Model construction (Learning)

Step 2: Model Evaluation
¨ Estimate the accuracy rate of the model based on a test set.
The known label of test sample is compared with the classified result from the model.
Accuracy rate is the percentage of test set samples that are correctly classified by the
model.
Test set is independent of training set, otherwise over-fitting will occur.
¨ If the accuracy is acceptable, use the model to classify data tuples whose
class labels are not known.

Step 3: Using the Model in Prediction
¨ The model is used to classify unseen objects (new data).
Give a class label to a new tuple.
Predict the value of an actual attribute (class label).
10
Predict if the new data is(safe) or risky)) ?

Decision Tree : How it works
13

Decision Tree Induction: Algorithm
1. Tree is constructed in a top-down recursive divide-and-conquer manner.
2. At start, all the training examples are at the root; then it breaks down a
dataset into smaller and smaller subsets
3. Examples are partitioned recursively based on selected attributes.
n Attributes are categorical (if continuous-valued, they are discretized in
advance).
4. The final result is a tree with decision nodes and leaf nodes.
n Decision node is an attribute, and the
n leaf node represents classification or decision.
5. Test attributes are selected on the basis of a heuristic or statistical measure
(e.g., information gain).
13

Decision Tree Induction:
Example
14
Dataset

Decision Tree Induction:
Example
14

Decision Tree Induction: Algorithm
¨ Conditions for stopping partitioning:
1. All samples for a given node belong to the same class.
2. There are no remaining attributes for further partitioning – majority voting
is employed for classifying the leaf.
3. There are no samples left.
13

Decision Tree Induction: Algorithm Flowchart
13

Attribute Partitioning Scenario
15
¨ Partitioning is different depends on attribute types and tree type.
Binary Tree: 2-way split
Non-Binary Tree: Multi-way split
(a) If A is discrete-valued.
(b) If A is continuous-valued.
(c) If A is discrete-valued and a binary tree must be produced.

Attribute Partitioning Scenario
15

Attribute Selection Measures
¨ An attribute selection measure is a
heuristic for selecting the splitting criterion
that “best” separates a given data partition,
D, of class-labeled training tuples into
individual classes.
¨ Different measures:
Information Gain
Gain Ratio
Gini Index (Gini impurity)
16
The best splitting attribute is

¨ The attribute selection measure provides a ranking for each attribute
describing the given training tuples.
¨ The attribute having the best score is chosen as the splitting attribute.
¨ Best score: depending on the measure, either the highest or lowest is chosen as
the best.
16

16
Choosing Top Node in DT

Decision Tree:Howtobuilddecisiontree
13

Build decision tree to predict new samples
13

Decision Tree:Howtoquantifyimpurity
13

Gini impurity to quantify impurity
13

Gini impurity of Love popcorn
13

Gini impurity of Love Popcorn
13

Gini impurity of Love Popcorn Left Side
13

Gini impurity of Love Popcorn Right Side
13

Gini impurity of Love Popcorn : Total impurity
13

Gini impurity of Love Soda: Discrete data
13

Gini impurity of Age: Continuous data
13

Gini impurity of Continuous attributes: Age
13

Gini impurity of Age: Calculating gini of Age < 9.5
13

Gini impurity of Age: Calculating gini of all Age values
13

Gini impurity of Age: Calculating gini of all Age values
13
2

Don’t Forget: we want to choose the
root node of the decision Tree
13

13
Which attribute will be root node

Gini impurity: Impure node need to split
13

13
Gini impurity: check Love popcorn VS age

13
Gini impurity of Love popcorn

13
Gini impurity of Age < 12.5

13
Gini impurity of Choose Age

13
Gini impurity: Assign output label to leaves

Prediction using Decision Tree
13

Predict using Decision Tree
13

Overfitting Solution : Thresholding
13

Rule extraction from decision tree
13

Rule Extraction
¨ Rules can be extracted from a decision tree:
Rules are easier to understand than large trees.
the knowledge is represented in the form of IF-THEN rules.
n Rule antecedent/precondition vs. rule consequent.
¨ One rule is created for each path from the root to a leaf.
Each attribute-value pair along a path forms a conjunction: the leaf
holds the class prediction.
36

Rule Extraction: Example 1
36
¨ Example: Rule extraction from buys computer decision-tree:
age>40
age<=30
30<age<=40

Rule Extraction
q IF age = youth AND student = yes. THEN buys_computer = yes
q IF age = mid-aged THEN buys_computer = yes
IF age = senior AND credit_rating = excellent THEN buys_computer = yes
q IF age = senior AND credit_rating = fair THEN buys_computer = no
36
¨ Example: Rule extraction from buys computer decision-tree:
IF age = youth AND student = no. THEN buys_computer = no

Rule Assessment
¨ A rule R can be assessed by its coverage and accuracy.
¨ Coverage (R): the percentage of tuples that are covered by the rule.
their attribute values hold true for the rule’s antecedent.
¨ Accuracy(R): percentage of covered rules that are correctly classified by R
37
𝑛𝑐𝑜𝑣𝑒𝑟𝑠= # of tuples covered by R
𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = # of tuples correctly classified by R

Rule Assessment: Example
38
Exercise: IF age<=30 and student=no THEN buys_computer = no

Summary
¨ Classification: Supervised learning.
¨ Decision Tree Induction: Splitting Methods.
¨ Prediction using classification model.
53

ML_DT.pdf

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to ML_DT.pdf

Similar to ML_DT.pdf (20)

More from AmirMohamedNabilSale

More from AmirMohamedNabilSale (18)

Recently uploaded

Recently uploaded (20)

ML_DT.pdf