Decision Tree
CSIBER
AN AUTONOMOUS INSTITUTE Classification: Decision Tree
• A decision tree is a popular machine learning algorithm used
for both classification and regression tasks.
• It's a tree-like structure where each internal node represents a
decision based on a feature, each branch represents an
outcome of that decision, and each leaf node represents the
final output (class label or numerical value).
• Decision trees are particularly useful for their simplicity,
interpretability, and ease of visualization.
CSIBER
AN AUTONOMOUS INSTITUTE Classification: Decision Tree
• A decision tree is a popular machine learning algorithm used
for both classification and regression tasks.
• It's a tree-like structure where each internal node represents a
decision based on a feature, each branch represents an
outcome of that decision, and each leaf node represents the
final output (class label or numerical value).
• Decision trees are particularly useful for their simplicity,
interpretability, and ease of visualization.
CSIBER
AN AUTONOMOUS INSTITUTE
Decision Tree
• The process of building a decision tree involves recursively splitting the dataset into
subsets based on the most informative features at each node.
• The algorithm selects the feature that provides the best separation of the data,
typically using metrics like Gini impurity for classification tasks or mean squared
error for regression tasks.
• This process continues until a stopping criterion is met, such as a predefined depth
of the tree or when further splitting does not significantly improve the model.
CSIBER
AN AUTONOMOUS INSTITUTE
Pros and Cons of Decision Tree
Pros: Decision trees requires little data preprocessing (no need for normalization or scaling)
and being able to handle both numerical and categorical data.
Decision trees are widely used in various applications due to their simplicity, interpretability,
and effectiveness in capturing complex decision boundaries in the data.
Cons: Decision trees are prone to overfitting, especially when the tree becomes too deep.
Techniques like pruning or using ensemble methods like Random Forests can help mitigate
overfitting and improve generalization performance.
CSIBER
AN AUTONOMOUS INSTITUTE
Terminologies
In decision trees, several key terminologies and concepts are used to describe
different elements and aspects of the tree-building process.
Root Node: The top node of the decision tree, from which the tree starts branching.
Decision Nodes (Internal Nodes): Nodes in the tree where decisions are made based
on the values of features. These nodes have branches leading to child nodes.
CSIBER
AN AUTONOMOUS INSTITUTE
Terminologies
Leaf Nodes (Terminal Nodes) :End nodes of the tree where the final
decision or prediction is made. They do not have any child nodes.
Branches: The edges connecting nodes in the tree. Each branch
corresponds to a possible outcome of a decision.
Splitting:The process of dividing a node into two or more child nodes
based on a certain feature and a threshold or condition.
CSIBER
AN AUTONOMOUS INSTITUTE
Terminologies
Decision Rule: The condition at each decision node that determines the direction
of the branch. It is based on a feature and a threshold.
Entropy: A measure of impurity or disorder in a set of data. Decision trees use
entropy to find the best feature to split the data.
Information Gain: The reduction in entropy or impurity achieved by splitting the
data based on a particular feature. Decision trees aim to maximize information
gain when choosing features for splitting.
CSIBER
AN AUTONOMOUS INSTITUTE
Terminologies
Gini Impurity: Another measure of impurity used in decision trees. It represents
the probability of incorrectly classifying a randomly chosen element.
Pruning: The process of removing branches from the tree to reduce overfitting.
Pruning helps improve the generalization ability of the model.
Depth of the Tree: The length of the longest path from the root node to a leaf
node. It indicates how many decisions or splits are made before reaching a
prediction.
CSIBER
AN AUTONOMOUS INSTITUTE
Terminologies
CART (Classification and Regression Trees)
• An algorithm used for constructing decision trees that can be applied to both classification and
regression tasks.
Random Forest
• An ensemble learning method that builds multiple decision trees and combines their predictions to
improve accuracy and reduce overfitting.
• These terminologies provide a foundation for understanding and discussing decision trees and their
construction. Different algorithms and implementations may use variations of these terms, but the
fundamental concepts remain consistent across the field of machine learning.
CSIBER
AN AUTONOMOUS INSTITUTE
Example of a Decision Tree
categorical
categorical
continuous
class
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Splitting Attributes
Training Data Model: Decision Tree
CSIBER
AN AUTONOMOUS INSTITUTE
categorical
categorical
continuous
class
MarSt
Refund
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
There could be more than one tree that
fits the same data!
Example of a Decision Tree
CSIBER
AN AUTONOMOUS INSTITUTE
Decision Tree Classification Task
Decision
Tree
CSIBER
AN AUTONOMOUS INSTITUTE
Decision Tree Classification Task
Decision
Tree
CSIBER
AN AUTONOMOUS INSTITUTE
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Test Data
Start from the root of tree.
CSIBER
AN AUTONOMOUS INSTITUTE
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Test Data
CSIBER
AN AUTONOMOUS INSTITUTE
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Test Data
CSIBER
AN AUTONOMOUS INSTITUTE
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Test Data
CSIBER
AN AUTONOMOUS INSTITUTE
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Test Data
CSIBER
AN AUTONOMOUS INSTITUTE
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Test Data
Assign Cheat to “No”
CSIBER
AN AUTONOMOUS INSTITUTE
21
Decision Tree Classification
A decision tree is created in two phases:
■ Tree Building Phase
Repeatedly partition the training data until all the examples in each partition
belong to one class or the partition is sufficiently small
■ Tree Pruning Phase
Remove dependency on statistical noise or variation that may be particular
only to the training set
CSIBER
AN AUTONOMOUS INSTITUTE
22
Tree Building Phase
General tree-growth algorithm (binary tree)
Partition(Data S)
If (all points in S are of the same class) then return;
for each attribute A do
evaluate splits on attribute A;
Use best split to partition S into S1 and S2;
Partition(S1);
Partition(S2);
CSIBER
AN AUTONOMOUS INSTITUTE
23
Tree Building Phase (cont.)
• The form of the split depends on the type of the attribute
• Splits for numeric attributes are of the form A ≤ v, where v is a real number
• Splits for categorical attributes are of the form A S’, where S’ is a subset of all
∈
possible values of A
CSIBER
AN AUTONOMOUS INSTITUTE
Why decision tree
• Decision trees are powerful and popular tools for classification and prediction.
• Decision trees represent rules, which can be understood by humans and used in
knowledge system such as database.
CSIBER
AN AUTONOMOUS INSTITUTE
key requirements
Attribute-value description: object or case must be expressible in terms of a fixed collection
of properties or attributes (e.g., hot, mild, cold).
Predefined classes (target values): the target function has discrete output values (boolean
or multiclass)
Sufficient data: enough training cases should be provided to learn the model.
CSIBER
AN AUTONOMOUS INSTITUTE
Principled Criterion
Selection of an attribute to test at each node - choosing the most useful attribute for
classifying examples.
■ Gini impurity
■ measure of impurity or disorder in a set of data used in decision tree algorithms,
particularly for classification problems.
■ Entropy
■ A measure of homogeneity of the set of examples.
Information gain
■ measures how well a given attribute separates the training examples according to their
target classification
■ This measure is used to select among the candidate attributes at each step while
growing the tree
CSIBER
AN AUTONOMOUS INSTITUTE
Calculation of Gini Impurity
Gini impurity is a measure of impurity or disorder in a set of data used in decision tree algorithms,
particularly for classification problems.
• Which indicates the likelihood of new, random data being misclassified if it were given a random class
label according to the class distribution in the dataset.
• The Gini impurity for a node is calculated based on the distribution of class labels in that node.
The formula for Gini impurity is as follows:
where:
• c is the number of classes.
• pi is the probability of the occurrence of class i in the node.
The Gini impurity ranges from 0 to 0.5, where 0 indicates perfect purity (all instances belong to a single
class), and 0.5 indicates maximum impurity (an equal distribution of instances across all classes).
CSIBER
AN AUTONOMOUS INSTITUTE
Calculation of Gini Impurity
Here's a step-by-step example of calculating Gini impurity for a hypothetical dataset with
two classes, "Yes" and "No“.
Dataset
Calculation
CSIBER
AN AUTONOMOUS INSTITUTE
Calculation of Gini Impurity
• The node with uniform class distribution has the highest impurity.
• The minimum impurity is obtained when all records belong to the same class.
Several examples are given in the following table to demonstrate the Gini Impurity computation.
CSIBER
AN AUTONOMOUS INSTITUTE
Calculation of Gini Impurity
• Gini Impurity is a measurement used to build Decision Trees to determine how the
features of a dataset should split nodes to form the tree.
• In the context of decision trees, when selecting a feature for splitting, the one that
results in the lowest Gini impurity after the split is chosen, as it indicates a more
"pure" separation of the classes.
CSIBER
AN AUTONOMOUS INSTITUTE
Example : Gini Impurity
• For example, say you want to build a classifier that determines if someone will default on
their credit card. You have some labeled data with features, such as bins for age, income,
credit rating, and whether or not each person is a student.
• To find the best feature for the first split of the tree – the root node – you could calculate
how poorly each feature divided the data into the correct class, default ("yes") or didn't
default ("no").
• This calculation would measure the impurity of the split, and the feature with the lowest
impurity would determine the best feature for splitting the current node.
• This process would continue for each subsequent node using the remaining features.
CSIBER
AN AUTONOMOUS INSTITUTE
Example : Gini Impurity
CSIBER
AN AUTONOMOUS INSTITUTE
• An attribute with the smallest Gini Impurity is selected for splitting the node.
• If a data set D is split on an attribute into two subsets D1 and D2 with sizes and n1 and n2,
respectively, the Gini Impurity can be defined as:
• When training a decision tree, the attribute that provides the smallest is chosen to split the
node.
• In order to obtain information gain for an attribute, the weighted impurities of the branches is
subtracted from the original impurity.
• The best split can also be chosen by maximizing the Gini gain.
• Gini gain is calculated as follows:
CSIBER
AN AUTONOMOUS INSTITUTE
Gini Impurity and Gini Index
The Gini Index is essentially the same, expressed in percentage terms. It is
calculated as the Gini Impurity multiplied by 100
Gini Index = Gini Impurity ×100
Both Gini Impurity and Gini Index are widely used as criteria for evaluating
the quality of splits in decision trees, and the choice between them often
depends on the convention used in specific implementations or literature.
CSIBER
AN AUTONOMOUS INSTITUTE
Entropy
A measure of homogeneity of the set of examples.
Given a set S of positive and negative examples of some target concept (a 2-class
problem), the entropy of set S relative to this binary classification is
E(S) = - p(P)log2 p(P) – p(N)log2 p(N)
CSIBER
AN AUTONOMOUS INSTITUTE
Some Intuitions
The entropy is 0 if the outcome is ``certain’’.
The entropy is maximum if we have no knowledge of the system (or any outcome
is equally possible).
Entropy of a 2-class problem
with regard to the portion of
one of the two groups
CSIBER
AN AUTONOMOUS INSTITUTE
It is simply the expected reduction in entropy caused by partitioning the examples according
to this attribute.
It is the number of bits saved when encoding the target value of an arbitrary member of S,
by knowing the value of attribute A.
• The range of entropy is from 0 to 1, whereas the range of Gini Impurity is from 0 to 0.5
CSIBER
AN AUTONOMOUS INSTITUTE
Calculation of Entropy
Entropy is a measure of impurity or disorder in a set of data. In the context of decision
trees, entropy is used to determine the homogeneity of a node based on the distribution of
class labels. The formula for entropy (H) is given by:
The entropy is calculated for each node in a decision tree, and the goal is to minimize
entropy by choosing the best features for splitting the data.
CSIBER
AN AUTONOMOUS INSTITUTE
Example:
Here's a step-by-step example of calculating entropy for a hypothetical dataset with two classes, "Yes"
and "No“.
Dataset
Calculation:
Gini Index – 0.48 Entropy – 0.29
CSIBER
AN AUTONOMOUS INSTITUTE
Example:
If all instances in a dataset belong to the same class (either all "Yes" or all "No"), then the
dataset is perfectly pure, and there is no uncertainty or disorder in terms of class labels. In
such a case, the entropy is 0, indicating complete homogeneity.
where
c is the number of classes, and pi is the probability of class i in set S.
CSIBER
AN AUTONOMOUS INSTITUTE
Example:
If all instances are of the same class, say "Yes," then =1 and =0. Substituting these values into the
entropy formula:
H(S)=0
Note that (0) is undefined, so we typically define 0 . (0) as 0 for the purpose of the calculation.
Therefore:
Similarly, if all instances are of the class "No," the entropy will also be 0.
If a dataset contains only one class, the entropy is 0, indicating perfect homogeneity and no uncertainty.
This is the ideal scenario for a node in a decision tree because it implies a clear and pure separation of
classes.
CSIBER
AN AUTONOMOUS INSTITUTE
Entropy measure: let us get a feeling
As the data become purer and purer, the entropy value becomes smaller and smaller.
CSIBER
AN AUTONOMOUS INSTITUTE
Difference Between Gini Impurity and
Entropy
Both Gini impurity and entropy are measures of impurity or disorder used in decision tree
algorithms to determine the best split at each node. However, there are some differences
in how they calculate impurity.
Formula
Gini Impurity: It is calculated using the formula: Gini(t) = 1 - Σ (pi)^2, where pi is the
proportion of instances of class i in node t.
Entropy: It is calculated using the formula: Entropy(t) = -Σ pi * log2(pi), where pi is the
proportion of instances of class i in node t.
CSIBER
AN AUTONOMOUS INSTITUTE
Difference Between Gini Impurity and
Entropy
Scale
Gini Impurity: Gini impurity values range from 0 to 0.5
• where 0 represents perfect purity (all instances belong to a single class), and 1 represents maximum impurity
(instances are evenly distributed across all classes).
Entropy: Entropy values range from 0 to log2(number of classes) (0 to 1)
• where 0 represents perfect purity, and log2(number of classes) represents maximum impurity.
Decision Rule
Gini Impurity: Tends to favor larger partitions with dominant classes. It is less sensitive to imbalances in class
distribution.
Entropy: Can be sensitive to imbalances in class distribution. It may result in slightly more balanced trees compared to
Gini impurity.
CSIBER
AN AUTONOMOUS INSTITUTE
Difference Between Gini Impurity and
Entropy
Computation
Gini Impurity: Computationally less expensive compared to entropy, as it doesn't involve logarithmic
calculations.
Entropy: Requires logarithmic calculations, which can be relatively more computationally expensive.
In practice, both Gini impurity and entropy are commonly used, and the choice between them often
depends on the specific characteristics of the dataset and the desired properties of the resulting decision
tree.
Scikit-learn, a popular machine learning library in Python, allows users to choose between Gini impurity
and entropy as the criterion for splitting in decision tree classifiers.
CSIBER
AN AUTONOMOUS INSTITUTE
Information Gain
Information gain measures the expected reduction in entropy, or uncertainty.
■ Values(A) is the set of all possible values for attribute A, and Sv the subset of S for which
attribute A has value v Sv = {s in S | A(s) = v}.
■ the first term in the equation for Gain is just the entropy of the original collection S
■ the second term is the expected value of the entropy after S is partitioned using attribute A
CSIBER
AN AUTONOMOUS INSTITUTE
Relationship Between Entropy and
Information Gain
Here's how entropy and information gain are related:
Entropy (H)
Entropy is a measure of impurity or disorder in a set of data. It quantifies the
uncertainty associated with the distribution of class labels in a node.
Higher entropy indicates higher disorder and uncertainty.
CSIBER
AN AUTONOMOUS INSTITUTE
Relationship Between Entropy and
Information Gain
Information Gain (IG)
Information gain measures the reduction in entropy (uncertainty) achieved by splitting the
data based on a particular feature.
The feature with the highest information gain is chosen as the decision feature for the
split, as it provides the most significant reduction in uncertainty.
CSIBER
AN AUTONOMOUS INSTITUTE
Relationship Between Entropy and
Information Gain
• There is a strong relationship between entropy and information gain in the
context of decision trees. Information gain is a concept used to measure the
effectiveness of a feature in reducing the uncertainty or entropy in a dataset. The
higher the information gain, the more effective the feature is for making decisions
in a decision tree.
CSIBER
AN AUTONOMOUS INSTITUTE
Relationship Between Entropy and
Information Gain
• The decision tree algorithm seeks to maximize information gain when
selecting features for splitting.
• It chooses the feature that results in the highest reduction of entropy in
the dataset, leading to more homogenous and well-separated subsets.
• This process is repeated recursively, building the decision tree by selecting
features that maximize information gain at each decision node.
CSIBER
AN AUTONOMOUS INSTITUTE
A simple example
You want to guess the outcome of next week's game between the MallRats and the
Chinooks.
Available knowledge / Attribute
■ was the game at Home or Away
■ was the starting time 5pm, 7pm or 9pm.
■ Did Joe play center, or forward.
■ whether that opponent's center was tall or not.
■ …..
CSIBER
AN AUTONOMOUS INSTITUTE
Basket ball data
CSIBER
AN AUTONOMOUS INSTITUTE
What we know
The game will be away, at 9pm, and that Joe will play center on offense…
A classification problem
Generalizing the learned rule to new examples
CSIBER
AN AUTONOMOUS INSTITUTE
● Decision tree is a classifier in the form of a tree structure
– Decision node: specifies a test on a single attribute
– Leaf node: indicates the value of the target attribute
– Arc/edge: split of one attribute
– Path: a disjunction of test to make the final decision
● Decision trees classify instances or examples by starting at the root of the tree
and moving through it until a leaf node.
Definition
CSIBER
AN AUTONOMOUS INSTITUTE
Illustration
(2) Which node to
proceed?
(3) When to stop/ come to conclusion?
(1) Which to start? (root)
CSIBER
AN AUTONOMOUS INSTITUTE
Random split
The tree can grow huge
These trees are hard to understand.
Larger trees are typically less accurate than smaller trees.
CSIBER
AN AUTONOMOUS INSTITUTE
Example
Before partitioning, the entropy is
■ H(10/20, 10/20) = - 10/20 log(10/20) - 10/20 log(10/20) = 1
Using the ``where’’ attribute, divide into 2 subsets
■ Entropy of the first set H(home) = - 6/12 log(6/12) - 6/12 log(6/12) = 1
■ Entropy of the second set H(away) = - 4/8 log(6/8) - 4/8 log(4/8) = 1
Expected entropy after partitioning
■ 12/20 * H(home) + 8/20 * H(away) = 1
■ Information gain is 1-1 =0
CSIBER
AN AUTONOMOUS INSTITUTE
Using the ``when’’ attribute, divide into 3 subsets
■ Entropy of the first set H(5pm) = - 1/4 log(1/4) - 3/4 log(3/4);
■ Entropy of the second set H(7pm) = - 9/12 log(9/12) - 3/12 log(3/12);
■ Entropy of the second set H(9pm) = - 0/4 log(0/4) - 4/4 log(4/4) = 0
Expected entropy after partitioning
■ 4/20 * H(1/4, 3/4) + 12/20 * H(9/12, 3/12) + 4/20 * H(0/4, 4/4) = 0.65
Information gain 1-0.65 = 0.35
Example
CSIBER
AN AUTONOMOUS INSTITUTE
Decision
Knowing the ``when’’ attribute values provides larger information gain than ``where’’.
Therefore the ``when’’ attribute should be chosen for testing prior to the ``where’’
attribute.
Similarly, we can compute the information gain for other attributes.
At each node, choose the attribute with the largest information gain.
CSIBER
AN AUTONOMOUS INSTITUTE
Iris Data Set
• The Iris dataset is a well-known dataset in the field of machine learning and statistics. It was introduced by the
British biologist and statistician Sir Ronald A. Fisher in 1936. The dataset is commonly used for practicing and
demonstrating various classification algorithms, particularly in the context of supervised learning.
• The Iris dataset consists of measurements of four features (attributes) of iris flowers from three different
species: setosa, versicolor, and virginica. The four features are:
• Sepal Length: Length of the iris flower's sepal (the outermost whorl of a flower).
• Sepal Width: Width of the iris flower's sepal.
• Petal Length: Length of the iris flower's petal (the innermost whorl of a flower).
• Petal Width: Width of the iris flower's petal.
CSIBER
AN AUTONOMOUS INSTITUTE
Iris Data Set
• Each sample in the dataset represents an individual iris flower, and the corresponding
label indicates the species to which the flower belongs (setosa, versicolor, or virginica).
• The Iris dataset is commonly used as a beginner's dataset for learning and practicing
classification algorithms due to its simplicity and the clear separation of the three species
based on the provided features.
• The Iris dataset is available in many machine learning libraries, including scikit-learn,
making it easily accessible for experimentation and learning purposes.
CSIBER
AN AUTONOMOUS INSTITUTE
Iris Data Set
CSIBER
AN AUTONOMOUS INSTITUTE
Iris Data Set
Setosa (Iris setosa)
• Setosa is one of the three species in the Iris dataset.
• It is characterized by its distinctive features, including smaller size, a unique petal shape, and color patterns.
• Setosa flowers typically have a shorter sepal length compared to the other two species.
Versicolor (Iris versicolor)
• Versicolor is another species in the Iris dataset.
• It is known for its versatility in terms of petal and sepal characteristics.
• Versicolor flowers have medium-sized petals and sepals.
Virginica (Iris virginica)
• Virginica is the third species included in the Iris dataset.
• It is characterized by larger petal and sepal sizes compared to Setosa and Versicolor.
• Virginica flowers often exhibit different color patterns.
CSIBER
AN AUTONOMOUS INSTITUTE
Performance of Classification
Algorithms
• Precision, recall, and F1-score are metrics commonly used to evaluate the
performance of classification algorithms, particularly in the context of binary or
multiclass classification problems.
• These metrics provide insights into how well a model is performing in terms of
correctly classifying instances from different classes.
• Let's break down each metric. These metrics are commonly reported together to
provide a comprehensive assessment of a model's performance, especially in
situations where achieving a balance between false positives and false negatives is
crucial.
CSIBER
AN AUTONOMOUS INSTITUTE
Performance of Classification
Algorithms
• Precision: "Of all instances predicted as positive, how many are actually
positive?"
• Recall: "Of all actual positive instances, how many were correctly
predicted?"
• F1-Score: "A balance between precision and recall. It is the harmonic mean
of the two."
• Support: "The number of actual occurrences of the class in the dataset."
CSIBER
AN AUTONOMOUS INSTITUTE
Performance of Classification
Algorithms
Let's use a hypothetical binary classification scenario to illustrate precision, recall, and F1-score. Consider
the following confusion matrix for a binary classification problem:
In this confusion matrix:
True Positives (TP) = 85
True Negatives (TN) = 90
False Positives (FP) = 10
False Negatives (FN) = 15
CSIBER
AN AUTONOMOUS INSTITUTE
Performance of Classification
Algorithms
Let's create a simple example with two columns, "Actual Class" and "Predicted Class," each
containing ten rows. We will use binary classification, where the classes are either 0 or 1.
CSIBER
AN AUTONOMOUS INSTITUTE
Performance of Classification
Algorithms
In this example:
True Positives (TP): Rows where both "Actual Class" and "Predicted Class" are 1. (e.g., row 2, row
5, row 8, row 9)
True Negatives (TN): Rows where both "Actual Class" and "Predicted Class" are 0. (e.g., row 6,
row 10)
False Positives (FP): Rows where "Actual Class" is 0, but "Predicted Class" is 1. (e.g., row 1, row 4)
False Negatives (FN): Rows where "Actual Class" is 1, but "Predicted Class" is 0. (e.g., row 3, row
7)
CSIBER
AN AUTONOMOUS INSTITUTE
Performance of Classification
Algorithms
Now, let's calculate Precision, Recall, and F1-Score:
• Precision = TP / (TP + FP) = 4 / (4 + 2) = 0.67
• Recall = TP / (TP + FN) = 4 / (4 + 2) = 0.67
• F1-Score = 2 * (Precision * Recall) / (Precision + Recall) = 2 * (0.67 * 0.67) / (0.67 + 0.67) = 0.67
In this example, Precision and Recall are both 0.67, indicating that out of the instances
predicted as positive, 67% are actually positive, and out of all actual positive instances,
67% were correctly predicted. The F1-Score provides a balanced measure, and in this case,
it is also 0.67.
CSIBER
AN AUTONOMOUS INSTITUTE
Performance of Classification
Algorithms
Now, let's calculate precision, recall, and F1-score using the formulas mentioned earlier:
So, in this example:
• Precision is approximately 0.8947.
• Recall is 0.85.
• F1-Score is approximately 0.8714.
CSIBER
AN AUTONOMOUS INSTITUTE
Precision
• Precision is the ratio of correctly predicted positive observations to the total
predicted positives.
• It is calculated as the number of true positives divided by the sum of true positives
and false positives.
• Precision is a measure of how many of the predicted positive instances are actually
positive.
• The formula for precision is: Precision = TP / (TP + FP)
CSIBER
AN AUTONOMOUS INSTITUTE
Recall (Sensitivity or True Positive Rate)
• Recall is the ratio of correctly predicted positive observations to the total actual
positives.
• It is calculated as the number of true positives divided by the sum of true positives
and false negatives.
• Recall measures the ability of the model to capture all the actual positive instances.
• The formula for recall is: Recall = TP / (TP + FN)
• The formula for precision is: Precision = TP / (TP + FP)
CSIBER
AN AUTONOMOUS INSTITUTE
F1-Score
• F1-score is the harmonic mean of precision and recall. It provides a balance
between precision and recall.
• It is calculated as 2 * (Precision * Recall) / (Precision + Recall).
• F1-score reaches its best value at 1 (perfect precision and recall) and worst at 0.
• F1-score is particularly useful when the class distribution is imbalanced.
CSIBER
AN AUTONOMOUS INSTITUTE
Support
• Support is the number of actual occurrences of the class in the specified dataset.
• It is the number of instances belonging to each class.
CSIBER
AN AUTONOMOUS INSTITUTE
Practice Problem
Scenario: Email Spam Classification
• Suppose you have a dataset of emails labeled as either spam or ham
(Non-spam), and you want to build a classification model to identify spam
emails. The initial performance of the model may look like this:
• What are total rows in the data set?
CSIBER
AN AUTONOMOUS INSTITUTE
Practice Problem
In this confusion matrix:
True Positives (TP): 80
True Negatives (TN): 950
False Positives (FP): 50
False Negatives (FN): 20
CSIBER
AN AUTONOMOUS INSTITUTE
Practice Problem
The total number of rows in the dataset (total instances or observations) can be calculated by adding
up all these values:
Total Rows =TP + TN + FP + FN
Using the values from the confusion matrix:
Total Rows = 80 + 950 + 50 + 20 =1100
So, there are a total of 1100 rows or instances in the dataset based on the information provided in the
confusion matrix.
Each row corresponds to a specific instance (email in this case) that the model has made predictions
on, and the confusion matrix summarizes how those predictions align with the true classes.
CSIBER
AN AUTONOMOUS INSTITUTE
Practice Problem

decision tree machine learning model for classification

  • 1.
  • 2.
    CSIBER AN AUTONOMOUS INSTITUTEClassification: Decision Tree • A decision tree is a popular machine learning algorithm used for both classification and regression tasks. • It's a tree-like structure where each internal node represents a decision based on a feature, each branch represents an outcome of that decision, and each leaf node represents the final output (class label or numerical value). • Decision trees are particularly useful for their simplicity, interpretability, and ease of visualization.
  • 3.
    CSIBER AN AUTONOMOUS INSTITUTEClassification: Decision Tree • A decision tree is a popular machine learning algorithm used for both classification and regression tasks. • It's a tree-like structure where each internal node represents a decision based on a feature, each branch represents an outcome of that decision, and each leaf node represents the final output (class label or numerical value). • Decision trees are particularly useful for their simplicity, interpretability, and ease of visualization.
  • 4.
    CSIBER AN AUTONOMOUS INSTITUTE DecisionTree • The process of building a decision tree involves recursively splitting the dataset into subsets based on the most informative features at each node. • The algorithm selects the feature that provides the best separation of the data, typically using metrics like Gini impurity for classification tasks or mean squared error for regression tasks. • This process continues until a stopping criterion is met, such as a predefined depth of the tree or when further splitting does not significantly improve the model.
  • 5.
    CSIBER AN AUTONOMOUS INSTITUTE Prosand Cons of Decision Tree Pros: Decision trees requires little data preprocessing (no need for normalization or scaling) and being able to handle both numerical and categorical data. Decision trees are widely used in various applications due to their simplicity, interpretability, and effectiveness in capturing complex decision boundaries in the data. Cons: Decision trees are prone to overfitting, especially when the tree becomes too deep. Techniques like pruning or using ensemble methods like Random Forests can help mitigate overfitting and improve generalization performance.
  • 6.
    CSIBER AN AUTONOMOUS INSTITUTE Terminologies Indecision trees, several key terminologies and concepts are used to describe different elements and aspects of the tree-building process. Root Node: The top node of the decision tree, from which the tree starts branching. Decision Nodes (Internal Nodes): Nodes in the tree where decisions are made based on the values of features. These nodes have branches leading to child nodes.
  • 7.
    CSIBER AN AUTONOMOUS INSTITUTE Terminologies LeafNodes (Terminal Nodes) :End nodes of the tree where the final decision or prediction is made. They do not have any child nodes. Branches: The edges connecting nodes in the tree. Each branch corresponds to a possible outcome of a decision. Splitting:The process of dividing a node into two or more child nodes based on a certain feature and a threshold or condition.
  • 8.
    CSIBER AN AUTONOMOUS INSTITUTE Terminologies DecisionRule: The condition at each decision node that determines the direction of the branch. It is based on a feature and a threshold. Entropy: A measure of impurity or disorder in a set of data. Decision trees use entropy to find the best feature to split the data. Information Gain: The reduction in entropy or impurity achieved by splitting the data based on a particular feature. Decision trees aim to maximize information gain when choosing features for splitting.
  • 9.
    CSIBER AN AUTONOMOUS INSTITUTE Terminologies GiniImpurity: Another measure of impurity used in decision trees. It represents the probability of incorrectly classifying a randomly chosen element. Pruning: The process of removing branches from the tree to reduce overfitting. Pruning helps improve the generalization ability of the model. Depth of the Tree: The length of the longest path from the root node to a leaf node. It indicates how many decisions or splits are made before reaching a prediction.
  • 10.
    CSIBER AN AUTONOMOUS INSTITUTE Terminologies CART(Classification and Regression Trees) • An algorithm used for constructing decision trees that can be applied to both classification and regression tasks. Random Forest • An ensemble learning method that builds multiple decision trees and combines their predictions to improve accuracy and reduce overfitting. • These terminologies provide a foundation for understanding and discussing decision trees and their construction. Different algorithms and implementations may use variations of these terms, but the fundamental concepts remain consistent across the field of machine learning.
  • 11.
    CSIBER AN AUTONOMOUS INSTITUTE Exampleof a Decision Tree categorical categorical continuous class Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Splitting Attributes Training Data Model: Decision Tree
  • 12.
    CSIBER AN AUTONOMOUS INSTITUTE categorical categorical continuous class MarSt Refund TaxInc YES NO NO NO YesNo Married Single, Divorced < 80K > 80K There could be more than one tree that fits the same data! Example of a Decision Tree
  • 13.
    CSIBER AN AUTONOMOUS INSTITUTE DecisionTree Classification Task Decision Tree
  • 14.
    CSIBER AN AUTONOMOUS INSTITUTE DecisionTree Classification Task Decision Tree
  • 15.
    CSIBER AN AUTONOMOUS INSTITUTE ApplyModel to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data Start from the root of tree.
  • 16.
    CSIBER AN AUTONOMOUS INSTITUTE ApplyModel to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data
  • 17.
    CSIBER AN AUTONOMOUS INSTITUTE ApplyModel to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data
  • 18.
    CSIBER AN AUTONOMOUS INSTITUTE ApplyModel to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data
  • 19.
    CSIBER AN AUTONOMOUS INSTITUTE ApplyModel to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data
  • 20.
    CSIBER AN AUTONOMOUS INSTITUTE ApplyModel to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data Assign Cheat to “No”
  • 21.
    CSIBER AN AUTONOMOUS INSTITUTE 21 DecisionTree Classification A decision tree is created in two phases: ■ Tree Building Phase Repeatedly partition the training data until all the examples in each partition belong to one class or the partition is sufficiently small ■ Tree Pruning Phase Remove dependency on statistical noise or variation that may be particular only to the training set
  • 22.
    CSIBER AN AUTONOMOUS INSTITUTE 22 TreeBuilding Phase General tree-growth algorithm (binary tree) Partition(Data S) If (all points in S are of the same class) then return; for each attribute A do evaluate splits on attribute A; Use best split to partition S into S1 and S2; Partition(S1); Partition(S2);
  • 23.
    CSIBER AN AUTONOMOUS INSTITUTE 23 TreeBuilding Phase (cont.) • The form of the split depends on the type of the attribute • Splits for numeric attributes are of the form A ≤ v, where v is a real number • Splits for categorical attributes are of the form A S’, where S’ is a subset of all ∈ possible values of A
  • 24.
    CSIBER AN AUTONOMOUS INSTITUTE Whydecision tree • Decision trees are powerful and popular tools for classification and prediction. • Decision trees represent rules, which can be understood by humans and used in knowledge system such as database.
  • 25.
    CSIBER AN AUTONOMOUS INSTITUTE keyrequirements Attribute-value description: object or case must be expressible in terms of a fixed collection of properties or attributes (e.g., hot, mild, cold). Predefined classes (target values): the target function has discrete output values (boolean or multiclass) Sufficient data: enough training cases should be provided to learn the model.
  • 26.
    CSIBER AN AUTONOMOUS INSTITUTE PrincipledCriterion Selection of an attribute to test at each node - choosing the most useful attribute for classifying examples. ■ Gini impurity ■ measure of impurity or disorder in a set of data used in decision tree algorithms, particularly for classification problems. ■ Entropy ■ A measure of homogeneity of the set of examples. Information gain ■ measures how well a given attribute separates the training examples according to their target classification ■ This measure is used to select among the candidate attributes at each step while growing the tree
  • 27.
    CSIBER AN AUTONOMOUS INSTITUTE Calculationof Gini Impurity Gini impurity is a measure of impurity or disorder in a set of data used in decision tree algorithms, particularly for classification problems. • Which indicates the likelihood of new, random data being misclassified if it were given a random class label according to the class distribution in the dataset. • The Gini impurity for a node is calculated based on the distribution of class labels in that node. The formula for Gini impurity is as follows: where: • c is the number of classes. • pi is the probability of the occurrence of class i in the node. The Gini impurity ranges from 0 to 0.5, where 0 indicates perfect purity (all instances belong to a single class), and 0.5 indicates maximum impurity (an equal distribution of instances across all classes).
  • 28.
    CSIBER AN AUTONOMOUS INSTITUTE Calculationof Gini Impurity Here's a step-by-step example of calculating Gini impurity for a hypothetical dataset with two classes, "Yes" and "No“. Dataset Calculation
  • 29.
    CSIBER AN AUTONOMOUS INSTITUTE Calculationof Gini Impurity • The node with uniform class distribution has the highest impurity. • The minimum impurity is obtained when all records belong to the same class. Several examples are given in the following table to demonstrate the Gini Impurity computation.
  • 30.
    CSIBER AN AUTONOMOUS INSTITUTE Calculationof Gini Impurity • Gini Impurity is a measurement used to build Decision Trees to determine how the features of a dataset should split nodes to form the tree. • In the context of decision trees, when selecting a feature for splitting, the one that results in the lowest Gini impurity after the split is chosen, as it indicates a more "pure" separation of the classes.
  • 31.
    CSIBER AN AUTONOMOUS INSTITUTE Example: Gini Impurity • For example, say you want to build a classifier that determines if someone will default on their credit card. You have some labeled data with features, such as bins for age, income, credit rating, and whether or not each person is a student. • To find the best feature for the first split of the tree – the root node – you could calculate how poorly each feature divided the data into the correct class, default ("yes") or didn't default ("no"). • This calculation would measure the impurity of the split, and the feature with the lowest impurity would determine the best feature for splitting the current node. • This process would continue for each subsequent node using the remaining features.
  • 32.
  • 33.
    CSIBER AN AUTONOMOUS INSTITUTE •An attribute with the smallest Gini Impurity is selected for splitting the node. • If a data set D is split on an attribute into two subsets D1 and D2 with sizes and n1 and n2, respectively, the Gini Impurity can be defined as: • When training a decision tree, the attribute that provides the smallest is chosen to split the node. • In order to obtain information gain for an attribute, the weighted impurities of the branches is subtracted from the original impurity. • The best split can also be chosen by maximizing the Gini gain. • Gini gain is calculated as follows:
  • 34.
    CSIBER AN AUTONOMOUS INSTITUTE GiniImpurity and Gini Index The Gini Index is essentially the same, expressed in percentage terms. It is calculated as the Gini Impurity multiplied by 100 Gini Index = Gini Impurity ×100 Both Gini Impurity and Gini Index are widely used as criteria for evaluating the quality of splits in decision trees, and the choice between them often depends on the convention used in specific implementations or literature.
  • 35.
    CSIBER AN AUTONOMOUS INSTITUTE Entropy Ameasure of homogeneity of the set of examples. Given a set S of positive and negative examples of some target concept (a 2-class problem), the entropy of set S relative to this binary classification is E(S) = - p(P)log2 p(P) – p(N)log2 p(N)
  • 36.
    CSIBER AN AUTONOMOUS INSTITUTE SomeIntuitions The entropy is 0 if the outcome is ``certain’’. The entropy is maximum if we have no knowledge of the system (or any outcome is equally possible). Entropy of a 2-class problem with regard to the portion of one of the two groups
  • 37.
    CSIBER AN AUTONOMOUS INSTITUTE Itis simply the expected reduction in entropy caused by partitioning the examples according to this attribute. It is the number of bits saved when encoding the target value of an arbitrary member of S, by knowing the value of attribute A. • The range of entropy is from 0 to 1, whereas the range of Gini Impurity is from 0 to 0.5
  • 38.
    CSIBER AN AUTONOMOUS INSTITUTE Calculationof Entropy Entropy is a measure of impurity or disorder in a set of data. In the context of decision trees, entropy is used to determine the homogeneity of a node based on the distribution of class labels. The formula for entropy (H) is given by: The entropy is calculated for each node in a decision tree, and the goal is to minimize entropy by choosing the best features for splitting the data.
  • 39.
    CSIBER AN AUTONOMOUS INSTITUTE Example: Here'sa step-by-step example of calculating entropy for a hypothetical dataset with two classes, "Yes" and "No“. Dataset Calculation: Gini Index – 0.48 Entropy – 0.29
  • 40.
    CSIBER AN AUTONOMOUS INSTITUTE Example: Ifall instances in a dataset belong to the same class (either all "Yes" or all "No"), then the dataset is perfectly pure, and there is no uncertainty or disorder in terms of class labels. In such a case, the entropy is 0, indicating complete homogeneity. where c is the number of classes, and pi is the probability of class i in set S.
  • 41.
    CSIBER AN AUTONOMOUS INSTITUTE Example: Ifall instances are of the same class, say "Yes," then =1 and =0. Substituting these values into the entropy formula: H(S)=0 Note that (0) is undefined, so we typically define 0 . (0) as 0 for the purpose of the calculation. Therefore: Similarly, if all instances are of the class "No," the entropy will also be 0. If a dataset contains only one class, the entropy is 0, indicating perfect homogeneity and no uncertainty. This is the ideal scenario for a node in a decision tree because it implies a clear and pure separation of classes.
  • 42.
    CSIBER AN AUTONOMOUS INSTITUTE Entropymeasure: let us get a feeling As the data become purer and purer, the entropy value becomes smaller and smaller.
  • 43.
    CSIBER AN AUTONOMOUS INSTITUTE DifferenceBetween Gini Impurity and Entropy Both Gini impurity and entropy are measures of impurity or disorder used in decision tree algorithms to determine the best split at each node. However, there are some differences in how they calculate impurity. Formula Gini Impurity: It is calculated using the formula: Gini(t) = 1 - Σ (pi)^2, where pi is the proportion of instances of class i in node t. Entropy: It is calculated using the formula: Entropy(t) = -Σ pi * log2(pi), where pi is the proportion of instances of class i in node t.
  • 44.
    CSIBER AN AUTONOMOUS INSTITUTE DifferenceBetween Gini Impurity and Entropy Scale Gini Impurity: Gini impurity values range from 0 to 0.5 • where 0 represents perfect purity (all instances belong to a single class), and 1 represents maximum impurity (instances are evenly distributed across all classes). Entropy: Entropy values range from 0 to log2(number of classes) (0 to 1) • where 0 represents perfect purity, and log2(number of classes) represents maximum impurity. Decision Rule Gini Impurity: Tends to favor larger partitions with dominant classes. It is less sensitive to imbalances in class distribution. Entropy: Can be sensitive to imbalances in class distribution. It may result in slightly more balanced trees compared to Gini impurity.
  • 45.
    CSIBER AN AUTONOMOUS INSTITUTE DifferenceBetween Gini Impurity and Entropy Computation Gini Impurity: Computationally less expensive compared to entropy, as it doesn't involve logarithmic calculations. Entropy: Requires logarithmic calculations, which can be relatively more computationally expensive. In practice, both Gini impurity and entropy are commonly used, and the choice between them often depends on the specific characteristics of the dataset and the desired properties of the resulting decision tree. Scikit-learn, a popular machine learning library in Python, allows users to choose between Gini impurity and entropy as the criterion for splitting in decision tree classifiers.
  • 46.
    CSIBER AN AUTONOMOUS INSTITUTE InformationGain Information gain measures the expected reduction in entropy, or uncertainty. ■ Values(A) is the set of all possible values for attribute A, and Sv the subset of S for which attribute A has value v Sv = {s in S | A(s) = v}. ■ the first term in the equation for Gain is just the entropy of the original collection S ■ the second term is the expected value of the entropy after S is partitioned using attribute A
  • 47.
    CSIBER AN AUTONOMOUS INSTITUTE RelationshipBetween Entropy and Information Gain Here's how entropy and information gain are related: Entropy (H) Entropy is a measure of impurity or disorder in a set of data. It quantifies the uncertainty associated with the distribution of class labels in a node. Higher entropy indicates higher disorder and uncertainty.
  • 48.
    CSIBER AN AUTONOMOUS INSTITUTE RelationshipBetween Entropy and Information Gain Information Gain (IG) Information gain measures the reduction in entropy (uncertainty) achieved by splitting the data based on a particular feature. The feature with the highest information gain is chosen as the decision feature for the split, as it provides the most significant reduction in uncertainty.
  • 49.
    CSIBER AN AUTONOMOUS INSTITUTE RelationshipBetween Entropy and Information Gain • There is a strong relationship between entropy and information gain in the context of decision trees. Information gain is a concept used to measure the effectiveness of a feature in reducing the uncertainty or entropy in a dataset. The higher the information gain, the more effective the feature is for making decisions in a decision tree.
  • 50.
    CSIBER AN AUTONOMOUS INSTITUTE RelationshipBetween Entropy and Information Gain • The decision tree algorithm seeks to maximize information gain when selecting features for splitting. • It chooses the feature that results in the highest reduction of entropy in the dataset, leading to more homogenous and well-separated subsets. • This process is repeated recursively, building the decision tree by selecting features that maximize information gain at each decision node.
  • 51.
    CSIBER AN AUTONOMOUS INSTITUTE Asimple example You want to guess the outcome of next week's game between the MallRats and the Chinooks. Available knowledge / Attribute ■ was the game at Home or Away ■ was the starting time 5pm, 7pm or 9pm. ■ Did Joe play center, or forward. ■ whether that opponent's center was tall or not. ■ …..
  • 52.
  • 53.
    CSIBER AN AUTONOMOUS INSTITUTE Whatwe know The game will be away, at 9pm, and that Joe will play center on offense… A classification problem Generalizing the learned rule to new examples
  • 54.
    CSIBER AN AUTONOMOUS INSTITUTE ●Decision tree is a classifier in the form of a tree structure – Decision node: specifies a test on a single attribute – Leaf node: indicates the value of the target attribute – Arc/edge: split of one attribute – Path: a disjunction of test to make the final decision ● Decision trees classify instances or examples by starting at the root of the tree and moving through it until a leaf node. Definition
  • 55.
    CSIBER AN AUTONOMOUS INSTITUTE Illustration (2)Which node to proceed? (3) When to stop/ come to conclusion? (1) Which to start? (root)
  • 56.
    CSIBER AN AUTONOMOUS INSTITUTE Randomsplit The tree can grow huge These trees are hard to understand. Larger trees are typically less accurate than smaller trees.
  • 57.
    CSIBER AN AUTONOMOUS INSTITUTE Example Beforepartitioning, the entropy is ■ H(10/20, 10/20) = - 10/20 log(10/20) - 10/20 log(10/20) = 1 Using the ``where’’ attribute, divide into 2 subsets ■ Entropy of the first set H(home) = - 6/12 log(6/12) - 6/12 log(6/12) = 1 ■ Entropy of the second set H(away) = - 4/8 log(6/8) - 4/8 log(4/8) = 1 Expected entropy after partitioning ■ 12/20 * H(home) + 8/20 * H(away) = 1 ■ Information gain is 1-1 =0
  • 58.
    CSIBER AN AUTONOMOUS INSTITUTE Usingthe ``when’’ attribute, divide into 3 subsets ■ Entropy of the first set H(5pm) = - 1/4 log(1/4) - 3/4 log(3/4); ■ Entropy of the second set H(7pm) = - 9/12 log(9/12) - 3/12 log(3/12); ■ Entropy of the second set H(9pm) = - 0/4 log(0/4) - 4/4 log(4/4) = 0 Expected entropy after partitioning ■ 4/20 * H(1/4, 3/4) + 12/20 * H(9/12, 3/12) + 4/20 * H(0/4, 4/4) = 0.65 Information gain 1-0.65 = 0.35 Example
  • 59.
    CSIBER AN AUTONOMOUS INSTITUTE Decision Knowingthe ``when’’ attribute values provides larger information gain than ``where’’. Therefore the ``when’’ attribute should be chosen for testing prior to the ``where’’ attribute. Similarly, we can compute the information gain for other attributes. At each node, choose the attribute with the largest information gain.
  • 60.
    CSIBER AN AUTONOMOUS INSTITUTE IrisData Set • The Iris dataset is a well-known dataset in the field of machine learning and statistics. It was introduced by the British biologist and statistician Sir Ronald A. Fisher in 1936. The dataset is commonly used for practicing and demonstrating various classification algorithms, particularly in the context of supervised learning. • The Iris dataset consists of measurements of four features (attributes) of iris flowers from three different species: setosa, versicolor, and virginica. The four features are: • Sepal Length: Length of the iris flower's sepal (the outermost whorl of a flower). • Sepal Width: Width of the iris flower's sepal. • Petal Length: Length of the iris flower's petal (the innermost whorl of a flower). • Petal Width: Width of the iris flower's petal.
  • 61.
    CSIBER AN AUTONOMOUS INSTITUTE IrisData Set • Each sample in the dataset represents an individual iris flower, and the corresponding label indicates the species to which the flower belongs (setosa, versicolor, or virginica). • The Iris dataset is commonly used as a beginner's dataset for learning and practicing classification algorithms due to its simplicity and the clear separation of the three species based on the provided features. • The Iris dataset is available in many machine learning libraries, including scikit-learn, making it easily accessible for experimentation and learning purposes.
  • 62.
  • 63.
    CSIBER AN AUTONOMOUS INSTITUTE IrisData Set Setosa (Iris setosa) • Setosa is one of the three species in the Iris dataset. • It is characterized by its distinctive features, including smaller size, a unique petal shape, and color patterns. • Setosa flowers typically have a shorter sepal length compared to the other two species. Versicolor (Iris versicolor) • Versicolor is another species in the Iris dataset. • It is known for its versatility in terms of petal and sepal characteristics. • Versicolor flowers have medium-sized petals and sepals. Virginica (Iris virginica) • Virginica is the third species included in the Iris dataset. • It is characterized by larger petal and sepal sizes compared to Setosa and Versicolor. • Virginica flowers often exhibit different color patterns.
  • 64.
    CSIBER AN AUTONOMOUS INSTITUTE Performanceof Classification Algorithms • Precision, recall, and F1-score are metrics commonly used to evaluate the performance of classification algorithms, particularly in the context of binary or multiclass classification problems. • These metrics provide insights into how well a model is performing in terms of correctly classifying instances from different classes. • Let's break down each metric. These metrics are commonly reported together to provide a comprehensive assessment of a model's performance, especially in situations where achieving a balance between false positives and false negatives is crucial.
  • 65.
    CSIBER AN AUTONOMOUS INSTITUTE Performanceof Classification Algorithms • Precision: "Of all instances predicted as positive, how many are actually positive?" • Recall: "Of all actual positive instances, how many were correctly predicted?" • F1-Score: "A balance between precision and recall. It is the harmonic mean of the two." • Support: "The number of actual occurrences of the class in the dataset."
  • 66.
    CSIBER AN AUTONOMOUS INSTITUTE Performanceof Classification Algorithms Let's use a hypothetical binary classification scenario to illustrate precision, recall, and F1-score. Consider the following confusion matrix for a binary classification problem: In this confusion matrix: True Positives (TP) = 85 True Negatives (TN) = 90 False Positives (FP) = 10 False Negatives (FN) = 15
  • 67.
    CSIBER AN AUTONOMOUS INSTITUTE Performanceof Classification Algorithms Let's create a simple example with two columns, "Actual Class" and "Predicted Class," each containing ten rows. We will use binary classification, where the classes are either 0 or 1.
  • 68.
    CSIBER AN AUTONOMOUS INSTITUTE Performanceof Classification Algorithms In this example: True Positives (TP): Rows where both "Actual Class" and "Predicted Class" are 1. (e.g., row 2, row 5, row 8, row 9) True Negatives (TN): Rows where both "Actual Class" and "Predicted Class" are 0. (e.g., row 6, row 10) False Positives (FP): Rows where "Actual Class" is 0, but "Predicted Class" is 1. (e.g., row 1, row 4) False Negatives (FN): Rows where "Actual Class" is 1, but "Predicted Class" is 0. (e.g., row 3, row 7)
  • 69.
    CSIBER AN AUTONOMOUS INSTITUTE Performanceof Classification Algorithms Now, let's calculate Precision, Recall, and F1-Score: • Precision = TP / (TP + FP) = 4 / (4 + 2) = 0.67 • Recall = TP / (TP + FN) = 4 / (4 + 2) = 0.67 • F1-Score = 2 * (Precision * Recall) / (Precision + Recall) = 2 * (0.67 * 0.67) / (0.67 + 0.67) = 0.67 In this example, Precision and Recall are both 0.67, indicating that out of the instances predicted as positive, 67% are actually positive, and out of all actual positive instances, 67% were correctly predicted. The F1-Score provides a balanced measure, and in this case, it is also 0.67.
  • 70.
    CSIBER AN AUTONOMOUS INSTITUTE Performanceof Classification Algorithms Now, let's calculate precision, recall, and F1-score using the formulas mentioned earlier: So, in this example: • Precision is approximately 0.8947. • Recall is 0.85. • F1-Score is approximately 0.8714.
  • 71.
    CSIBER AN AUTONOMOUS INSTITUTE Precision •Precision is the ratio of correctly predicted positive observations to the total predicted positives. • It is calculated as the number of true positives divided by the sum of true positives and false positives. • Precision is a measure of how many of the predicted positive instances are actually positive. • The formula for precision is: Precision = TP / (TP + FP)
  • 72.
    CSIBER AN AUTONOMOUS INSTITUTE Recall(Sensitivity or True Positive Rate) • Recall is the ratio of correctly predicted positive observations to the total actual positives. • It is calculated as the number of true positives divided by the sum of true positives and false negatives. • Recall measures the ability of the model to capture all the actual positive instances. • The formula for recall is: Recall = TP / (TP + FN) • The formula for precision is: Precision = TP / (TP + FP)
  • 73.
    CSIBER AN AUTONOMOUS INSTITUTE F1-Score •F1-score is the harmonic mean of precision and recall. It provides a balance between precision and recall. • It is calculated as 2 * (Precision * Recall) / (Precision + Recall). • F1-score reaches its best value at 1 (perfect precision and recall) and worst at 0. • F1-score is particularly useful when the class distribution is imbalanced.
  • 74.
    CSIBER AN AUTONOMOUS INSTITUTE Support •Support is the number of actual occurrences of the class in the specified dataset. • It is the number of instances belonging to each class.
  • 75.
    CSIBER AN AUTONOMOUS INSTITUTE PracticeProblem Scenario: Email Spam Classification • Suppose you have a dataset of emails labeled as either spam or ham (Non-spam), and you want to build a classification model to identify spam emails. The initial performance of the model may look like this: • What are total rows in the data set?
  • 76.
    CSIBER AN AUTONOMOUS INSTITUTE PracticeProblem In this confusion matrix: True Positives (TP): 80 True Negatives (TN): 950 False Positives (FP): 50 False Negatives (FN): 20
  • 77.
    CSIBER AN AUTONOMOUS INSTITUTE PracticeProblem The total number of rows in the dataset (total instances or observations) can be calculated by adding up all these values: Total Rows =TP + TN + FP + FN Using the values from the confusion matrix: Total Rows = 80 + 950 + 50 + 20 =1100 So, there are a total of 1100 rows or instances in the dataset based on the information provided in the confusion matrix. Each row corresponds to a specific instance (email in this case) that the model has made predictions on, and the confusion matrix summarizes how those predictions align with the true classes.
  • 78.

Editor's Notes

  • #11 Decision tree technique –ID3 algorithm ( tree constructed– extract classification rules ) Training data + Classification algorithm 🡪 Classifier model ---- evaluate the performance and accuracy of model + Test data set Refund – yes( 3)---cheat- No (3) - No (7) ---- MarSt --Married (3) --- Cheat No(3) -- Single/Divorced (4) -- Taxinc -- <80K ( 1) --- cheat NO (1) >= 80K (3) cheat No (3) If Then classification rules If Refund is YES then Cheat is NO If Refund is NO and MarSt is Married then Cheat is NO If Refund is NO and MarSt is ( Single or Divorced) and TaxInc is< 80K then Cheat is NO If Refund is NO and MarSt is ( Single or Divorced) and TaxInc is>= 80K then Cheat is YES