Entropy and information gain in decision tree.

•

0 likes•122 views

Entropy is a measure of unpredictability or impurity in a data set. It is used in decision trees to determine the best way to split data at each node. High entropy means low purity with an equal mix of classes, while low entropy means high purity with mostly one class. Information gain is the reduction in entropy when splitting on an attribute, with the attribute with the highest information gain chosen as the split. For example, in a data set on restaurant patrons, splitting on the "patrons" attribute results in a higher information gain than splitting on "type of food" so "patrons" would be chosen as the root node.

Technology

Entropy and
Information Gain
in Decision Tree
OMega TechEd

Entropy
Entropy is the machine learning metric that measures the
unpredictability or impurity in the system.
Entropy is the measurement of disorder or impurities in the information
processed in machine learning. It determines how a decision tree
chooses to split data.
2
High Entropy
Low Entropy
OMega TechEd

Entropy
A random variable with only one value, a coin that always comes up heads,
has no uncertainty and thus its entropy is defined as zero; thus, we gain no
information by observing its value.
Entropy always lies between 0 and 1, however depending on the number of
classes in the dataset, it can be greater than 1.
In general, the entropy of a random variable V with values vk, each with
probability P(vk), is defined as Entropy:
H(V ) = − ∑ k P(vk) log2 P(vk) .
Entropy of a fair coin flip
H(Fair ) = −(0.5 log2 0.5+0.5 log2 0.5) = 1 .
3
OMega TechEd

How to calculate Entropy?
H(V ) = − ∑ k P(vk) log2 P(vk) .
Example:
If we had a total 10 data points in our dataset with 3 belonging to positive
class and 7 belonging to negative class:
-3/10 * log2 (3/10) – 7/10 * log2 (7/10) ≈ 0.876
The Entropy is approximately 0.88 .
High entropy means low level of purity.
4
OMega TechEd

Entropy (Cont.)
Different cases
5
Entropy=0
Entropy=1 Entropy=0.88
If dataset contain equal no of positive and negative data points entropy is 1.
If dataset contain only positive or only negative data points entropy is 0.
OMega TechEd

Information Gain
Information gain is defined as the pattern observed in the dataset and
reduction in the entropy.
Mathematically, information gain can be expressed with the below formula:
Information Gain = (Entropy of parent node)-(Entropy of child node)
6
OMega TechEd

Decision tree using information gain
1. An attribute with the highest information gain from a set should be
selected as the parent (root) node.
2. Build child nodes for every value of attribute A.
3. Repeat iteratively until we finish constructing the whole tree.
7
OMega TechEd

Choosing the best attribute
We need a measure of “good” and “bad” for
attributes. One way to do is to compute the
information gain.
Example:
At the root node of the restaurant problem,
there are 6 True samples and 6 False
samples.
Entropy(Parent) =1
8
6 positive
6 negative
2 negative
4 positive 2 positive
4 negative
None
2
Some
4
Full
6
0 positive
0 negative
Patrons
OMega TechEd

Choosing the best attribute
E(Patrons=None) = 0
E(Patrons=Some) = 0
E(Patrons=Full) =
= -2/6 * log2 (2/6) – 4/6 * log2 (4/6)
= -1/3 * (-1.59) - 2/3 * (-0.59)
= 0.53+0.39 ≈ 0.92
Weighted average of entropy for each node
E(Patrons)=
2/12 * 0 + 4/12 * 0 + 6/12 * 0.92 = 0.46
E(Patrons) ≈ 0.46
9
6 positive
6 negative
2 negative
4 positive 2 positive
4 negative
None
2
Some
4
Full
6
0 positive
0 negative
Patrons

Choosing the best attribute
10
Information Gain = (Entropy of parent node)-(Entropy of child node)
IG= 1-0.46 ≈ 0.54
Gain(Patrons) ≈ 0.54
OMega TechEd

Choosing the best attribute
E(Type=French) = 1
E(Type=Italian) = 1
E(Type=Thai) = 1
E(Type=Burger) =1
Weighted average of entropy for each node
E(Type)=
[2/12 * 1 + 2/12 * 1 + 4/12 * 1 + 4/12 *1]
= 1
E(Type) ≈ 1
11
6 positive
6 negative
2 positive
2 negative
French
2
Italian
2
Thai
4
1 negative
1 positive
1 positive
1 negative
Burger
4
2 positive
2 negative
Type
OMega TechEd

Choosing the best attribute
12
Information Gain = (Entropy of parent node)-(Entropy of child node)
IG= 1 - 1 ≈ 0
Gain(Type) ≈ 0
Confirming that Patrons is a better attribute than Type. In fact, at the root Patrons gives
the highest information gain.
OMega TechEd

Thank you
Reference:
Artificial Intelligence: A Modern Approach, 3rd ed.
Stuart Russell and Peter Norvig OMega TechEd

What's hot

Decision treeTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Single Layer Rosenblatt PerceptronAndriyOleksiuk

KNN.pptxMohamed Essam

ABC Algorithm.N Vinayak

Introduction of Xgboostmichiaki ito

Genetic AlgorithmSHIMI S L

Naive bayesAshraf Uddin

What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida

Artificial bee colony algorithmSatyasis Mishra

Important Classification and Regression Metrics.pptxChode Amarnath

NLP_KASHK:Minimum Edit DistanceHemantha Kulathilake

Genetic algorithmszamakhan

Intro to Feature Selectionchenhm

Decision TreesStudent

L2. Evaluating Machine Learning Algorithms IMachine Learning Valencia

NAIVE BAYES CLASSIFIERKnoldus Inc.

Time and space complexityAnkit Katiyar

K - Nearest neighbor ( KNN )Mohammad Junaid Khan

CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...butest

Data Science - Part VI - Market Basket and Product Recommendation EnginesDerek Kane

What's hot (20)

Decision tree

Single Layer Rosenblatt Perceptron

KNN.pptx

ABC Algorithm.

Introduction of Xgboost

Genetic Algorithm

Naive bayes

What is the Expectation Maximization (EM) Algorithm?

Artificial bee colony algorithm

Important Classification and Regression Metrics.pptx

NLP_KASHK:Minimum Edit Distance

Genetic algorithms

Intro to Feature Selection

Decision Trees

L2. Evaluating Machine Learning Algorithms I

NAIVE BAYES CLASSIFIER

Time and space complexity

K - Nearest neighbor ( KNN )

CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...

Data Science - Part VI - Market Basket and Product Recommendation Engines

Similar to Entropy and information gain in decision tree.

ID3 AlgorithmCherifRehouma

Machine Learningbutest

19_Learning.pptgnans Kgnanshek

The entropic principle: /dev/u?random and NetBSD by Taylor R Campbelleurobsdcon

Secure information aggregation in sensor networksAleksandr Yampolskiy

Abductive commonsense reasoningSan Kim

002.decision treeshoangminhdong

lecture13-NN-basics.pptxAbijahRoseline1

Theories of continuous optimizationOlivier Teytaud

Probability 4herbison

Probability 4.1herbison

Assignments of source coding theory and applicationsGaspard Ggas

Artificial Neural NetworkDessy Amirudin

Decision Trees - The Machine Learning Magic UnveiledLuca Zavarella

Perceptron in ANNZaid Al-husseini

Decision treeDr. Jasmine Beulah Gnanadurai

Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra

SPIE Conference V3.0Robert Fry

Similar to Entropy and information gain in decision tree. (20)

ID3 Algorithm

Machine Learning

19_Learning.ppt

The entropic principle: /dev/u?random and NetBSD by Taylor R Campbell

Secure information aggregation in sensor networks

Abductive commonsense reasoning

002.decision trees

lecture13-NN-basics.pptx

Theories of continuous optimization

Probability 4

Probability 4.1

Assignments of source coding theory and applications

Artificial Neural Network

Decision Trees - The Machine Learning Magic Unveiled

Perceptron in ANN

Decision tree

Artificial Neural Networks-Supervised Learning Models

SPIE Conference V3.0

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Artificial Intelligence: Facts and MythsJoaquim Jorge

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

How to convert PDF to text with Nanonetsnaman860154

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

A Domino Admins Adventures (Engage 2024)Gabriella Davis

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Finology Group – Insurtech Innovation Award 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

How to Troubleshoot Apps for the Modern Connected Worker

Boost PC performance: How more available memory can improve productivity

What Are The Drone Anti-jamming Systems Technology?

How to Troubleshoot Apps for the Modern Connected Worker

Artificial Intelligence: Facts and Myths

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

How to convert PDF to text with Nanonets

Scaling API-first – The story of a global engineering organization

Boost Fertility New Invention Ups Success Rates.pdf

A Domino Admins Adventures (Engage 2024)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

08448380779 Call Girls In Civil Lines Women Seeking Men

🐬 The future of MySQL is Postgres 🐘

Presentation on how to chat with PDF using ChatGPT code interpreter

Entropy and information gain in decision tree.

1. Entropy and Information Gain in Decision Tree OMega TechEd

2. Entropy Entropy is the machine learning metric that measures the unpredictability or impurity in the system. Entropy is the measurement of disorder or impurities in the information processed in machine learning. It determines how a decision tree chooses to split data. 2 High Entropy Low Entropy OMega TechEd

3. Entropy A random variable with only one value, a coin that always comes up heads, has no uncertainty and thus its entropy is defined as zero; thus, we gain no information by observing its value. Entropy always lies between 0 and 1, however depending on the number of classes in the dataset, it can be greater than 1. In general, the entropy of a random variable V with values vk, each with probability P(vk), is defined as Entropy: H(V ) = − ∑ k P(vk) log2 P(vk) . Entropy of a fair coin flip H(Fair ) = −(0.5 log2 0.5+0.5 log2 0.5) = 1 . 3 OMega TechEd

4. How to calculate Entropy? H(V ) = − ∑ k P(vk) log2 P(vk) . Example: If we had a total 10 data points in our dataset with 3 belonging to positive class and 7 belonging to negative class: -3/10 * log2 (3/10) – 7/10 * log2 (7/10) ≈ 0.876 The Entropy is approximately 0.88 . High entropy means low level of purity. 4 OMega TechEd

5. Entropy (Cont.) Different cases 5 Entropy=0 Entropy=1 Entropy=0.88 If dataset contain equal no of positive and negative data points entropy is 1. If dataset contain only positive or only negative data points entropy is 0. OMega TechEd

6. Information Gain Information gain is defined as the pattern observed in the dataset and reduction in the entropy. Mathematically, information gain can be expressed with the below formula: Information Gain = (Entropy of parent node)-(Entropy of child node) 6 OMega TechEd

7. Decision tree using information gain 1. An attribute with the highest information gain from a set should be selected as the parent (root) node. 2. Build child nodes for every value of attribute A. 3. Repeat iteratively until we finish constructing the whole tree. 7 OMega TechEd

8. Choosing the best attribute We need a measure of “good” and “bad” for attributes. One way to do is to compute the information gain. Example: At the root node of the restaurant problem, there are 6 True samples and 6 False samples. Entropy(Parent) =1 8 6 positive 6 negative 2 negative 4 positive 2 positive 4 negative None 2 Some 4 Full 6 0 positive 0 negative Patrons OMega TechEd

9. Choosing the best attribute E(Patrons=None) = 0 E(Patrons=Some) = 0 E(Patrons=Full) = = -2/6 * log2 (2/6) – 4/6 * log2 (4/6) = -1/3 * (-1.59) - 2/3 * (-0.59) = 0.53+0.39 ≈ 0.92 Weighted average of entropy for each node E(Patrons)= 2/12 * 0 + 4/12 * 0 + 6/12 * 0.92 = 0.46 E(Patrons) ≈ 0.46 9 6 positive 6 negative 2 negative 4 positive 2 positive 4 negative None 2 Some 4 Full 6 0 positive 0 negative Patrons

10. Choosing the best attribute 10 Information Gain = (Entropy of parent node)-(Entropy of child node) IG= 1-0.46 ≈ 0.54 Gain(Patrons) ≈ 0.54 OMega TechEd

11. Choosing the best attribute E(Type=French) = 1 E(Type=Italian) = 1 E(Type=Thai) = 1 E(Type=Burger) =1 Weighted average of entropy for each node E(Type)= [2/12 * 1 + 2/12 * 1 + 4/12 * 1 + 4/12 *1] = 1 E(Type) ≈ 1 11 6 positive 6 negative 2 positive 2 negative French 2 Italian 2 Thai 4 1 negative 1 positive 1 positive 1 negative Burger 4 2 positive 2 negative Type OMega TechEd

12. Choosing the best attribute 12 Information Gain = (Entropy of parent node)-(Entropy of child node) IG= 1 - 1 ≈ 0 Gain(Type) ≈ 0 Confirming that Patrons is a better attribute than Type. In fact, at the root Patrons gives the highest information gain. OMega TechEd

13. Thank you Reference: Artificial Intelligence: A Modern Approach, 3rd ed. Stuart Russell and Peter Norvig OMega TechEd

Entropy and information gain in decision tree.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Entropy and information gain in decision tree.

Similar to Entropy and information gain in decision tree. (20)

More from Megha Sharma

More from Megha Sharma (20)

Recently uploaded

Recently uploaded (20)

Entropy and information gain in decision tree.