SlideShare a Scribd company logo
Attribute Selection measure
The information gain measure is used to select the test attribute at each node in the tree.
The attribute with highest information gain is chosen as the test attribute for the node.
Algorithm
Let S be a set consist of s data samples.
The class label attribute has m distinct values defining m distinct classes. Ci(for i=1,2…m)
Si be the number of samples of S in class Ci
1) The expected information needed to classify a given sample is given by
I(s1,s2….sm)=-∑Pilog2(Pi) where i=1….m
Pi- probability that arbitrary sample belongs to class Ci
2) The entropy or expected information based on the partitioning into subsets by A, is given by
E(A)=∑(Sij+….+smj/S).I(Sij…..Smj) where j=1….r
3) Calculate gain value
Gain(A)= I(S1,S2….Sm)-E(A)
the algorithm computes the information gain of each attribute. The attribute with the highest
information gain is chosen as the test attribute for the given set S.
A node is created and labeled with the attribute, branches are created for each value of the attribute,
and the samples are partitioned accordingly.
Induction of a decision tree
Training data tuples from customer database.
Si.No Age Income Marital status Employed Class:
diagnosed
1 31…40 High Unmarried No No
2 31….40 High Unmarried Yes No
3 41…50 High Unmarried No Yes
4 51….60 Medium Unmarried No Yes
5 51…60 Low Married No Yes
6 51….60 Low Married Yes No
7 41….50 Low Married Yes Yes
8 31….40 Medium Unmarried No No
9 31….40 Low Married No Yes
10 51….60 Medium married No Yes
11 31….40 Medium married Yes Yes
12 41….50 Medium Unmarried Yes Yes
13 41…..50 High Married No Yes
14 51….60 medium unmarried yes no
Training data tuples from palayam, kanyakumari district database (30 samples)
This data set collected from palayam, kanyakumari district during medical camp on 07/04/2019.
There are 30 samples collected for research purpose. Totally 64 attributes collected. From these
select 5 attributes for testing.
Attributes collected: 64
Attributes selected: 05
The class label attribute, diagnosis has 2 distinct values ({yes,no})
Therefore m=2
C1 corresponds to yes
C2 corresponds to no
There are 9 samples of class yes and 5 samples of class no.
1) Information gain of each attribute,
I(s1,s2)=I(9,5)=-[9/14 log29/14+5/14 log25/4]
=0.940
Next compute the entropy of each attribute
Let’s start with attribute ‘age’
For age=’31…40’
S11=2, s21=3
I(s11,S21)=0.971
For age= ’41…50’
S12=4, s22=0
I(s21,s22)=0
For age =’51…60’
S13=3,s23=2
I(S13,S23)=0.9
Then entropy is calculated.
2)E(age)=(((2+3)/14).I(S11,S21)+((4+0)/14).I(S12,S22)+((3+2)/14).I(S13,S23))
E(age)=0.694
3) Gain(age)=I(s1,s2)-E(age)
=0.940-0.694
=0.246
4) Decision tree
Since age has the highest information gain. Therefore it becomes a test attribute at the root node of
the decision tree.
5) Rule based classifier
Generating classification rules from a decision tree
The rules extracted from the above decision tree
If age=’31…40’ AND marital status=”unmarried” then diagnosis=’no’
If age=’31…40’ AND marital status=’married’ THEN diagnosis=’yes’
If age =’41..50’ THEN diagnosis=’yes’
If age=’51…60’ AND employed=’yes’ THEN diagnosis=’no’
If age=’51…60’ AND employed=’no’ THEN diagnosis=’yes’
J48 algorithm uses the training samples to estimate the accuracy of each rule. A rule can be pruned
by removing any condition in its antecedent that does not improve the estimated accuracy of the
rule.

More Related Content

Similar to Attribute selection measure

Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
yazad dumasia
 
Data Mining
Data MiningData Mining
Data Mining
IIIT ALLAHABAD
 
Mean (Girja Pd. Patel)
Mean (Girja Pd. Patel)Mean (Girja Pd. Patel)
Mean (Girja Pd. Patel)
GirjaPrasad
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Salah Amean
 
kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
Aryanhayaran
 
11 id3 & c4
11 id3 & c411 id3 & c4
11 id3 & c4
Vishal Dutt
 
Outlier detection by Ueda's method
Outlier detection by Ueda's methodOutlier detection by Ueda's method
Outlier detection by Ueda's method
POOJA PATIL
 
decisiontree-110906040745-phpapp01.pptx
decisiontree-110906040745-phpapp01.pptxdecisiontree-110906040745-phpapp01.pptx
decisiontree-110906040745-phpapp01.pptx
ABINASHPADHY6
 
Measures of Central Tendency.pptx
Measures of Central Tendency.pptxMeasures of Central Tendency.pptx
Measures of Central Tendency.pptx
Melba Shaya Sweety
 
Central Tendancy.pdf
Central Tendancy.pdfCentral Tendancy.pdf
Central Tendancy.pdf
MuhammadFaizan389
 
Systematics Lecture: Phenetics
Systematics Lecture: PheneticsSystematics Lecture: Phenetics
Systematics Lecture: Phenetics
Shaina Mavreen Villaroza
 
measures-of-central-tendency.pdf
measures-of-central-tendency.pdfmeasures-of-central-tendency.pdf
measures-of-central-tendency.pdf
AshaChikkaputtegowda1
 
STT802project-writeup-Final (1)
STT802project-writeup-Final (1)STT802project-writeup-Final (1)
STT802project-writeup-Final (1)
James P. Regan II
 
Data mining assignment 4
Data mining assignment 4Data mining assignment 4
Data mining assignment 4
BarryK88
 

Similar to Attribute selection measure (14)

Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
 
Data Mining
Data MiningData Mining
Data Mining
 
Mean (Girja Pd. Patel)
Mean (Girja Pd. Patel)Mean (Girja Pd. Patel)
Mean (Girja Pd. Patel)
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
 
11 id3 & c4
11 id3 & c411 id3 & c4
11 id3 & c4
 
Outlier detection by Ueda's method
Outlier detection by Ueda's methodOutlier detection by Ueda's method
Outlier detection by Ueda's method
 
decisiontree-110906040745-phpapp01.pptx
decisiontree-110906040745-phpapp01.pptxdecisiontree-110906040745-phpapp01.pptx
decisiontree-110906040745-phpapp01.pptx
 
Measures of Central Tendency.pptx
Measures of Central Tendency.pptxMeasures of Central Tendency.pptx
Measures of Central Tendency.pptx
 
Central Tendancy.pdf
Central Tendancy.pdfCentral Tendancy.pdf
Central Tendancy.pdf
 
Systematics Lecture: Phenetics
Systematics Lecture: PheneticsSystematics Lecture: Phenetics
Systematics Lecture: Phenetics
 
measures-of-central-tendency.pdf
measures-of-central-tendency.pdfmeasures-of-central-tendency.pdf
measures-of-central-tendency.pdf
 
STT802project-writeup-Final (1)
STT802project-writeup-Final (1)STT802project-writeup-Final (1)
STT802project-writeup-Final (1)
 
Data mining assignment 4
Data mining assignment 4Data mining assignment 4
Data mining assignment 4
 

Recently uploaded

Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
uqyfuc
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
Addu25809
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
upoux
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
Shiny Christobel
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
ijseajournal
 
2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt
abdatawakjira
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
sydezfe
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
AlvianRamadhani5
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Transcat
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
Kamal Acharya
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 

Recently uploaded (20)

Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
 
2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 

Attribute selection measure

  • 1. Attribute Selection measure The information gain measure is used to select the test attribute at each node in the tree. The attribute with highest information gain is chosen as the test attribute for the node. Algorithm Let S be a set consist of s data samples. The class label attribute has m distinct values defining m distinct classes. Ci(for i=1,2…m) Si be the number of samples of S in class Ci 1) The expected information needed to classify a given sample is given by I(s1,s2….sm)=-∑Pilog2(Pi) where i=1….m Pi- probability that arbitrary sample belongs to class Ci 2) The entropy or expected information based on the partitioning into subsets by A, is given by E(A)=∑(Sij+….+smj/S).I(Sij…..Smj) where j=1….r 3) Calculate gain value Gain(A)= I(S1,S2….Sm)-E(A) the algorithm computes the information gain of each attribute. The attribute with the highest information gain is chosen as the test attribute for the given set S. A node is created and labeled with the attribute, branches are created for each value of the attribute, and the samples are partitioned accordingly. Induction of a decision tree Training data tuples from customer database. Si.No Age Income Marital status Employed Class: diagnosed 1 31…40 High Unmarried No No 2 31….40 High Unmarried Yes No 3 41…50 High Unmarried No Yes 4 51….60 Medium Unmarried No Yes 5 51…60 Low Married No Yes 6 51….60 Low Married Yes No 7 41….50 Low Married Yes Yes 8 31….40 Medium Unmarried No No 9 31….40 Low Married No Yes 10 51….60 Medium married No Yes 11 31….40 Medium married Yes Yes 12 41….50 Medium Unmarried Yes Yes 13 41…..50 High Married No Yes 14 51….60 medium unmarried yes no Training data tuples from palayam, kanyakumari district database (30 samples)
  • 2. This data set collected from palayam, kanyakumari district during medical camp on 07/04/2019. There are 30 samples collected for research purpose. Totally 64 attributes collected. From these select 5 attributes for testing. Attributes collected: 64 Attributes selected: 05 The class label attribute, diagnosis has 2 distinct values ({yes,no}) Therefore m=2 C1 corresponds to yes C2 corresponds to no There are 9 samples of class yes and 5 samples of class no. 1) Information gain of each attribute, I(s1,s2)=I(9,5)=-[9/14 log29/14+5/14 log25/4] =0.940 Next compute the entropy of each attribute Let’s start with attribute ‘age’ For age=’31…40’ S11=2, s21=3 I(s11,S21)=0.971 For age= ’41…50’ S12=4, s22=0 I(s21,s22)=0 For age =’51…60’ S13=3,s23=2 I(S13,S23)=0.9 Then entropy is calculated. 2)E(age)=(((2+3)/14).I(S11,S21)+((4+0)/14).I(S12,S22)+((3+2)/14).I(S13,S23)) E(age)=0.694 3) Gain(age)=I(s1,s2)-E(age) =0.940-0.694 =0.246 4) Decision tree Since age has the highest information gain. Therefore it becomes a test attribute at the root node of the decision tree. 5) Rule based classifier Generating classification rules from a decision tree The rules extracted from the above decision tree If age=’31…40’ AND marital status=”unmarried” then diagnosis=’no’ If age=’31…40’ AND marital status=’married’ THEN diagnosis=’yes’ If age =’41..50’ THEN diagnosis=’yes’ If age=’51…60’ AND employed=’yes’ THEN diagnosis=’no’ If age=’51…60’ AND employed=’no’ THEN diagnosis=’yes’ J48 algorithm uses the training samples to estimate the accuracy of each rule. A rule can be pruned by removing any condition in its antecedent that does not improve the estimated accuracy of the rule.