SlideShare a Scribd company logo
1 of 8
Prof. Neeraj Bhargava
Vishal Dutt
Department of Computer Science, School of
Engineering & System Sciences
MDS University, Ajmer
2
Attribute Selection Measure: Information
Gain (ID3/C4.5)
 S contains si tuples of class Ci for i = {1, …, m}
 information measures info (entropy) required to classify
any arbitrary tuple
 Assume a set of training examples, S. If we make attribute
A, with v values, the root of the current tree, this will
partition S into v subsets. The expected information
needed to complete the tree after making A the root is:
s
s
log
s
s
),...,s,ssI(
i
m
i
i
m21 2
1


)s,...,s(I
s
s...s
E(A) mjj
v
j
mjj
1
1
1



Information gain
 information gained by branching on
attribute A
 We will choose the attribute with the
highest information gain to branch the
current tree.
E(A))s,...,s,I(sGain(A) m  21
3
Attribute Selection by info gain
 Class P: buys_computer = “yes”
 Class N: buys_computer = “no”
 I(p, n) = I(9, 5) =0.940
 Compute the entropy for age:
means “age <=30” has 5
out of 14 samples, with 2
yes’es and 3 no’s. Hence
Similarly,
4
age pi ni I(pi, ni)
<=30 2 3 0.971
30…40 4 0 0
>40 3 2 0.971
694.0)2,3(
14
5
)0,4(
14
4
)3,2(
14
5
)(


I
IIageE
048.0)_(
151.0)(
029.0)(



ratingcreditGain
studentGain
incomeGain
246.0)(),()(  ageEnpIageGain
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
)3,2(
14
5
I
We build the following tree
5
age?
overcast
student? credit rating?
no yes fairexcellent
<=30 >40
no noyes yes
yes
30..40
Extracting Classification Rules from Trees
 Represent the knowledge in the form of IF-THEN rules
 One rule is created for each path from the root to a leaf
 Each attribute-value pair along a path forms a conjunction.
The leaf node holds the class prediction
 Rules are easier for humans to understand
 Example
IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN buys_computer =
“yes”
IF age = “<=30” AND credit_rating = “fair” THEN buys_computer = “no”
6
Avoid Overfitting in Classification
 Overfitting: An tree may overfit the training data
 Good accuracy on training data but poor on test exmples
 Too many branches, some may reflect anomalies due to
noise or outliers
 Two approaches to avoid overfitting
 Prepruning: Halt tree construction early
 Difficult to decide
 Postpruning: Remove branches from a “fully grown”
tree—get a sequence of progressively pruned trees.
 This method is commonly used (based on validation set or
statistical estimate or MDL)
7
Enhancements to basic decision
tree induction
 Allow for continuous-valued attributes
 Dynamically define new discrete-valued attributes that
partition the continuous attribute value into a discrete set
of intervals
 Handle missing attribute values
 Assign the most common value of the attribute
 Assign probability to each of the possible values
 Attribute construction
 Create new attributes based on existing ones that are
sparsely represented.
 This reduces fragmentation, repetition, and replication 8

More Related Content

What's hot

Nearest Class Mean Metric Learning
Nearest Class Mean Metric LearningNearest Class Mean Metric Learning
Nearest Class Mean Metric LearningSangjun Han
 
Real life application of discrete math
Real life application of discrete mathReal life application of discrete math
Real life application of discrete mathSanad Bhowmik
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptxRaflyRizky2
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Matrix and it's Application
Matrix and it's ApplicationMatrix and it's Application
Matrix and it's ApplicationMahmudle Hassan
 
Disease prediction using machine learning
Disease prediction using machine learningDisease prediction using machine learning
Disease prediction using machine learningJinishaKG
 
Distributed Decision Tree Induction
Distributed Decision Tree InductionDistributed Decision Tree Induction
Distributed Decision Tree Inductiongregoryg
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaMacha Pujitha
 
support vector machine and associative classification
support vector machine and associative classificationsupport vector machine and associative classification
support vector machine and associative classificationrajshreemuthiah
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine LearningPranav Challa
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 
Matrices in computer applications
Matrices in computer applicationsMatrices in computer applications
Matrices in computer applicationsRayyan777
 
Neural netorksmatching
Neural netorksmatchingNeural netorksmatching
Neural netorksmatchingMasa Kato
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesVimal Gupta
 

What's hot (16)

Nearest Class Mean Metric Learning
Nearest Class Mean Metric LearningNearest Class Mean Metric Learning
Nearest Class Mean Metric Learning
 
Real life application of discrete math
Real life application of discrete mathReal life application of discrete math
Real life application of discrete math
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Matrix and it's Application
Matrix and it's ApplicationMatrix and it's Application
Matrix and it's Application
 
Disease prediction using machine learning
Disease prediction using machine learningDisease prediction using machine learning
Disease prediction using machine learning
 
Distributed Decision Tree Induction
Distributed Decision Tree InductionDistributed Decision Tree Induction
Distributed Decision Tree Induction
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using Weka
 
ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHM
 
support vector machine and associative classification
support vector machine and associative classificationsupport vector machine and associative classification
support vector machine and associative classification
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Matrices in computer applications
Matrices in computer applicationsMatrices in computer applications
Matrices in computer applications
 
Neural netorksmatching
Neural netorksmatchingNeural netorksmatching
Neural netorksmatching
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 

Similar to 11 id3 &amp; c4

Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.pptRohit Raj
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
 
unit classification.pptx
unit  classification.pptxunit  classification.pptx
unit classification.pptxssuser908de6
 
08 classbasic
08 classbasic08 classbasic
08 classbasicengrasi
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf321106410027
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxHimanshuSharma997566
 
Managing Data: storage, decisions and classification
Managing Data: storage, decisions and classificationManaging Data: storage, decisions and classification
Managing Data: storage, decisions and classificationEdward Blurock
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Unit-4 classification
Unit-4 classificationUnit-4 classification
Unit-4 classificationLokarchanaD
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.pptbutest
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
 
Classification and Prediction
Classification and PredictionClassification and Prediction
Classification and PredictionSahilKumar542
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision treehktripathy
 

Similar to 11 id3 &amp; c4 (20)

Data Mining
Data MiningData Mining
Data Mining
 
Classification
Classification Classification
Classification
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.ppt
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classification
 
unit classification.pptx
unit  classification.pptxunit  classification.pptx
unit classification.pptx
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptx
 
Managing Data: storage, decisions and classification
Managing Data: storage, decisions and classificationManaging Data: storage, decisions and classification
Managing Data: storage, decisions and classification
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Unit-4 classification
Unit-4 classificationUnit-4 classification
Unit-4 classification
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.ppt
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Classification and Prediction
Classification and PredictionClassification and Prediction
Classification and Prediction
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 

More from Vishal Dutt

Grid computing components
Grid computing componentsGrid computing components
Grid computing componentsVishal Dutt
 
Python files / directories part16
Python files / directories  part16Python files / directories  part16
Python files / directories part16Vishal Dutt
 
Python Classes and Objects part14
Python Classes and Objects  part14Python Classes and Objects  part14
Python Classes and Objects part14Vishal Dutt
 
Python Classes and Objects part13
Python Classes and Objects  part13Python Classes and Objects  part13
Python Classes and Objects part13Vishal Dutt
 
Python files / directories part15
Python files / directories  part15Python files / directories  part15
Python files / directories part15Vishal Dutt
 
Python functions part12
Python functions  part12Python functions  part12
Python functions part12Vishal Dutt
 
Python functions part11
Python functions  part11Python functions  part11
Python functions part11Vishal Dutt
 
Python functions part10
Python functions  part10Python functions  part10
Python functions part10Vishal Dutt
 
Python decision making_loops_control statements part9
Python decision making_loops_control statements part9Python decision making_loops_control statements part9
Python decision making_loops_control statements part9Vishal Dutt
 
Python decision making_loops_control statements part8
Python decision making_loops_control statements part8Python decision making_loops_control statements part8
Python decision making_loops_control statements part8Vishal Dutt
 
Python decision making_loops part7
Python decision making_loops part7Python decision making_loops part7
Python decision making_loops part7Vishal Dutt
 
Python decision making_loops part6
Python decision making_loops part6Python decision making_loops part6
Python decision making_loops part6Vishal Dutt
 
Python decision making part5
Python decision making part5Python decision making part5
Python decision making part5Vishal Dutt
 
Python decision making part4
Python decision making part4Python decision making part4
Python decision making part4Vishal Dutt
 
Python operators part3
Python operators part3Python operators part3
Python operators part3Vishal Dutt
 

More from Vishal Dutt (20)

Grid computing components
Grid computing componentsGrid computing components
Grid computing components
 
Python files / directories part16
Python files / directories  part16Python files / directories  part16
Python files / directories part16
 
Python Classes and Objects part14
Python Classes and Objects  part14Python Classes and Objects  part14
Python Classes and Objects part14
 
Python Classes and Objects part13
Python Classes and Objects  part13Python Classes and Objects  part13
Python Classes and Objects part13
 
Python files / directories part15
Python files / directories  part15Python files / directories  part15
Python files / directories part15
 
Python functions part12
Python functions  part12Python functions  part12
Python functions part12
 
Python functions part11
Python functions  part11Python functions  part11
Python functions part11
 
Python functions part10
Python functions  part10Python functions  part10
Python functions part10
 
List view5
List view5List view5
List view5
 
Python decision making_loops_control statements part9
Python decision making_loops_control statements part9Python decision making_loops_control statements part9
Python decision making_loops_control statements part9
 
List view4
List view4List view4
List view4
 
List view3
List view3List view3
List view3
 
Python decision making_loops_control statements part8
Python decision making_loops_control statements part8Python decision making_loops_control statements part8
Python decision making_loops_control statements part8
 
Python decision making_loops part7
Python decision making_loops part7Python decision making_loops part7
Python decision making_loops part7
 
Python decision making_loops part6
Python decision making_loops part6Python decision making_loops part6
Python decision making_loops part6
 
List view2
List view2List view2
List view2
 
List view1
List view1List view1
List view1
 
Python decision making part5
Python decision making part5Python decision making part5
Python decision making part5
 
Python decision making part4
Python decision making part4Python decision making part4
Python decision making part4
 
Python operators part3
Python operators part3Python operators part3
Python operators part3
 

Recently uploaded

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 

Recently uploaded (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

11 id3 &amp; c4

  • 1. Prof. Neeraj Bhargava Vishal Dutt Department of Computer Science, School of Engineering & System Sciences MDS University, Ajmer
  • 2. 2 Attribute Selection Measure: Information Gain (ID3/C4.5)  S contains si tuples of class Ci for i = {1, …, m}  information measures info (entropy) required to classify any arbitrary tuple  Assume a set of training examples, S. If we make attribute A, with v values, the root of the current tree, this will partition S into v subsets. The expected information needed to complete the tree after making A the root is: s s log s s ),...,s,ssI( i m i i m21 2 1   )s,...,s(I s s...s E(A) mjj v j mjj 1 1 1   
  • 3. Information gain  information gained by branching on attribute A  We will choose the attribute with the highest information gain to branch the current tree. E(A))s,...,s,I(sGain(A) m  21 3
  • 4. Attribute Selection by info gain  Class P: buys_computer = “yes”  Class N: buys_computer = “no”  I(p, n) = I(9, 5) =0.940  Compute the entropy for age: means “age <=30” has 5 out of 14 samples, with 2 yes’es and 3 no’s. Hence Similarly, 4 age pi ni I(pi, ni) <=30 2 3 0.971 30…40 4 0 0 >40 3 2 0.971 694.0)2,3( 14 5 )0,4( 14 4 )3,2( 14 5 )(   I IIageE 048.0)_( 151.0)( 029.0)(    ratingcreditGain studentGain incomeGain 246.0)(),()(  ageEnpIageGain age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 31…40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31…40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31…40 medium no excellent yes 31…40 high yes fair yes >40 medium no excellent no )3,2( 14 5 I
  • 5. We build the following tree 5 age? overcast student? credit rating? no yes fairexcellent <=30 >40 no noyes yes yes 30..40
  • 6. Extracting Classification Rules from Trees  Represent the knowledge in the form of IF-THEN rules  One rule is created for each path from the root to a leaf  Each attribute-value pair along a path forms a conjunction. The leaf node holds the class prediction  Rules are easier for humans to understand  Example IF age = “<=30” AND student = “no” THEN buys_computer = “no” IF age = “<=30” AND student = “yes” THEN buys_computer = “yes” IF age = “31…40” THEN buys_computer = “yes” IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes” IF age = “<=30” AND credit_rating = “fair” THEN buys_computer = “no” 6
  • 7. Avoid Overfitting in Classification  Overfitting: An tree may overfit the training data  Good accuracy on training data but poor on test exmples  Too many branches, some may reflect anomalies due to noise or outliers  Two approaches to avoid overfitting  Prepruning: Halt tree construction early  Difficult to decide  Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees.  This method is commonly used (based on validation set or statistical estimate or MDL) 7
  • 8. Enhancements to basic decision tree induction  Allow for continuous-valued attributes  Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals  Handle missing attribute values  Assign the most common value of the attribute  Assign probability to each of the possible values  Attribute construction  Create new attributes based on existing ones that are sparsely represented.  This reduces fragmentation, repetition, and replication 8