SlideShare a Scribd company logo
Advanced Machine Learning with Python
Session 9 :Decision Trees
SIGKDD
Carlos Santillan
Bentley Systems Inc
csantill@gmail.com
Decision Trees
A tree-like graph decision support model
Growing a Tree
Types of Decision Trees
There are two main Types
• Classification Tree (Categorical Value Decision Tree)
• Regression Tree (Continuous Variable Decision Tree)
CART (Classification and Regression Tree) Used to refer to both
The type of a Decision tree is based on the type of the target Variable
Nodes
1.Root Node
2.Internal Node (Decision Node)
3.Leaf (terminal)
Depth - Length of of longest path
from root to leaf
Decision Stump (One level decision Tree)
Decision Tree Terms
Decision Tree Algorithm
The basic greedy algorithm is as follows:
Start at Node find “best attribute” to split at
Repartition N into N1, N2, … according to best split
Repeat for each Node N until “stop condition” is met
Growing an optimal Decision Tree is an NP-complete
Problem
Fortunately greedy algorithms have good accuracy and
performance
What is the “Best Attribute” to split
There are different criteria that can be used to determine what
is the best attribute to split.
• Information Gain
• Gini Index
• Classification Error
• Gain Ratio (Normalized Information Gain)
• Variance Reduction
Purity
Entropy
Def: Measure of Impurity in our sample
• Entropy =0 (All elements are same class)
• Entropy =1 (All elements evenly split between classes)
Information Gain
Information Gain = Entropy (Parent) - [ Weighted Average]
Entropy (Children)
If we Split X < 4
• Entropy < 4 = 0.86
• Entropy > 4 = 0
Information Gain = 0.95 - 14/16 (0.86) - (2/16) (0)
Information Gain = 0.19
Information Gain
IG = Entropy (Parent) - [ Weighted Average] Entropy (Children)
If we Split X < 3
• Entropy < 3 = 0
• Entropy > 3 = 0.811
Information Gain = 0.95 - 8/16 (0) - (2/16) (0.811)
Information Gain = 0.8486
GINI Index
Definition : Expected error rate
• GINI =0 (All elements are same class)
• GINI =0.5 (All elements evenly split between classes)
GINI Gain
If we Split X < 4
• Gini < 4 = 0.4081
• Gini > 4 = 0
Gini Gain = 0.4687 - 10/16 (0.4081) - (0/16) (0)
Gini Gain = 0.2136
GINI Gain
If we Split X < 3
• Gini< 3 = 0
• Gini > 3 = 0.375
Gini Gain = 0.4687 - 8/16 (0) - (2/16) (0.375)
Gini Gain = 0.421825
When to use which?
● Gini for continuous attributes
● Entropy for categorical.
● Entropy is slower to calculate than GINI
● Gini may fail with very small probability
● Difference between the two is theoretically around 2%
When to stop growing?
• All data points at leaf are pure
• When tree a reaches depth k
• Number of cases in node less that minimum number of cases
• Splitting criteria less than certain threshold
Pruning
Prevent over fitting
Smaller trees may be more accurate
Strategies:
• Prepruning : Stop growing when information becomes
unreliable
• Postpruning : fully grow a tree and remove unreliable parts
Note: Pruning currently not supported by scikit
Algorithms
ID3 (Iterative Dichotomiser 3) Greedy algorithm, categorical
(entropy)
C4.5 Improves on ID3 support categorical and continuous
(entropy)
C5.0 (See5)
CART similar to C4.5 (Gini Impurity)
Pros
• Easy to Understand (white box)
• Supports both Numerical and Categorical data
• Fast (greedy) algorithms
• Performs well with large datasets
• Accurate
• Feature importance
Cons
• Without pruning/Cross-validation Prone to overfitting
• Information gain biased toward features with a lot of classes
• Sensitive to changes in the data
DEMO
Resources
• https://github.com/csantill/AustinSIGKDD-DecisionTrees
• Decision Forests for Classification, Regression, Density
Estimation, Manifold Learning and Semi-Supervised Learning
• Classification and Regression Trees
• A Visual Introduction to Machine Learning
• A Complete Tutorial on Tree Based Modeling from Scratch
• Theoretical Comparison between the Gini Index and
Information Gain Criteria
Thank You
Carlos Santillan

More Related Content

What's hot

Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
Peter Reimann
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Gajanand Sharma
 
Decision tree
Decision treeDecision tree
Decision tree
Soujanya V
 
Brief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data CleaningBrief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data Cleaning
Jennifer Morrow
 
03. Data Preprocessing
03. Data Preprocessing03. Data Preprocessing
03. Data Preprocessing
Achmad Solichin
 
LITE 2015 - Data and Reporting
LITE 2015 - Data and ReportingLITE 2015 - Data and Reporting
LITE 2015 - Data and Reporting
getadministrate
 
An Introduction to SPSS
An Introduction to SPSSAn Introduction to SPSS
An Introduction to SPSS
Rayman Soe
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Slideshare
 
Object-Oriented Design Fundamentals.pptx
Object-Oriented Design Fundamentals.pptxObject-Oriented Design Fundamentals.pptx
Object-Oriented Design Fundamentals.pptx
RaflyRizky2
 
Statistical software packages
Statistical software packagesStatistical software packages
Statistical software packages
Km Ashif
 
Data processing and analysis final
Data processing and analysis finalData processing and analysis final
Data processing and analysis final
Akul10
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
thamizh arasi
 
Preprocessing
PreprocessingPreprocessing
Preprocessing
mmuthuraj
 
Data processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overviewData processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overview
ATHUL RAVI
 
4 module 3 --
4 module 3 --4 module 3 --
4 module 3 --
tafosepsdfasg
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Harry Potter
 
Spss beginners
Spss beginnersSpss beginners
Spss beginners
University of Education
 

What's hot (17)

Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Decision tree
Decision treeDecision tree
Decision tree
 
Brief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data CleaningBrief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data Cleaning
 
03. Data Preprocessing
03. Data Preprocessing03. Data Preprocessing
03. Data Preprocessing
 
LITE 2015 - Data and Reporting
LITE 2015 - Data and ReportingLITE 2015 - Data and Reporting
LITE 2015 - Data and Reporting
 
An Introduction to SPSS
An Introduction to SPSSAn Introduction to SPSS
An Introduction to SPSS
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Object-Oriented Design Fundamentals.pptx
Object-Oriented Design Fundamentals.pptxObject-Oriented Design Fundamentals.pptx
Object-Oriented Design Fundamentals.pptx
 
Statistical software packages
Statistical software packagesStatistical software packages
Statistical software packages
 
Data processing and analysis final
Data processing and analysis finalData processing and analysis final
Data processing and analysis final
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Preprocessing
PreprocessingPreprocessing
Preprocessing
 
Data processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overviewData processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overview
 
4 module 3 --
4 module 3 --4 module 3 --
4 module 3 --
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Spss beginners
Spss beginnersSpss beginners
Spss beginners
 

Viewers also liked

Protecting Web App users in today’s hostile environment
Protecting Web App users in today’s hostile environmentProtecting Web App users in today’s hostile environment
Protecting Web App users in today’s hostile environment
ajitdhumale
 
How should we perceive Security in the Cloud
How should we perceive Security in the CloudHow should we perceive Security in the Cloud
How should we perceive Security in the Cloud
Yasir Karam
 
Network Function Virtualization (NFV) BoF
Network Function Virtualization (NFV) BoFNetwork Function Virtualization (NFV) BoF
Network Function Virtualization (NFV) BoF
APNIC
 
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...
Alessandra Bagnato
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
Machine Learning Valencia
 
Translators
TranslatorsTranslators
Translators
MrsEhm
 
Final presentation MIS 637 A - Rishab Kothari
Final presentation MIS 637 A - Rishab KothariFinal presentation MIS 637 A - Rishab Kothari
Final presentation MIS 637 A - Rishab Kothari
Stevens Institute of Technology
 
Algoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data MiningAlgoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data Mining
Nasha Dmasive
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Alex Pinto
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
Abdelfattah Al Zaqqa
 
Fin presentation
Fin presentationFin presentation
Fin presentation
amit gaur
 
Network Function Virtualization : Overview
Network Function Virtualization : OverviewNetwork Function Virtualization : Overview
Network Function Virtualization : Overview
sidneel
 
Techniques in Translation
Techniques in TranslationTechniques in Translation
Techniques in Translation
juvelle villafania
 
5 pen pc technology
5 pen pc technology5 pen pc technology
5 pen pc technology
Muhsin Nangarath
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
Venkata Reddy Konasani
 
EVO - Gamification in healthcare - Manu Melwin Joy
EVO - Gamification in healthcare - Manu Melwin JoyEVO - Gamification in healthcare - Manu Melwin Joy
EVO - Gamification in healthcare - Manu Melwin Joy
manumelwin
 
Decision tree
Decision treeDecision tree
Decision tree
Venkata Reddy Konasani
 
RapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid MinerRapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid Miner
DataminingTools Inc
 
Sales forecasting
Sales forecastingSales forecasting
Sales forecasting
Jerry Heikal
 

Viewers also liked (20)

Protecting Web App users in today’s hostile environment
Protecting Web App users in today’s hostile environmentProtecting Web App users in today’s hostile environment
Protecting Web App users in today’s hostile environment
 
How should we perceive Security in the Cloud
How should we perceive Security in the CloudHow should we perceive Security in the Cloud
How should we perceive Security in the Cloud
 
Network Function Virtualization (NFV) BoF
Network Function Virtualization (NFV) BoFNetwork Function Virtualization (NFV) BoF
Network Function Virtualization (NFV) BoF
 
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
 
Translators
TranslatorsTranslators
Translators
 
Final presentation MIS 637 A - Rishab Kothari
Final presentation MIS 637 A - Rishab KothariFinal presentation MIS 637 A - Rishab Kothari
Final presentation MIS 637 A - Rishab Kothari
 
Algoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data MiningAlgoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data Mining
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
 
Fin presentation
Fin presentationFin presentation
Fin presentation
 
Network Function Virtualization : Overview
Network Function Virtualization : OverviewNetwork Function Virtualization : Overview
Network Function Virtualization : Overview
 
Techniques in Translation
Techniques in TranslationTechniques in Translation
Techniques in Translation
 
5 pen pc technology
5 pen pc technology5 pen pc technology
5 pen pc technology
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
EVO - Gamification in healthcare - Manu Melwin Joy
EVO - Gamification in healthcare - Manu Melwin JoyEVO - Gamification in healthcare - Manu Melwin Joy
EVO - Gamification in healthcare - Manu Melwin Joy
 
Decision tree
Decision treeDecision tree
Decision tree
 
RapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid MinerRapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid Miner
 
Sales forecasting
Sales forecastingSales forecasting
Sales forecasting
 

Similar to Decision Trees

Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
ssuser4c50a9
 
Decision trees
Decision treesDecision trees
Decision trees
Ncib Lotfi
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Maninda Edirisooriya
 
Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic Unveiled
Luca Zavarella
 
Classfication Basic.ppt
Classfication Basic.pptClassfication Basic.ppt
Classfication Basic.ppt
henonah
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
Vijay Yadav
 
CS632_Lecture_15_updated.pptx
CS632_Lecture_15_updated.pptxCS632_Lecture_15_updated.pptx
CS632_Lecture_15_updated.pptx
MuhammadAbubakar114879
 
Lecture08_Decision Tree Learning PartII.pptx
Lecture08_Decision Tree Learning PartII.pptxLecture08_Decision Tree Learning PartII.pptx
Lecture08_Decision Tree Learning PartII.pptx
EasyConceptByZohaib
 
Lecture4.ppt
Lecture4.pptLecture4.ppt
Lecture4.ppt
Minakshee Patil
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
Wake Tech BAS
 
Decision tree
Decision treeDecision tree
Decision tree
Karan Deopura
 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classification
Kalpna Saharan
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
Decision tree
Decision treeDecision tree
Classification (ML).ppt
Classification (ML).pptClassification (ML).ppt
Classification (ML).ppt
rajasamal1999
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
Nandhini S
 
Decision tree
Decision treeDecision tree
Decision tree
Varun Jain
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
Souma Maiti
 
08ClassBasic VT.ppt
08ClassBasic VT.ppt08ClassBasic VT.ppt
08ClassBasic VT.ppt
GaneshaAdhik
 

Similar to Decision Trees (20)

Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
 
Decision trees
Decision treesDecision trees
Decision trees
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic Unveiled
 
Classfication Basic.ppt
Classfication Basic.pptClassfication Basic.ppt
Classfication Basic.ppt
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
 
CS632_Lecture_15_updated.pptx
CS632_Lecture_15_updated.pptxCS632_Lecture_15_updated.pptx
CS632_Lecture_15_updated.pptx
 
Lecture08_Decision Tree Learning PartII.pptx
Lecture08_Decision Tree Learning PartII.pptxLecture08_Decision Tree Learning PartII.pptx
Lecture08_Decision Tree Learning PartII.pptx
 
Lecture4.ppt
Lecture4.pptLecture4.ppt
Lecture4.ppt
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
Decision tree
Decision treeDecision tree
Decision tree
 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classification
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Decision tree
Decision treeDecision tree
Decision tree
 
Classification (ML).ppt
Classification (ML).pptClassification (ML).ppt
Classification (ML).ppt
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
 
08ClassBasic VT.ppt
08ClassBasic VT.ppt08ClassBasic VT.ppt
08ClassBasic VT.ppt
 

Recently uploaded

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 

Recently uploaded (20)

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 

Decision Trees

  • 1. Advanced Machine Learning with Python Session 9 :Decision Trees SIGKDD Carlos Santillan Bentley Systems Inc csantill@gmail.com
  • 2.
  • 3.
  • 4. Decision Trees A tree-like graph decision support model
  • 6. Types of Decision Trees There are two main Types • Classification Tree (Categorical Value Decision Tree) • Regression Tree (Continuous Variable Decision Tree) CART (Classification and Regression Tree) Used to refer to both The type of a Decision tree is based on the type of the target Variable
  • 7. Nodes 1.Root Node 2.Internal Node (Decision Node) 3.Leaf (terminal) Depth - Length of of longest path from root to leaf Decision Stump (One level decision Tree) Decision Tree Terms
  • 8. Decision Tree Algorithm The basic greedy algorithm is as follows: Start at Node find “best attribute” to split at Repartition N into N1, N2, … according to best split Repeat for each Node N until “stop condition” is met Growing an optimal Decision Tree is an NP-complete Problem Fortunately greedy algorithms have good accuracy and performance
  • 9. What is the “Best Attribute” to split There are different criteria that can be used to determine what is the best attribute to split. • Information Gain • Gini Index • Classification Error • Gain Ratio (Normalized Information Gain) • Variance Reduction
  • 11. Entropy Def: Measure of Impurity in our sample • Entropy =0 (All elements are same class) • Entropy =1 (All elements evenly split between classes)
  • 12. Information Gain Information Gain = Entropy (Parent) - [ Weighted Average] Entropy (Children) If we Split X < 4 • Entropy < 4 = 0.86 • Entropy > 4 = 0 Information Gain = 0.95 - 14/16 (0.86) - (2/16) (0) Information Gain = 0.19
  • 13. Information Gain IG = Entropy (Parent) - [ Weighted Average] Entropy (Children) If we Split X < 3 • Entropy < 3 = 0 • Entropy > 3 = 0.811 Information Gain = 0.95 - 8/16 (0) - (2/16) (0.811) Information Gain = 0.8486
  • 14. GINI Index Definition : Expected error rate • GINI =0 (All elements are same class) • GINI =0.5 (All elements evenly split between classes)
  • 15. GINI Gain If we Split X < 4 • Gini < 4 = 0.4081 • Gini > 4 = 0 Gini Gain = 0.4687 - 10/16 (0.4081) - (0/16) (0) Gini Gain = 0.2136
  • 16. GINI Gain If we Split X < 3 • Gini< 3 = 0 • Gini > 3 = 0.375 Gini Gain = 0.4687 - 8/16 (0) - (2/16) (0.375) Gini Gain = 0.421825
  • 17. When to use which? ● Gini for continuous attributes ● Entropy for categorical. ● Entropy is slower to calculate than GINI ● Gini may fail with very small probability ● Difference between the two is theoretically around 2%
  • 18. When to stop growing? • All data points at leaf are pure • When tree a reaches depth k • Number of cases in node less that minimum number of cases • Splitting criteria less than certain threshold
  • 19. Pruning Prevent over fitting Smaller trees may be more accurate Strategies: • Prepruning : Stop growing when information becomes unreliable • Postpruning : fully grow a tree and remove unreliable parts Note: Pruning currently not supported by scikit
  • 20. Algorithms ID3 (Iterative Dichotomiser 3) Greedy algorithm, categorical (entropy) C4.5 Improves on ID3 support categorical and continuous (entropy) C5.0 (See5) CART similar to C4.5 (Gini Impurity)
  • 21. Pros • Easy to Understand (white box) • Supports both Numerical and Categorical data • Fast (greedy) algorithms • Performs well with large datasets • Accurate • Feature importance
  • 22. Cons • Without pruning/Cross-validation Prone to overfitting • Information gain biased toward features with a lot of classes • Sensitive to changes in the data
  • 23. DEMO
  • 24. Resources • https://github.com/csantill/AustinSIGKDD-DecisionTrees • Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning • Classification and Regression Trees • A Visual Introduction to Machine Learning • A Complete Tutorial on Tree Based Modeling from Scratch • Theoretical Comparison between the Gini Index and Information Gain Criteria