SlideShare a Scribd company logo
Machine Learning with
DECISION TREES
Ramandeep Kaur
Software Consultant
Knoldus Software LLP
Agenda
● What are Decision trees?
● Appropriate problems for DTrees
● Information gain , Entropy
● Issues with DTrees
● Overfitting
● Pruning
● Demo
What are Decision trees?
● A decision tree is a tree in which each branch
node represents a choice between a number
of alternatives, and each leaf node represents
a decision.
● A type of supervised learning algorithm.
● It is one of the most widely used and practical
methods for Inductive Inference.
Example
Appropriate problems for DTrees
● Instances are represented by attribute-value pairs
● The target function has discrete output values
● The training data may contain errors
● The training data may have missing attribute values
How to select the deciding node?
Which is the best Classifier?
Which node can be described easily?
➔ Less impure node requires less information to describe it.
➔ More impure node requires more information.
===> Information theory is a measure to
define this degree of disorganization in a
system known as Entropy.
So, we can conclude
Entropy - measuring
homogeneity of a learning set
● Entropy is a measure of the uncertainty about a source of
messages.
● Given a collection S, containing positive and negative
examples of some target concept, the entropy of S relative to
this classification.
where, pi is the proportion of S belonging to class i
●
Entropy is 0 if all the members
of S belong to the same class.
● Entropy is 1 when the collection
contains an equal no. of +ve
and -ve examples.
●
Entropy is between 0 and 1 if
the collection contains unequal
no. of +ve and -ve examples.
Information gain
● Decides which attribute goes into a decision node.
● To minimize the decision tree depth, the attribute with the
most entropy reduction is the best choice!
Where:
● S is each value v of all possible values of attribute A
● Sv = subset of S for which attribute A has value v
● |Sv| = number of elements in Sv
● |S| = number of elements in S
The information gain, Gain(S,A) of an attribute A,
Issues in Decision Tree
Learning
ISSUES
● How deeply to grow the decision tree?
● Handling continuous attributes
● Choosing an appropriate attribute selection
measure
● Handling training data with missing attribute
values
Overfitting in Decision Trees
● If a decision tree is fully grown, it may lose
some generalization capability.
● This is a phenomenon known as overfitting.
Overfitting
A hypothesis overfits the training examples if
some other hypothesis that fits the training
examples less well actually performs better
over the entire distribution of instances (i.e,
including instances beyond the training
examples)
Why ??
Causes of Overfitting
● Overfitting Due to Presence of Noise -
Mislabeled instances may contradict the class
labels of other similar records.
● Overfitting Due to Lack of Representative
Instances - Lack of representative instances
in the training data can prevent refinement of
the learning algorithm.
Overfitting Due to Noise: An
Example
Overfitting Due to Noise
Overfitting Due to Noise
Overfitting Due to Lack of
Samples
Overfitting Due to Lack of
Samples
● Although the model’s training error
is zero, its error rate on the test set
is 30%.
● Humans, elephants, and dolphins
are misclassified because the
decision tree classifies all
warmblooded vertebrates that do not
hibernate as non-mammals.
A good model must not only fit the
training data well
but also accurately classify records
it has never seen.
Avoiding overfitting in decision
tree learning
● approaches that stop growing the tree earlier,
before it reaches the point where it perfectly
classifies the training data.
● approaches that allow the tree to overfit the
data, and then post-prune the tree.
Approaches to implement
● separate set of examples
● a statistical test: chi-square test
● a heuristic called the Minimum Description
Length principle - (explicit measure of the
complexity for encoding the training examples
and the decision tree, halting growth of the
tree when this encoding size is minimized.)
PRUNING
● Consider each of the decision nodes in the tree to be candidates for
pruning.
● Pruning a decision node consists of removing the subtree rooted at
that node, making it a leaf node, and assigning it the most common
classification of the training examples affiliated with that node.
● Nodes are removed only if the resulting pruned tree performs no
worse than the original over the validation set.
● Pruning of nodes continues until further pruning is harmful (i.e.,
decreases accuracy of the tree over the validation set).
DEMO
References
● Machine Learning – Tom Mitchell
● https://www3.nd.edu/~rjohns15/cse40647.sp14/ww
Thank You

More Related Content

What's hot

Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
amalalhait
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
Archana Swaminathan
 
supervised learning
supervised learningsupervised learning
supervised learning
Amar Tripathi
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
Haris Jamil
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
Samra Shahzadi
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
Paras Kohli
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
Lippo Group Digital
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
kishanthkumaar
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
Knoldus Inc.
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
Mahbubur Rahman Shimul
 
Feature selection
Feature selectionFeature selection
Feature selection
Dong Guo
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
SHUBHAM GUPTA
 

What's hot (20)

Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Feature selection
Feature selectionFeature selection
Feature selection
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 

Similar to Machine Learning with Decision trees

MLF-2.pptx
MLF-2.pptxMLF-2.pptx
MLF-2.pptx
DevarapalliVamsi1
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
RaflyRizky2
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Maninda Edirisooriya
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
kibriaswe
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
zohebmusharraf
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
Viet-Trung TRAN
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.
DrezzingGaming
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
Kunal Jain
 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision Trees
Rupak Roy
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
GaytriDhingra1
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
GaytriDhingra1
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
Rupak Roy
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
Decision tree
Decision tree Decision tree
Decision tree
Learnbay Datascience
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Sherri Gunder
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
Chitrachitrap
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
DurgaDevi310087
 

Similar to Machine Learning with Decision trees (20)

MLF-2.pptx
MLF-2.pptxMLF-2.pptx
MLF-2.pptx
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision Trees
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Decision tree
Decision tree Decision tree
Decision tree
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
 

More from Knoldus Inc.

Using InfluxDB for real-time monitoring in Jmeter
Using InfluxDB for real-time monitoring in JmeterUsing InfluxDB for real-time monitoring in Jmeter
Using InfluxDB for real-time monitoring in Jmeter
Knoldus Inc.
 
Intoduction to KubeVela Presentation (DevOps)
Intoduction to KubeVela Presentation (DevOps)Intoduction to KubeVela Presentation (DevOps)
Intoduction to KubeVela Presentation (DevOps)
Knoldus Inc.
 
Stakeholder Management (Project Management) Presentation
Stakeholder Management (Project Management) PresentationStakeholder Management (Project Management) Presentation
Stakeholder Management (Project Management) Presentation
Knoldus Inc.
 
Introduction To Kaniko (DevOps) Presentation
Introduction To Kaniko (DevOps) PresentationIntroduction To Kaniko (DevOps) Presentation
Introduction To Kaniko (DevOps) Presentation
Knoldus Inc.
 
Efficient Test Environments with Infrastructure as Code (IaC)
Efficient Test Environments with Infrastructure as Code (IaC)Efficient Test Environments with Infrastructure as Code (IaC)
Efficient Test Environments with Infrastructure as Code (IaC)
Knoldus Inc.
 
Exploring Terramate DevOps (Presentation)
Exploring Terramate DevOps (Presentation)Exploring Terramate DevOps (Presentation)
Exploring Terramate DevOps (Presentation)
Knoldus Inc.
 
Clean Code in Test Automation Differentiating Between the Good and the Bad
Clean Code in Test Automation  Differentiating Between the Good and the BadClean Code in Test Automation  Differentiating Between the Good and the Bad
Clean Code in Test Automation Differentiating Between the Good and the Bad
Knoldus Inc.
 
Integrating AI Capabilities in Test Automation
Integrating AI Capabilities in Test AutomationIntegrating AI Capabilities in Test Automation
Integrating AI Capabilities in Test Automation
Knoldus Inc.
 
State Management with NGXS in Angular.pptx
State Management with NGXS in Angular.pptxState Management with NGXS in Angular.pptx
State Management with NGXS in Angular.pptx
Knoldus Inc.
 
Authentication in Svelte using cookies.pptx
Authentication in Svelte using cookies.pptxAuthentication in Svelte using cookies.pptx
Authentication in Svelte using cookies.pptx
Knoldus Inc.
 
OAuth2 Implementation Presentation (Java)
OAuth2 Implementation Presentation (Java)OAuth2 Implementation Presentation (Java)
OAuth2 Implementation Presentation (Java)
Knoldus Inc.
 
Supply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptxSupply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptx
Knoldus Inc.
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Knoldus Inc.
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On Introduction
Knoldus Inc.
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptx
Knoldus Inc.
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
Knoldus Inc.
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdf
Knoldus Inc.
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptx
Knoldus Inc.
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
Knoldus Inc.
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
Knoldus Inc.
 

More from Knoldus Inc. (20)

Using InfluxDB for real-time monitoring in Jmeter
Using InfluxDB for real-time monitoring in JmeterUsing InfluxDB for real-time monitoring in Jmeter
Using InfluxDB for real-time monitoring in Jmeter
 
Intoduction to KubeVela Presentation (DevOps)
Intoduction to KubeVela Presentation (DevOps)Intoduction to KubeVela Presentation (DevOps)
Intoduction to KubeVela Presentation (DevOps)
 
Stakeholder Management (Project Management) Presentation
Stakeholder Management (Project Management) PresentationStakeholder Management (Project Management) Presentation
Stakeholder Management (Project Management) Presentation
 
Introduction To Kaniko (DevOps) Presentation
Introduction To Kaniko (DevOps) PresentationIntroduction To Kaniko (DevOps) Presentation
Introduction To Kaniko (DevOps) Presentation
 
Efficient Test Environments with Infrastructure as Code (IaC)
Efficient Test Environments with Infrastructure as Code (IaC)Efficient Test Environments with Infrastructure as Code (IaC)
Efficient Test Environments with Infrastructure as Code (IaC)
 
Exploring Terramate DevOps (Presentation)
Exploring Terramate DevOps (Presentation)Exploring Terramate DevOps (Presentation)
Exploring Terramate DevOps (Presentation)
 
Clean Code in Test Automation Differentiating Between the Good and the Bad
Clean Code in Test Automation  Differentiating Between the Good and the BadClean Code in Test Automation  Differentiating Between the Good and the Bad
Clean Code in Test Automation Differentiating Between the Good and the Bad
 
Integrating AI Capabilities in Test Automation
Integrating AI Capabilities in Test AutomationIntegrating AI Capabilities in Test Automation
Integrating AI Capabilities in Test Automation
 
State Management with NGXS in Angular.pptx
State Management with NGXS in Angular.pptxState Management with NGXS in Angular.pptx
State Management with NGXS in Angular.pptx
 
Authentication in Svelte using cookies.pptx
Authentication in Svelte using cookies.pptxAuthentication in Svelte using cookies.pptx
Authentication in Svelte using cookies.pptx
 
OAuth2 Implementation Presentation (Java)
OAuth2 Implementation Presentation (Java)OAuth2 Implementation Presentation (Java)
OAuth2 Implementation Presentation (Java)
 
Supply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptxSupply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptx
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On Introduction
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptx
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdf
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptx
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
 

Recently uploaded

Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 

Recently uploaded (20)

Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 

Machine Learning with Decision trees

  • 1. Machine Learning with DECISION TREES Ramandeep Kaur Software Consultant Knoldus Software LLP
  • 2. Agenda ● What are Decision trees? ● Appropriate problems for DTrees ● Information gain , Entropy ● Issues with DTrees ● Overfitting ● Pruning ● Demo
  • 3. What are Decision trees? ● A decision tree is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision. ● A type of supervised learning algorithm.
  • 4. ● It is one of the most widely used and practical methods for Inductive Inference.
  • 6. Appropriate problems for DTrees ● Instances are represented by attribute-value pairs ● The target function has discrete output values ● The training data may contain errors ● The training data may have missing attribute values
  • 7. How to select the deciding node? Which is the best Classifier?
  • 8. Which node can be described easily?
  • 9. ➔ Less impure node requires less information to describe it. ➔ More impure node requires more information. ===> Information theory is a measure to define this degree of disorganization in a system known as Entropy. So, we can conclude
  • 10. Entropy - measuring homogeneity of a learning set ● Entropy is a measure of the uncertainty about a source of messages. ● Given a collection S, containing positive and negative examples of some target concept, the entropy of S relative to this classification. where, pi is the proportion of S belonging to class i
  • 11. ● Entropy is 0 if all the members of S belong to the same class. ● Entropy is 1 when the collection contains an equal no. of +ve and -ve examples. ● Entropy is between 0 and 1 if the collection contains unequal no. of +ve and -ve examples.
  • 12. Information gain ● Decides which attribute goes into a decision node. ● To minimize the decision tree depth, the attribute with the most entropy reduction is the best choice!
  • 13. Where: ● S is each value v of all possible values of attribute A ● Sv = subset of S for which attribute A has value v ● |Sv| = number of elements in Sv ● |S| = number of elements in S The information gain, Gain(S,A) of an attribute A,
  • 14. Issues in Decision Tree Learning
  • 15. ISSUES ● How deeply to grow the decision tree? ● Handling continuous attributes ● Choosing an appropriate attribute selection measure ● Handling training data with missing attribute values
  • 16. Overfitting in Decision Trees ● If a decision tree is fully grown, it may lose some generalization capability. ● This is a phenomenon known as overfitting.
  • 17. Overfitting A hypothesis overfits the training examples if some other hypothesis that fits the training examples less well actually performs better over the entire distribution of instances (i.e, including instances beyond the training examples)
  • 19. Causes of Overfitting ● Overfitting Due to Presence of Noise - Mislabeled instances may contradict the class labels of other similar records. ● Overfitting Due to Lack of Representative Instances - Lack of representative instances in the training data can prevent refinement of the learning algorithm.
  • 20. Overfitting Due to Noise: An Example
  • 23. Overfitting Due to Lack of Samples
  • 24. Overfitting Due to Lack of Samples ● Although the model’s training error is zero, its error rate on the test set is 30%. ● Humans, elephants, and dolphins are misclassified because the decision tree classifies all warmblooded vertebrates that do not hibernate as non-mammals.
  • 25. A good model must not only fit the training data well but also accurately classify records it has never seen.
  • 26. Avoiding overfitting in decision tree learning ● approaches that stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data. ● approaches that allow the tree to overfit the data, and then post-prune the tree.
  • 27. Approaches to implement ● separate set of examples ● a statistical test: chi-square test ● a heuristic called the Minimum Description Length principle - (explicit measure of the complexity for encoding the training examples and the decision tree, halting growth of the tree when this encoding size is minimized.)
  • 28. PRUNING ● Consider each of the decision nodes in the tree to be candidates for pruning. ● Pruning a decision node consists of removing the subtree rooted at that node, making it a leaf node, and assigning it the most common classification of the training examples affiliated with that node. ● Nodes are removed only if the resulting pruned tree performs no worse than the original over the validation set. ● Pruning of nodes continues until further pruning is harmful (i.e., decreases accuracy of the tree over the validation set).
  • 29.
  • 30.
  • 31. DEMO
  • 32. References ● Machine Learning – Tom Mitchell ● https://www3.nd.edu/~rjohns15/cse40647.sp14/ww