SlideShare a Scribd company logo
1 of 11
Download to read offline
Machine
Learning - IV
Decision Trees
What is decision Tree?
Decision trees are a type of supervised machine learning model or in
simple words a branching method where the data is continuously split
according to a certain parameter.
Rupak Roy
Binary target variable
If the target variable for a decision tree is a binary we will use Binary
Decision tree
Target Variable:
1 for Sales, 0 for No-sales
30%
50%
70%
100%
Adv.= 1K
Sales =10K
Adv. =8,000,
Sales = 91,000
Adv. =6,000,
Sales = 65,000
Adv. =4,500,
Sales = 33,000
Adv. =1500,
Sales = 32000
Adv. =2000,
Sales = 26,000
Adv. =2,000
Sales = 9000
Adv.= 900
Sales = 6310
Rupak Roy
Continuous target variable
IF the target variable is numeric like Income (a continuous variable not
discrete like Yes or No) We will use Regression Tree for prediction.
What happens is the target variable is spitted each in the tree is chosen
to decrease the variance in the values of the target variable within
each child node.
In simple words,,
If Average Income is less than 70K
than categorize it and create a new tree
under 60k. Again if less then 60K than
create new tree under 50 and 10
Avg.
Income
$70k
60K
50 10
49
38
Yes No
Rupak Roy
Continuous target variable
Example: a company wants to impute missing values in the income
field for its customers. The average income of a person is 30K. The
company can assign the missing values using the rules created from
decision trees for an better estimate.
Terminology:
• Base node is also known as root node
• Any node which can be further splitted is called as decision node.
• The node that cannot be further splitted are called as terminal nodes
or leaf nodes.
• The process of cutting down the tree or removing sections of it is
called Pruning.
• The process of adding a whole section to a tree is called as grafting.
Rupak Roy
Data preparation for decision trees
Most decision trees can handle categorical & continuous variables so
there is no need of much data transformation.
Classification trees is used if the target variable is discrete.
Regression trees is used when the target variable is continuous.
Removing the records due to missing values is likely to create a bias
training set because the records with missing values are not likely to be
a random sample of the population. Removing the missing values has
the risk that important information associated will be lost.
And Replacing them with imputed values has the risk of diverting the
important properties/ information which will tend to create a bias
model .
Treating them as separate category is better than assigning them as
average values.
Decision trees
are a non-parametric technique.
What is non-parametric technique?
A parametric statistical test is one that makes assumptions about the parameters
(designing properties)of the population distributions(s) from which one’s data are
drawn, while a non-parametric is the one that makes no such assumptions.
For practical purpose, you can think of “parametric” as referring to tests, such as t-test
and the analysis of variance, that assume the underlying source populations(s) to be
normally distributed; they generally also assume that one’s measures derive from an
equal-interval scale. And you think of “non-parametric” as referring to tests that do
not make on these particular assumptions.
 Examples of non-parametric tests include the various forms of chi-square tests,
 the Fisher Exact Probability test,
 the Mann-Whitney Test,
 the Wilcoxon Signed-Rank Test,
 the Kruskal-Wallis Test and the Friedman Test.
 Non-parametric tests are sometimes spoken of as "distribution-free" tests.
Hence Decision Trees are not effected by outliers.
Rupak Roy
Steps for Decision tree
1. Find the split
- Identify all possible split options
- Choose the best split value for the tree
2. Grow the tree
- Continue growing tree as much as possible
3. Prune the tree
- Stop/Prune the tree using a size based criteria
4. Extract the rules
- Extract the rules generated from the tree.
Finding the right split
The one which creates most homogenous population is considered to be
the Best Split
Poor split Good split (homogenous)
There are various decision tree algorithms that helps to split the data in
smaller & smaller group in a way that each new nodes has greater purity
than its parent nodes with respect to the target variable.
Splits are evaluated based on the node purity in terms of the target variable.
These means the splitting criteria depends on the type of target variable
and not on the type of the input variable.
Rupak Roy
Finding the right split algorithm
1. For Categorical target variable we use
- Gini - Chi-square - Information Gain
2. For Continuous target variable
- Reduction in variance
3. Other methods:
- Information Gain is a improvement over the information gain
measure.
- -F test, t measures the variance in distributions between parent &
child nodes. It is used when the target variable is continuous.
Rupak Roy
Next
Let’s learn each of them in detail
Rupak Roy

More Related Content

What's hot

From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision treehktripathy
 
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanAccelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanPyData
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forestsMarc Garcia
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lectureShreyas S K
 
Types of analytics & the structures of data
Types of analytics & the structures of dataTypes of analytics & the structures of data
Types of analytics & the structures of dataRupak Roy
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Covering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmCovering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmZHAO Sam
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: InsuranceGregg Barrett
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 

What's hot (20)

Decision tree
Decision treeDecision tree
Decision tree
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Random forest
Random forestRandom forest
Random forest
 
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanAccelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
Krupa rm
Krupa rmKrupa rm
Krupa rm
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lecture
 
Classification
ClassificationClassification
Classification
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
Types of analytics & the structures of data
Types of analytics & the structures of dataTypes of analytics & the structures of data
Types of analytics & the structures of data
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Covering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmCovering (Rules-based) Algorithm
Covering (Rules-based) Algorithm
 
Decision tree
Decision treeDecision tree
Decision tree
 
Clustering
ClusteringClustering
Clustering
 
Decision tree
Decision treeDecision tree
Decision tree
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: Insurance
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 

Similar to Machine Learning - Decision Trees

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptxRaflyRizky2
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
Causal Random Forest
Causal Random ForestCausal Random Forest
Causal Random ForestBong-Ho Lee
 
Understanding Decision Trees in Machine Learning: A Comprehensive Guide
Understanding Decision Trees in Machine Learning: A Comprehensive GuideUnderstanding Decision Trees in Machine Learning: A Comprehensive Guide
Understanding Decision Trees in Machine Learning: A Comprehensive Guidecyberprosocial
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data miningEr. Nawaraj Bhandari
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progressoveesingh
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)AlexAman1
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - BengaluruKunal Jain
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentationVijay Yadav
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...INFOGAIN PUBLICATION
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.DrezzingGaming
 

Similar to Machine Learning - Decision Trees (20)

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Decision tree
Decision tree Decision tree
Decision tree
 
Causal Random Forest
Causal Random ForestCausal Random Forest
Causal Random Forest
 
Understanding Decision Trees in Machine Learning: A Comprehensive Guide
Understanding Decision Trees in Machine Learning: A Comprehensive GuideUnderstanding Decision Trees in Machine Learning: A Comprehensive Guide
Understanding Decision Trees in Machine Learning: A Comprehensive Guide
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progresso
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
 
decisiontrees (3).ppt
decisiontrees (3).pptdecisiontrees (3).ppt
decisiontrees (3).ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.
 

More from Rupak Roy

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPRupak Roy
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPRupak Roy
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLPRupak Roy
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical StepsRupak Roy
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment AnalysisRupak Roy
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular ExpressionsRupak Roy
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining Rupak Roy
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase ArchitectureRupak Roy
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase Rupak Roy
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQLRupak Roy
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSRupak Roy
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Rupak Roy
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to FlumeRupak Roy
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Rupak Roy
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command LineRupak Roy
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations Rupak Roy
 

More from Rupak Roy (20)

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Machine Learning - Decision Trees

  • 2. What is decision Tree? Decision trees are a type of supervised machine learning model or in simple words a branching method where the data is continuously split according to a certain parameter. Rupak Roy
  • 3. Binary target variable If the target variable for a decision tree is a binary we will use Binary Decision tree Target Variable: 1 for Sales, 0 for No-sales 30% 50% 70% 100% Adv.= 1K Sales =10K Adv. =8,000, Sales = 91,000 Adv. =6,000, Sales = 65,000 Adv. =4,500, Sales = 33,000 Adv. =1500, Sales = 32000 Adv. =2000, Sales = 26,000 Adv. =2,000 Sales = 9000 Adv.= 900 Sales = 6310 Rupak Roy
  • 4. Continuous target variable IF the target variable is numeric like Income (a continuous variable not discrete like Yes or No) We will use Regression Tree for prediction. What happens is the target variable is spitted each in the tree is chosen to decrease the variance in the values of the target variable within each child node. In simple words,, If Average Income is less than 70K than categorize it and create a new tree under 60k. Again if less then 60K than create new tree under 50 and 10 Avg. Income $70k 60K 50 10 49 38 Yes No Rupak Roy
  • 5. Continuous target variable Example: a company wants to impute missing values in the income field for its customers. The average income of a person is 30K. The company can assign the missing values using the rules created from decision trees for an better estimate. Terminology: • Base node is also known as root node • Any node which can be further splitted is called as decision node. • The node that cannot be further splitted are called as terminal nodes or leaf nodes. • The process of cutting down the tree or removing sections of it is called Pruning. • The process of adding a whole section to a tree is called as grafting. Rupak Roy
  • 6. Data preparation for decision trees Most decision trees can handle categorical & continuous variables so there is no need of much data transformation. Classification trees is used if the target variable is discrete. Regression trees is used when the target variable is continuous. Removing the records due to missing values is likely to create a bias training set because the records with missing values are not likely to be a random sample of the population. Removing the missing values has the risk that important information associated will be lost. And Replacing them with imputed values has the risk of diverting the important properties/ information which will tend to create a bias model . Treating them as separate category is better than assigning them as average values.
  • 7. Decision trees are a non-parametric technique. What is non-parametric technique? A parametric statistical test is one that makes assumptions about the parameters (designing properties)of the population distributions(s) from which one’s data are drawn, while a non-parametric is the one that makes no such assumptions. For practical purpose, you can think of “parametric” as referring to tests, such as t-test and the analysis of variance, that assume the underlying source populations(s) to be normally distributed; they generally also assume that one’s measures derive from an equal-interval scale. And you think of “non-parametric” as referring to tests that do not make on these particular assumptions.  Examples of non-parametric tests include the various forms of chi-square tests,  the Fisher Exact Probability test,  the Mann-Whitney Test,  the Wilcoxon Signed-Rank Test,  the Kruskal-Wallis Test and the Friedman Test.  Non-parametric tests are sometimes spoken of as "distribution-free" tests. Hence Decision Trees are not effected by outliers. Rupak Roy
  • 8. Steps for Decision tree 1. Find the split - Identify all possible split options - Choose the best split value for the tree 2. Grow the tree - Continue growing tree as much as possible 3. Prune the tree - Stop/Prune the tree using a size based criteria 4. Extract the rules - Extract the rules generated from the tree.
  • 9. Finding the right split The one which creates most homogenous population is considered to be the Best Split Poor split Good split (homogenous) There are various decision tree algorithms that helps to split the data in smaller & smaller group in a way that each new nodes has greater purity than its parent nodes with respect to the target variable. Splits are evaluated based on the node purity in terms of the target variable. These means the splitting criteria depends on the type of target variable and not on the type of the input variable. Rupak Roy
  • 10. Finding the right split algorithm 1. For Categorical target variable we use - Gini - Chi-square - Information Gain 2. For Continuous target variable - Reduction in variance 3. Other methods: - Information Gain is a improvement over the information gain measure. - -F test, t measures the variance in distributions between parent & child nodes. It is used when the target variable is continuous. Rupak Roy
  • 11. Next Let’s learn each of them in detail Rupak Roy