SlideShare a Scribd company logo
DECISION TREE
Content
 What is decision tree?
 Example
 How to select the deciding node?
 Entropy
 Information gain
 Gini Impurity
 Steps for Making decision tree
 Pros.
 Cons.
What is decision tree?
 Decision tree is the most powerful and
popular tool for classification and prediction.
A Decision tree is a flowchart like tree
structure, where each internal node
denotes a test on an attribute, each branch
represents an outcome of the test, and
each leaf node (terminal node) holds a
class label.
 It is a type of superised learning algorithm.
 It is one of the most widely used and
practical method for inductive inference
Example
 Let's assume we want to play
badminton on a particular day — say
Saturday — how will you decide
whether to play or not. Let's say you
go out and check if it's hot or cold,
check the speed of the wind and
humidity, how the weather is, i.e. is it
sunny, cloudy, or rainy. You take all
these factors into account to decide if
you want to play or not.
So, you calculate all these factors for the last ten days
and form a lookup table like the one below.
Day Weather Temperature Humidity Wind
1 Sunny Hot High Weak
2 Cloudy Hot High Weak
3 Sunny Mild Normal Strong
4 Cloudy Mild High Strong
5 Rainy Mild High Strong
6 Rainy Cool Normal Strong
7 Rainy Mild High Weak
8 Sunny Hot High Strong
9 Cloudy Hot Normal Weak
10 Rainy Mild High Strong
 A decision tree would be a great way to represent
data like this because it takes into account all the
possible paths that can lead to the final decision by
following a tree-like structure.
How to select the deciding
node?
Which is the best Classifier?
So,we can conclude
 Less impure node requires less
information to describe it.
 More impure node requires more
information.
 Information theory is a measure to
define this degree of disorganization in
a system known as Entropy.
Entropy - measuring
homogeneity of a learning set
 Entropy is a measure of the uncertainty about a
source of messages.
 Given a collection S, containing positive and
negative examples of some target concept, the
entropy of S relative to this classification.
where, pi is the proportion of S belonging to class i
Entropy is 0 if all
the members of S
belong to the same
class.
Entropy is 1 when
the collection
contains an equal
no. of +ve and -ve
examples.
Entropy is between
0 and 1 if the
collection contains
unequal no. of +ve
and -ve examples.
Information gain
 Decides which attribute goes into a
decision node.
 To minimize the decision tree depth,
the attribute with the most entropy
reduction is the best choice!
 They are of 2 types-
1. High Information Gain
2. Low Information Gain
 The information gain, Gain(S,A) of an attribute .
Where:
S is each value v of all possible values of attributeA Sv = subset of
S for which attribute A has valuev
|Sv| = number of elements in Sv
|S| = number of elements in S
High Information Gain
 An attribute with high information gain splits the
data into groups with an uneven number of
positives and negatives and as a result helps in
separating the two from each other.
Low Information Gain
 An attribute with low information gain
splits the data relatively evenly and as
a result doesn’t bring us any closer to
a decision.
Gini Impurity
 Gini Impurity is a measurement of the
likelihood of an incorrect classification of a new
instance of a random variable, if that new
instance were randomly classified according to
the distribution of class labels from the data
set.
 If our dataset is Pure then likelihood of
incorrect classification is 0. If our sample is
mixture of different classes then likelihood of
incorrect classification will be high.
 They are of 2 types
 Pure means, in a selected sample of
dataset all data belongs to same class.
 Impure means, data is mixture of different
classes.
Steps for Making decision tree
 Get list of rows (dataset) which are taken into
consideration for making decision tree (recursively
at each nodes).
 Calculate uncertanity of our dataset or Gini
impurity or how much our data is mixed up etc.
 Generate list of all question which needs to be
asked at that node.
 Partition rows into True rows and False rows based
on each question asked.
 Calculate information gain based on gini impurity
and partition of data from previous step.
 Update highest information gain based on each
question asked.
 Update best question based on information gain
(higher information gain).
 Divide the node on best question. Repeat again
from step 1 again until we get pure node (leaf
Pros.
 Easy to use and understand.
 Can handle both categorical and
numerical data.
 Resistant to outliers, hence require
little data preprocessing.
Cons.
 Prone to overfitting.
 Require some kind of measurement
as to how well they are doing.
 Need to be careful with parameter
tuning.
 Can create biased learned trees if
some classes dominate.
Decision tree

More Related Content

What's hot

Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
Marc Garcia
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
Krish_ver2
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Milind Gokhale
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Edureka!
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
Palin analytics
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Krish_ver2
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
Decision tree
Decision treeDecision tree
Decision tree
SEMINARGROOT
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
Palin analytics
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Decision tree
Decision treeDecision tree
Decision tree
R A Akerkar
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
Viet-Trung TRAN
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Simplilearn
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Random forest
Random forestRandom forest
Random forest
Ujjawal
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
Xueping Peng
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 

What's hot (20)

Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Random forest
Random forestRandom forest
Random forest
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 

Similar to Decision tree

Decision tree
Decision tree Decision tree
Decision tree
Learnbay Datascience
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
JayabharathiMuraliku
 
10 decision tree
10 decision tree10 decision tree
10 decision tree
Vishal Dutt
 
Desicion tree and neural networks
Desicion tree and neural networksDesicion tree and neural networks
Desicion tree and neural networks
jaskarankaur21
 
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docxBUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
curwenmichaela
 
Forms of learning in ai
Forms of learning in aiForms of learning in ai
Forms of learning in ai
Robert Antony
 
Lt. 5 Pattern Reg.pptx
Lt. 5  Pattern Reg.pptxLt. 5  Pattern Reg.pptx
Lt. 5 Pattern Reg.pptx
ssuser5c580e1
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
Lippo Group Digital
 
Handout11
Handout11Handout11
Handout11
butest
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
Padma Metta
 
Statistics for management
Statistics for managementStatistics for management
Statistics for management
John Prarthan
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Sherri Gunder
 
Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methods
sonangrai
 
ML_Unit_1_Part_C
ML_Unit_1_Part_CML_Unit_1_Part_C
ML_Unit_1_Part_C
Srimatre K
 
data
datadata
data
som allul
 
Classifiers
ClassifiersClassifiers
Classifiers
Ayurdata
 
coppin chapter 10e.ppt
coppin chapter 10e.pptcoppin chapter 10e.ppt
coppin chapter 10e.ppt
butest
 

Similar to Decision tree (20)

Decision tree
Decision tree Decision tree
Decision tree
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
10 decision tree
10 decision tree10 decision tree
10 decision tree
 
Desicion tree and neural networks
Desicion tree and neural networksDesicion tree and neural networks
Desicion tree and neural networks
 
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docxBUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
 
Forms of learning in ai
Forms of learning in aiForms of learning in ai
Forms of learning in ai
 
Lt. 5 Pattern Reg.pptx
Lt. 5  Pattern Reg.pptxLt. 5  Pattern Reg.pptx
Lt. 5 Pattern Reg.pptx
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Handout11
Handout11Handout11
Handout11
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
 
Statistics for management
Statistics for managementStatistics for management
Statistics for management
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methods
 
ML_Unit_1_Part_C
ML_Unit_1_Part_CML_Unit_1_Part_C
ML_Unit_1_Part_C
 
data
datadata
data
 
Classifiers
ClassifiersClassifiers
Classifiers
 
coppin chapter 10e.ppt
coppin chapter 10e.pptcoppin chapter 10e.ppt
coppin chapter 10e.ppt
 

Recently uploaded

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 

Recently uploaded (20)

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 

Decision tree

  • 2. Content  What is decision tree?  Example  How to select the deciding node?  Entropy  Information gain  Gini Impurity  Steps for Making decision tree  Pros.  Cons.
  • 3. What is decision tree?  Decision tree is the most powerful and popular tool for classification and prediction. A Decision tree is a flowchart like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.  It is a type of superised learning algorithm.  It is one of the most widely used and practical method for inductive inference
  • 4. Example  Let's assume we want to play badminton on a particular day — say Saturday — how will you decide whether to play or not. Let's say you go out and check if it's hot or cold, check the speed of the wind and humidity, how the weather is, i.e. is it sunny, cloudy, or rainy. You take all these factors into account to decide if you want to play or not.
  • 5. So, you calculate all these factors for the last ten days and form a lookup table like the one below. Day Weather Temperature Humidity Wind 1 Sunny Hot High Weak 2 Cloudy Hot High Weak 3 Sunny Mild Normal Strong 4 Cloudy Mild High Strong 5 Rainy Mild High Strong 6 Rainy Cool Normal Strong 7 Rainy Mild High Weak 8 Sunny Hot High Strong 9 Cloudy Hot Normal Weak 10 Rainy Mild High Strong
  • 6.  A decision tree would be a great way to represent data like this because it takes into account all the possible paths that can lead to the final decision by following a tree-like structure.
  • 7. How to select the deciding node? Which is the best Classifier?
  • 8. So,we can conclude  Less impure node requires less information to describe it.  More impure node requires more information.  Information theory is a measure to define this degree of disorganization in a system known as Entropy.
  • 9. Entropy - measuring homogeneity of a learning set  Entropy is a measure of the uncertainty about a source of messages.  Given a collection S, containing positive and negative examples of some target concept, the entropy of S relative to this classification. where, pi is the proportion of S belonging to class i
  • 10. Entropy is 0 if all the members of S belong to the same class. Entropy is 1 when the collection contains an equal no. of +ve and -ve examples. Entropy is between 0 and 1 if the collection contains unequal no. of +ve and -ve examples.
  • 11. Information gain  Decides which attribute goes into a decision node.  To minimize the decision tree depth, the attribute with the most entropy reduction is the best choice!  They are of 2 types- 1. High Information Gain 2. Low Information Gain
  • 12.  The information gain, Gain(S,A) of an attribute . Where: S is each value v of all possible values of attributeA Sv = subset of S for which attribute A has valuev |Sv| = number of elements in Sv |S| = number of elements in S
  • 13. High Information Gain  An attribute with high information gain splits the data into groups with an uneven number of positives and negatives and as a result helps in separating the two from each other.
  • 14. Low Information Gain  An attribute with low information gain splits the data relatively evenly and as a result doesn’t bring us any closer to a decision.
  • 15. Gini Impurity  Gini Impurity is a measurement of the likelihood of an incorrect classification of a new instance of a random variable, if that new instance were randomly classified according to the distribution of class labels from the data set.  If our dataset is Pure then likelihood of incorrect classification is 0. If our sample is mixture of different classes then likelihood of incorrect classification will be high.  They are of 2 types  Pure means, in a selected sample of dataset all data belongs to same class.  Impure means, data is mixture of different classes.
  • 16. Steps for Making decision tree  Get list of rows (dataset) which are taken into consideration for making decision tree (recursively at each nodes).  Calculate uncertanity of our dataset or Gini impurity or how much our data is mixed up etc.  Generate list of all question which needs to be asked at that node.  Partition rows into True rows and False rows based on each question asked.  Calculate information gain based on gini impurity and partition of data from previous step.  Update highest information gain based on each question asked.  Update best question based on information gain (higher information gain).  Divide the node on best question. Repeat again from step 1 again until we get pure node (leaf
  • 17. Pros.  Easy to use and understand.  Can handle both categorical and numerical data.  Resistant to outliers, hence require little data preprocessing.
  • 18. Cons.  Prone to overfitting.  Require some kind of measurement as to how well they are doing.  Need to be careful with parameter tuning.  Can create biased learned trees if some classes dominate.