SlideShare a Scribd company logo
1 of 31
Machine Learning
Decision Tree and Random Forest
Machine Learning
• Introduction
• What is ML, DL, AL?
• Decision Tree
Definition
Why Decision Tree?
Basic Terminology
Challenges
• Random Forest
Definition
Why Random Forest
How it works?
• Advantages & Disadvantages
Machine Learning
According to Arthur Samuel (1950) “Machine Learning is a field of study that gives computers the
ability to learn without being explicitly programmed”.
Machine learning is a study and design of algorithms which can learn by processing input (learning
samples) data.
The most widely used definition of machine learning is that of Carnegie Mellon University Professor
Tom Mitchell: “A computer program is said to learn from experience ‘E’, with respect to some
class of tasks ‘T’ and performance measure ‘P’ if its performance at tasks in ‘T’ as measured by
‘P’ improves with experience ‘E’”.
Machine Learning
AI
ML
DS
DL
Decision Tree & Random Forest
• Decision Tree
 Definition
 Why Decision Tree?
 Basic Terminology
 Challenges
• Random Forest
 Definition
 Why Random Forest
 How it works?
Decision Tree
Decision tree is a supervised machine learning algorithm which can be used
for classification as well as for regression problems. It represent the target on
its leaf nodes as a result or inferring's with a tree like structure
Why Decision Tree?
 Helpful in solving more complex problem where a linear prediction line does not
perform well
 Gives wonderful graphical presentation of each possible results
Decision Tree & Random Forest
Why Decision Tree?
Prediction can be done with
a linear regression line
Dose(mg)
Effectiveness
Dose
(mg)
Age Sex Effect
10 25 F 95
20 78 M 0
35 52 F 98
5 12 M 44
… … … …
… … … …
… … … …
Prediction can not be done
with a linear regression line
Why Decision Tree?
Dose
(mg)
Age Sex Effect
10 25 F 95
20 78 M 0
35 52 F 98
5 12 M 44
… … … …
… … … …
… … … …
Sample dataset Sample Decision Tree
Decision Tree
Root Node
Intermediate
Node
Leaf node Leaf Node
Intermediate
Node
Leaf node
Root Node: The top-most node of a decision tree. It does not have any parent node. It represents
the entire population or sample
Leaf / Terminal Nodes: Nodes that do not have any child node are known as Terminal/Leaf Nodes
Challenge in building Decision Tree
Challenge in building Decision Tree:
1. How to decide splitting Criteria?
• Target Variable(Categorical)
• Target Variable(Continuous)
2. How to decide depth of decision tree/ when to stop?
• Considerably all data points have been covered
• Check for node purity/ homogeneity
3. Over fitting
• Pre Pruning
• Post Pruning
How to built a decision tree using criteria:
How to built a decision tree
Love
Popcorn Love Soda Gender
Love Ice
cream
Y Y M N
Y N F N
N Y M Y
N Y M Y
Y Y F Y
Y N F N
N N M N
Y N M ?
Root node?
How to decide splitting Criteria?
1. Check if target variable if Categorical:
Gini Impurity: It indicate the feature purity, less impurity of a feature help it to be a root node or split node
Entropy/ Information Gain: Information gain and Entropy are opposite to each other, here entropy
indicates the impurity of a feature. That means higher the entropy, lesser the information gain. If information
gain of a node is high, higher the chances it become the root node.
Chi Square:
2. Target Variable(Continuous):
• Reduction in variance: When target variable is a continuous type of variable then this method can be
used to check variance of feature to decide it will be a splitting node or not.
How to decide splitting Criteria?
How to built a decision tree using criteria(Gini Index/ impurity):
How to built a decision tree
Love
Popcorn Love Soda Gender
Love Ice
cream
Y Y M N
Y N F N
N Y M Y
N Y M Y
Y Y F Y
Y N F N
N N M N
Y N M ?
Root node?
G.I. of leaf love popcorn (yes): 0.375
G.I. of leaf love popcorn (no): 0.444
G.I of feature love popcorn: 0.404
G.I. of leaf love Soda (yes): 0.375
G.I. of leaf love Soda (no): 0
G.I of feature love soda: 0.214
G.I. of leaf Gender (Male): 0.5
G.I. of leaf Gender (Female): 0.444
G.I of feature Gender: 0.476
Figure: Feature description with target variable i.e. Love Ice-cream
Decision Tree
Figure: Initial Decision Tree
Next
node?
Decision Tree
Love
Soda
Love
Popcorn Gender
Love Ice
cream
Y Y M N
Y N M Y
Y N M Y
Y Y F Y
Figure: Subset of decision of intermediate node
Feature description with target variable i.e. Love Ice-cream
Decision Tree
G.I. of leaf love popcorn (yes): 0.5
G.I. of leaf love popcorn (no): 0.
G.I of feature love popcorn: 0.25
G.I. of leaf Gender (Male): 0.444
G.I. of leaf Gender (Female): 0
G.I of feature Gender: 0.333
Figure: Feature description with target variable i.e. Love Ice-cream
Decision Tree
Love
Soda
Love
Popcorn Gender
Love
Icecream
Y Y M N
Y Y F Y
Decision Tree
Figure: Final Decision tree
Love
Popcorn Love Soda Gender
Love Ice
cream
Y Y M N
Y N F N
N Y M Y
N Y M Y
Y Y F Y
Y N F N
N N M N
Y Y M ?
Decision Tree
Decision Tree
Over fitting Problem: Decision tree are prune to over fitting because of high variance
in outcome produced, it make decision tree results uncertain. It can be overcome with
following methods:
Pre Pruning: Tune hyper parameters while fitting the feature in decision tree classifier.
Post Pruning: Set alpha parameter after preparation of decision tree and prune with
CCP alpha parameter.
Hands-On Decision Tree
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
clf = DecisionTreeClassifier(random_state=0)
clf.fit(X_train,y_train)
pred=clf.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(y_test, pred)
Hands-On Decision Tree
from sklearn import tree
plt.figure(figsize=(15,10))
tree.plot_tree(clf,filled=True)
Hands-On Decision Tree
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
ccp_alphas
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(random_state=0, ccp_alpha=ccp_alpha)
clf.fit(X_train, y_train)
clfs.append(clf)
print("Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]))
train_scores = [clf.score(X_train, y_train) for clf in clfs]
test_scores = [clf.score(X_test, y_test) for clf in clfs]
fig, ax = plt.subplots()
ax.set_xlabel("alpha")
ax.set_ylabel("accuracy")
ax.set_title("Accuracy vs alpha for training and testing sets")
ax.plot(ccp_alphas, train_scores, marker='o', label="train",
drawstyle="steps-post")
ax.plot(ccp_alphas, test_scores, marker='o', label="test",
drawstyle="steps-post")
ax.legend()
plt.show()
Hands-On Decision Tree
clf = DecisionTreeClassifier(random_state=0, ccp_alpha=0.012)
clf.fit(X_train,y_train)
pred=clf.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(y_test, pred)
from sklearn import tree
plt.figure(figsize=(15,10))
tree.plot_tree(clf,filled=True)
Random Forest:
Definition: Random forest is a type of ensemble techniques named as BAGGING(Bootstrap
Aggregation). It works on the principal of “Wisdom of Crowd”.
Why Random Forest?
Random forest are mostly used to overcome the issue of over fitting while using decision
tree classifier as it reduces the variance problem of decision tree and produce efficient
outcome with maximum accuracy.
Random Forest
How it works?
Random Forest
Decision Tree & Random Forest
Decision Tree:
Advantages:
1. Simple and easy implementation like IF-ELSE statements
2. Better visualization and understandable
3. Used for Classification as well as for Regression
Disadvantages:
1. Over fitting
2. Unstable Results
3. Prone to noisy data
4. Less effective with large dataset
Decision Tree & Random Forest
Random Forest:
Advantages:
1. Overcome for problem of over fitting with decision tree
2. Used for Classification as well as for Regression
Disadvantages:
1. Higher training time than decision tree
2. Less effective with small dataset
3. Require computation power as well as resources
Decision Tree & Random Forest
References:
• https://en.wikipedia.org/wiki/Decision_tree
• https://en.wikipedia.org/wiki/Random_forest
• https://en.wikipedia.org/wiki/Random_forest#Bagging
• https://en.wikipedia.org/wiki/Decision_tree#Association_rule_induction
• https://en.wikipedia.org/wiki/Decision_tree#Advantages_and_disadvantages
• https://en.wikipedia.org/wiki/Machine_learning
• https://en.wikipedia.org/wiki/Machine_learning#Artificial_intelligence
• https://en.wikipedia.org/wiki/Machine_learning#Overfitting
• https://www.abrisconsult.com/artificial-intelligence-and-data-science/
Decision Tree & Random Forest

More Related Content

What's hot

Random forest
Random forestRandom forest
Random forestUjjawal
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forestsMarc Garcia
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Edureka!
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...Edureka!
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
Classification and prediction
Classification and predictionClassification and prediction
Classification and predictionAcad
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligenceMdAlAmin187
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsPalin analytics
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 

What's hot (20)

Decision tree
Decision treeDecision tree
Decision tree
 
Random forest
Random forestRandom forest
Random forest
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Classification and prediction
Classification and predictionClassification and prediction
Classification and prediction
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
Decision tree
Decision treeDecision tree
Decision tree
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Classification Using Decision tree
Classification Using Decision treeClassification Using Decision tree
Classification Using Decision tree
 

Similar to Random forest and decision tree

Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning FoundationsAlbert Y. C. Chen
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree inductionthamizh arasi
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensembleDanbi Cho
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision TreesSara Hooker
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...Yao Wu
 
Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningJohn Edward Slough II
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
Dr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docx
Dr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docxDr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docx
Dr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docxmadlynplamondon
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxHimanshuSharma997566
 
Boosted tree
Boosted treeBoosted tree
Boosted treeZhuyi Xue
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxChandrakalaV15
 

Similar to Random forest and decision tree (20)

Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning Foundations
 
07 learning
07 learning07 learning
07 learning
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensemble
 
Decision tree
Decision tree Decision tree
Decision tree
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...
 
Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine Learning
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Dr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docx
Dr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docxDr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docx
Dr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docx
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptx
 
Boosted tree
Boosted treeBoosted tree
Boosted tree
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 

More from AAKANKSHA JAIN

Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]AAKANKSHA JAIN
 
Inheritance in OOPs with java
Inheritance in OOPs with javaInheritance in OOPs with java
Inheritance in OOPs with javaAAKANKSHA JAIN
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data WarehousingAAKANKSHA JAIN
 
Distributed Database Design and Relational Query Language
Distributed Database Design and Relational Query LanguageDistributed Database Design and Relational Query Language
Distributed Database Design and Relational Query LanguageAAKANKSHA JAIN
 
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESDISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESAAKANKSHA JAIN
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management SystemAAKANKSHA JAIN
 
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMSDETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMSAAKANKSHA JAIN
 
Fingerprint matching using ridge count
Fingerprint matching using ridge countFingerprint matching using ridge count
Fingerprint matching using ridge countAAKANKSHA JAIN
 
Image processing second unit Notes
Image processing second unit NotesImage processing second unit Notes
Image processing second unit NotesAAKANKSHA JAIN
 
Advance image processing
Advance image processingAdvance image processing
Advance image processingAAKANKSHA JAIN
 

More from AAKANKSHA JAIN (12)

Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]
 
Inheritance in OOPs with java
Inheritance in OOPs with javaInheritance in OOPs with java
Inheritance in OOPs with java
 
OOPs with java
OOPs with javaOOPs with java
OOPs with java
 
Probability
ProbabilityProbability
Probability
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
Distributed Database Design and Relational Query Language
Distributed Database Design and Relational Query LanguageDistributed Database Design and Relational Query Language
Distributed Database Design and Relational Query Language
 
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESDISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMSDETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
 
Fingerprint matching using ridge count
Fingerprint matching using ridge countFingerprint matching using ridge count
Fingerprint matching using ridge count
 
Image processing second unit Notes
Image processing second unit NotesImage processing second unit Notes
Image processing second unit Notes
 
Advance image processing
Advance image processingAdvance image processing
Advance image processing
 

Recently uploaded

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 

Recently uploaded (20)

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 

Random forest and decision tree

  • 1. Machine Learning Decision Tree and Random Forest
  • 2. Machine Learning • Introduction • What is ML, DL, AL? • Decision Tree Definition Why Decision Tree? Basic Terminology Challenges • Random Forest Definition Why Random Forest How it works? • Advantages & Disadvantages
  • 3. Machine Learning According to Arthur Samuel (1950) “Machine Learning is a field of study that gives computers the ability to learn without being explicitly programmed”. Machine learning is a study and design of algorithms which can learn by processing input (learning samples) data. The most widely used definition of machine learning is that of Carnegie Mellon University Professor Tom Mitchell: “A computer program is said to learn from experience ‘E’, with respect to some class of tasks ‘T’ and performance measure ‘P’ if its performance at tasks in ‘T’ as measured by ‘P’ improves with experience ‘E’”.
  • 5. Decision Tree & Random Forest • Decision Tree  Definition  Why Decision Tree?  Basic Terminology  Challenges • Random Forest  Definition  Why Random Forest  How it works?
  • 6. Decision Tree Decision tree is a supervised machine learning algorithm which can be used for classification as well as for regression problems. It represent the target on its leaf nodes as a result or inferring's with a tree like structure Why Decision Tree?  Helpful in solving more complex problem where a linear prediction line does not perform well  Gives wonderful graphical presentation of each possible results Decision Tree & Random Forest
  • 7. Why Decision Tree? Prediction can be done with a linear regression line Dose(mg) Effectiveness Dose (mg) Age Sex Effect 10 25 F 95 20 78 M 0 35 52 F 98 5 12 M 44 … … … … … … … … … … … … Prediction can not be done with a linear regression line
  • 8. Why Decision Tree? Dose (mg) Age Sex Effect 10 25 F 95 20 78 M 0 35 52 F 98 5 12 M 44 … … … … … … … … … … … … Sample dataset Sample Decision Tree
  • 9. Decision Tree Root Node Intermediate Node Leaf node Leaf Node Intermediate Node Leaf node Root Node: The top-most node of a decision tree. It does not have any parent node. It represents the entire population or sample Leaf / Terminal Nodes: Nodes that do not have any child node are known as Terminal/Leaf Nodes
  • 10. Challenge in building Decision Tree Challenge in building Decision Tree: 1. How to decide splitting Criteria? • Target Variable(Categorical) • Target Variable(Continuous) 2. How to decide depth of decision tree/ when to stop? • Considerably all data points have been covered • Check for node purity/ homogeneity 3. Over fitting • Pre Pruning • Post Pruning
  • 11. How to built a decision tree using criteria: How to built a decision tree Love Popcorn Love Soda Gender Love Ice cream Y Y M N Y N F N N Y M Y N Y M Y Y Y F Y Y N F N N N M N Y N M ? Root node?
  • 12. How to decide splitting Criteria? 1. Check if target variable if Categorical: Gini Impurity: It indicate the feature purity, less impurity of a feature help it to be a root node or split node Entropy/ Information Gain: Information gain and Entropy are opposite to each other, here entropy indicates the impurity of a feature. That means higher the entropy, lesser the information gain. If information gain of a node is high, higher the chances it become the root node. Chi Square:
  • 13. 2. Target Variable(Continuous): • Reduction in variance: When target variable is a continuous type of variable then this method can be used to check variance of feature to decide it will be a splitting node or not. How to decide splitting Criteria?
  • 14. How to built a decision tree using criteria(Gini Index/ impurity): How to built a decision tree Love Popcorn Love Soda Gender Love Ice cream Y Y M N Y N F N N Y M Y N Y M Y Y Y F Y Y N F N N N M N Y N M ? Root node?
  • 15. G.I. of leaf love popcorn (yes): 0.375 G.I. of leaf love popcorn (no): 0.444 G.I of feature love popcorn: 0.404 G.I. of leaf love Soda (yes): 0.375 G.I. of leaf love Soda (no): 0 G.I of feature love soda: 0.214 G.I. of leaf Gender (Male): 0.5 G.I. of leaf Gender (Female): 0.444 G.I of feature Gender: 0.476 Figure: Feature description with target variable i.e. Love Ice-cream Decision Tree
  • 16. Figure: Initial Decision Tree Next node? Decision Tree
  • 17. Love Soda Love Popcorn Gender Love Ice cream Y Y M N Y N M Y Y N M Y Y Y F Y Figure: Subset of decision of intermediate node Feature description with target variable i.e. Love Ice-cream Decision Tree
  • 18. G.I. of leaf love popcorn (yes): 0.5 G.I. of leaf love popcorn (no): 0. G.I of feature love popcorn: 0.25 G.I. of leaf Gender (Male): 0.444 G.I. of leaf Gender (Female): 0 G.I of feature Gender: 0.333 Figure: Feature description with target variable i.e. Love Ice-cream Decision Tree
  • 20. Figure: Final Decision tree Love Popcorn Love Soda Gender Love Ice cream Y Y M N Y N F N N Y M Y N Y M Y Y Y F Y Y N F N N N M N Y Y M ? Decision Tree
  • 21. Decision Tree Over fitting Problem: Decision tree are prune to over fitting because of high variance in outcome produced, it make decision tree results uncertain. It can be overcome with following methods: Pre Pruning: Tune hyper parameters while fitting the feature in decision tree classifier. Post Pruning: Set alpha parameter after preparation of decision tree and prune with CCP alpha parameter.
  • 22. Hands-On Decision Tree %matplotlib inline import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.datasets import load_breast_cancer from sklearn.tree import DecisionTreeClassifier X, y = load_breast_cancer(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) clf = DecisionTreeClassifier(random_state=0) clf.fit(X_train,y_train) pred=clf.predict(X_test) from sklearn.metrics import accuracy_score accuracy_score(y_test, pred)
  • 23. Hands-On Decision Tree from sklearn import tree plt.figure(figsize=(15,10)) tree.plot_tree(clf,filled=True)
  • 24. Hands-On Decision Tree path = clf.cost_complexity_pruning_path(X_train, y_train) ccp_alphas, impurities = path.ccp_alphas, path.impurities ccp_alphas clfs = [] for ccp_alpha in ccp_alphas: clf = DecisionTreeClassifier(random_state=0, ccp_alpha=ccp_alpha) clf.fit(X_train, y_train) clfs.append(clf) print("Number of nodes in the last tree is: {} with ccp_alpha: {}".format( clfs[-1].tree_.node_count, ccp_alphas[-1])) train_scores = [clf.score(X_train, y_train) for clf in clfs] test_scores = [clf.score(X_test, y_test) for clf in clfs] fig, ax = plt.subplots() ax.set_xlabel("alpha") ax.set_ylabel("accuracy") ax.set_title("Accuracy vs alpha for training and testing sets") ax.plot(ccp_alphas, train_scores, marker='o', label="train", drawstyle="steps-post") ax.plot(ccp_alphas, test_scores, marker='o', label="test", drawstyle="steps-post") ax.legend() plt.show()
  • 25. Hands-On Decision Tree clf = DecisionTreeClassifier(random_state=0, ccp_alpha=0.012) clf.fit(X_train,y_train) pred=clf.predict(X_test) from sklearn.metrics import accuracy_score accuracy_score(y_test, pred) from sklearn import tree plt.figure(figsize=(15,10)) tree.plot_tree(clf,filled=True)
  • 26. Random Forest: Definition: Random forest is a type of ensemble techniques named as BAGGING(Bootstrap Aggregation). It works on the principal of “Wisdom of Crowd”. Why Random Forest? Random forest are mostly used to overcome the issue of over fitting while using decision tree classifier as it reduces the variance problem of decision tree and produce efficient outcome with maximum accuracy. Random Forest
  • 28. Decision Tree & Random Forest Decision Tree: Advantages: 1. Simple and easy implementation like IF-ELSE statements 2. Better visualization and understandable 3. Used for Classification as well as for Regression Disadvantages: 1. Over fitting 2. Unstable Results 3. Prone to noisy data 4. Less effective with large dataset
  • 29. Decision Tree & Random Forest Random Forest: Advantages: 1. Overcome for problem of over fitting with decision tree 2. Used for Classification as well as for Regression Disadvantages: 1. Higher training time than decision tree 2. Less effective with small dataset 3. Require computation power as well as resources
  • 30. Decision Tree & Random Forest References: • https://en.wikipedia.org/wiki/Decision_tree • https://en.wikipedia.org/wiki/Random_forest • https://en.wikipedia.org/wiki/Random_forest#Bagging • https://en.wikipedia.org/wiki/Decision_tree#Association_rule_induction • https://en.wikipedia.org/wiki/Decision_tree#Advantages_and_disadvantages • https://en.wikipedia.org/wiki/Machine_learning • https://en.wikipedia.org/wiki/Machine_learning#Artificial_intelligence • https://en.wikipedia.org/wiki/Machine_learning#Overfitting • https://www.abrisconsult.com/artificial-intelligence-and-data-science/
  • 31. Decision Tree & Random Forest