SlideShare a Scribd company logo
Decision Tree
Decision tree is a graph to represent choices and their
results in form of a tree.
The nodes in the graph represent an event or choice and
the edges of the graph represent the decision rules or
conditions.
It is mostly used in Machine Learning and Data Mining
applications using R.
Examples
• predicting an email as spam or not spam.
• predicting of a tumour is cancerous or not .
• predicting a loan as a good or bad credit risk based
on the factors in each of these.
• Generally, a model is created with observed data
also called training data.
• Then a set of validation data is used to verify and
improve the model.
• R has packages which are used to create and
visualize decision trees.
• For new set of predictor variable, we use this model
to arrive at a decision on the category (yes/No,
spam/not spam) of the data.
• The R package "party" is used to create decision
trees.
• Install R Package
• Use the below command in R console to install
the package. You also have to install the
dependent packages if any.
• install.packages("party")
• The package "party" has the function ctree()(is
used to create recursive tree), conditional
inference tree, which is used to create and
analyze decision tree.
• Syntax
• The basic syntax for creating a decision tree in R is −
• ctree(formula, data) Following is the description of
the parameters used −
• formula is a formula describing the predictor and
response variables.
• data is the name of the data set used.
Input Data
• We will use the R in-built data set
named readingSkills to create a decision tree.
• It describes the score of someone's readingSkills if
we know the variables "age","shoesize","score" and
whether the person is a native speaker or not.
• Here is the sample data.
• # Load the party package.
• It will automatically load other # dependent
packages.
• library(party) # Print some records from data
set readingSkills.
• print(head(readingSkills))
• Example
• We will use the ctree() function to create the
decision tree and see its graph.
• # Load the party package.
• It will automatically load other # dependent packages.
• library(party)
• # Create the input data frame.
• InputData <- readingSkills[c(1:105),]
• # Give the chart file a name. It is the name of the output
file
• png(file = "decision_tree.png")
• # Create the tree.
• outputTree <- ctree( nativeSpeaker ~ age + shoeSize +
score, data = InputData )
• # Plot the tree.
• plot(outputTree )
• # Save the file.
• dev.off()
• When we execute the above code, it produces the
following result −
• null device 1
• Loading required package: methods
• Loading required package: grid
• Loading required package: mvtnorm
• Loading required package: modeltools
• Loading required package: stats4
• Loading required package: strucchange
• Loading required package: zoo
• Attaching package: ‘zoo’
• The following objects are masked from
‘package:base’: as.Date, as.Date.numeric
• Loading required package: sandwich
The tree will be like 4 terminal nodes
The number of input variables are age,shoeSize,score
• Load library
• library(rpart)
• nativeSpeaker_find<-
data.frame(“age”=11,”shoeSize”=30.63692,”score”=5
5.721149)
• Create an rpart object”fit”
• fit<-
rpart(nativeSpeaker~age+shoeSize+score,data=readin
gSkills)
• Use predict function
• prediction<-predict(fit,newdata=nativeSpeaker_find,
type=“class”)
• Print the return value from predict function
• print(predict)
• R’s rpart package provides a powerful framework
for growing classification and regression trees. To
see how it works, let’s get started with a minimal
example.
• First let’s define a problem.
• There’s a common scam amongst motorists
whereby a person will slam on his breaks in heavy
traffic with the intention of being rear-ended.
• The person will then file an insurance claim for
personal injury and damage to his vehicle, alleging
that the other driver was at fault.
• Suppose we want to predict which of an insurance
company’s claims are fraudulent using a decision
tree.
• To start, we need to build a training set of known
fraudulent claims.
• train <- data.frame( ClaimID = c(1,2,3), RearEnd = c(TRUE, FALSE, TRUE),
Fraud = c(TRUE, FALSE, TRUE) )
• In order to grow our decision tree, we have to first load the rpart
package. Then we can use the rpart() function, specifying the model
formula, data, and method parameters.
• In this case, we want to classify the feature Fraud using the
predictor RearEnd, so our call to rpart()
• library(rpart) mytree <- rpart( Fraud ~ RearEnd, data = train, method =
"class" )
• Mytree
• Notice the output shows only a root node.
• This is because rpart has some default parameters that prevented our
tree from growing.
• Namely minsplit and minbucket. minsplit is “the minimum number of
observations that must exist in a node in order for a split to be
attempted” and minbucket is “the minimum number of observations in
any terminal node”.
• mytree <- rpart( Fraud ~ RearEnd, data = train,
method = "class", minsplit = 2, minbucket = 1 )
Now our tree has a root node, one split and two leaves
(terminal nodes).
Observe that rpart encoded our boolean variable as an
integer (false = 0, true = 1).
We can plot mytree by loading the rattle package (and
some helper packages) and using
the fancyRpartPlot() function.
library(rattle)
library(rpart.plot)
library(RColorBrewer)
# plot mytree
fancyRpartPlot(mytree, caption = NULL)
• The decision tree correctly identified that if a
claim involved a rear-end collision, the claim was
most likely fraudulent.
• mytree <- rpart( Fraud ~ RearEnd, data = train,
method = "class", parms = list(split =
'information'), minsplit = 2, minbucket = 1 )
mytree
Example on MTCARS
• fit<-rpart(speed ~ dist,data=cars)
• fit
• plot(fit)
• text(fit,use.n = TRUE)
How to Use optim Function in R
• A function to be minimized (or maximized), with first
argument the vector of parameters over which
minimization is to take place.
• optim(par, fn, data, ...)
• where:
• par: Initial values for the parameters to be optimized over
• fn: A function to be minimized or maximized
• data: The name of the object in R that contains the data
• The following examples show how to use this function in the
following scenarios:
• 1. Find coefficients for a linear regression model.
• 2. Find coefficients for a quadratic regression model.
• Find Coefficients for Linear Regression Model
• The following code shows how to use
the optim() function to find the coefficients for a
linear regression model by minimizing the residual
sum of squares:
• #create data frame
• df <- data.frame(x=c(1, 3, 3, 5, 6, 7, 9, 12), y=c(4, 5, 8,
6, 9, 10, 13, 17))
• #define function to minimize residual sum of squares
• min_residuals <- function(data, par)
• {
• with(data, sum((par[1] + par[2] * x - y)^2)) }
• #find coefficients of linear regression model
• optim(par=c(0, 1), fn=min_residuals, data=df)
Find Coefficients for Quadratic Regression Model
• The following code shows how to use the optim() function to find the
coefficients for a quadratic regression model by minimizing the residual sum
of squares:
• #create data frame
• df <- data.frame(x=c(6, 9, 12, 14, 30, 35, 40, 47, 51, 55, 60), y=c(14, 28, 50,
70, 89, 94, 90, 75, 59, 44, 27))
• #define function to minimize residual sum of squares
• min_residuals <- function(data, par)
• {
• with(data, sum((par[1] + par[2]*x + par[3]*x^2 - y)^2))
• }
• #find coefficients of quadratic regression model
• optim(par=c(0, 0, 0), fn=min_residuals, data=df)
• Using the values returned under $par, we can
write the following fitted quadratic regression
model:
• y = -18.261 + 6.744x – 0.101x2
• We can verify this is correct by using the built-
in lm() function in R:
• #create data frame
• df <- data.frame(x=c(6, 9, 12, 14, 30, 35, 40, 47,
51, 55, 60), y=c(14, 28, 50, 70, 89, 94, 90, 75, 59,
44, 27))
• #create a new variable for
• x^2 df$x2 <- df$x^2
• #fit quadratic regression model
• quadraticModel <- lm(y ~ x + x2, data=df)
• #display coefficients of quadratic regression
model
• summary(quadraticModel)$coef
What are appropriate problems for Decision tree learning?
• Although a variety of decision-tree learning methods have been
developed with somewhat differing capabilities and
requirements, decision-tree learning is generally best suited to
problems with the following characteristics:
1. Instances are represented by attribute-value pairs.
• “Instances are described by a fixed set of attributes (e.g.,
Temperature) and their values (e.g., Hot).
• The easiest situation for decision tree learning is when each
attribute takes on a small number of disjoint possible values (e.g.,
Hot, Mild, Cold).
• However, extensions to the basic algorithm allow handling real-
valued attributes as well (e.g., representing Temperature
numerically).”
Example
• snames<- c(“ram”,”shyam”,”tina”,”simi”,”rahul”,”raj”)
• sage<-c(17,16,17,18,16,16,17,17)
• d<- data.frame(cbind(snames,sage))
• S<-ctree(sage~snames,data=d)
• s
2. The target function has discrete output values.
• “The decision tree is usually used for Boolean
classification (e.g., yes or no) kind of example.
• Decision tree methods easily extend to
learning functions with more than two
possible output values.
• A more substantial extension allows learning
target functions with real-valued outputs,
though the application of decision trees in this
setting is less common.”
Example
• R<-read.csv(“StuDummy.csv”) (student.name,
annual.attendance,annual.score, eligible )
• Fit<- ctree(r$ eligible
~r$annual.attendance+annual.score)
• fit

More Related Content

What's hot

Factors.pptx
Factors.pptxFactors.pptx
Data Analysis and Programming in R
Data Analysis and Programming in RData Analysis and Programming in R
Data Analysis and Programming in REshwar Sai
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
FAO
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
Yugal Kumar
 
Loops in R
Loops in RLoops in R
Loops in R
Chris Orwa
 
Linear Regression.pptx
Linear Regression.pptxLinear Regression.pptx
Linear Regression.pptx
Ramakrishna Reddy Bijjam
 
NUMPY-2.pptx
NUMPY-2.pptxNUMPY-2.pptx
NUMPY-2.pptx
MahendraVusa
 
All pairs shortest path algorithm
All pairs shortest path algorithmAll pairs shortest path algorithm
All pairs shortest path algorithmSrikrishnan Suresh
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
Andrew Ferlitsch
 
Python : Data Types
Python : Data TypesPython : Data Types
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
Sourabh Sahu
 
Branch and bound
Branch and boundBranch and bound
Branch and bound
Nv Thejaswini
 
Graph coloring problem
Graph coloring problemGraph coloring problem
Graph coloring problem
V.V.Vanniaperumal College for Women
 
Data Reduction
Data ReductionData Reduction
Data Reduction
Rajan Shah
 
The Design and Analysis of Algorithms.pdf
The Design and Analysis of Algorithms.pdfThe Design and Analysis of Algorithms.pdf
The Design and Analysis of Algorithms.pdf
Saqib Raza
 
Import Data using R
Import Data using R Import Data using R
Import Data using R
Rupak Roy
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
Bill Liu
 
9. chapter 8 np hard and np complete problems
9. chapter 8   np hard and np complete problems9. chapter 8   np hard and np complete problems
9. chapter 8 np hard and np complete problems
Jyotsna Suryadevara
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
Piyush rai
 

What's hot (20)

Factors.pptx
Factors.pptxFactors.pptx
Factors.pptx
 
Data Analysis and Programming in R
Data Analysis and Programming in RData Analysis and Programming in R
Data Analysis and Programming in R
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 
Loops in R
Loops in RLoops in R
Loops in R
 
Linear Regression.pptx
Linear Regression.pptxLinear Regression.pptx
Linear Regression.pptx
 
NUMPY-2.pptx
NUMPY-2.pptxNUMPY-2.pptx
NUMPY-2.pptx
 
All pairs shortest path algorithm
All pairs shortest path algorithmAll pairs shortest path algorithm
All pairs shortest path algorithm
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Python : Data Types
Python : Data TypesPython : Data Types
Python : Data Types
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Branch and bound
Branch and boundBranch and bound
Branch and bound
 
Graph coloring problem
Graph coloring problemGraph coloring problem
Graph coloring problem
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
The Design and Analysis of Algorithms.pdf
The Design and Analysis of Algorithms.pdfThe Design and Analysis of Algorithms.pdf
The Design and Analysis of Algorithms.pdf
 
Import Data using R
Import Data using R Import Data using R
Import Data using R
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
 
9. chapter 8 np hard and np complete problems
9. chapter 8   np hard and np complete problems9. chapter 8   np hard and np complete problems
9. chapter 8 np hard and np complete problems
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 

Similar to Decision Tree.pptx

Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
Ramakrishna Reddy Bijjam
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Yao Yao
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
AmanBhalla14
 
SMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachSMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachReza Rahimi
 
R decision tree
R   decision treeR   decision tree
R decision tree
Learnbay Datascience
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
vikassingh569137
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MD
SonaCharles2
 
17641.ppt
17641.ppt17641.ppt
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
SujaAldrin
 
software engineering modules iii & iv.pptx
software engineering  modules iii & iv.pptxsoftware engineering  modules iii & iv.pptx
software engineering modules iii & iv.pptx
rani marri
 
Data science with R - Clustering and Classification
Data science with R - Clustering and ClassificationData science with R - Clustering and Classification
Data science with R - Clustering and Classification
Brigitte Mueller
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with N
OllieShoresna
 
How to obtain and install R.ppt
How to obtain and install R.pptHow to obtain and install R.ppt
How to obtain and install R.ppt
rajalakshmi5921
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
Rajendran
 
Lesson 2 data preprocessing
Lesson 2   data preprocessingLesson 2   data preprocessing
Lesson 2 data preprocessing
AbdurRazzaqe1
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
Zihui Li
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
The Statistical and Applied Mathematical Sciences Institute
 
CS301-lec01.ppt
CS301-lec01.pptCS301-lec01.ppt
CS301-lec01.ppt
omair31
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013
Sanjeev Mishra
 

Similar to Decision Tree.pptx (20)

Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
SMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachSMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning Approach
 
R decision tree
R   decision treeR   decision tree
R decision tree
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MD
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
software engineering modules iii & iv.pptx
software engineering  modules iii & iv.pptxsoftware engineering  modules iii & iv.pptx
software engineering modules iii & iv.pptx
 
Data science with R - Clustering and Classification
Data science with R - Clustering and ClassificationData science with R - Clustering and Classification
Data science with R - Clustering and Classification
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with N
 
How to obtain and install R.ppt
How to obtain and install R.pptHow to obtain and install R.ppt
How to obtain and install R.ppt
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
Lesson 2 data preprocessing
Lesson 2   data preprocessingLesson 2   data preprocessing
Lesson 2 data preprocessing
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
 
CS301-lec01.ppt
CS301-lec01.pptCS301-lec01.ppt
CS301-lec01.ppt
 
cluster(python)
cluster(python)cluster(python)
cluster(python)
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013
 

More from Ramakrishna Reddy Bijjam

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
Ramakrishna Reddy Bijjam
 
Arrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxArrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptx
Ramakrishna Reddy Bijjam
 
Auxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptxAuxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptx
Ramakrishna Reddy Bijjam
 
Python With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxPython With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptx
Ramakrishna Reddy Bijjam
 
Pointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxPointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptx
Ramakrishna Reddy Bijjam
 
Certinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxCertinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptx
Ramakrishna Reddy Bijjam
 
Auxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxAuxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptx
Ramakrishna Reddy Bijjam
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
Ramakrishna Reddy Bijjam
 
K Means Clustering in ML.pptx
K Means Clustering in ML.pptxK Means Clustering in ML.pptx
K Means Clustering in ML.pptx
Ramakrishna Reddy Bijjam
 
Pandas.pptx
Pandas.pptxPandas.pptx
Python With MongoDB.pptx
Python With MongoDB.pptxPython With MongoDB.pptx
Python With MongoDB.pptx
Ramakrishna Reddy Bijjam
 
Python with MySql.pptx
Python with MySql.pptxPython with MySql.pptx
Python with MySql.pptx
Ramakrishna Reddy Bijjam
 
PYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdfPYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdf
Ramakrishna Reddy Bijjam
 
BInary file Operations.pptx
BInary file Operations.pptxBInary file Operations.pptx
BInary file Operations.pptx
Ramakrishna Reddy Bijjam
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
Ramakrishna Reddy Bijjam
 
CSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptxCSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptx
Ramakrishna Reddy Bijjam
 
HTML files in python.pptx
HTML files in python.pptxHTML files in python.pptx
HTML files in python.pptx
Ramakrishna Reddy Bijjam
 
Regular Expressions in Python.pptx
Regular Expressions in Python.pptxRegular Expressions in Python.pptx
Regular Expressions in Python.pptx
Ramakrishna Reddy Bijjam
 
datareprersentation 1.pptx
datareprersentation 1.pptxdatareprersentation 1.pptx
datareprersentation 1.pptx
Ramakrishna Reddy Bijjam
 
Apriori.pptx
Apriori.pptxApriori.pptx

More from Ramakrishna Reddy Bijjam (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Arrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxArrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptx
 
Auxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptxAuxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptx
 
Python With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxPython With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptx
 
Pointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxPointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptx
 
Certinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxCertinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptx
 
Auxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxAuxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptx
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
K Means Clustering in ML.pptx
K Means Clustering in ML.pptxK Means Clustering in ML.pptx
K Means Clustering in ML.pptx
 
Pandas.pptx
Pandas.pptxPandas.pptx
Pandas.pptx
 
Python With MongoDB.pptx
Python With MongoDB.pptxPython With MongoDB.pptx
Python With MongoDB.pptx
 
Python with MySql.pptx
Python with MySql.pptxPython with MySql.pptx
Python with MySql.pptx
 
PYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdfPYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdf
 
BInary file Operations.pptx
BInary file Operations.pptxBInary file Operations.pptx
BInary file Operations.pptx
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
CSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptxCSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptx
 
HTML files in python.pptx
HTML files in python.pptxHTML files in python.pptx
HTML files in python.pptx
 
Regular Expressions in Python.pptx
Regular Expressions in Python.pptxRegular Expressions in Python.pptx
Regular Expressions in Python.pptx
 
datareprersentation 1.pptx
datareprersentation 1.pptxdatareprersentation 1.pptx
datareprersentation 1.pptx
 
Apriori.pptx
Apriori.pptxApriori.pptx
Apriori.pptx
 

Recently uploaded

FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 

Recently uploaded (20)

FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 

Decision Tree.pptx

  • 1. Decision Tree Decision tree is a graph to represent choices and their results in form of a tree. The nodes in the graph represent an event or choice and the edges of the graph represent the decision rules or conditions. It is mostly used in Machine Learning and Data Mining applications using R.
  • 2. Examples • predicting an email as spam or not spam. • predicting of a tumour is cancerous or not . • predicting a loan as a good or bad credit risk based on the factors in each of these. • Generally, a model is created with observed data also called training data. • Then a set of validation data is used to verify and improve the model. • R has packages which are used to create and visualize decision trees. • For new set of predictor variable, we use this model to arrive at a decision on the category (yes/No, spam/not spam) of the data.
  • 3. • The R package "party" is used to create decision trees. • Install R Package • Use the below command in R console to install the package. You also have to install the dependent packages if any. • install.packages("party") • The package "party" has the function ctree()(is used to create recursive tree), conditional inference tree, which is used to create and analyze decision tree.
  • 4. • Syntax • The basic syntax for creating a decision tree in R is − • ctree(formula, data) Following is the description of the parameters used − • formula is a formula describing the predictor and response variables. • data is the name of the data set used. Input Data • We will use the R in-built data set named readingSkills to create a decision tree. • It describes the score of someone's readingSkills if we know the variables "age","shoesize","score" and whether the person is a native speaker or not. • Here is the sample data.
  • 5. • # Load the party package. • It will automatically load other # dependent packages. • library(party) # Print some records from data set readingSkills. • print(head(readingSkills)) • Example • We will use the ctree() function to create the decision tree and see its graph.
  • 6. • # Load the party package. • It will automatically load other # dependent packages. • library(party) • # Create the input data frame. • InputData <- readingSkills[c(1:105),] • # Give the chart file a name. It is the name of the output file • png(file = "decision_tree.png") • # Create the tree. • outputTree <- ctree( nativeSpeaker ~ age + shoeSize + score, data = InputData ) • # Plot the tree. • plot(outputTree ) • # Save the file. • dev.off()
  • 7. • When we execute the above code, it produces the following result − • null device 1 • Loading required package: methods • Loading required package: grid • Loading required package: mvtnorm • Loading required package: modeltools • Loading required package: stats4 • Loading required package: strucchange • Loading required package: zoo • Attaching package: ‘zoo’ • The following objects are masked from ‘package:base’: as.Date, as.Date.numeric • Loading required package: sandwich
  • 8. The tree will be like 4 terminal nodes The number of input variables are age,shoeSize,score
  • 9. • Load library • library(rpart) • nativeSpeaker_find<- data.frame(“age”=11,”shoeSize”=30.63692,”score”=5 5.721149) • Create an rpart object”fit” • fit<- rpart(nativeSpeaker~age+shoeSize+score,data=readin gSkills) • Use predict function • prediction<-predict(fit,newdata=nativeSpeaker_find, type=“class”) • Print the return value from predict function • print(predict)
  • 10. • R’s rpart package provides a powerful framework for growing classification and regression trees. To see how it works, let’s get started with a minimal example. • First let’s define a problem. • There’s a common scam amongst motorists whereby a person will slam on his breaks in heavy traffic with the intention of being rear-ended. • The person will then file an insurance claim for personal injury and damage to his vehicle, alleging that the other driver was at fault. • Suppose we want to predict which of an insurance company’s claims are fraudulent using a decision tree.
  • 11. • To start, we need to build a training set of known fraudulent claims. • train <- data.frame( ClaimID = c(1,2,3), RearEnd = c(TRUE, FALSE, TRUE), Fraud = c(TRUE, FALSE, TRUE) ) • In order to grow our decision tree, we have to first load the rpart package. Then we can use the rpart() function, specifying the model formula, data, and method parameters. • In this case, we want to classify the feature Fraud using the predictor RearEnd, so our call to rpart() • library(rpart) mytree <- rpart( Fraud ~ RearEnd, data = train, method = "class" ) • Mytree • Notice the output shows only a root node. • This is because rpart has some default parameters that prevented our tree from growing. • Namely minsplit and minbucket. minsplit is “the minimum number of observations that must exist in a node in order for a split to be attempted” and minbucket is “the minimum number of observations in any terminal node”.
  • 12. • mytree <- rpart( Fraud ~ RearEnd, data = train, method = "class", minsplit = 2, minbucket = 1 ) Now our tree has a root node, one split and two leaves (terminal nodes). Observe that rpart encoded our boolean variable as an integer (false = 0, true = 1). We can plot mytree by loading the rattle package (and some helper packages) and using the fancyRpartPlot() function. library(rattle) library(rpart.plot) library(RColorBrewer) # plot mytree fancyRpartPlot(mytree, caption = NULL)
  • 13. • The decision tree correctly identified that if a claim involved a rear-end collision, the claim was most likely fraudulent. • mytree <- rpart( Fraud ~ RearEnd, data = train, method = "class", parms = list(split = 'information'), minsplit = 2, minbucket = 1 ) mytree
  • 14. Example on MTCARS • fit<-rpart(speed ~ dist,data=cars) • fit • plot(fit) • text(fit,use.n = TRUE)
  • 15. How to Use optim Function in R • A function to be minimized (or maximized), with first argument the vector of parameters over which minimization is to take place. • optim(par, fn, data, ...) • where: • par: Initial values for the parameters to be optimized over • fn: A function to be minimized or maximized • data: The name of the object in R that contains the data • The following examples show how to use this function in the following scenarios: • 1. Find coefficients for a linear regression model. • 2. Find coefficients for a quadratic regression model.
  • 16. • Find Coefficients for Linear Regression Model • The following code shows how to use the optim() function to find the coefficients for a linear regression model by minimizing the residual sum of squares: • #create data frame • df <- data.frame(x=c(1, 3, 3, 5, 6, 7, 9, 12), y=c(4, 5, 8, 6, 9, 10, 13, 17)) • #define function to minimize residual sum of squares • min_residuals <- function(data, par) • { • with(data, sum((par[1] + par[2] * x - y)^2)) } • #find coefficients of linear regression model • optim(par=c(0, 1), fn=min_residuals, data=df)
  • 17. Find Coefficients for Quadratic Regression Model • The following code shows how to use the optim() function to find the coefficients for a quadratic regression model by minimizing the residual sum of squares: • #create data frame • df <- data.frame(x=c(6, 9, 12, 14, 30, 35, 40, 47, 51, 55, 60), y=c(14, 28, 50, 70, 89, 94, 90, 75, 59, 44, 27)) • #define function to minimize residual sum of squares • min_residuals <- function(data, par) • { • with(data, sum((par[1] + par[2]*x + par[3]*x^2 - y)^2)) • } • #find coefficients of quadratic regression model • optim(par=c(0, 0, 0), fn=min_residuals, data=df)
  • 18. • Using the values returned under $par, we can write the following fitted quadratic regression model: • y = -18.261 + 6.744x – 0.101x2 • We can verify this is correct by using the built- in lm() function in R:
  • 19. • #create data frame • df <- data.frame(x=c(6, 9, 12, 14, 30, 35, 40, 47, 51, 55, 60), y=c(14, 28, 50, 70, 89, 94, 90, 75, 59, 44, 27)) • #create a new variable for • x^2 df$x2 <- df$x^2 • #fit quadratic regression model • quadraticModel <- lm(y ~ x + x2, data=df) • #display coefficients of quadratic regression model • summary(quadraticModel)$coef
  • 20. What are appropriate problems for Decision tree learning? • Although a variety of decision-tree learning methods have been developed with somewhat differing capabilities and requirements, decision-tree learning is generally best suited to problems with the following characteristics: 1. Instances are represented by attribute-value pairs. • “Instances are described by a fixed set of attributes (e.g., Temperature) and their values (e.g., Hot). • The easiest situation for decision tree learning is when each attribute takes on a small number of disjoint possible values (e.g., Hot, Mild, Cold). • However, extensions to the basic algorithm allow handling real- valued attributes as well (e.g., representing Temperature numerically).”
  • 21. Example • snames<- c(“ram”,”shyam”,”tina”,”simi”,”rahul”,”raj”) • sage<-c(17,16,17,18,16,16,17,17) • d<- data.frame(cbind(snames,sage)) • S<-ctree(sage~snames,data=d) • s
  • 22. 2. The target function has discrete output values. • “The decision tree is usually used for Boolean classification (e.g., yes or no) kind of example. • Decision tree methods easily extend to learning functions with more than two possible output values. • A more substantial extension allows learning target functions with real-valued outputs, though the application of decision trees in this setting is less common.”
  • 23. Example • R<-read.csv(“StuDummy.csv”) (student.name, annual.attendance,annual.score, eligible ) • Fit<- ctree(r$ eligible ~r$annual.attendance+annual.score) • fit