SlideShare a Scribd company logo
1 of 23
Decision Tree
Decision tree is a graph to represent choices and their
results in form of a tree.
The nodes in the graph represent an event or choice and
the edges of the graph represent the decision rules or
conditions.
It is mostly used in Machine Learning and Data Mining
applications using R.
Examples
• predicting an email as spam or not spam.
• predicting of a tumour is cancerous or not .
• predicting a loan as a good or bad credit risk based
on the factors in each of these.
• Generally, a model is created with observed data
also called training data.
• Then a set of validation data is used to verify and
improve the model.
• R has packages which are used to create and
visualize decision trees.
• For new set of predictor variable, we use this model
to arrive at a decision on the category (yes/No,
spam/not spam) of the data.
• The R package "party" is used to create decision
trees.
• Install R Package
• Use the below command in R console to install
the package. You also have to install the
dependent packages if any.
• install.packages("party")
• The package "party" has the function ctree()(is
used to create recursive tree), conditional
inference tree, which is used to create and
analyze decision tree.
• Syntax
• The basic syntax for creating a decision tree in R is −
• ctree(formula, data) Following is the description of
the parameters used −
• formula is a formula describing the predictor and
response variables.
• data is the name of the data set used.
Input Data
• We will use the R in-built data set
named readingSkills to create a decision tree.
• It describes the score of someone's readingSkills if
we know the variables "age","shoesize","score" and
whether the person is a native speaker or not.
• Here is the sample data.
• # Load the party package.
• It will automatically load other # dependent
packages.
• library(party) # Print some records from data
set readingSkills.
• print(head(readingSkills))
• Example
• We will use the ctree() function to create the
decision tree and see its graph.
• # Load the party package.
• It will automatically load other # dependent packages.
• library(party)
• # Create the input data frame.
• InputData <- readingSkills[c(1:105),]
• # Give the chart file a name. It is the name of the output
file
• png(file = "decision_tree.png")
• # Create the tree.
• outputTree <- ctree( nativeSpeaker ~ age + shoeSize +
score, data = InputData )
• # Plot the tree.
• plot(outputTree )
• # Save the file.
• dev.off()
• When we execute the above code, it produces the
following result −
• null device 1
• Loading required package: methods
• Loading required package: grid
• Loading required package: mvtnorm
• Loading required package: modeltools
• Loading required package: stats4
• Loading required package: strucchange
• Loading required package: zoo
• Attaching package: ‘zoo’
• The following objects are masked from
‘package:base’: as.Date, as.Date.numeric
• Loading required package: sandwich
The tree will be like 4 terminal nodes
The number of input variables are age,shoeSize,score
• Load library
• library(rpart)
• nativeSpeaker_find<-
data.frame(“age”=11,”shoeSize”=30.63692,”score”=5
5.721149)
• Create an rpart object”fit”
• fit<-
rpart(nativeSpeaker~age+shoeSize+score,data=readin
gSkills)
• Use predict function
• prediction<-predict(fit,newdata=nativeSpeaker_find,
type=“class”)
• Print the return value from predict function
• print(predict)
• R’s rpart package provides a powerful framework
for growing classification and regression trees. To
see how it works, let’s get started with a minimal
example.
• First let’s define a problem.
• There’s a common scam amongst motorists
whereby a person will slam on his breaks in heavy
traffic with the intention of being rear-ended.
• The person will then file an insurance claim for
personal injury and damage to his vehicle, alleging
that the other driver was at fault.
• Suppose we want to predict which of an insurance
company’s claims are fraudulent using a decision
tree.
• To start, we need to build a training set of known
fraudulent claims.
• train <- data.frame( ClaimID = c(1,2,3), RearEnd = c(TRUE, FALSE, TRUE),
Fraud = c(TRUE, FALSE, TRUE) )
• In order to grow our decision tree, we have to first load the rpart
package. Then we can use the rpart() function, specifying the model
formula, data, and method parameters.
• In this case, we want to classify the feature Fraud using the
predictor RearEnd, so our call to rpart()
• library(rpart) mytree <- rpart( Fraud ~ RearEnd, data = train, method =
"class" )
• Mytree
• Notice the output shows only a root node.
• This is because rpart has some default parameters that prevented our
tree from growing.
• Namely minsplit and minbucket. minsplit is “the minimum number of
observations that must exist in a node in order for a split to be
attempted” and minbucket is “the minimum number of observations in
any terminal node”.
• mytree <- rpart( Fraud ~ RearEnd, data = train,
method = "class", minsplit = 2, minbucket = 1 )
Now our tree has a root node, one split and two leaves
(terminal nodes).
Observe that rpart encoded our boolean variable as an
integer (false = 0, true = 1).
We can plot mytree by loading the rattle package (and
some helper packages) and using
the fancyRpartPlot() function.
library(rattle)
library(rpart.plot)
library(RColorBrewer)
# plot mytree
fancyRpartPlot(mytree, caption = NULL)
• The decision tree correctly identified that if a
claim involved a rear-end collision, the claim was
most likely fraudulent.
• mytree <- rpart( Fraud ~ RearEnd, data = train,
method = "class", parms = list(split =
'information'), minsplit = 2, minbucket = 1 )
mytree
Example on MTCARS
• fit<-rpart(speed ~ dist,data=cars)
• fit
• plot(fit)
• text(fit,use.n = TRUE)
How to Use optim Function in R
• A function to be minimized (or maximized), with first
argument the vector of parameters over which
minimization is to take place.
• optim(par, fn, data, ...)
• where:
• par: Initial values for the parameters to be optimized over
• fn: A function to be minimized or maximized
• data: The name of the object in R that contains the data
• The following examples show how to use this function in the
following scenarios:
• 1. Find coefficients for a linear regression model.
• 2. Find coefficients for a quadratic regression model.
• Find Coefficients for Linear Regression Model
• The following code shows how to use
the optim() function to find the coefficients for a
linear regression model by minimizing the residual
sum of squares:
• #create data frame
• df <- data.frame(x=c(1, 3, 3, 5, 6, 7, 9, 12), y=c(4, 5, 8,
6, 9, 10, 13, 17))
• #define function to minimize residual sum of squares
• min_residuals <- function(data, par)
• {
• with(data, sum((par[1] + par[2] * x - y)^2)) }
• #find coefficients of linear regression model
• optim(par=c(0, 1), fn=min_residuals, data=df)
Find Coefficients for Quadratic Regression Model
• The following code shows how to use the optim() function to find the
coefficients for a quadratic regression model by minimizing the residual sum
of squares:
• #create data frame
• df <- data.frame(x=c(6, 9, 12, 14, 30, 35, 40, 47, 51, 55, 60), y=c(14, 28, 50,
70, 89, 94, 90, 75, 59, 44, 27))
• #define function to minimize residual sum of squares
• min_residuals <- function(data, par)
• {
• with(data, sum((par[1] + par[2]*x + par[3]*x^2 - y)^2))
• }
• #find coefficients of quadratic regression model
• optim(par=c(0, 0, 0), fn=min_residuals, data=df)
• Using the values returned under $par, we can
write the following fitted quadratic regression
model:
• y = -18.261 + 6.744x – 0.101x2
• We can verify this is correct by using the built-
in lm() function in R:
• #create data frame
• df <- data.frame(x=c(6, 9, 12, 14, 30, 35, 40, 47,
51, 55, 60), y=c(14, 28, 50, 70, 89, 94, 90, 75, 59,
44, 27))
• #create a new variable for
• x^2 df$x2 <- df$x^2
• #fit quadratic regression model
• quadraticModel <- lm(y ~ x + x2, data=df)
• #display coefficients of quadratic regression
model
• summary(quadraticModel)$coef
What are appropriate problems for Decision tree learning?
• Although a variety of decision-tree learning methods have been
developed with somewhat differing capabilities and
requirements, decision-tree learning is generally best suited to
problems with the following characteristics:
1. Instances are represented by attribute-value pairs.
• “Instances are described by a fixed set of attributes (e.g.,
Temperature) and their values (e.g., Hot).
• The easiest situation for decision tree learning is when each
attribute takes on a small number of disjoint possible values (e.g.,
Hot, Mild, Cold).
• However, extensions to the basic algorithm allow handling real-
valued attributes as well (e.g., representing Temperature
numerically).”
Example
• snames<- c(“ram”,”shyam”,”tina”,”simi”,”rahul”,”raj”)
• sage<-c(17,16,17,18,16,16,17,17)
• d<- data.frame(cbind(snames,sage))
• S<-ctree(sage~snames,data=d)
• s
2. The target function has discrete output values.
• “The decision tree is usually used for Boolean
classification (e.g., yes or no) kind of example.
• Decision tree methods easily extend to
learning functions with more than two
possible output values.
• A more substantial extension allows learning
target functions with real-valued outputs,
though the application of decision trees in this
setting is less common.”
Example
• R<-read.csv(“StuDummy.csv”) (student.name,
annual.attendance,annual.score, eligible )
• Fit<- ctree(r$ eligible
~r$annual.attendance+annual.score)
• fit

More Related Content

What's hot

DomainKeys Identified Mail (DKIM).pptx
DomainKeys Identified Mail (DKIM).pptxDomainKeys Identified Mail (DKIM).pptx
DomainKeys Identified Mail (DKIM).pptxSrijanKumarShetty
 
Prolog Programming : Basics
Prolog Programming : BasicsProlog Programming : Basics
Prolog Programming : BasicsMitul Desai
 
Design and Analysis of Algorithms
Design and Analysis of AlgorithmsDesign and Analysis of Algorithms
Design and Analysis of AlgorithmsArvind Krishnaa
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data framekrishna singh
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data MiningValerii Klymchuk
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based ClusteringSSA KPI
 
Introduction to matplotlib
Introduction to matplotlibIntroduction to matplotlib
Introduction to matplotlibPiyush rai
 
Database security and security in networks
Database security and security in networksDatabase security and security in networks
Database security and security in networksG Prachi
 
unit-4-dynamic programming
unit-4-dynamic programmingunit-4-dynamic programming
unit-4-dynamic programminghodcsencet
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Python- Regular expression
Python- Regular expressionPython- Regular expression
Python- Regular expressionMegha V
 
Course 102: Lecture 24: Archiving and Compression of Files
Course 102: Lecture 24: Archiving and Compression of Files Course 102: Lecture 24: Archiving and Compression of Files
Course 102: Lecture 24: Archiving and Compression of Files Ahmed El-Arabawy
 
Decomposition using Functional Dependency
Decomposition using Functional DependencyDecomposition using Functional Dependency
Decomposition using Functional DependencyRaj Naik
 
Algorithm Using Divide And Conquer
Algorithm Using Divide And ConquerAlgorithm Using Divide And Conquer
Algorithm Using Divide And ConquerUrviBhalani2
 
Elliptic Curve Cryptography
Elliptic Curve CryptographyElliptic Curve Cryptography
Elliptic Curve CryptographyJorgeVillamarin5
 

What's hot (20)

DomainKeys Identified Mail (DKIM).pptx
DomainKeys Identified Mail (DKIM).pptxDomainKeys Identified Mail (DKIM).pptx
DomainKeys Identified Mail (DKIM).pptx
 
Prolog Programming : Basics
Prolog Programming : BasicsProlog Programming : Basics
Prolog Programming : Basics
 
Design and Analysis of Algorithms
Design and Analysis of AlgorithmsDesign and Analysis of Algorithms
Design and Analysis of Algorithms
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data frame
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based Clustering
 
Database Connection
Database ConnectionDatabase Connection
Database Connection
 
Introduction to matplotlib
Introduction to matplotlibIntroduction to matplotlib
Introduction to matplotlib
 
Database security and security in networks
Database security and security in networksDatabase security and security in networks
Database security and security in networks
 
unit-4-dynamic programming
unit-4-dynamic programmingunit-4-dynamic programming
unit-4-dynamic programming
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Public key algorithm
Public key algorithmPublic key algorithm
Public key algorithm
 
svm classification
svm classificationsvm classification
svm classification
 
Python- Regular expression
Python- Regular expressionPython- Regular expression
Python- Regular expression
 
Rc4
Rc4Rc4
Rc4
 
Course 102: Lecture 24: Archiving and Compression of Files
Course 102: Lecture 24: Archiving and Compression of Files Course 102: Lecture 24: Archiving and Compression of Files
Course 102: Lecture 24: Archiving and Compression of Files
 
Minimum spanning tree
Minimum spanning treeMinimum spanning tree
Minimum spanning tree
 
Decomposition using Functional Dependency
Decomposition using Functional DependencyDecomposition using Functional Dependency
Decomposition using Functional Dependency
 
Algorithm Using Divide And Conquer
Algorithm Using Divide And ConquerAlgorithm Using Divide And Conquer
Algorithm Using Divide And Conquer
 
Elliptic Curve Cryptography
Elliptic Curve CryptographyElliptic Curve Cryptography
Elliptic Curve Cryptography
 

Similar to Decision Tree.pptx

Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
SMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachSMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachReza Rahimi
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSonaCharles2
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in RSujaAldrin
 
software engineering modules iii & iv.pptx
software engineering  modules iii & iv.pptxsoftware engineering  modules iii & iv.pptx
software engineering modules iii & iv.pptxrani marri
 
Data science with R - Clustering and Classification
Data science with R - Clustering and ClassificationData science with R - Clustering and Classification
Data science with R - Clustering and ClassificationBrigitte Mueller
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NOllieShoresna
 
How to obtain and install R.ppt
How to obtain and install R.pptHow to obtain and install R.ppt
How to obtain and install R.pptrajalakshmi5921
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsRajendran
 
Lesson 2 data preprocessing
Lesson 2   data preprocessingLesson 2   data preprocessing
Lesson 2 data preprocessingAbdurRazzaqe1
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
CS301-lec01.ppt
CS301-lec01.pptCS301-lec01.ppt
CS301-lec01.pptomair31
 

Similar to Decision Tree.pptx (20)

Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
SMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachSMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning Approach
 
R decision tree
R   decision treeR   decision tree
R decision tree
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MD
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
software engineering modules iii & iv.pptx
software engineering  modules iii & iv.pptxsoftware engineering  modules iii & iv.pptx
software engineering modules iii & iv.pptx
 
Data science with R - Clustering and Classification
Data science with R - Clustering and ClassificationData science with R - Clustering and Classification
Data science with R - Clustering and Classification
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with N
 
How to obtain and install R.ppt
How to obtain and install R.pptHow to obtain and install R.ppt
How to obtain and install R.ppt
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
Lesson 2 data preprocessing
Lesson 2   data preprocessingLesson 2   data preprocessing
Lesson 2 data preprocessing
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
 
Data Exploration in R.pptx
Data Exploration in R.pptxData Exploration in R.pptx
Data Exploration in R.pptx
 
CS301-lec01.ppt
CS301-lec01.pptCS301-lec01.ppt
CS301-lec01.ppt
 
cluster(python)
cluster(python)cluster(python)
cluster(python)
 

More from Ramakrishna Reddy Bijjam

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Arrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxArrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxRamakrishna Reddy Bijjam
 
Python With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxPython With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxRamakrishna Reddy Bijjam
 
Pointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxPointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxRamakrishna Reddy Bijjam
 
Certinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxCertinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxRamakrishna Reddy Bijjam
 
Auxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxAuxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxRamakrishna Reddy Bijjam
 

More from Ramakrishna Reddy Bijjam (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Arrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxArrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptx
 
Auxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptxAuxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptx
 
Python With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxPython With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptx
 
Pointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxPointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptx
 
Certinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxCertinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptx
 
Auxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxAuxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptx
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
K Means Clustering in ML.pptx
K Means Clustering in ML.pptxK Means Clustering in ML.pptx
K Means Clustering in ML.pptx
 
Pandas.pptx
Pandas.pptxPandas.pptx
Pandas.pptx
 
Python With MongoDB.pptx
Python With MongoDB.pptxPython With MongoDB.pptx
Python With MongoDB.pptx
 
Python with MySql.pptx
Python with MySql.pptxPython with MySql.pptx
Python with MySql.pptx
 
PYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdfPYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdf
 
BInary file Operations.pptx
BInary file Operations.pptxBInary file Operations.pptx
BInary file Operations.pptx
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
CSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptxCSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptx
 
HTML files in python.pptx
HTML files in python.pptxHTML files in python.pptx
HTML files in python.pptx
 
Regular Expressions in Python.pptx
Regular Expressions in Python.pptxRegular Expressions in Python.pptx
Regular Expressions in Python.pptx
 
datareprersentation 1.pptx
datareprersentation 1.pptxdatareprersentation 1.pptx
datareprersentation 1.pptx
 
Apriori.pptx
Apriori.pptxApriori.pptx
Apriori.pptx
 

Recently uploaded

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 

Recently uploaded (20)

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 

Decision Tree.pptx

  • 1. Decision Tree Decision tree is a graph to represent choices and their results in form of a tree. The nodes in the graph represent an event or choice and the edges of the graph represent the decision rules or conditions. It is mostly used in Machine Learning and Data Mining applications using R.
  • 2. Examples • predicting an email as spam or not spam. • predicting of a tumour is cancerous or not . • predicting a loan as a good or bad credit risk based on the factors in each of these. • Generally, a model is created with observed data also called training data. • Then a set of validation data is used to verify and improve the model. • R has packages which are used to create and visualize decision trees. • For new set of predictor variable, we use this model to arrive at a decision on the category (yes/No, spam/not spam) of the data.
  • 3. • The R package "party" is used to create decision trees. • Install R Package • Use the below command in R console to install the package. You also have to install the dependent packages if any. • install.packages("party") • The package "party" has the function ctree()(is used to create recursive tree), conditional inference tree, which is used to create and analyze decision tree.
  • 4. • Syntax • The basic syntax for creating a decision tree in R is − • ctree(formula, data) Following is the description of the parameters used − • formula is a formula describing the predictor and response variables. • data is the name of the data set used. Input Data • We will use the R in-built data set named readingSkills to create a decision tree. • It describes the score of someone's readingSkills if we know the variables "age","shoesize","score" and whether the person is a native speaker or not. • Here is the sample data.
  • 5. • # Load the party package. • It will automatically load other # dependent packages. • library(party) # Print some records from data set readingSkills. • print(head(readingSkills)) • Example • We will use the ctree() function to create the decision tree and see its graph.
  • 6. • # Load the party package. • It will automatically load other # dependent packages. • library(party) • # Create the input data frame. • InputData <- readingSkills[c(1:105),] • # Give the chart file a name. It is the name of the output file • png(file = "decision_tree.png") • # Create the tree. • outputTree <- ctree( nativeSpeaker ~ age + shoeSize + score, data = InputData ) • # Plot the tree. • plot(outputTree ) • # Save the file. • dev.off()
  • 7. • When we execute the above code, it produces the following result − • null device 1 • Loading required package: methods • Loading required package: grid • Loading required package: mvtnorm • Loading required package: modeltools • Loading required package: stats4 • Loading required package: strucchange • Loading required package: zoo • Attaching package: ‘zoo’ • The following objects are masked from ‘package:base’: as.Date, as.Date.numeric • Loading required package: sandwich
  • 8. The tree will be like 4 terminal nodes The number of input variables are age,shoeSize,score
  • 9. • Load library • library(rpart) • nativeSpeaker_find<- data.frame(“age”=11,”shoeSize”=30.63692,”score”=5 5.721149) • Create an rpart object”fit” • fit<- rpart(nativeSpeaker~age+shoeSize+score,data=readin gSkills) • Use predict function • prediction<-predict(fit,newdata=nativeSpeaker_find, type=“class”) • Print the return value from predict function • print(predict)
  • 10. • R’s rpart package provides a powerful framework for growing classification and regression trees. To see how it works, let’s get started with a minimal example. • First let’s define a problem. • There’s a common scam amongst motorists whereby a person will slam on his breaks in heavy traffic with the intention of being rear-ended. • The person will then file an insurance claim for personal injury and damage to his vehicle, alleging that the other driver was at fault. • Suppose we want to predict which of an insurance company’s claims are fraudulent using a decision tree.
  • 11. • To start, we need to build a training set of known fraudulent claims. • train <- data.frame( ClaimID = c(1,2,3), RearEnd = c(TRUE, FALSE, TRUE), Fraud = c(TRUE, FALSE, TRUE) ) • In order to grow our decision tree, we have to first load the rpart package. Then we can use the rpart() function, specifying the model formula, data, and method parameters. • In this case, we want to classify the feature Fraud using the predictor RearEnd, so our call to rpart() • library(rpart) mytree <- rpart( Fraud ~ RearEnd, data = train, method = "class" ) • Mytree • Notice the output shows only a root node. • This is because rpart has some default parameters that prevented our tree from growing. • Namely minsplit and minbucket. minsplit is “the minimum number of observations that must exist in a node in order for a split to be attempted” and minbucket is “the minimum number of observations in any terminal node”.
  • 12. • mytree <- rpart( Fraud ~ RearEnd, data = train, method = "class", minsplit = 2, minbucket = 1 ) Now our tree has a root node, one split and two leaves (terminal nodes). Observe that rpart encoded our boolean variable as an integer (false = 0, true = 1). We can plot mytree by loading the rattle package (and some helper packages) and using the fancyRpartPlot() function. library(rattle) library(rpart.plot) library(RColorBrewer) # plot mytree fancyRpartPlot(mytree, caption = NULL)
  • 13. • The decision tree correctly identified that if a claim involved a rear-end collision, the claim was most likely fraudulent. • mytree <- rpart( Fraud ~ RearEnd, data = train, method = "class", parms = list(split = 'information'), minsplit = 2, minbucket = 1 ) mytree
  • 14. Example on MTCARS • fit<-rpart(speed ~ dist,data=cars) • fit • plot(fit) • text(fit,use.n = TRUE)
  • 15. How to Use optim Function in R • A function to be minimized (or maximized), with first argument the vector of parameters over which minimization is to take place. • optim(par, fn, data, ...) • where: • par: Initial values for the parameters to be optimized over • fn: A function to be minimized or maximized • data: The name of the object in R that contains the data • The following examples show how to use this function in the following scenarios: • 1. Find coefficients for a linear regression model. • 2. Find coefficients for a quadratic regression model.
  • 16. • Find Coefficients for Linear Regression Model • The following code shows how to use the optim() function to find the coefficients for a linear regression model by minimizing the residual sum of squares: • #create data frame • df <- data.frame(x=c(1, 3, 3, 5, 6, 7, 9, 12), y=c(4, 5, 8, 6, 9, 10, 13, 17)) • #define function to minimize residual sum of squares • min_residuals <- function(data, par) • { • with(data, sum((par[1] + par[2] * x - y)^2)) } • #find coefficients of linear regression model • optim(par=c(0, 1), fn=min_residuals, data=df)
  • 17. Find Coefficients for Quadratic Regression Model • The following code shows how to use the optim() function to find the coefficients for a quadratic regression model by minimizing the residual sum of squares: • #create data frame • df <- data.frame(x=c(6, 9, 12, 14, 30, 35, 40, 47, 51, 55, 60), y=c(14, 28, 50, 70, 89, 94, 90, 75, 59, 44, 27)) • #define function to minimize residual sum of squares • min_residuals <- function(data, par) • { • with(data, sum((par[1] + par[2]*x + par[3]*x^2 - y)^2)) • } • #find coefficients of quadratic regression model • optim(par=c(0, 0, 0), fn=min_residuals, data=df)
  • 18. • Using the values returned under $par, we can write the following fitted quadratic regression model: • y = -18.261 + 6.744x – 0.101x2 • We can verify this is correct by using the built- in lm() function in R:
  • 19. • #create data frame • df <- data.frame(x=c(6, 9, 12, 14, 30, 35, 40, 47, 51, 55, 60), y=c(14, 28, 50, 70, 89, 94, 90, 75, 59, 44, 27)) • #create a new variable for • x^2 df$x2 <- df$x^2 • #fit quadratic regression model • quadraticModel <- lm(y ~ x + x2, data=df) • #display coefficients of quadratic regression model • summary(quadraticModel)$coef
  • 20. What are appropriate problems for Decision tree learning? • Although a variety of decision-tree learning methods have been developed with somewhat differing capabilities and requirements, decision-tree learning is generally best suited to problems with the following characteristics: 1. Instances are represented by attribute-value pairs. • “Instances are described by a fixed set of attributes (e.g., Temperature) and their values (e.g., Hot). • The easiest situation for decision tree learning is when each attribute takes on a small number of disjoint possible values (e.g., Hot, Mild, Cold). • However, extensions to the basic algorithm allow handling real- valued attributes as well (e.g., representing Temperature numerically).”
  • 21. Example • snames<- c(“ram”,”shyam”,”tina”,”simi”,”rahul”,”raj”) • sage<-c(17,16,17,18,16,16,17,17) • d<- data.frame(cbind(snames,sage)) • S<-ctree(sage~snames,data=d) • s
  • 22. 2. The target function has discrete output values. • “The decision tree is usually used for Boolean classification (e.g., yes or no) kind of example. • Decision tree methods easily extend to learning functions with more than two possible output values. • A more substantial extension allows learning target functions with real-valued outputs, though the application of decision trees in this setting is less common.”
  • 23. Example • R<-read.csv(“StuDummy.csv”) (student.name, annual.attendance,annual.score, eligible ) • Fit<- ctree(r$ eligible ~r$annual.attendance+annual.score) • fit