SlideShare a Scribd company logo
1 of 25
ACTIVE LEARNING
ASSIGNMENT FOR THE
SUBJECT
“DATA MINING
&
BUSINESS INTELLIGENCE”
CART – Classification & Regression Trees
Guided By : -
Mitali Sonar
Prepared By :-
Hemant H. Chetwani
(130410107010 LY CE-II)
CART ??
CART ??
Classification
And
Regression Trees
CART ??
Classification
 Classification is a data mining technique used for
systematic placement of group membership of data.
 It maps the data into predefined groups or classes
and searches for new patterns.
 For example, you may wish to use classification to
predict whether the weather on a particular day will
be “sunny”, “rainy”, or “cloudy”.
Regression
 Used to predict for individuals on the basis of information
gained from a previous sample of similar individuals.
 For example, A person wants do some savings for future and
then It will be based on his current values and several past
values. He uses a linear regression formula to predict his
future savings.
 It may also be used in modelling the effect of doses in
medicines or agriculture, response of a customer to a mail
and evaluate the risk that the client will not pay back the loan
taken from the bank.
What is CART?
 Classification And Regression Trees
 Developed by Breiman, Friedman, Olshen, Stone in early 80’s.
 Introduced tree-based modeling into the statistical mainstream,
rigorous approach involving cross-validation to select the optimal
tree.
 One of many tree-based modeling techniques.
 CART -- the classic
 CHAID
 C5.0
 Software package variants (SAS, S-Plus, R…)
Philosophy
“Data analysis can be done from a number of different
viewpoints. Tree structured regression offers an interesting
alternative for looking at regression type problems. It has
sometimes given clues to data structure not apparent from a
linear regression analysis. Like any tool, its greatest benefit lies
in its intelligent and sensible application.”
--Breiman, Friedman, Olshen,
Stone
Working
When & What ?
 If the dependent variable is categorical, CART produces a
classification tree. And if the variable is continuous, it
produces a regression tree.
THE KEY IDEA
Recursive Partitioning
 Take all of your data.
 Consider all possible values of all variables.
 Select the variable/value (X=t1) that produces the greatest
“separation” in the target.
 (X=t1) is called a “split”.
 If (X< t1) then send the data to the “left”; otherwise, send data point
to the “right”.
 Now repeat same process on these two “nodes”
You get a “tree”
Note: CART only uses binary splits.
CART GENERATION
STEPS
STEP 1
 Starting with the first variable, CART splits a variable at all of
its possible split points. At each possible split point of the
variable, the sample splits into two binary or child nodes.
 Cases with the “yes” response to the question posed are sent
to the left node and the “no” responses are sent to the right
node.
 It is also possible to define these split based on linear
combinations of variables.
STEP 2
 CART the applies its goodness of a split criteria to each split
point and evaluates the reduction in impurity, or
heterogeneity due to the split.
 This is based on the “Split criterion”. This works in the
following fashion:
Suppose the dependent variable is categorical, taking on
the value of 1 and 2.
The probability distribution of these variables at a given
node t are p(1|t) & p(2|t), respectively.
STEP 2
 A measure of heterogeneity, or impurity at node, i(t) is a
function of these probabilities,
 In the case of categorical dependent variables, CART allows
for a number of specifications of this function.
 The objective is to maximize the reduction in the degree of
heterogeneity in i(t).
i(t) = N ( p(1|t), p(2|t) ).
where, i(t) is a generic function.
STEPS 3, 4 & 5
 It selects the best split on the variable as that split for which
reduction in impurity is the highest, as described in step 2.
 Steps 1-3 are repeated for each of the remaining variables at
the root node. CART then ranks all the “best” splits on each
variable according to the reduction in impurity achieved by
each split.
 It selects the variable and its split point that most reduced
impurity of the root or parent node.
STEPS 6 & 7
 CART then assigns classes to these nodes according to a rule
that minimizes misclassification costs. Although all
classification tree procedures will generate some errors, there
are algorithms within CART designed to minimize these.
 Steps 1-6 are repeatedly applied to each non – terminal child
node at each of the successive stages.
STEP 8
 CART continues the splitting process and builds a large tree.
The large tree can be achieved if the splitting process
continues until every observation constitutes a terminal node.
 Obviously, such a tree will have a large number of terminal
nodes that are either pure or very small in content.
 Having generated a large tree, CART then prunes the result
using cross – validation & creates a sequence of a nested
trees. This also produce a cross – validation error rate & from
this the optimal tree is selected.
Simple Example
 Goal: Classify a record as “is owner” or “not”
 Rule might be “If lot size < 19, and if income > 84.75, then class =
“owner”.
 Recursive partitioning
Repeatedly split the records into two parts so as to achieve
maximum homogeneity within the new parts
 Pruning the tree
Simplify the tree by pruning peripheral branches to avoid overfitting.
Impurity
 Obtain overall impurity measure (weighted avg. of individual
rectangles).
 At each successive stage, compare this measure across all
possible splits in all variables.
 Choose the split that reduces impurity the most.
 Chosen split points become nodes on the tree.
First Split – The Tree
Tree after three splits
Tree after all splits
Summary
 Classification and Regression Trees are an easily
understandable and transparent method for predicting or
classifying new records.
 A tree is a graphical representation of a set of rules.
 Trees must be pruned to avoid over-fitting of the training
data.
 As trees do not make any assumptions about the data
structure, they usually require large samples.
CART – Classification & Regression Trees

More Related Content

What's hot

Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade offVARUN KUMAR
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsPalin analytics
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysisKrish_ver2
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesAbhishekKumar4995
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classificationKrish_ver2
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning AlgorithmsWalaa Hamdy Assy
 

What's hot (20)

Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Decision tree
Decision treeDecision tree
Decision tree
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning Algorithms
 

Similar to CART – Classification & Regression Trees

Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
 
Alacart Poor man's classification trees
Alacart Poor man's classification treesAlacart Poor man's classification trees
Alacart Poor man's classification treesLeonardo Auslender
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...Yao Wu
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)AlexAman1
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...INFOGAIN PUBLICATION
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 

Similar to CART – Classification & Regression Trees (20)

Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Alacart Poor man's classification trees
Alacart Poor man's classification treesAlacart Poor man's classification trees
Alacart Poor man's classification trees
 
16 Simple CART
16 Simple CART16 Simple CART
16 Simple CART
 
Advanced cart 2007
Advanced cart 2007Advanced cart 2007
Advanced cart 2007
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Decision tree
Decision tree Decision tree
Decision tree
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
Introduction to cart_2009
Introduction to cart_2009Introduction to cart_2009
Introduction to cart_2009
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Classification
ClassificationClassification
Classification
 
Classification
ClassificationClassification
Classification
 

More from Hemant Chetwani

More from Hemant Chetwani (12)

Simulated annealing in n - queens
Simulated annealing in n - queensSimulated annealing in n - queens
Simulated annealing in n - queens
 
Channel Capacity and transmission media
Channel Capacity and transmission mediaChannel Capacity and transmission media
Channel Capacity and transmission media
 
Pseudo Random Number
Pseudo Random NumberPseudo Random Number
Pseudo Random Number
 
Types of Compilers
Types of CompilersTypes of Compilers
Types of Compilers
 
Properties and indexers in C#
Properties and indexers in C#Properties and indexers in C#
Properties and indexers in C#
 
Socket & Server Socket
Socket & Server SocketSocket & Server Socket
Socket & Server Socket
 
Pumming Lemma
Pumming LemmaPumming Lemma
Pumming Lemma
 
Hash table
Hash tableHash table
Hash table
 
First pass of assembler
First pass of assemblerFirst pass of assembler
First pass of assembler
 
130410107010 exception handling
130410107010 exception handling130410107010 exception handling
130410107010 exception handling
 
Counters &amp; time delay
Counters &amp; time delayCounters &amp; time delay
Counters &amp; time delay
 
Bucket sort
Bucket sortBucket sort
Bucket sort
 

Recently uploaded

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 

Recently uploaded (20)

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 

CART – Classification & Regression Trees

  • 1. ACTIVE LEARNING ASSIGNMENT FOR THE SUBJECT “DATA MINING & BUSINESS INTELLIGENCE” CART – Classification & Regression Trees Guided By : - Mitali Sonar Prepared By :- Hemant H. Chetwani (130410107010 LY CE-II)
  • 5. Classification  Classification is a data mining technique used for systematic placement of group membership of data.  It maps the data into predefined groups or classes and searches for new patterns.  For example, you may wish to use classification to predict whether the weather on a particular day will be “sunny”, “rainy”, or “cloudy”.
  • 6. Regression  Used to predict for individuals on the basis of information gained from a previous sample of similar individuals.  For example, A person wants do some savings for future and then It will be based on his current values and several past values. He uses a linear regression formula to predict his future savings.  It may also be used in modelling the effect of doses in medicines or agriculture, response of a customer to a mail and evaluate the risk that the client will not pay back the loan taken from the bank.
  • 7. What is CART?  Classification And Regression Trees  Developed by Breiman, Friedman, Olshen, Stone in early 80’s.  Introduced tree-based modeling into the statistical mainstream, rigorous approach involving cross-validation to select the optimal tree.  One of many tree-based modeling techniques.  CART -- the classic  CHAID  C5.0  Software package variants (SAS, S-Plus, R…)
  • 8. Philosophy “Data analysis can be done from a number of different viewpoints. Tree structured regression offers an interesting alternative for looking at regression type problems. It has sometimes given clues to data structure not apparent from a linear regression analysis. Like any tool, its greatest benefit lies in its intelligent and sensible application.” --Breiman, Friedman, Olshen, Stone
  • 10. When & What ?  If the dependent variable is categorical, CART produces a classification tree. And if the variable is continuous, it produces a regression tree.
  • 11. THE KEY IDEA Recursive Partitioning  Take all of your data.  Consider all possible values of all variables.  Select the variable/value (X=t1) that produces the greatest “separation” in the target.  (X=t1) is called a “split”.  If (X< t1) then send the data to the “left”; otherwise, send data point to the “right”.  Now repeat same process on these two “nodes” You get a “tree” Note: CART only uses binary splits.
  • 13. STEP 1  Starting with the first variable, CART splits a variable at all of its possible split points. At each possible split point of the variable, the sample splits into two binary or child nodes.  Cases with the “yes” response to the question posed are sent to the left node and the “no” responses are sent to the right node.  It is also possible to define these split based on linear combinations of variables.
  • 14. STEP 2  CART the applies its goodness of a split criteria to each split point and evaluates the reduction in impurity, or heterogeneity due to the split.  This is based on the “Split criterion”. This works in the following fashion: Suppose the dependent variable is categorical, taking on the value of 1 and 2. The probability distribution of these variables at a given node t are p(1|t) & p(2|t), respectively.
  • 15. STEP 2  A measure of heterogeneity, or impurity at node, i(t) is a function of these probabilities,  In the case of categorical dependent variables, CART allows for a number of specifications of this function.  The objective is to maximize the reduction in the degree of heterogeneity in i(t). i(t) = N ( p(1|t), p(2|t) ). where, i(t) is a generic function.
  • 16. STEPS 3, 4 & 5  It selects the best split on the variable as that split for which reduction in impurity is the highest, as described in step 2.  Steps 1-3 are repeated for each of the remaining variables at the root node. CART then ranks all the “best” splits on each variable according to the reduction in impurity achieved by each split.  It selects the variable and its split point that most reduced impurity of the root or parent node.
  • 17. STEPS 6 & 7  CART then assigns classes to these nodes according to a rule that minimizes misclassification costs. Although all classification tree procedures will generate some errors, there are algorithms within CART designed to minimize these.  Steps 1-6 are repeatedly applied to each non – terminal child node at each of the successive stages.
  • 18. STEP 8  CART continues the splitting process and builds a large tree. The large tree can be achieved if the splitting process continues until every observation constitutes a terminal node.  Obviously, such a tree will have a large number of terminal nodes that are either pure or very small in content.  Having generated a large tree, CART then prunes the result using cross – validation & creates a sequence of a nested trees. This also produce a cross – validation error rate & from this the optimal tree is selected.
  • 19. Simple Example  Goal: Classify a record as “is owner” or “not”  Rule might be “If lot size < 19, and if income > 84.75, then class = “owner”.  Recursive partitioning Repeatedly split the records into two parts so as to achieve maximum homogeneity within the new parts  Pruning the tree Simplify the tree by pruning peripheral branches to avoid overfitting.
  • 20. Impurity  Obtain overall impurity measure (weighted avg. of individual rectangles).  At each successive stage, compare this measure across all possible splits in all variables.  Choose the split that reduces impurity the most.  Chosen split points become nodes on the tree.
  • 21. First Split – The Tree
  • 23. Tree after all splits
  • 24. Summary  Classification and Regression Trees are an easily understandable and transparent method for predicting or classifying new records.  A tree is a graphical representation of a set of rules.  Trees must be pruned to avoid over-fitting of the training data.  As trees do not make any assumptions about the data structure, they usually require large samples.