SlideShare a Scribd company logo
1 of 24
Proprietary Information created by Parth Khare
Machine Learning
Classification & Decision Trees
04/01/2013
2
Contents
 Recursive Partitioning
 Classification
 Regression/Decision
 Bagging
 Random Forest
 Boosting
 Gradient Boosting
 Questions
2
3
Detail and flow
 What is the difference between supervised and unsupervised learning?
 What is ML? how is it different from classical statistics?
 Supervised learning: machine -> an application is Trees
 Most elementary analysis: CART
 Tree
3
4
Basics
 Supervised Learning:
 Called “supervised” because of the presence of the outcome variable to guide learning process
 building a learner (/model) to predict the outcome for new unseen objects.
 Alternatively,
 Unsupervised Learning:
 observe only the features and have no measurements of the outcome
 task is rather to describe how the data are organized or clustered
4
5
Machine Learning viz Statistics
‘learning’ viz ‘fitting’
 Machine learning: a branch of artificial intelligence, is about the construction and study
of systems that can learn from data.
 Statistics bases everything on probability models
 assuming your data are samples from a random variable with some
 distribution, then making
 inferences about the parameters of the distribution
 Machine learning may use probability models, and when it does, it overlaps with
statistics.
 isn't so committed to probability
 use other approaches to problem solving that are not based on probability
 The basic optimization concept is the same for trees is same as that of parametric
techniques, minimizing errors metrics. Instead of square error function or MLE,
Machine Learning supervises optimization of entropy, node impurity etc
 An application _-> Trees
5
6
Decision Tree Approach: Parlance
 A decision tree represents a hierarchical segmentation of the data
 The original segment is called the root node and is the entire data set
 The root node is partitioned into two or more segments by applying a series of simple
rules over an input variables
 For example, risk = low, risk = not low
 Each rule assigns the observations to a segment based on its input value
 Each resulting segment can be further partitioned into sub-segments, and so on
 For example risk = low can be partitioned into income = low and income = not low
 The segments are also called nodes, and the final segments are called leaf nodes or
leaves
 Final node surviving the partitions called the terminal node
7
Decision Tree Example: Risk
Assessment(Loan)
Income
< $30k >= $30k
Age Credit Score
< 25 >=25 < 600 >= 600
not on-time on-time not on-time on-time
8
CART: Heuristic and Visual
 Generic supervised learning problem:
 given a bunch of data (x1, y1), (x2, y2)…(xn,yn), and a new point ‘xi ‘, supervised learning
objective: associates a ‘y’ with this new ‘x’
 Main Idea: form a binary tree and minimize error in each leaf
 Given dataset, a decision tree: choose a sequence of binary split of the data
8
9
Growing the tree
 Growing the tree involves successively partitioning the data – recursively partitioning
 If an input variable is binary, then the two categories can be used to split the data
(relative concentration of ‘0’’s and ‘1’’s)
 If an input variable is interval, a splitting value is used to classify the data into two
segments
 For example, if household income is interval and there are 100 possible incomes in the
data set, then there are 100 possible splitting values
 For example, income < $30k, and income >= $30k
10
Classification Tree: again (referrence)
 Represented by a series of binary splits.
 Each internal node represents a value
query on one of the variables — e.g. “Is
X3 > 0.4”. If the answer is “Yes”, go right,
else go left.
 The terminal nodes are the decision
nodes. Typically each terminal node is
dominated by one of the classes.
 The tree is grown using training data, by
recursive splitting.
 The tree is often pruned to an optimal
size, evaluated by cross-validation.
 New observations are classified by
passing their X down to a terminal node of
the tree, and then using majority vote.
10
11
Evaluating the partitions
 When the target is categorical, for each partition of an input variable a chi-square
statistic is computed
 A contingency table is formed that maps responders and non-responders against the
partitioned input variable
 For example, the null hypothesis might be that there is no difference between people
with income <$30k and those with income >=$30k in making an on-time loan payment
 The lower the significance or p-value, the more likely that we reject this hypothesis,
meaning that this income split is a discriminating factor
12
Splitting Criteria: Categorical
 Information Gain -> Entropy
 The rarity of an event is defined as: -log2(pi)
 Impurity Measure:
- Pr(Y=0) X log2 [Pr(Y=0)] - Pr(Y=1) X log2 [Pr(Y=1)]
e.g. check at Pr(Y=0) = 0.5??
 Entropy sums up the rarity of response and non-response over all observations
 Entropy ranges from the best case of 0 (all responders or all non-responders) to 1
(equal mix of responders and non-responders)
link
http://www.youtube.com/watch?v=p17C9q2M00Q
12
13
Splitting Criteria :Continuous
 An F-statistic is used to measure the degree of separation of a split for an interval
target, such as revenue
 Similar to the sum of squares discussion under multiple regression,
F-statistic is based on the ratio of the sum of squares between the groups and the sum
of squares within groups, both adjusted for the number of degrees of freedom
 The null hypothesis is that there is no difference in the target mean between the two
groups
14
Contents
 Recursive Partitioning
 Classification
 Regression/Decision
 Bagging
 Random Forest
 Boosting
 Gradient Boosting
14
15
Bagging
 Ensemble Models : Combines the results from different models
 An ensemble classifier using many decision tree models
 Bagging: Bootstrapped Samples of data
Working: Random Forest
 A different subset of the training data are selected (~2/3), with replacement, to train
each tree
 Remaining training data (OOB) are used to estimate error and variable importance
 Class assignment is made by the number of votes from all of the trees and for
regression the average of the results is used
 A randomly selected subset of variables is used to split each node
 The number of variables used is decided by the user (mtry parameter in R)
15
16
Bagging: Stanford
 Suppose
 C(S, x) is a classifier, such as a tree, based
on our training data S, producing a
predicted class label at input point x.
 ‘To bag C, we draw bootstrap samples
S∗1,...S∗B each of size N with replacement
from the training data.
 Then
 Cˆbag(x) = Majority Vote{C(S∗b, x)}B
b =1.
 Bagging can dramatically reduce the
variance of unstable procedures (like
trees), leading to improved prediction.
 However any simple structure in C (e.g a
tree) is lost.
16
17
Bootstrapped samples
17
18
Contents
 Recursive Partitioning
 Classification
 Regression/Decision
 Bagging
 Random Forest
 Boosting
 Gradient Boosting
18
19
Boosting
Make Copies of Data
 Boosting idea: Based on "strength of weak learn ability" principles
 Example:
IF Gender=MALE AND Age<=25 THEN claim_freq.=‘high’
Combination of weak learners increased accuracy
 Simple or “weak" learners are not perfect!
 Every “boosting” algorithm can be interpreted as optimizing the loss function in a “greedy stage-
wise” manner
Working: Gradient Descent
 First tree is created, residuals observed
 Now, a tree is fitted on the residuals of the first tree and so on
 In this way, boosting grows trees in series, with later trees dependent on the results of previous
trees
 Shrinkage, CV folds, Interaction Depth
 Adaboost, DirectBoost, Laplace Loss(Gaussian Boost)
19
20
GBM
 Gradient Tree Boosting is a generalization of boosting to arbitrary differentiable loss functions.
GBRT is an accurate and effective off-the-shelf procedure that can be used for both regression and
classification problems.
 What it does essentially
 By sequentially learning form the errors of the previous trees Gradient Boosting, in a way tries to
‘learn’ the unconditional distribution of the target variable. So, analogus to how we use different
types of distributions in GLM modeling, GBM creates/replicates the distribution in the given data
as close as possible.
 This comes with an additional risk of over-fitting, resolved by methods like cross validation
within, min observation per node etc.
 Parameters working: OOB data/error
 We know that the first tree of GBM is build on training data and the subsequent trees are
developed on the error form the first tree. This process carries on.
 For OOB, the training data is also split in two parts, on one part the trees and developed, and on
the other part the tree developed on the first part is tested. This second part is called the OOB
data and the error obtained is known as OOB error.
20
21
Summary: Rf and GBM
Main similarities:
 Both derive many benefits from ensembling, with few disadvantages
 Both can be applied to ensembling decision trees
Main differences:
 Boosting performs an exhaustive search for best predictor to split on; RF searches
only a small subset
 Boosting grows trees in series, with later trees dependent on the results of
previous trees
 RF grows trees in parallel independently of one another.
 RF cannot work with missing values GBM can
21
22
More diff b/w RF and GBM
 Algorithmic difference is;
 Random Forests are trained with random sample of data (even more randomized
cases available like feature randomization) and it trusts randomization to have better
generalization performance on out of train set.
 On the other spectrum, Gradient Boosted Trees algorithm additionally tries to find
optimal linear combination of trees (assume final model is the weighted sum of
predictions of individual trees) in relation to given train data. This extra tuning might
be deemed as the difference. Note that, there are many variations of those
algorithms as well.
 At the practical side; owing to this tuning stage,
 Gradient Boosted Trees are more susceptible to jiggling data. This final stage makes
GBT more likely to overfit therefore if the test cases are inclined to be so verbose
compared to train cases this algorithm starts lacking.
 On the contrary, Random Forests are better to strain on overfitting although it is
lacking on the other way around.
22
23
Questions
 Concept/ Interpretation
 Application
23
For further details contact:
Parth Khare
https://www.linkedin.com/profile/view?id=43877647&trk=nav_responsive_tab_profile

More Related Content

What's hot

Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERKnoldus Inc.
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSpotle.ai
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
 

What's hot (20)

Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Xgboost
XgboostXgboost
Xgboost
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Decision tree
Decision treeDecision tree
Decision tree
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision tree
Decision treeDecision tree
Decision tree
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
Random forest
Random forestRandom forest
Random forest
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
 
XGBoost & LightGBM
XGBoost & LightGBMXGBoost & LightGBM
XGBoost & LightGBM
 

Similar to Machine learning basics using trees algorithm (Random forest, Gradient Boosting)

13 random forest
13 random forest13 random forest
13 random forestVishal Dutt
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...Yao Wu
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxShivakrishnan18
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxAbhishekSingh43430
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009Matthew Magistrado
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3butest
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsVidya sagar Sharma
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Kush Kulshrestha
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.pptLaxmi139487
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...INFOGAIN PUBLICATION
 
Algoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nyaAlgoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nyabatubao
 

Similar to Machine learning basics using trees algorithm (Random forest, Gradient Boosting) (20)

Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
13 random forest
13 random forest13 random forest
13 random forest
 
Decision tree
Decision tree Decision tree
Decision tree
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptx
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptx
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
Classification
ClassificationClassification
Classification
 
Classification
ClassificationClassification
Classification
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.ppt
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
 
Algoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nyaAlgoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nya
 

Recently uploaded

Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptTanveerAhmed817946
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjaytendertech
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIf6x4zqzk86
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样wsppdmt
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...mikehavy0
 

Recently uploaded (20)

Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .ppt
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 

Machine learning basics using trees algorithm (Random forest, Gradient Boosting)

  • 1. Proprietary Information created by Parth Khare Machine Learning Classification & Decision Trees 04/01/2013
  • 2. 2 Contents  Recursive Partitioning  Classification  Regression/Decision  Bagging  Random Forest  Boosting  Gradient Boosting  Questions 2
  • 3. 3 Detail and flow  What is the difference between supervised and unsupervised learning?  What is ML? how is it different from classical statistics?  Supervised learning: machine -> an application is Trees  Most elementary analysis: CART  Tree 3
  • 4. 4 Basics  Supervised Learning:  Called “supervised” because of the presence of the outcome variable to guide learning process  building a learner (/model) to predict the outcome for new unseen objects.  Alternatively,  Unsupervised Learning:  observe only the features and have no measurements of the outcome  task is rather to describe how the data are organized or clustered 4
  • 5. 5 Machine Learning viz Statistics ‘learning’ viz ‘fitting’  Machine learning: a branch of artificial intelligence, is about the construction and study of systems that can learn from data.  Statistics bases everything on probability models  assuming your data are samples from a random variable with some  distribution, then making  inferences about the parameters of the distribution  Machine learning may use probability models, and when it does, it overlaps with statistics.  isn't so committed to probability  use other approaches to problem solving that are not based on probability  The basic optimization concept is the same for trees is same as that of parametric techniques, minimizing errors metrics. Instead of square error function or MLE, Machine Learning supervises optimization of entropy, node impurity etc  An application _-> Trees 5
  • 6. 6 Decision Tree Approach: Parlance  A decision tree represents a hierarchical segmentation of the data  The original segment is called the root node and is the entire data set  The root node is partitioned into two or more segments by applying a series of simple rules over an input variables  For example, risk = low, risk = not low  Each rule assigns the observations to a segment based on its input value  Each resulting segment can be further partitioned into sub-segments, and so on  For example risk = low can be partitioned into income = low and income = not low  The segments are also called nodes, and the final segments are called leaf nodes or leaves  Final node surviving the partitions called the terminal node
  • 7. 7 Decision Tree Example: Risk Assessment(Loan) Income < $30k >= $30k Age Credit Score < 25 >=25 < 600 >= 600 not on-time on-time not on-time on-time
  • 8. 8 CART: Heuristic and Visual  Generic supervised learning problem:  given a bunch of data (x1, y1), (x2, y2)…(xn,yn), and a new point ‘xi ‘, supervised learning objective: associates a ‘y’ with this new ‘x’  Main Idea: form a binary tree and minimize error in each leaf  Given dataset, a decision tree: choose a sequence of binary split of the data 8
  • 9. 9 Growing the tree  Growing the tree involves successively partitioning the data – recursively partitioning  If an input variable is binary, then the two categories can be used to split the data (relative concentration of ‘0’’s and ‘1’’s)  If an input variable is interval, a splitting value is used to classify the data into two segments  For example, if household income is interval and there are 100 possible incomes in the data set, then there are 100 possible splitting values  For example, income < $30k, and income >= $30k
  • 10. 10 Classification Tree: again (referrence)  Represented by a series of binary splits.  Each internal node represents a value query on one of the variables — e.g. “Is X3 > 0.4”. If the answer is “Yes”, go right, else go left.  The terminal nodes are the decision nodes. Typically each terminal node is dominated by one of the classes.  The tree is grown using training data, by recursive splitting.  The tree is often pruned to an optimal size, evaluated by cross-validation.  New observations are classified by passing their X down to a terminal node of the tree, and then using majority vote. 10
  • 11. 11 Evaluating the partitions  When the target is categorical, for each partition of an input variable a chi-square statistic is computed  A contingency table is formed that maps responders and non-responders against the partitioned input variable  For example, the null hypothesis might be that there is no difference between people with income <$30k and those with income >=$30k in making an on-time loan payment  The lower the significance or p-value, the more likely that we reject this hypothesis, meaning that this income split is a discriminating factor
  • 12. 12 Splitting Criteria: Categorical  Information Gain -> Entropy  The rarity of an event is defined as: -log2(pi)  Impurity Measure: - Pr(Y=0) X log2 [Pr(Y=0)] - Pr(Y=1) X log2 [Pr(Y=1)] e.g. check at Pr(Y=0) = 0.5??  Entropy sums up the rarity of response and non-response over all observations  Entropy ranges from the best case of 0 (all responders or all non-responders) to 1 (equal mix of responders and non-responders) link http://www.youtube.com/watch?v=p17C9q2M00Q 12
  • 13. 13 Splitting Criteria :Continuous  An F-statistic is used to measure the degree of separation of a split for an interval target, such as revenue  Similar to the sum of squares discussion under multiple regression, F-statistic is based on the ratio of the sum of squares between the groups and the sum of squares within groups, both adjusted for the number of degrees of freedom  The null hypothesis is that there is no difference in the target mean between the two groups
  • 14. 14 Contents  Recursive Partitioning  Classification  Regression/Decision  Bagging  Random Forest  Boosting  Gradient Boosting 14
  • 15. 15 Bagging  Ensemble Models : Combines the results from different models  An ensemble classifier using many decision tree models  Bagging: Bootstrapped Samples of data Working: Random Forest  A different subset of the training data are selected (~2/3), with replacement, to train each tree  Remaining training data (OOB) are used to estimate error and variable importance  Class assignment is made by the number of votes from all of the trees and for regression the average of the results is used  A randomly selected subset of variables is used to split each node  The number of variables used is decided by the user (mtry parameter in R) 15
  • 16. 16 Bagging: Stanford  Suppose  C(S, x) is a classifier, such as a tree, based on our training data S, producing a predicted class label at input point x.  ‘To bag C, we draw bootstrap samples S∗1,...S∗B each of size N with replacement from the training data.  Then  Cˆbag(x) = Majority Vote{C(S∗b, x)}B b =1.  Bagging can dramatically reduce the variance of unstable procedures (like trees), leading to improved prediction.  However any simple structure in C (e.g a tree) is lost. 16
  • 18. 18 Contents  Recursive Partitioning  Classification  Regression/Decision  Bagging  Random Forest  Boosting  Gradient Boosting 18
  • 19. 19 Boosting Make Copies of Data  Boosting idea: Based on "strength of weak learn ability" principles  Example: IF Gender=MALE AND Age<=25 THEN claim_freq.=‘high’ Combination of weak learners increased accuracy  Simple or “weak" learners are not perfect!  Every “boosting” algorithm can be interpreted as optimizing the loss function in a “greedy stage- wise” manner Working: Gradient Descent  First tree is created, residuals observed  Now, a tree is fitted on the residuals of the first tree and so on  In this way, boosting grows trees in series, with later trees dependent on the results of previous trees  Shrinkage, CV folds, Interaction Depth  Adaboost, DirectBoost, Laplace Loss(Gaussian Boost) 19
  • 20. 20 GBM  Gradient Tree Boosting is a generalization of boosting to arbitrary differentiable loss functions. GBRT is an accurate and effective off-the-shelf procedure that can be used for both regression and classification problems.  What it does essentially  By sequentially learning form the errors of the previous trees Gradient Boosting, in a way tries to ‘learn’ the unconditional distribution of the target variable. So, analogus to how we use different types of distributions in GLM modeling, GBM creates/replicates the distribution in the given data as close as possible.  This comes with an additional risk of over-fitting, resolved by methods like cross validation within, min observation per node etc.  Parameters working: OOB data/error  We know that the first tree of GBM is build on training data and the subsequent trees are developed on the error form the first tree. This process carries on.  For OOB, the training data is also split in two parts, on one part the trees and developed, and on the other part the tree developed on the first part is tested. This second part is called the OOB data and the error obtained is known as OOB error. 20
  • 21. 21 Summary: Rf and GBM Main similarities:  Both derive many benefits from ensembling, with few disadvantages  Both can be applied to ensembling decision trees Main differences:  Boosting performs an exhaustive search for best predictor to split on; RF searches only a small subset  Boosting grows trees in series, with later trees dependent on the results of previous trees  RF grows trees in parallel independently of one another.  RF cannot work with missing values GBM can 21
  • 22. 22 More diff b/w RF and GBM  Algorithmic difference is;  Random Forests are trained with random sample of data (even more randomized cases available like feature randomization) and it trusts randomization to have better generalization performance on out of train set.  On the other spectrum, Gradient Boosted Trees algorithm additionally tries to find optimal linear combination of trees (assume final model is the weighted sum of predictions of individual trees) in relation to given train data. This extra tuning might be deemed as the difference. Note that, there are many variations of those algorithms as well.  At the practical side; owing to this tuning stage,  Gradient Boosted Trees are more susceptible to jiggling data. This final stage makes GBT more likely to overfit therefore if the test cases are inclined to be so verbose compared to train cases this algorithm starts lacking.  On the contrary, Random Forests are better to strain on overfitting although it is lacking on the other way around. 22
  • 24. For further details contact: Parth Khare https://www.linkedin.com/profile/view?id=43877647&trk=nav_responsive_tab_profile