SlideShare a Scribd company logo
1 of 14
Predrag Radenković 3237/10
Faculty of Electrical Engineering
University Of Belgrade
Random Forest
Definition
 Random forest (or random forests) is an
ensemble classifier that consists of many
decision trees and outputs the class that is the
mode of the class's output by individual trees.
 The term came from random decision
forests that was first proposed by Tin Kam Ho
of Bell Labs in 1995.
 The method combines Breiman's "bagging"
idea and the random selection of features.
2/14 Predrag Radenković 3237/10
Decision trees
 Decision trees are individual learners that are
combined. They are one of the most popular
learning methods commonly used for data
exploration.
 One type of decision tree is called CART…
classification and regression tree.
 CART … greedy, top-down binary, recursive
partitioning, that divides feature space into sets of
disjoint rectangular regions.
 Regions should be pure wrt response variable
 Simple model is fit in each region – majority vote for
classification, constant value for regression.
3/14 Predrag Radenković 3237/10
Decision tress involve greedy, recursive
partitioning.
 Simple dataset with two predictors
 Greedy, recursive partitioning along TI and PE
4/14 Predrag Radenković 3237/10
Algorithm
Each tree is constructed using the following algorithm:
1. Let the number of training cases be N, and the number of
variables in the classifier be M.
2. We are told the number m of input variables to be used to
determine the decision at a node of the tree; m should be much
less than M.
3. Choose a training set for this tree by choosing n times with
replacement from all N available training cases (i.e. take a
bootstrap sample). Use the rest of the cases to estimate the
error of the tree, by predicting their classes.
4. For each node of the tree, randomly choose m variables on
which to base the decision at that node. Calculate the best split
based on these m variables in the training set.
5. Each tree is fully grown and not pruned (as may be done in
constructing a normal tree classifier).
For prediction a new sample is pushed down the tree. It is assigned
the label of the training sample in the terminal node it ends up in.
This procedure is iterated over all trees in the ensemble, and the
average vote of all trees is reported as random forest prediction.
5/14 Predrag Radenković 3237/10
Algorithm flow chart
 For computer scientists:
6/14 Predrag Radenković 3237/10
Random Forest – practical consideration
 Splits are chosen according to a purity measure:
 E.g. squared error (regression), Gini index or
devinace (classification)
 How to select N?
 Build trees until the error no longer decreases
 How to select M?
 Try to recommend defaults, half of them and twice
of them and pick the best.
7/14 Predrag Radenković 3237/10
Features and Advantages
The advantages of random forest are:
 It is one of the most accurate learning algorithms
available. For many data sets, it produces a highly
accurate classifier.
 It runs efficiently on large databases.
 It can handle thousands of input variables without
variable deletion.
 It gives estimates of what variables are important in
the classification.
 It generates an internal unbiased estimate of the
generalization error as the forest building progresses.
 It has an effective method for estimating missing data
and maintains accuracy when a large proportion of
the data are missing.
8/14 Predrag Radenković 3237/10
Features and Advantages
 It has methods for balancing error in class population
unbalanced data sets.
 Generated forests can be saved for future use on
other data.
 Prototypes are computed that give information about
the relation between the variables and the
classification.
 It computes proximities between pairs of cases that
can be used in clustering, locating outliers, or (by
scaling) give interesting views of the data.
 The capabilities of the above can be extended to
unlabeled data, leading to unsupervised clustering,
data views and outlier detection.
 It offers an experimental method for detecting variable
interactions.
9/14 Predrag Radenković 3237/10
Disadvantages
 Random forests have been observed to overfit for
some datasets with noisy classification/regression
tasks.
 For data including categorical variables with
different number of levels, random forests are
biased in favor of those attributes with more
levels. Therefore, the variable importance scores
from random forest are not reliable for this type of
data.
10/14 Predrag Radenković 3237/10
RM - Additional information
Estimating the test error:
 While growing forest, estimate test error from
training samples
 For each tree grown, 33-36% of samples are not
selected in bootstrap, called out of bootstrap
(OOB) samples
 Using OOB samples as input to the
corresponding tree, predictions are made as if
they were novel test samples
 Through book-keeping, majority vote
(classification), average (regression) is computed
for all OOB samples from all trees.
 Such estimated test error is very accurate in
practice, with reasonable N
11/14 Predrag Radenković 3237/10
RM - Additional information
Estimating the importance of each predictor:
 Denote by ê the OOB estimate of the loss when
using original training set, D.
 For each predictor xp where p∈{1,..,k}
 Randomly permute pth predictor to generate a new
set of samples D' ={(y1,x'1),…,(yN,x'N)}
 Compute OOB estimate êk of prediction error with
the new samples
 A measure of importance of predictor xp is êk – ê,
the increase in error due to random perturbation
of pth predictor
12/14 Predrag Radenković 3237/10
Conclusions & summary:
 Fast fast fast!
 RF is fast to build. Even faster to predict!
 Practically speaking, not requiring cross-validation alone for
model selection significantly speeds training by 10x-100x or
more.
 Fully parallelizable … to go even faster!
 Automatic predictor selection from large number of candidates
 Resistance to over training
 Ability to handle data without preprocessing
 data does not need to be rescaled, transformed, or modified
 resistant to outliers
 automatic handling of missing values
 Cluster identification can be used to generate tree-based
clusters through sample proximity
13/14 Predrag Radenković 3237/10
Thank you!
14/14 Predrag Radenković 3237/10

More Related Content

Similar to Algoritma Random Forest beserta aplikasi nya

Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...IOSR Journals
 
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanAccelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanPyData
 
83 learningdecisiontree
83 learningdecisiontree83 learningdecisiontree
83 learningdecisiontreetahseen shaikh
 
Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...eSAT Publishing House
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchjim
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Researchbutest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest Rupak Roy
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxAbhishekSingh43430
 
Download It
Download ItDownload It
Download Itbutest
 
Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniquesijsrd.com
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2arogozhnikov
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
Using Decision trees with GIS data for modeling and prediction
Using Decision trees with GIS data for modeling and prediction Using Decision trees with GIS data for modeling and prediction
Using Decision trees with GIS data for modeling and prediction Omar F. Althuwaynee
 

Similar to Algoritma Random Forest beserta aplikasi nya (20)

Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
 
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanAccelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
 
83 learningdecisiontree
83 learningdecisiontree83 learningdecisiontree
83 learningdecisiontree
 
Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptx
 
Download It
Download ItDownload It
Download It
 
Ensembles.pdf
Ensembles.pdfEnsembles.pdf
Ensembles.pdf
 
Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniques
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2
 
What Is Random Forest_ analytics_ IBM.pdf
What Is Random Forest_ analytics_ IBM.pdfWhat Is Random Forest_ analytics_ IBM.pdf
What Is Random Forest_ analytics_ IBM.pdf
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
Using Decision trees with GIS data for modeling and prediction
Using Decision trees with GIS data for modeling and prediction Using Decision trees with GIS data for modeling and prediction
Using Decision trees with GIS data for modeling and prediction
 

Recently uploaded

Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 

Recently uploaded (20)

Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 

Algoritma Random Forest beserta aplikasi nya

  • 1. Predrag Radenković 3237/10 Faculty of Electrical Engineering University Of Belgrade Random Forest
  • 2. Definition  Random forest (or random forests) is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees.  The term came from random decision forests that was first proposed by Tin Kam Ho of Bell Labs in 1995.  The method combines Breiman's "bagging" idea and the random selection of features. 2/14 Predrag Radenković 3237/10
  • 3. Decision trees  Decision trees are individual learners that are combined. They are one of the most popular learning methods commonly used for data exploration.  One type of decision tree is called CART… classification and regression tree.  CART … greedy, top-down binary, recursive partitioning, that divides feature space into sets of disjoint rectangular regions.  Regions should be pure wrt response variable  Simple model is fit in each region – majority vote for classification, constant value for regression. 3/14 Predrag Radenković 3237/10
  • 4. Decision tress involve greedy, recursive partitioning.  Simple dataset with two predictors  Greedy, recursive partitioning along TI and PE 4/14 Predrag Radenković 3237/10
  • 5. Algorithm Each tree is constructed using the following algorithm: 1. Let the number of training cases be N, and the number of variables in the classifier be M. 2. We are told the number m of input variables to be used to determine the decision at a node of the tree; m should be much less than M. 3. Choose a training set for this tree by choosing n times with replacement from all N available training cases (i.e. take a bootstrap sample). Use the rest of the cases to estimate the error of the tree, by predicting their classes. 4. For each node of the tree, randomly choose m variables on which to base the decision at that node. Calculate the best split based on these m variables in the training set. 5. Each tree is fully grown and not pruned (as may be done in constructing a normal tree classifier). For prediction a new sample is pushed down the tree. It is assigned the label of the training sample in the terminal node it ends up in. This procedure is iterated over all trees in the ensemble, and the average vote of all trees is reported as random forest prediction. 5/14 Predrag Radenković 3237/10
  • 6. Algorithm flow chart  For computer scientists: 6/14 Predrag Radenković 3237/10
  • 7. Random Forest – practical consideration  Splits are chosen according to a purity measure:  E.g. squared error (regression), Gini index or devinace (classification)  How to select N?  Build trees until the error no longer decreases  How to select M?  Try to recommend defaults, half of them and twice of them and pick the best. 7/14 Predrag Radenković 3237/10
  • 8. Features and Advantages The advantages of random forest are:  It is one of the most accurate learning algorithms available. For many data sets, it produces a highly accurate classifier.  It runs efficiently on large databases.  It can handle thousands of input variables without variable deletion.  It gives estimates of what variables are important in the classification.  It generates an internal unbiased estimate of the generalization error as the forest building progresses.  It has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing. 8/14 Predrag Radenković 3237/10
  • 9. Features and Advantages  It has methods for balancing error in class population unbalanced data sets.  Generated forests can be saved for future use on other data.  Prototypes are computed that give information about the relation between the variables and the classification.  It computes proximities between pairs of cases that can be used in clustering, locating outliers, or (by scaling) give interesting views of the data.  The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection.  It offers an experimental method for detecting variable interactions. 9/14 Predrag Radenković 3237/10
  • 10. Disadvantages  Random forests have been observed to overfit for some datasets with noisy classification/regression tasks.  For data including categorical variables with different number of levels, random forests are biased in favor of those attributes with more levels. Therefore, the variable importance scores from random forest are not reliable for this type of data. 10/14 Predrag Radenković 3237/10
  • 11. RM - Additional information Estimating the test error:  While growing forest, estimate test error from training samples  For each tree grown, 33-36% of samples are not selected in bootstrap, called out of bootstrap (OOB) samples  Using OOB samples as input to the corresponding tree, predictions are made as if they were novel test samples  Through book-keeping, majority vote (classification), average (regression) is computed for all OOB samples from all trees.  Such estimated test error is very accurate in practice, with reasonable N 11/14 Predrag Radenković 3237/10
  • 12. RM - Additional information Estimating the importance of each predictor:  Denote by ê the OOB estimate of the loss when using original training set, D.  For each predictor xp where p∈{1,..,k}  Randomly permute pth predictor to generate a new set of samples D' ={(y1,x'1),…,(yN,x'N)}  Compute OOB estimate êk of prediction error with the new samples  A measure of importance of predictor xp is êk – ê, the increase in error due to random perturbation of pth predictor 12/14 Predrag Radenković 3237/10
  • 13. Conclusions & summary:  Fast fast fast!  RF is fast to build. Even faster to predict!  Practically speaking, not requiring cross-validation alone for model selection significantly speeds training by 10x-100x or more.  Fully parallelizable … to go even faster!  Automatic predictor selection from large number of candidates  Resistance to over training  Ability to handle data without preprocessing  data does not need to be rescaled, transformed, or modified  resistant to outliers  automatic handling of missing values  Cluster identification can be used to generate tree-based clusters through sample proximity 13/14 Predrag Radenković 3237/10
  • 14. Thank you! 14/14 Predrag Radenković 3237/10