SlideShare a Scribd company logo
1 of 22
LECTURE : INTRODUCTION TO RANDOM
FOREST AND GRADIENT BOOSTING METHODS
- Presented by Shreyas S.K
30-03-2019 1
AD RESEARCH GROUP
WHAT IS MACHINE LEARNING ABOUT??
30-03-2019 2
data
APPLICATIONS OF MACHINE LEARNING
3
ANATOMY OF DECISION TREE
4
• Trees that predict categorical results are
called as decision trees
• At each node certain set of rules should be
satisfied
• Output from each node will be a Boolean
(True/False)
• Splitting is a process of dividing a node into
two or more sub nodes
• Root node represents the entire population
• When sub nodes split into further sub
nodes then it’s a decision node
• Nodes that do not split are called as
terminal nodes/leaf nodes
ROOT NODE
DECISION NODE
LEAF NODE
Decision tree for Regression dataset
X[i] :- Input variables in the dataset
MSE :- Mean Squared Error of all samples in a node
Samples :- Total number of samples in a node
Value :- Average value of all samples corresponding to
an output variable in a node
30-03-2019
DECISION TREES FOR CLASSIFICATION
5
Predict whether or not to play tennis based on
Temperature, Humidity, Wind and Outlook
• A good decision tree is the one which makes correct
predictions for any unseen data
• Split at each node is made based on Gini-score
• Best split is the one which yields the lowest Gini-score
30-03-2019
DECISION TREE FOR REGRESSION
• Regression trees predict continuous values
• Values at the leaves are the average of all
samples in the leaf
• Best split at each node is based on MSE or
weighted average of standard deviation
6
Predict the average precipitation based on the
Slope and Elevation of the Himalayan region
30-03-2019
BEST SPLIT BASED ON STANDARD DEVIATION
7Weighted standard deviation30-03-2019
HOW LONG TO KEEP SPLITTING??..
• Until:
• Leaf nodes are pure – Only one class remains
• A maximum depth is reached
• A performance metric is achieved
• Problem:
• Decision trees tend to overfit
• Small changes in data greatly affects the prediction
• Solution:
• Pruning the trees
• Restricting the tree from growing to it’s fullest
• Maintain minimum number of samples in leaf
nodes
30-03-2019 8
Pros and Cons of Classification and Regression Trees
Advantages
• Simple to understand, interpret and
visualise
• Can handle both numerical and
categorical data
• Less effort in data preparation
• Non linear relationships between
parameters wont affect tree
performance
• Implicitly performs feature selection
Disadvantages
• Prone to create over complex trees
which lack generalization capability
• Unstable, small variations in data
results into completely different tree
• They create biased trees if some
classes dominate
• Cannot guarantee to return global
optimal decision tree
30-03-2019 9
Lower the variance of individual trees by Ensemble methods like Bagging and Boosting
ANALOGY OF ENSEMBLE LEARNING
10
Decision Tree 1 Decision Tree 2 Decision Tree 3
2.91
2.6 2.95 3.2 Desired output :- 2.85Predicted outputs
30-03-2019
RANDOM FOREST METHOD
11
Training dataset
Bootstrap sample 1 Bootstrap sample 2 Bootstrap sample k
In Bag
(2/3)
Out of Bag
(1/3)
In Bag
(2/3)
Out of Bag
(1/3)
In Bag
(2/3)
Out of Bag
(1/3)
Prediction 1 Prediction 2 Prediction k
Average of k
predictions
30-03-2019
RANDOM FOREST – A BAGGING APPROACH
30-03-2019 12
PSEUDO CODE FOR RANDOM FOREST METHOD
1. Randomly select “k” features from total “m” features
• k< m
2. Among “k” features, calculate the node “d” using best split point
3. Split the node into daughter nodes using the best split
4. Repeat steps 1 to 3 until a predefined number of nodes is reached
5. Build a forest by repeating steps 1 to 4 “n” number of times to create “n”
number of trees
6. Takes the test features and uses the rules of each randomly created trees to
predict the output
7. Calculates the votes for each predicted target
8. High voted predicted target is considered as the final prediction
30-03-2019 13
OVERFITTING – HIGH VARIANCE
• High variance
• Outcome can vary even if there are tiniest changes in the input
• Do not generalise well to new data
• High variance compared to “PHYSICAL BALANCE”
• If you are balancing on one foot while standing on solid ground you’re
not likely to fall over.
• But what if there are suddenly 100 mph wind gusts? I bet you’d fall
over.
• That’s because your ability to balance on one leg is highly dependent
on the factors in your environment.
• If even one thing changes, it could completely mess you up!
• If we mess with any factors in its training data, we could completely
change the outcome.
• This is not stable model and therefore not a model of which we
would want to make decisions.
30-03-2019 14
Don’t fall, lil guy!!
APPLICATIONS OF RANDOM FOREST
30-03-2019 15
1. Banking
• To find loyal and fraud customers
• Growth of a bank purely depends on loyal customers
• To identify customers not profitable to bank
• Bank won’t approve loans to such customers if
identified
2. Medicine
• To identify the disease by analysing patient’s
medical records
• Identify correct combination of components to
validate the medicine
3. E-commerce
• Identify likelihood of customer liking a
recommended product
GRADIENT BOOSTING METHOD
16
Create a decision tree
on known response
values
Make Predictions
Calculate errors
(Residuals)
Fit new tree using
errors as response
values
Combine new tree
with tree from
previous iteration
• Tuning parameters:
1. Number of trees
2. Maximum depth of each tree
3. Maximum features at each split
4. Learning rate
5. Minimum samples in leaf
• Builds decision trees sequentially
• More weight is given to mispredicted values
at each stage of training
• Builds more accurate models as the final
output is the average of predictions of all
decision trees
30-03-2019
GRADIENT BOOSTING – A BOOSTING APPROACH
30-03-2019 17
PSEUDO CODE FOR GRADIENT BOOSTING METHOD
1. Initialize the approximation function F(x):
2. For m=1 to M do:
• Calculate the pseudo responses
• Fit the regression tree using the training set
• Calculate the step size using the line search
• Update the model:
3. End the algorithm: is the final output
30-03-2019 18
AN EXAMPLE BASED ON GRADIENT BOOSTING
30-03-2019 19
Predict the age of a person based on whether they play video games, enjoy gardening and
their preference in wearing hats
Objective :- Minimize Squared Error
LOSS FUNCTION :- SQUARED ERROR
30-03-2019 20
F1 = F0 + gamma0 ∗ h0PseudoResidual0 = Age − F0F0 = (1/n) ∗ k=1
n
Age 𝑆𝑆𝐸 =
𝑘=1
𝑛
(𝐴𝑔𝑒 − 𝐹1)2
BOOSTING – SEQUENTIAL ACCUMULATION
30-03-2019 21
Tree1 Residual = Age – Tree1 Prediction Combined Prediction = Tree1 Prediction + Tree2 Prediction
THANK YOU FOR PATIENT HEARING!!!!..
2230-03-2019

More Related Content

What's hot

Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsDr Sulaimon Afolabi
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression TreesHemant Chetwani
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with baggingChode Amarnath
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 

What's hot (20)

Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Ensemble Learning.pptx
Ensemble Learning.pptxEnsemble Learning.pptx
Ensemble Learning.pptx
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Decision tree
Decision treeDecision tree
Decision tree
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning Problems
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Random forest
Random forestRandom forest
Random forest
 
Random forest
Random forestRandom forest
Random forest
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
XGBoost & LightGBM
XGBoost & LightGBMXGBoost & LightGBM
XGBoost & LightGBM
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 

Similar to Introduction to random forest and gradient boosting methods a lecture

Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004Salford Systems
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxAbhishekSingh43430
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Classification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docxClassification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docxmonicafrancis71118
 
Survival Analysis Superlearner
Survival Analysis SuperlearnerSurvival Analysis Superlearner
Survival Analysis SuperlearnerColleen Farrelly
 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision TreesRupak Roy
 
Random forests-talk-nl-meetup
Random forests-talk-nl-meetupRandom forests-talk-nl-meetup
Random forests-talk-nl-meetupWillem Hendriks
 
decision_trees_forests_2.pptx
decision_trees_forests_2.pptxdecision_trees_forests_2.pptx
decision_trees_forests_2.pptxstalkthemhaha
 
Data Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfData Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfJayanti Pande
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Kush Kulshrestha
 
Statistics for DP Biology IA
Statistics for DP Biology IAStatistics for DP Biology IA
Statistics for DP Biology IAVeronika Garga
 

Similar to Introduction to random forest and gradient boosting methods a lecture (20)

Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptx
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
decisiontrees (3).ppt
decisiontrees (3).pptdecisiontrees (3).ppt
decisiontrees (3).ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
random forest.pptx
random forest.pptxrandom forest.pptx
random forest.pptx
 
Classification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docxClassification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docx
 
Survival Analysis Superlearner
Survival Analysis SuperlearnerSurvival Analysis Superlearner
Survival Analysis Superlearner
 
Issues in DTL.pptx
Issues in DTL.pptxIssues in DTL.pptx
Issues in DTL.pptx
 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision Trees
 
Random forests-talk-nl-meetup
Random forests-talk-nl-meetupRandom forests-talk-nl-meetup
Random forests-talk-nl-meetup
 
decision_trees_forests_2.pptx
decision_trees_forests_2.pptxdecision_trees_forests_2.pptx
decision_trees_forests_2.pptx
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Data Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfData Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdf
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Statistics for DP Biology IA
Statistics for DP Biology IAStatistics for DP Biology IA
Statistics for DP Biology IA
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Recently uploaded (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

Introduction to random forest and gradient boosting methods a lecture

  • 1. LECTURE : INTRODUCTION TO RANDOM FOREST AND GRADIENT BOOSTING METHODS - Presented by Shreyas S.K 30-03-2019 1 AD RESEARCH GROUP
  • 2. WHAT IS MACHINE LEARNING ABOUT?? 30-03-2019 2 data
  • 4. ANATOMY OF DECISION TREE 4 • Trees that predict categorical results are called as decision trees • At each node certain set of rules should be satisfied • Output from each node will be a Boolean (True/False) • Splitting is a process of dividing a node into two or more sub nodes • Root node represents the entire population • When sub nodes split into further sub nodes then it’s a decision node • Nodes that do not split are called as terminal nodes/leaf nodes ROOT NODE DECISION NODE LEAF NODE Decision tree for Regression dataset X[i] :- Input variables in the dataset MSE :- Mean Squared Error of all samples in a node Samples :- Total number of samples in a node Value :- Average value of all samples corresponding to an output variable in a node 30-03-2019
  • 5. DECISION TREES FOR CLASSIFICATION 5 Predict whether or not to play tennis based on Temperature, Humidity, Wind and Outlook • A good decision tree is the one which makes correct predictions for any unseen data • Split at each node is made based on Gini-score • Best split is the one which yields the lowest Gini-score 30-03-2019
  • 6. DECISION TREE FOR REGRESSION • Regression trees predict continuous values • Values at the leaves are the average of all samples in the leaf • Best split at each node is based on MSE or weighted average of standard deviation 6 Predict the average precipitation based on the Slope and Elevation of the Himalayan region 30-03-2019
  • 7. BEST SPLIT BASED ON STANDARD DEVIATION 7Weighted standard deviation30-03-2019
  • 8. HOW LONG TO KEEP SPLITTING??.. • Until: • Leaf nodes are pure – Only one class remains • A maximum depth is reached • A performance metric is achieved • Problem: • Decision trees tend to overfit • Small changes in data greatly affects the prediction • Solution: • Pruning the trees • Restricting the tree from growing to it’s fullest • Maintain minimum number of samples in leaf nodes 30-03-2019 8
  • 9. Pros and Cons of Classification and Regression Trees Advantages • Simple to understand, interpret and visualise • Can handle both numerical and categorical data • Less effort in data preparation • Non linear relationships between parameters wont affect tree performance • Implicitly performs feature selection Disadvantages • Prone to create over complex trees which lack generalization capability • Unstable, small variations in data results into completely different tree • They create biased trees if some classes dominate • Cannot guarantee to return global optimal decision tree 30-03-2019 9 Lower the variance of individual trees by Ensemble methods like Bagging and Boosting
  • 10. ANALOGY OF ENSEMBLE LEARNING 10 Decision Tree 1 Decision Tree 2 Decision Tree 3 2.91 2.6 2.95 3.2 Desired output :- 2.85Predicted outputs 30-03-2019
  • 11. RANDOM FOREST METHOD 11 Training dataset Bootstrap sample 1 Bootstrap sample 2 Bootstrap sample k In Bag (2/3) Out of Bag (1/3) In Bag (2/3) Out of Bag (1/3) In Bag (2/3) Out of Bag (1/3) Prediction 1 Prediction 2 Prediction k Average of k predictions 30-03-2019
  • 12. RANDOM FOREST – A BAGGING APPROACH 30-03-2019 12
  • 13. PSEUDO CODE FOR RANDOM FOREST METHOD 1. Randomly select “k” features from total “m” features • k< m 2. Among “k” features, calculate the node “d” using best split point 3. Split the node into daughter nodes using the best split 4. Repeat steps 1 to 3 until a predefined number of nodes is reached 5. Build a forest by repeating steps 1 to 4 “n” number of times to create “n” number of trees 6. Takes the test features and uses the rules of each randomly created trees to predict the output 7. Calculates the votes for each predicted target 8. High voted predicted target is considered as the final prediction 30-03-2019 13
  • 14. OVERFITTING – HIGH VARIANCE • High variance • Outcome can vary even if there are tiniest changes in the input • Do not generalise well to new data • High variance compared to “PHYSICAL BALANCE” • If you are balancing on one foot while standing on solid ground you’re not likely to fall over. • But what if there are suddenly 100 mph wind gusts? I bet you’d fall over. • That’s because your ability to balance on one leg is highly dependent on the factors in your environment. • If even one thing changes, it could completely mess you up! • If we mess with any factors in its training data, we could completely change the outcome. • This is not stable model and therefore not a model of which we would want to make decisions. 30-03-2019 14 Don’t fall, lil guy!!
  • 15. APPLICATIONS OF RANDOM FOREST 30-03-2019 15 1. Banking • To find loyal and fraud customers • Growth of a bank purely depends on loyal customers • To identify customers not profitable to bank • Bank won’t approve loans to such customers if identified 2. Medicine • To identify the disease by analysing patient’s medical records • Identify correct combination of components to validate the medicine 3. E-commerce • Identify likelihood of customer liking a recommended product
  • 16. GRADIENT BOOSTING METHOD 16 Create a decision tree on known response values Make Predictions Calculate errors (Residuals) Fit new tree using errors as response values Combine new tree with tree from previous iteration • Tuning parameters: 1. Number of trees 2. Maximum depth of each tree 3. Maximum features at each split 4. Learning rate 5. Minimum samples in leaf • Builds decision trees sequentially • More weight is given to mispredicted values at each stage of training • Builds more accurate models as the final output is the average of predictions of all decision trees 30-03-2019
  • 17. GRADIENT BOOSTING – A BOOSTING APPROACH 30-03-2019 17
  • 18. PSEUDO CODE FOR GRADIENT BOOSTING METHOD 1. Initialize the approximation function F(x): 2. For m=1 to M do: • Calculate the pseudo responses • Fit the regression tree using the training set • Calculate the step size using the line search • Update the model: 3. End the algorithm: is the final output 30-03-2019 18
  • 19. AN EXAMPLE BASED ON GRADIENT BOOSTING 30-03-2019 19 Predict the age of a person based on whether they play video games, enjoy gardening and their preference in wearing hats Objective :- Minimize Squared Error
  • 20. LOSS FUNCTION :- SQUARED ERROR 30-03-2019 20 F1 = F0 + gamma0 ∗ h0PseudoResidual0 = Age − F0F0 = (1/n) ∗ k=1 n Age 𝑆𝑆𝐸 = 𝑘=1 𝑛 (𝐴𝑔𝑒 − 𝐹1)2
  • 21. BOOSTING – SEQUENTIAL ACCUMULATION 30-03-2019 21 Tree1 Residual = Age – Tree1 Prediction Combined Prediction = Tree1 Prediction + Tree2 Prediction
  • 22. THANK YOU FOR PATIENT HEARING!!!!.. 2230-03-2019

Editor's Notes

  1. Good morning everyone! My name is Shreyas. Now I’ll be giving a presentation on the topic “”
  2. The analogy of ensemble methods can be described by comparing the workflow with most popular show “Who wants to be a Billionaire”?. There are three lifelines in this show as shown. At each stage of training, we build decision trees and each of them will be giving an output. Each tree is a weak learner as the predicted output of each tree is some what better than random guessing. Here we are combining a set of weak learners and projecting a strong learner by averaging output. Probability of getting a correct answer from a friend is comparably lower than the answer we get from a set of audience.
  3. Data splitting will divide the total dataset into training and testing sets. Further the training sets are divided into bootstrap samples.
  4. The predictions made by every new tree in each iteration will be stronger than the previous one.