SlideShare a Scribd company logo
Gradient boosted trees
Nihar Ranjan
Data Mining
 Data Mining : It is a process of extracting patterns from data. They should be:
 Valid: holding on to new data with some certainity
 Novel: being non-obvious to the system.
 Useful: should be possible to act on the item
 Understandable: Humans should be able to interpret the pattern.
 Also known as Knowledge Discovery in Databases (KDD).
Data Mining might mean:
Statistics Visualizatiom
Artificial
Intelligence
Database
Technology
Machine Learning Neural Networks
Information
Retreival
Knowledge-based
systems
Knowledge
acquisition
Pattern
Recognition
High performance
computing
And so on….
What's needed?
Suitable data Computing power Data mining software
Someone who knows both
the nature of data and the
software tools.
Reason, theory or hunch
Typical
applications of
Data Mining
and KDD
Data Mining and KDD have
widespread applications.
Some examples include: Marketing
Healthcare Financial services And so on….
Some basic techniques
Predictive model: It basically describes what will happen in the future,rather predicts by
analyzing the given current data. It uses statistical analysis, machine learning algorithms and
other forecast techniques to predict what might happen in the future.It is not accurate as it is
essentially just a prediction into the future using the data and the given stastistical/Machine
Learning techniques. Eg- Performance Analysis.
Descriptive model: It basically gives a vision into the past and tells what exactly happened in
the past. It involves Data Aggregation and Data Mining.It is accurate as it describes exactly
what happened in the past. Eg- Sentiment Analysis.
Prescriptive model: This is realtively new field in Data Science.It is a step above predictive
and descriptive model. It basically provides a viable solution to the problem in hand and the
impact of considering a solution on future trend.It is still an evolving technique. Eg- Google
self driving car.
Some basic techniques
Predictive
 Regression
 Classification
 Collaborative Filtering
Descriptive
 Clustering
 Association rules and variants
 Deviation detection
Key data mining tasks
Classification: mapping
data into predefined
groups or classes.
Regression: mapping data
item to a real valued
prediction variable.
Clustering: Grouping
similar data together into
clusters.
Key learning tasks in Machine Learning
Supervised learning: A set of well-labled
data is given with defined inputs and
outputs variables (training data ) and the
algorithms learn to predict the output
from the input data.
Unsupervised learning: Data given is not
labelled ie. only input variables are given
with no corresponding output variables.
The algorithms find patterns and draw
inferences from the given data. This is
"pure Data Mining".
Semi-supervised: Some data is labeled
but most of it is unlabeled and a mixture
of supervised and unsupervised
techniques can be used.
Some basic Data Mining Methods
Decision Trees Neural Networks
Cluster/Nearest
Neighbour
Genetic
Algorithms/Evolutionary
Computing
Bayesien Networks Statistics Hybrids
Gradient
boosted trees
 We are interested in Gradient boosted trees.
 We would use Rapidminer (possibly Python?)
Gradient boosted trees
 Decision Trees
 We will discuss a bit about decision trees first.
 A decision tree is a tree where each node represents a feature(attribute), each
link(branch) represents a decision(rule) and each leaf represents an
outcome(categorical or continues value).
 A decision tree takes a set of input features and splits input data recursively based
on those features.
 The processes are repeated until some stop condition is met. Ex- Depth of tree, no
more information gain possible etc.
Gradient boosted trees
 Decision Trees have been there for a long time and have also known to suffer from
bias and variance.
 We have a large bias with simple trees and large variance with complex trees.
 Ensemble methods combine several decision trees to produce better predictive
performance rather than utilizing a single decision tree.
 The main principle behind the ensemble model is that a group of weak learners
come together to form a strong learner.
 A few ensemble methods : Bagging, Boosting
 We will see each of them.
Gradient boosted trees
 Bagging
 It's used when our goal is to reduce the variance of the decision tree.
 Here the idea is to take a súbset of data from training sample chosen randomly
with replacement.
 Now, each collection of subset data is used to train their decision trees.
 Thus we end up with ensemble of different models and their average is much more
robust than a single decision tree,which is much more robust in Predictive
Analysis.
 Random Forest is an extension of Bagging.
Gradient boosted trees
 Random Forest
 It is basically a collection or ensemble of model of numerous decision trees. A collection of
trees is generally called forest.
 It is also a bagging technique with a key difference, it takes a subset of features at each split
, and prune the trees with a stopping criteria for node splits.
 The tree is grown to the largest.
 The above steps are repeated and the prediction is given based on the aggregation of
predictions from n number of trees.
 Used for both classification and regression.
 It handles higher dimensionality data and missing values well and maintains accuracy, but
doesnt give precise values for the regression model as the final prediction is based on the
mean predictions from subset trees.
Gradient boosted trees
 Boosting
 Boosting refers to a family of learners which convert weak learners to strong learners.
 It learns sequentially from the errors from a prior random sample(in our case, a tree).
 The weak learners are trained sequentially each trying to correct its predecessor.
 The early learners fit simple models to the data and then analyze the data for errors.
 All the weak learners with their higher accuracy of error (only slighty less than
guessing,0.5) are combined in some way to get a strong classifier,with a higher accuracy.
 When an input is misclassified by a hypothesis, its weight is increased so that next
hypothesis is more likely to classify it correctly.
 By combining the whole set at the end, the weak learners are converted into better
performing model.
Gradient boosted trees
Types of boosting AdaBoost: short for
Adaptive boosting.
Start from a weak
classifier and learn to
linearly combine them so
that the error is reduced.
The result is strong
classifier built by
boosting of weak
classifiers.
We train an algorithm,
say Decision tree on a
model, whose all features
have been given equal
weights.
A model is built on a
subset of data and
predictions are made on
the whole dataset,and
errors are calculated by
the predictions and
actual values.
Gradient boosted trees
 Adaboost
 While creating the next model, higher weights are given to the data points which were
predicted incorrectly ie. misclassified.
 Weights can be determined using the error value, ie. Higher the error, more is the weight
associated to the observation.
 This process is repeated until the error function does not change, or the maximum limit of
the estimators is reached.
 Its used for both classfication and regression problem,mostly decision stamps are used with
Adaboost, but any machine learning algorithm, if it accepts weight on training data set can
be used a base learner.
 One of the applications of Adaboost is face recognition systems.
Gradient boosted trees
 Types of Boosting
 Gradient Boosting
 We will cover this in detail now.
 There are other implementations of Gradient boosting like XGBoost and Light
GB.
Gradient boosted trees
 Gradient Boost
 It’s also a machine learning technique which produces which produces a
prediction model in the form of an ensemble of weak prediction models, typically
decision trees.
 Thus, they may be referred as Gradient boosted trees.
 Like other boosting methods, it builds a model in a sequential or stage-wise
fashion.
Gradient boosted trees
 We shall now see some maths behind it.
 The objective of any supervised learning algorithm is to define a loss function and minimize it.
 We have mean square error defined as:
 We want our loss function(MSE) in our predictions be minimum using gradient descent and updating our
predictions based on a learning rate.
Gradient boosted trees
 We will see what is learning rate.
 Learning rates are the hypermeters which controls how much we are adjusting the weights of our network with
respect to the loss gradient. The learning rate affects how quickly our model can converge to a local minima (aka.
arrive at the best accuracy).
 The relationship is given by the formula: new_weight = existing_weight — learning_rate * gradient
 In gradient boosted trees, we use the following learning rate:
 We basically update the predictions such that the sum of our residuals is close to zero(or minimum) and the
predicted values are sufficiently close to the actual values.
 Learning rates are so tuned so as to prevent the overfitting which the gradient boosted trees are prone to.
Gradient boosted trees
 In Gradient boosted trees, models are sequentially trained, and each model minimizes the
loss function (y = ax + b + e, e needs special attention as it is an error term) of the whole
system using Gradient descent method, as explained earlier.
 The learning procedure consecutively fits new models to provide a more accurate estimate
of response variable.
 The principle idea behind this algorithm is to create new base learners, which can be
maximally corelated with negative gradient of the loss function, associated with the whole
ensemble.
 Pros of Gradient boosted trees: Fast, easy to tune, not sensitive to scale (features can be a
mix of continuous and categorical data), good performance, lots of software available(well
supported and tested)
 Cons: Sensitive to overfitting and noise (should always cross validate)
Thanks!

More Related Content

What's hot

XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
Jaroslaw Szymczak
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
Lippo Group Digital
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
SANTHOSH RAJA M G
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
Babu Priyavrat
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
Marina Santini
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
Christopher Marker
 
Xgboost
XgboostXgboost
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
Andrew Ferlitsch
 
Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
Kush Kulshrestha
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Learning sets of rules, Sequential Learning Algorithm,FOIL
Learning sets of rules, Sequential Learning Algorithm,FOILLearning sets of rules, Sequential Learning Algorithm,FOIL
Learning sets of rules, Sequential Learning Algorithm,FOIL
Pavithra Thippanaik
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
Kirkwood Donavin
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Prakash Pimpale
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 

What's hot (20)

XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHM
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 
Xgboost
XgboostXgboost
Xgboost
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Learning sets of rules, Sequential Learning Algorithm,FOIL
Learning sets of rules, Sequential Learning Algorithm,FOILLearning sets of rules, Sequential Learning Algorithm,FOIL
Learning sets of rules, Sequential Learning Algorithm,FOIL
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 

Similar to Gradient Boosted trees

Introduction to XGBoost Machine Learning Model.pptx
Introduction to XGBoost Machine Learning Model.pptxIntroduction to XGBoost Machine Learning Model.pptx
Introduction to XGBoost Machine Learning Model.pptx
agathaljjwm20
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
Vidya sagar Sharma
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
NitinSharma134320
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
DynamicPitch
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
ssuser6654de1
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
RaflyRizky2
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
Chitrachitrap
 
Building Azure Machine Learning Models
Building Azure Machine Learning ModelsBuilding Azure Machine Learning Models
Building Azure Machine Learning Models
Eng Teong Cheah
 
Machine Learning Interview Questions and Answers
Machine Learning Interview Questions and AnswersMachine Learning Interview Questions and Answers
Machine Learning Interview Questions and Answers
Satyam Jaiswal
 
Big Data Analytics.pptx
Big Data Analytics.pptxBig Data Analytics.pptx
Big Data Analytics.pptx
Kaviya452563
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
Industrial training ppt
Industrial training pptIndustrial training ppt
Industrial training ppt
HRJEETSINGH
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
Data mining: Classification and Prediction
Data mining: Classification and PredictionData mining: Classification and Prediction
Data mining: Classification and Prediction
Datamining Tools
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
ijistjournal
 
Machine Learning_PPT.pptx
Machine Learning_PPT.pptxMachine Learning_PPT.pptx
Machine Learning_PPT.pptx
RajeshBabu833061
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive ModellingAmit Kumar
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
Ujjawal
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Adetimehin Oluwasegun Matthew
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
ijistjournal
 

Similar to Gradient Boosted trees (20)

Introduction to XGBoost Machine Learning Model.pptx
Introduction to XGBoost Machine Learning Model.pptxIntroduction to XGBoost Machine Learning Model.pptx
Introduction to XGBoost Machine Learning Model.pptx
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Building Azure Machine Learning Models
Building Azure Machine Learning ModelsBuilding Azure Machine Learning Models
Building Azure Machine Learning Models
 
Machine Learning Interview Questions and Answers
Machine Learning Interview Questions and AnswersMachine Learning Interview Questions and Answers
Machine Learning Interview Questions and Answers
 
Big Data Analytics.pptx
Big Data Analytics.pptxBig Data Analytics.pptx
Big Data Analytics.pptx
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Industrial training ppt
Industrial training pptIndustrial training ppt
Industrial training ppt
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Data mining: Classification and Prediction
Data mining: Classification and PredictionData mining: Classification and Prediction
Data mining: Classification and Prediction
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
 
Machine Learning_PPT.pptx
Machine Learning_PPT.pptxMachine Learning_PPT.pptx
Machine Learning_PPT.pptx
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
 

Recently uploaded

社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 

Recently uploaded (20)

社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 

Gradient Boosted trees

  • 2. Data Mining  Data Mining : It is a process of extracting patterns from data. They should be:  Valid: holding on to new data with some certainity  Novel: being non-obvious to the system.  Useful: should be possible to act on the item  Understandable: Humans should be able to interpret the pattern.  Also known as Knowledge Discovery in Databases (KDD).
  • 3. Data Mining might mean: Statistics Visualizatiom Artificial Intelligence Database Technology Machine Learning Neural Networks Information Retreival Knowledge-based systems Knowledge acquisition Pattern Recognition High performance computing And so on….
  • 4. What's needed? Suitable data Computing power Data mining software Someone who knows both the nature of data and the software tools. Reason, theory or hunch
  • 5. Typical applications of Data Mining and KDD Data Mining and KDD have widespread applications. Some examples include: Marketing Healthcare Financial services And so on….
  • 6. Some basic techniques Predictive model: It basically describes what will happen in the future,rather predicts by analyzing the given current data. It uses statistical analysis, machine learning algorithms and other forecast techniques to predict what might happen in the future.It is not accurate as it is essentially just a prediction into the future using the data and the given stastistical/Machine Learning techniques. Eg- Performance Analysis. Descriptive model: It basically gives a vision into the past and tells what exactly happened in the past. It involves Data Aggregation and Data Mining.It is accurate as it describes exactly what happened in the past. Eg- Sentiment Analysis. Prescriptive model: This is realtively new field in Data Science.It is a step above predictive and descriptive model. It basically provides a viable solution to the problem in hand and the impact of considering a solution on future trend.It is still an evolving technique. Eg- Google self driving car.
  • 7. Some basic techniques Predictive  Regression  Classification  Collaborative Filtering Descriptive  Clustering  Association rules and variants  Deviation detection
  • 8. Key data mining tasks Classification: mapping data into predefined groups or classes. Regression: mapping data item to a real valued prediction variable. Clustering: Grouping similar data together into clusters.
  • 9. Key learning tasks in Machine Learning Supervised learning: A set of well-labled data is given with defined inputs and outputs variables (training data ) and the algorithms learn to predict the output from the input data. Unsupervised learning: Data given is not labelled ie. only input variables are given with no corresponding output variables. The algorithms find patterns and draw inferences from the given data. This is "pure Data Mining". Semi-supervised: Some data is labeled but most of it is unlabeled and a mixture of supervised and unsupervised techniques can be used.
  • 10. Some basic Data Mining Methods Decision Trees Neural Networks Cluster/Nearest Neighbour Genetic Algorithms/Evolutionary Computing Bayesien Networks Statistics Hybrids
  • 11. Gradient boosted trees  We are interested in Gradient boosted trees.  We would use Rapidminer (possibly Python?)
  • 12. Gradient boosted trees  Decision Trees  We will discuss a bit about decision trees first.  A decision tree is a tree where each node represents a feature(attribute), each link(branch) represents a decision(rule) and each leaf represents an outcome(categorical or continues value).  A decision tree takes a set of input features and splits input data recursively based on those features.  The processes are repeated until some stop condition is met. Ex- Depth of tree, no more information gain possible etc.
  • 13. Gradient boosted trees  Decision Trees have been there for a long time and have also known to suffer from bias and variance.  We have a large bias with simple trees and large variance with complex trees.  Ensemble methods combine several decision trees to produce better predictive performance rather than utilizing a single decision tree.  The main principle behind the ensemble model is that a group of weak learners come together to form a strong learner.  A few ensemble methods : Bagging, Boosting  We will see each of them.
  • 14. Gradient boosted trees  Bagging  It's used when our goal is to reduce the variance of the decision tree.  Here the idea is to take a súbset of data from training sample chosen randomly with replacement.  Now, each collection of subset data is used to train their decision trees.  Thus we end up with ensemble of different models and their average is much more robust than a single decision tree,which is much more robust in Predictive Analysis.  Random Forest is an extension of Bagging.
  • 15. Gradient boosted trees  Random Forest  It is basically a collection or ensemble of model of numerous decision trees. A collection of trees is generally called forest.  It is also a bagging technique with a key difference, it takes a subset of features at each split , and prune the trees with a stopping criteria for node splits.  The tree is grown to the largest.  The above steps are repeated and the prediction is given based on the aggregation of predictions from n number of trees.  Used for both classification and regression.  It handles higher dimensionality data and missing values well and maintains accuracy, but doesnt give precise values for the regression model as the final prediction is based on the mean predictions from subset trees.
  • 16. Gradient boosted trees  Boosting  Boosting refers to a family of learners which convert weak learners to strong learners.  It learns sequentially from the errors from a prior random sample(in our case, a tree).  The weak learners are trained sequentially each trying to correct its predecessor.  The early learners fit simple models to the data and then analyze the data for errors.  All the weak learners with their higher accuracy of error (only slighty less than guessing,0.5) are combined in some way to get a strong classifier,with a higher accuracy.  When an input is misclassified by a hypothesis, its weight is increased so that next hypothesis is more likely to classify it correctly.  By combining the whole set at the end, the weak learners are converted into better performing model.
  • 17. Gradient boosted trees Types of boosting AdaBoost: short for Adaptive boosting. Start from a weak classifier and learn to linearly combine them so that the error is reduced. The result is strong classifier built by boosting of weak classifiers. We train an algorithm, say Decision tree on a model, whose all features have been given equal weights. A model is built on a subset of data and predictions are made on the whole dataset,and errors are calculated by the predictions and actual values.
  • 18. Gradient boosted trees  Adaboost  While creating the next model, higher weights are given to the data points which were predicted incorrectly ie. misclassified.  Weights can be determined using the error value, ie. Higher the error, more is the weight associated to the observation.  This process is repeated until the error function does not change, or the maximum limit of the estimators is reached.  Its used for both classfication and regression problem,mostly decision stamps are used with Adaboost, but any machine learning algorithm, if it accepts weight on training data set can be used a base learner.  One of the applications of Adaboost is face recognition systems.
  • 19. Gradient boosted trees  Types of Boosting  Gradient Boosting  We will cover this in detail now.  There are other implementations of Gradient boosting like XGBoost and Light GB.
  • 20. Gradient boosted trees  Gradient Boost  It’s also a machine learning technique which produces which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.  Thus, they may be referred as Gradient boosted trees.  Like other boosting methods, it builds a model in a sequential or stage-wise fashion.
  • 21. Gradient boosted trees  We shall now see some maths behind it.  The objective of any supervised learning algorithm is to define a loss function and minimize it.  We have mean square error defined as:  We want our loss function(MSE) in our predictions be minimum using gradient descent and updating our predictions based on a learning rate.
  • 22. Gradient boosted trees  We will see what is learning rate.  Learning rates are the hypermeters which controls how much we are adjusting the weights of our network with respect to the loss gradient. The learning rate affects how quickly our model can converge to a local minima (aka. arrive at the best accuracy).  The relationship is given by the formula: new_weight = existing_weight — learning_rate * gradient  In gradient boosted trees, we use the following learning rate:  We basically update the predictions such that the sum of our residuals is close to zero(or minimum) and the predicted values are sufficiently close to the actual values.  Learning rates are so tuned so as to prevent the overfitting which the gradient boosted trees are prone to.
  • 23. Gradient boosted trees  In Gradient boosted trees, models are sequentially trained, and each model minimizes the loss function (y = ax + b + e, e needs special attention as it is an error term) of the whole system using Gradient descent method, as explained earlier.  The learning procedure consecutively fits new models to provide a more accurate estimate of response variable.  The principle idea behind this algorithm is to create new base learners, which can be maximally corelated with negative gradient of the loss function, associated with the whole ensemble.  Pros of Gradient boosted trees: Fast, easy to tune, not sensitive to scale (features can be a mix of continuous and categorical data), good performance, lots of software available(well supported and tested)  Cons: Sensitive to overfitting and noise (should always cross validate)