SlideShare a Scribd company logo
NGBoost:
Natural
Gradient
Boosting
Mohamed Ali Habib
Outlines
• Introduction.
• What is probabilistic regression?
• Why is it useful?
• How does other methods compare to NGBoost?
• Gradient Boosting Algorithm.
• NGBoost:
• Main components.
• Steps.
• Usage.
• Experiments & Results.
• Computational Complexity.
• Future Work.
• References.
Introduction
What is probabilistic regression?
(Standard Regression)
Note: This use of conditional probability distributions is already the norm in classification
Why is probabilistic regression (prediction) useful?
The measure of uncertainty makes probabilistic prediction crucial in applications like healthcare and
weather forecasting.
Why is probabilistic regression (prediction) useful?
All in all, probabilistic regression (prediction) provides better insight over standard (scalar)
regression.
E[Y|X=x]
X=x P(Y|X=x)
Problems with existing methods
Methods:
• Post-hoc variance.
• Generalized Additive Models for Shape Scale
and Location (GAMLSS)
• Bayesian methods like MMC.
• Bayesian deep learning.
Problems:
• Inflexible.
• Slow.
• Require expert knowledge.
• Make strong assumption about nature of data
(Homoscedasticity*)
Limitations of deep learning methods: difficult to use
out-of-the-box
• Require expert knowledge.
• Usually perform only on par with traditional
methods on limited size or tabular data.
• Require extensive hyperparameter tuning.
* Homoscedasticity: means that all random
variables in a sequence have the same finite
variance.
Gradient Boosting
Machines (GBMs)
• A set of highly modular methods
that:
• work out-of-the-box.
• Perform well on structured
data, even with small datasets.
• Demonstrated empirical success on
Kaggle and other data science
competitions.
Source: what algorithms are most successful on Kaggle?
Problems related to GBMs
• Assume Homoscedasticity: constant variance.
• Predicted distributions should have at least two
degrees of freedom (two parameters) to
effectively convey both the magnitude and the
uncertainty of the predictions.
What is the solution then?
(Spoiler Alert) it is NGBoost 
NGBoost sovles the problem of simultaneous boosting of multiple parameters from
the base learners using:
• A multiparameter boosting approach.
• Use of natural gradients.
Gradient
Boosting
Algorithm
• An ensemble of simple models are involved in making a prediction.
• Results in a prediction model in the form of ensemble weak models.
• Intuition: the best possible next model, when combined with previous models,
minimizes the overall prediction error.
• Components:
• A loss function to be optimized.
• E.g., MSE or Logarithmic Loss.
• A weak learner to make predictions.
• Most common choice is Decision Trees or Regression Trees.
• Common to constrain the learner such as specifying maximum number of
layers, nodes, splits or leaf nodes.
• An additive model to add weak learners to minimize the loss function.
• A gradient descent procedure is used to minimize the loss when adding
trees.
Gradient
Boosting
Algorithm
Gradient Boosting
Algorithm
Explanation:
Step 1: Initialize prediction to a constant whose value minimizes
the loss. You could solve using Gradient Descent or manually if
problem is trivial.
Step 2: build the trees (weak learners)
(A) Compute residuals between prediction and observed data.
Use prediction of previous step 𝐹 𝑥 = 𝐹𝑚−1(𝑥), which is
𝐹0(𝑥) for 𝑚 = 1.
(B) Optimize tree on the residuals (make residuals the target
output). 𝑗 here loops over leaf nodes.
(C) Determine output for each leaf in tree. E.g., if leaf has 14.7
and 2.7, then output is the value of 𝛾 that minimizes the
summation. Different than Step 1, here we are taking
previous prediction 𝐹𝑚−1(𝑥𝑖) into account.
(D) Make a new prediction for each sample. The summation
accounts for the case that a single sample ends up in
multiple leaves. So, you take a scaled sum of the outputs 𝛾
for each leaf. Choosing a small learning rate 𝜐 improves
prediction
Step 3: Final prediction is the prediction of the last tree.
To learn more:
• Paper: Greedy Function Approximation: A Gradient Boosting Machine, Jerome H. Friedman.
• Video explanations: Gradient Boost part 1, part 2, part 3, part 4.
• Decision Trees video explanation: Decision Trees.
• AdaBoost video explanation: AdaBoost.
NGBoost: Natural Gradient Boosting
• A method for probabilistic prediction with competitive state-of-the-art performance on a variety
of datasets.
• Combines a multiparameter boosting algorithm with the natural gradient to efficiently how the
parameters of the presumed outcome distribution vary with the observed features.
• In a standard prediction setting:
• the object of interest is the estimate of a scalar function Ε(𝑦|𝑥) where 𝑥 is the vector of covariates
(observed features) and 𝑦 is the prediction target.
• For NGBoost:
• The object of interest is a conditional probability distribution 𝑃𝜃(𝑦|𝑥).
• Assuming 𝑃𝜃 𝑦 𝑥 has a parametric form of 𝑝 parameters where 𝜃 𝜖 ℝ𝑝
(vector of p parameters).
NGBoost: Natural
Gradient Boosting
Components:
• Base learner (e.g. Regression Tree).
• Parametric probability distribution (Normal, Laplace, Poisson, etc.).
• Scoring Rule (MLE, CRPS, etc.).
NGBoost:
Natural
Gradient
Boosting
Steps:
1. Pick a scoring rule to grade our estimate of P(Y|X=x)
2. Assume that P(Y|X=x) has some parametric form
3. Fit the parameters θ(x) as a function of x using
gradient boosting
4. Use the natural gradient to correct the training
dynamics of this approach
Proper Scoring Rule
A proper scoring rule 𝑆(𝑃, 𝑦) must satisfy:
Ε𝑦~𝑄 𝑆(𝑄, 𝑦) ≤ Ε𝑦~𝑄 𝑆 𝑃, 𝑦 ∀ 𝑃, 𝑄
𝑄: 𝑡𝑟𝑢𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑦
𝑃: 𝑎𝑛𝑦 𝑜𝑡ℎ𝑒𝑟 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑒. 𝑔. 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑦
In other words, the scoring rule assigns a score to the forecast such that the true distribution 𝑄
of the outcomes gets the best score in expectation compared to other distributions, like 𝑃.
(Gneiting and Raftery, 2007. Strictly Proper Scoring Rules, Prediction, and Estimation.)
1. Pick a scoring rule to grade our estimate of P(Y|X=x)
Point Prediction Loss Function
Probabilistic
Prediction
Scoring Rule
Example scoring rule: negative log-likelihood
Notes:
• A scoring rule in probabilistic regression is analogous to loss function in standard regression.
• NLL when minimized gives the Maximum Likelihood Estimation (MLE).
• Taking the log simplifies the calculus.
• NLL (MLE) is the most common propre scoring rule.
• CRPS is another good alternative to MLE.
2. Assume P(Y|X=x) has some parametric form
μ = 1
σ = 1
μ = 2
σ = 0.5
μ = 2.5
σ = 0.75
μ = 3.5
σ = 1.5
Note:
here they are
assuming a normal
distribution, but
you can swap out
with any other
distribution
(Poisson, Bernoulli,
etc.) that fits your
application.
3. Fit the parameters θ(x) as a function of x using gradient boosting
μ = 1
σ = 1
μ = 2
σ = 0.5
μ = 2.5
σ = 0.75
μ = 3.5
σ = 1.5
This approach performs poorly in practice.
What we get:
What we want:
The algorithm is failing to adjust the mean which is affecting prediction.
What could be the solution?
Use natural gradients instead of ordinary gradients.
What we typically do: gradient descent in the parameter space
• Pick a small region around your value of 𝜃
• Which direction, to step into in the ball, decreases the score. (aka gradient)
What we want to do: Gradient descent in the space of distributions
Every point in this space represents
some distribution.
Parametrizing the space of distributions
is just a “name” for P
Each distribution has such
a name (i.e. is “identified”
by its parameters)
The problem is:
Gradient descent in the parameter space is not gradient descent in the distribution space because
distances don’t correspond.
That’s because distances are not the same in both spaces.
Spaces have
different
shape and
density
4. Use the natural gradient to correct the training dynamics of this
approach.
this is the natural gradient
Idea: do gradient descent in the distribution by
searching parameters in the transformed region
• is the Riemannian
metric of the space of
distributions
• It depends on the
parametric form chosen
and the score function
• If the score is NLL, this is
the Fisher Information
Here’s the trick:
• Multiplying the ordinary gradient with Riemannian metric which will implicitly transform optimal direction
in parameter space to optimal direction in the distributional space.
• We can conveniently compute the natural gradient by applying a transformation to the gradient
Proper scoring rules
and corresponding
gradients for fitting a
Normal distributions
~𝑁(0,1)
NGBoost
Explanation:
1. Estimate a common 𝜃(0)
such that it minimizes 𝑆.
2. For each iteration 𝑚:
• Compute the natural gradient 𝑔𝑖
(𝑚)
of 𝑆 with
respect to the predicted parameters of that
example up to that stage, 𝜃𝑖
(𝑚−1)
.
• Fit learners, one per parameter, on natural
gradients. E.g. 𝑓(𝑚)
= (𝑓𝜇
𝑚
, 𝑓log 𝜎
𝑚
)
• Compute a scaling factor 𝜌(𝑚)
(scalar) such that
it minimizes true scoring rule along the
projected gradient in the form of line search. In
practice, they found setting 𝜌 = 1 and then
halving successively works well.
• Update predicted parameters.
Notes:
• learning rate 𝜂 is typically 0.1 or 0.01. According to
Friedman assumption.
• Sub-sampling mini-batches can improve computation
performance for large datasets.
Experiments
• UCI ML Repository benchmarks.
• Probabilistic Regression:
• Configuration:
• Data split: 70% training, 20% validation, and 10% testing.
• Repeated 20 times.
• Ablation:
• 2nd-Order boosting: use 2nd order gradients instead of natural gradients.
• Multiparameter boosting: using ordinary gradients instead of natural
gradients.
• Homoscedastic boosting: assuming constant variance to see the benefits of
the allowing parameters other than the conditional mean to vary across 𝑥.
• Why? To demonstrate that multiparameter boosting and the natural
gradient work together to improve performance.
• Point estimation.
Results
The result is equal or better performance than state-of-the art probabilistic prediction methods
Results
Ablation
Results
NGBoost is competitive for point prediction too
Usage
Computational
Complexity
Difference between NGBoost and other boosting algorithms:
• NGBoost is a series of learners that must be fit for each
parameter, whereas standard boosting fits only one series of
learners.
• Natural Gradient 𝑝𝑥𝑝 𝐼𝑠
−1
matrix is computed at each step.
Note that 𝑝 is the number of parameters.
In practice:
• The matrix is small for most used distributions. Only 2x2 if using
Normal distribution.
• If dataset is huge, it may still be expensive to compute large
number of matrices for each iteration.
Future work
• Apply NGBoost to classification (e.g.
survival)
• Joint prediction: 𝑃𝜃(𝑧, 𝑦|𝑥)
• Technical innovations:
• Better tree-based base learners and
regularization are likely to improve
performance especially in terms of large
datasets.
References
• NGBoost: Natural Gradient Boosting for
Probabilistic Prediction
• NGBoost: Stanford ML Group
Thank you

More Related Content

What's hot

Pythonでカスタム状態空間モデル
Pythonでカスタム状態空間モデルPythonでカスタム状態空間モデル
Pythonでカスタム状態空間モデル
Hamage9
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門
tmtm otm
 
[DL輪読会]HoloGAN: Unsupervised learning of 3D representations from natural images
[DL輪読会]HoloGAN: Unsupervised learning of 3D representations from natural images[DL輪読会]HoloGAN: Unsupervised learning of 3D representations from natural images
[DL輪読会]HoloGAN: Unsupervised learning of 3D representations from natural images
Deep Learning JP
 
SSII2022 [OS3-02] Federated Learningの基礎と応用
SSII2022 [OS3-02] Federated Learningの基礎と応用SSII2022 [OS3-02] Federated Learningの基礎と応用
SSII2022 [OS3-02] Federated Learningの基礎と応用
SSII
 
[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models
Deep Learning JP
 
コンピュータビジョン分野メジャー国際会議 Award までの道のり
コンピュータビジョン分野メジャー国際会議 Award までの道のりコンピュータビジョン分野メジャー国際会議 Award までの道のり
コンピュータビジョン分野メジャー国際会議 Award までの道のり
cvpaper. challenge
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
 
最適輸送の計算アルゴリズムの研究動向
最適輸送の計算アルゴリズムの研究動向最適輸送の計算アルゴリズムの研究動向
最適輸送の計算アルゴリズムの研究動向
ohken
 
Bayesian network
Bayesian networkBayesian network
Bayesian network
Ahmad El Tawil
 
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
Deep Learning JP
 
ベルヌーイ分布からベータ分布までを関係づける
ベルヌーイ分布からベータ分布までを関係づけるベルヌーイ分布からベータ分布までを関係づける
ベルヌーイ分布からベータ分布までを関係づける
itoyan110
 
充足可能性問題のいろいろ
充足可能性問題のいろいろ充足可能性問題のいろいろ
充足可能性問題のいろいろ
Hiroshi Yamashita
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
Seiya Tokui
 
how-calculate-cluster-coefficience
how-calculate-cluster-coefficiencehow-calculate-cluster-coefficience
how-calculate-cluster-coefficience
Norihiro Shimoda
 
SSII2019企画: 点群深層学習の研究動向
SSII2019企画: 点群深層学習の研究動向SSII2019企画: 点群深層学習の研究動向
SSII2019企画: 点群深層学習の研究動向
SSII
 
はじめての方向け GANチュートリアル
はじめての方向け GANチュートリアルはじめての方向け GANチュートリアル
はじめての方向け GANチュートリアル
yohei okawa
 
EMアルゴリズム
EMアルゴリズムEMアルゴリズム
SMOTE resampling method slides 02-19-2018
SMOTE resampling method slides 02-19-2018SMOTE resampling method slides 02-19-2018
SMOTE resampling method slides 02-19-2018
Shuma Ishigami
 
[DL輪読会](Sequential) Variational Autoencoders for Collaborative Filtering
[DL輪読会](Sequential) Variational Autoencoders for Collaborative Filtering[DL輪読会](Sequential) Variational Autoencoders for Collaborative Filtering
[DL輪読会](Sequential) Variational Autoencoders for Collaborative Filtering
Deep Learning JP
 
分析のビジネス展開を考える―状態空間モデルを例に @TokyoWebMining #47
分析のビジネス展開を考える―状態空間モデルを例に @TokyoWebMining #47分析のビジネス展開を考える―状態空間モデルを例に @TokyoWebMining #47
分析のビジネス展開を考える―状態空間モデルを例に @TokyoWebMining #47
horihorio
 

What's hot (20)

Pythonでカスタム状態空間モデル
Pythonでカスタム状態空間モデルPythonでカスタム状態空間モデル
Pythonでカスタム状態空間モデル
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門
 
[DL輪読会]HoloGAN: Unsupervised learning of 3D representations from natural images
[DL輪読会]HoloGAN: Unsupervised learning of 3D representations from natural images[DL輪読会]HoloGAN: Unsupervised learning of 3D representations from natural images
[DL輪読会]HoloGAN: Unsupervised learning of 3D representations from natural images
 
SSII2022 [OS3-02] Federated Learningの基礎と応用
SSII2022 [OS3-02] Federated Learningの基礎と応用SSII2022 [OS3-02] Federated Learningの基礎と応用
SSII2022 [OS3-02] Federated Learningの基礎と応用
 
[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models
 
コンピュータビジョン分野メジャー国際会議 Award までの道のり
コンピュータビジョン分野メジャー国際会議 Award までの道のりコンピュータビジョン分野メジャー国際会議 Award までの道のり
コンピュータビジョン分野メジャー国際会議 Award までの道のり
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
 
最適輸送の計算アルゴリズムの研究動向
最適輸送の計算アルゴリズムの研究動向最適輸送の計算アルゴリズムの研究動向
最適輸送の計算アルゴリズムの研究動向
 
Bayesian network
Bayesian networkBayesian network
Bayesian network
 
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
 
ベルヌーイ分布からベータ分布までを関係づける
ベルヌーイ分布からベータ分布までを関係づけるベルヌーイ分布からベータ分布までを関係づける
ベルヌーイ分布からベータ分布までを関係づける
 
充足可能性問題のいろいろ
充足可能性問題のいろいろ充足可能性問題のいろいろ
充足可能性問題のいろいろ
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
 
how-calculate-cluster-coefficience
how-calculate-cluster-coefficiencehow-calculate-cluster-coefficience
how-calculate-cluster-coefficience
 
SSII2019企画: 点群深層学習の研究動向
SSII2019企画: 点群深層学習の研究動向SSII2019企画: 点群深層学習の研究動向
SSII2019企画: 点群深層学習の研究動向
 
はじめての方向け GANチュートリアル
はじめての方向け GANチュートリアルはじめての方向け GANチュートリアル
はじめての方向け GANチュートリアル
 
EMアルゴリズム
EMアルゴリズムEMアルゴリズム
EMアルゴリズム
 
SMOTE resampling method slides 02-19-2018
SMOTE resampling method slides 02-19-2018SMOTE resampling method slides 02-19-2018
SMOTE resampling method slides 02-19-2018
 
[DL輪読会](Sequential) Variational Autoencoders for Collaborative Filtering
[DL輪読会](Sequential) Variational Autoencoders for Collaborative Filtering[DL輪読会](Sequential) Variational Autoencoders for Collaborative Filtering
[DL輪読会](Sequential) Variational Autoencoders for Collaborative Filtering
 
分析のビジネス展開を考える―状態空間モデルを例に @TokyoWebMining #47
分析のビジネス展開を考える―状態空間モデルを例に @TokyoWebMining #47分析のビジネス展開を考える―状態空間モデルを例に @TokyoWebMining #47
分析のビジネス展開を考える―状態空間モデルを例に @TokyoWebMining #47
 

Similar to ngboost.pptx

Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
ananth
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn Algorithm
Supun Abeysinghe
 
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
San Kim
 
Regression ppt
Regression pptRegression ppt
Regression ppt
SuyashSingh70
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
Knoldus Inc.
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
ChenYiHuang5
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
Ramakrishna Reddy Bijjam
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
PrabhuSelvaraj15
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
Sanghyuk Chun
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
Joe li
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
Sourya Dey
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
Valerii Klymchuk
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Maninda Edirisooriya
 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.
Wuhyun Rico Shin
 
An overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdfAn overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdf
vudinhphuong96
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
JCMwave
 
Linear Regression Paper Review.pptx
Linear Regression Paper Review.pptxLinear Regression Paper Review.pptx
Linear Regression Paper Review.pptx
MurindanyiSudi1
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
Yogendra Singh
 

Similar to ngboost.pptx (20)

Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn Algorithm
 
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.
 
An overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdfAn overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdf
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
Linear Regression Paper Review.pptx
Linear Regression Paper Review.pptxLinear Regression Paper Review.pptx
Linear Regression Paper Review.pptx
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 

Recently uploaded

Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 

Recently uploaded (20)

Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 

ngboost.pptx

  • 2. Outlines • Introduction. • What is probabilistic regression? • Why is it useful? • How does other methods compare to NGBoost? • Gradient Boosting Algorithm. • NGBoost: • Main components. • Steps. • Usage. • Experiments & Results. • Computational Complexity. • Future Work. • References.
  • 3. Introduction What is probabilistic regression? (Standard Regression) Note: This use of conditional probability distributions is already the norm in classification
  • 4. Why is probabilistic regression (prediction) useful? The measure of uncertainty makes probabilistic prediction crucial in applications like healthcare and weather forecasting.
  • 5. Why is probabilistic regression (prediction) useful? All in all, probabilistic regression (prediction) provides better insight over standard (scalar) regression. E[Y|X=x] X=x P(Y|X=x)
  • 6. Problems with existing methods Methods: • Post-hoc variance. • Generalized Additive Models for Shape Scale and Location (GAMLSS) • Bayesian methods like MMC. • Bayesian deep learning. Problems: • Inflexible. • Slow. • Require expert knowledge. • Make strong assumption about nature of data (Homoscedasticity*) Limitations of deep learning methods: difficult to use out-of-the-box • Require expert knowledge. • Usually perform only on par with traditional methods on limited size or tabular data. • Require extensive hyperparameter tuning. * Homoscedasticity: means that all random variables in a sequence have the same finite variance.
  • 7. Gradient Boosting Machines (GBMs) • A set of highly modular methods that: • work out-of-the-box. • Perform well on structured data, even with small datasets. • Demonstrated empirical success on Kaggle and other data science competitions. Source: what algorithms are most successful on Kaggle?
  • 8. Problems related to GBMs • Assume Homoscedasticity: constant variance. • Predicted distributions should have at least two degrees of freedom (two parameters) to effectively convey both the magnitude and the uncertainty of the predictions. What is the solution then? (Spoiler Alert) it is NGBoost  NGBoost sovles the problem of simultaneous boosting of multiple parameters from the base learners using: • A multiparameter boosting approach. • Use of natural gradients.
  • 9. Gradient Boosting Algorithm • An ensemble of simple models are involved in making a prediction. • Results in a prediction model in the form of ensemble weak models. • Intuition: the best possible next model, when combined with previous models, minimizes the overall prediction error. • Components: • A loss function to be optimized. • E.g., MSE or Logarithmic Loss. • A weak learner to make predictions. • Most common choice is Decision Trees or Regression Trees. • Common to constrain the learner such as specifying maximum number of layers, nodes, splits or leaf nodes. • An additive model to add weak learners to minimize the loss function. • A gradient descent procedure is used to minimize the loss when adding trees.
  • 11. Gradient Boosting Algorithm Explanation: Step 1: Initialize prediction to a constant whose value minimizes the loss. You could solve using Gradient Descent or manually if problem is trivial. Step 2: build the trees (weak learners) (A) Compute residuals between prediction and observed data. Use prediction of previous step 𝐹 𝑥 = 𝐹𝑚−1(𝑥), which is 𝐹0(𝑥) for 𝑚 = 1. (B) Optimize tree on the residuals (make residuals the target output). 𝑗 here loops over leaf nodes. (C) Determine output for each leaf in tree. E.g., if leaf has 14.7 and 2.7, then output is the value of 𝛾 that minimizes the summation. Different than Step 1, here we are taking previous prediction 𝐹𝑚−1(𝑥𝑖) into account. (D) Make a new prediction for each sample. The summation accounts for the case that a single sample ends up in multiple leaves. So, you take a scaled sum of the outputs 𝛾 for each leaf. Choosing a small learning rate 𝜐 improves prediction Step 3: Final prediction is the prediction of the last tree. To learn more: • Paper: Greedy Function Approximation: A Gradient Boosting Machine, Jerome H. Friedman. • Video explanations: Gradient Boost part 1, part 2, part 3, part 4. • Decision Trees video explanation: Decision Trees. • AdaBoost video explanation: AdaBoost.
  • 12. NGBoost: Natural Gradient Boosting • A method for probabilistic prediction with competitive state-of-the-art performance on a variety of datasets. • Combines a multiparameter boosting algorithm with the natural gradient to efficiently how the parameters of the presumed outcome distribution vary with the observed features. • In a standard prediction setting: • the object of interest is the estimate of a scalar function Ε(𝑦|𝑥) where 𝑥 is the vector of covariates (observed features) and 𝑦 is the prediction target. • For NGBoost: • The object of interest is a conditional probability distribution 𝑃𝜃(𝑦|𝑥). • Assuming 𝑃𝜃 𝑦 𝑥 has a parametric form of 𝑝 parameters where 𝜃 𝜖 ℝ𝑝 (vector of p parameters).
  • 13. NGBoost: Natural Gradient Boosting Components: • Base learner (e.g. Regression Tree). • Parametric probability distribution (Normal, Laplace, Poisson, etc.). • Scoring Rule (MLE, CRPS, etc.).
  • 14. NGBoost: Natural Gradient Boosting Steps: 1. Pick a scoring rule to grade our estimate of P(Y|X=x) 2. Assume that P(Y|X=x) has some parametric form 3. Fit the parameters θ(x) as a function of x using gradient boosting 4. Use the natural gradient to correct the training dynamics of this approach
  • 15. Proper Scoring Rule A proper scoring rule 𝑆(𝑃, 𝑦) must satisfy: Ε𝑦~𝑄 𝑆(𝑄, 𝑦) ≤ Ε𝑦~𝑄 𝑆 𝑃, 𝑦 ∀ 𝑃, 𝑄 𝑄: 𝑡𝑟𝑢𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑦 𝑃: 𝑎𝑛𝑦 𝑜𝑡ℎ𝑒𝑟 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑒. 𝑔. 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑦 In other words, the scoring rule assigns a score to the forecast such that the true distribution 𝑄 of the outcomes gets the best score in expectation compared to other distributions, like 𝑃. (Gneiting and Raftery, 2007. Strictly Proper Scoring Rules, Prediction, and Estimation.)
  • 16. 1. Pick a scoring rule to grade our estimate of P(Y|X=x) Point Prediction Loss Function Probabilistic Prediction Scoring Rule Example scoring rule: negative log-likelihood Notes: • A scoring rule in probabilistic regression is analogous to loss function in standard regression. • NLL when minimized gives the Maximum Likelihood Estimation (MLE). • Taking the log simplifies the calculus. • NLL (MLE) is the most common propre scoring rule. • CRPS is another good alternative to MLE.
  • 17. 2. Assume P(Y|X=x) has some parametric form μ = 1 σ = 1 μ = 2 σ = 0.5 μ = 2.5 σ = 0.75 μ = 3.5 σ = 1.5 Note: here they are assuming a normal distribution, but you can swap out with any other distribution (Poisson, Bernoulli, etc.) that fits your application.
  • 18. 3. Fit the parameters θ(x) as a function of x using gradient boosting μ = 1 σ = 1 μ = 2 σ = 0.5 μ = 2.5 σ = 0.75 μ = 3.5 σ = 1.5
  • 19. This approach performs poorly in practice. What we get: What we want: The algorithm is failing to adjust the mean which is affecting prediction. What could be the solution? Use natural gradients instead of ordinary gradients.
  • 20. What we typically do: gradient descent in the parameter space • Pick a small region around your value of 𝜃 • Which direction, to step into in the ball, decreases the score. (aka gradient)
  • 21. What we want to do: Gradient descent in the space of distributions Every point in this space represents some distribution.
  • 22. Parametrizing the space of distributions is just a “name” for P Each distribution has such a name (i.e. is “identified” by its parameters)
  • 23. The problem is: Gradient descent in the parameter space is not gradient descent in the distribution space because distances don’t correspond. That’s because distances are not the same in both spaces. Spaces have different shape and density
  • 24. 4. Use the natural gradient to correct the training dynamics of this approach. this is the natural gradient Idea: do gradient descent in the distribution by searching parameters in the transformed region
  • 25. • is the Riemannian metric of the space of distributions • It depends on the parametric form chosen and the score function • If the score is NLL, this is the Fisher Information Here’s the trick: • Multiplying the ordinary gradient with Riemannian metric which will implicitly transform optimal direction in parameter space to optimal direction in the distributional space. • We can conveniently compute the natural gradient by applying a transformation to the gradient
  • 26. Proper scoring rules and corresponding gradients for fitting a Normal distributions ~𝑁(0,1)
  • 27. NGBoost Explanation: 1. Estimate a common 𝜃(0) such that it minimizes 𝑆. 2. For each iteration 𝑚: • Compute the natural gradient 𝑔𝑖 (𝑚) of 𝑆 with respect to the predicted parameters of that example up to that stage, 𝜃𝑖 (𝑚−1) . • Fit learners, one per parameter, on natural gradients. E.g. 𝑓(𝑚) = (𝑓𝜇 𝑚 , 𝑓log 𝜎 𝑚 ) • Compute a scaling factor 𝜌(𝑚) (scalar) such that it minimizes true scoring rule along the projected gradient in the form of line search. In practice, they found setting 𝜌 = 1 and then halving successively works well. • Update predicted parameters. Notes: • learning rate 𝜂 is typically 0.1 or 0.01. According to Friedman assumption. • Sub-sampling mini-batches can improve computation performance for large datasets.
  • 28. Experiments • UCI ML Repository benchmarks. • Probabilistic Regression: • Configuration: • Data split: 70% training, 20% validation, and 10% testing. • Repeated 20 times. • Ablation: • 2nd-Order boosting: use 2nd order gradients instead of natural gradients. • Multiparameter boosting: using ordinary gradients instead of natural gradients. • Homoscedastic boosting: assuming constant variance to see the benefits of the allowing parameters other than the conditional mean to vary across 𝑥. • Why? To demonstrate that multiparameter boosting and the natural gradient work together to improve performance. • Point estimation.
  • 29. Results The result is equal or better performance than state-of-the art probabilistic prediction methods
  • 31. Results NGBoost is competitive for point prediction too
  • 32. Usage
  • 33. Computational Complexity Difference between NGBoost and other boosting algorithms: • NGBoost is a series of learners that must be fit for each parameter, whereas standard boosting fits only one series of learners. • Natural Gradient 𝑝𝑥𝑝 𝐼𝑠 −1 matrix is computed at each step. Note that 𝑝 is the number of parameters. In practice: • The matrix is small for most used distributions. Only 2x2 if using Normal distribution. • If dataset is huge, it may still be expensive to compute large number of matrices for each iteration.
  • 34. Future work • Apply NGBoost to classification (e.g. survival) • Joint prediction: 𝑃𝜃(𝑧, 𝑦|𝑥) • Technical innovations: • Better tree-based base learners and regularization are likely to improve performance especially in terms of large datasets.
  • 35. References • NGBoost: Natural Gradient Boosting for Probabilistic Prediction • NGBoost: Stanford ML Group