SlideShare a Scribd company logo
Gaussian Process in Machine Learning
Subject: Machine Learning
Dr. Varun Kumar
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 1 / 16
Outlines
1 Introduction to Gaussian Distributed Random Variable
2 Central Limit Theorem
3 MLE Vs MAP
4 Gaussian Process for Linear Regression
5 References
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 2 / 16
Introduction to Gaussian Distributed Random Variable (rv)
Gaussian distribution
1 The general expression for the PDF of a uni-variate Gaussian
distributed random variable is
fX (x) =
1
√
2πσ
e−
(x−µ)2
2σ2
where, σ → Standard deviation, µ → Mean, σ2 → Variance
2 The general expression for the PDF of a multi-variate Gaussian
distributed random variable is
P(X, µx , Σ) =
1
(2π)d/2
det|Σ|
e−1
2
(X−µx )T Σ−1(X−µx )
X → d-dimensional input random vector, i.e X = [x1, x2, ....., xd ]T
µx → d-dimensional mean vector, i.e µx = [µx1
, µx2
, ....., µxd
]T
Σ → Co-variance matrix of size d × d
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 3 / 16
Properties of Gaussian distributed random variable
1. Addition of two Gaussian distributed rv is also a Gaussian. Let
X1 ∼ N(µX1
, ΣX1X1
) and X1 ∼ N(µX2
, ΣX2X2
) are two Gaussian distributed
rv.
Z = X1 + X2 ∼ N(µX1
+ µX2
, ΣX1X1
ΣX2X2
)
2. Normalization is also a Gaussian.
Z =
Z
y
p(y, µ, Σ)dy = 1 → Gaussian distribution
3. Marginalization is also a Gaussian distribution.
p(X1) =
Z ∞
0
p(X1, X2, µ, Σ)dX2 → Gaussian distribution
4. Conditioning: The conditional distribution of X1 on X2
p(X1/X2) =
p(X1, X2, µ, Σ)
R
X1
p(X1, X2, µ, Σ)dX1
→ Gaussian distribution
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 4 / 16
Central limit theorem
⇒ Let {X1, . . . , Xn} be a random sample of size n.
⇒ All random sample are independent and identically distributed (i.i.d.).
⇒ The sample average
X̄n =
X1 + X2 + .... + Xn
n
, n → ∞ ⇒ Gaussian distribution
⇒ By the law of large numbers, the sample averages converge almost
surely to the expected value µ and variance σ2.
⇒ Let Z be the expectation, where Z =
√
nX̄n−µ
σ
lim n→∞
⇒ Resultant PDF
f =
1
√
2πσ
e−
(X̄n−µ)
2σ2 =
1
√
2π
e−Z2
2
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 5 / 16
Continued–
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 6 / 16
MLE vs MAP
Maximum likelihood estimator (MLE)
Let y = ax + n, where n ∼ N(0, σ2)
x̂MLE (y) = arg
max x
fY (y/x) =
1
√
2πσ
e−
(y−ax)2
2σ2
Measure y = ȳ = ax̂MLE
Note: There is no requirement of the distribution of x.
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 7 / 16
Maximum aposteriori probability (MAP)
1 Maximum apriori
xapriori = arg
max x
fX (x)
2 Maximum aposteriori probability (MAP)
x̂MAP = arg
max x
fX (x/y) =
fY (y/x)fX (x)
fY (y)
=
fY (y/x)fX (x)
R
X fY (y/x)fX (x)dx
⇒ If xapriori is uniformly distributed then
x̂MLE = x̂MAP
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 8 / 16
Linear regression
Let we have a data, D = {(x1, y1), ....., (xn, yn)}
⇒ MLE: p(D/W ) =
Qn
i=1 p(yi /xi ; w) ∀ p(yi /xi ; w) ∼ N(W T X, σ2I)
⇒ MAP: p(W /D) ∝ p(D/W )p(W )
p(D) = p(D/W )p(W )
R
W p(D/W )p(W )dw
⇒
p(y/x; D) =
Z
w
p(y/x; w)p(w/D)dw
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 9 / 16
Continued–
In general, the posterior predictive distribution is
P(Y |D, X) =
Z
w
P(Y , w|D, X)dw =
Z
w
P(Y |w, D, X)P(w|D)dw
The above is often intractable in closed form.
The mean and covariance of the given expression can be written as
P(y|D, x) ∼ N(µy|D, Σy|D)
where
µy|D = KT
∗ (K + σ2
I)−1
y
and
Σy∗|D = KKT
∗ (K + σ2
I)−1
K∗
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 10 / 16
Gaussian process
⇒ Problem:
f is an infinite dimensional function. But, the multivariate Gaussian
distributions is for finite dimensional random vectors.
⇒ Definition: A GP is a collection of random variables (RV) such that
the joint distribution of every finite subset of RVs is multivariate
Gaussian:
f ∼ GP(µ, k)
where µ(x) and k(x, x0) are the mean and covariance function.
⇒ Need to model the predictive distribution P(f∗|x, D).
⇒ We can use a Bayesian approach by using a GP prior:
P(f |x) ∼ N(µ, Σ) and condition it on the training data D to model
the joint distribution of f = f (X) (vector of training observations)
and f∗ = f (x∗) (prediction at test input).
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 11 / 16
Gaussian Process Regression GPR
We observe the training labels that are drawn from the zero-mean prior Gaussian :
y = [y1, y2, ...., yn, yt]T
∼ N(0, Σ)
⇒ All training and test labels are drawn from an (n+m)-dimension Gaussian
distribution.
⇒ n is the number of training points.
⇒ m is the number of testing points.
We consider the following properties of Σ :
1 Σij = E((Yi − µi )(Yj − µj ))
2 Σ is always positive semi-definite.
3 Σii = Var(Yi ), thus Σii ≥ 0
4 If Yi and Yj are very independent, i.e. xi is very different from xj , then
Σii = Σij = 0. If xi is similar to xj , then Σij = Σji > 0
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 12 / 16
Continued–
We can observe that this is very similar from the kernel matrix in SVMs.
Therefore, we can simply let Σij = K(xi , xj ). For example,
(a) If we use RBF kernel
Σij = τe−
kxi −xj k2
2σ2
(b) If we use polynomial kernel, then Σij = τ(1 + xT
i xj )d .
We can decompose Σ as
Σ =

K, K∗
KT
∗ , K∗∗

where
K is the training kernel matrix.
K∗ is the training-testing kernel matrix.
KT
∗ is the testing-training kernel matrix
K∗∗ is the testing kernel matrix
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 13 / 16
Continued–
The conditional distribution of (noise-free) values of the latent function f
can be written as:
f∗|(Y1 = y1, ..., Yn = yn, x1, ..., xn, xt) ∼ N(KT
∗ K−1
y, K∗∗ − KT
∗ K−1
K∗)
,
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 14 / 16
Conclusion
Gaussian Process Regression has the following properties:
1 GPs are an elegant and powerful ML method.
2 We get a measure of uncertainty for the predictions for free.
3 GPs work very well for regression problems with small training data
set sizes.
4 Running time O(n3) ← matrix inversion (gets slow when n  0 ) ⇒
use sparse GPs for large n.
5 GPs are a little bit more involved for classification (non-Gaussian
likelihood).
6 We can model non-Gaussian likelihoods in regression and do
approximate inference for e.g., count data (Poisson distribution)
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 15 / 16
References
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
K. Weinberger,
https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote15.html,
May 2018.
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 16 / 16

More Related Content

What's hot

Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
Scale Invariant feature transform
Scale Invariant feature transformScale Invariant feature transform
Scale Invariant feature transform
Shanker Naik
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
Adri Jovin
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboost
Shuai Zhang
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
Sangwoo Mo
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
Prithvi Paneru
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
Yan Xu
 
K means clustering
K means clusteringK means clustering
K means clustering
keshav goyal
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
Kazuki Yoshida
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
zekeLabs Technologies
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
Kirkwood Donavin
 
Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine Learning
Paxcel Technologies
 
Markov Random Field (MRF)
Markov Random Field (MRF)Markov Random Field (MRF)
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
Region Splitting and Merging Technique For Image segmentation.
Region Splitting and Merging Technique For Image segmentation.Region Splitting and Merging Technique For Image segmentation.
Region Splitting and Merging Technique For Image segmentation.
SomitSamanto1
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
Mark Chang
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
Mikko Mäkipää
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Prof. Neeta Awasthy
 

What's hot (20)

Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Scale Invariant feature transform
Scale Invariant feature transformScale Invariant feature transform
Scale Invariant feature transform
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboost
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
K means clustering
K means clusteringK means clustering
K means clustering
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
 
Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine Learning
 
Markov Random Field (MRF)
Markov Random Field (MRF)Markov Random Field (MRF)
Markov Random Field (MRF)
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Region Splitting and Merging Technique For Image segmentation.
Region Splitting and Merging Technique For Image segmentation.Region Splitting and Merging Technique For Image segmentation.
Region Splitting and Merging Technique For Image segmentation.
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 

Similar to Gaussian process in machine learning

Concentration inequality in Machine Learning
Concentration inequality in Machine LearningConcentration inequality in Machine Learning
Concentration inequality in Machine Learning
VARUN KUMAR
 
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
The Statistical and Applied Mathematical Sciences Institute
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
Frank Nielsen
 
Discussion about random variable ad its characterization
Discussion about random variable ad its characterizationDiscussion about random variable ad its characterization
Discussion about random variable ad its characterization
Geeta Arora
 
The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...
Joe Suzuki
 
Newton's Divide and Difference Interpolation
Newton's Divide and Difference InterpolationNewton's Divide and Difference Interpolation
Newton's Divide and Difference Interpolation
VARUN KUMAR
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
Pedro222284
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
Application of Chebyshev and Markov Inequality in Machine Learning
Application of Chebyshev and Markov Inequality in Machine LearningApplication of Chebyshev and Markov Inequality in Machine Learning
Application of Chebyshev and Markov Inequality in Machine Learning
VARUN KUMAR
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
Edgar Marca
 
Basic terminology description in convex optimization
Basic terminology description in convex optimizationBasic terminology description in convex optimization
Basic terminology description in convex optimization
VARUN KUMAR
 
A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
A G
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Alexander Litvinenko
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
Frank Nielsen
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
Christian Robert
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
VARUN KUMAR
 
Hyers ulam rassias stability of exponential primitive mapping
Hyers  ulam rassias stability of exponential primitive mappingHyers  ulam rassias stability of exponential primitive mapping
Hyers ulam rassias stability of exponential primitive mapping
Alexander Decker
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
The Statistical and Applied Mathematical Sciences Institute
 
Normal density and discreminant analysis
Normal density and discreminant analysisNormal density and discreminant analysis
Normal density and discreminant analysis
VARUN KUMAR
 
(α ψ)- Construction with q- function for coupled fixed point
(α   ψ)-  Construction with q- function for coupled fixed point(α   ψ)-  Construction with q- function for coupled fixed point
(α ψ)- Construction with q- function for coupled fixed point
Alexander Decker
 

Similar to Gaussian process in machine learning (20)

Concentration inequality in Machine Learning
Concentration inequality in Machine LearningConcentration inequality in Machine Learning
Concentration inequality in Machine Learning
 
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Discussion about random variable ad its characterization
Discussion about random variable ad its characterizationDiscussion about random variable ad its characterization
Discussion about random variable ad its characterization
 
The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...
 
Newton's Divide and Difference Interpolation
Newton's Divide and Difference InterpolationNewton's Divide and Difference Interpolation
Newton's Divide and Difference Interpolation
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Application of Chebyshev and Markov Inequality in Machine Learning
Application of Chebyshev and Markov Inequality in Machine LearningApplication of Chebyshev and Markov Inequality in Machine Learning
Application of Chebyshev and Markov Inequality in Machine Learning
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Basic terminology description in convex optimization
Basic terminology description in convex optimizationBasic terminology description in convex optimization
Basic terminology description in convex optimization
 
A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Hyers ulam rassias stability of exponential primitive mapping
Hyers  ulam rassias stability of exponential primitive mappingHyers  ulam rassias stability of exponential primitive mapping
Hyers ulam rassias stability of exponential primitive mapping
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
 
Normal density and discreminant analysis
Normal density and discreminant analysisNormal density and discreminant analysis
Normal density and discreminant analysis
 
(α ψ)- Construction with q- function for coupled fixed point
(α   ψ)-  Construction with q- function for coupled fixed point(α   ψ)-  Construction with q- function for coupled fixed point
(α ψ)- Construction with q- function for coupled fixed point
 

More from VARUN KUMAR

Distributed rc Model
Distributed rc ModelDistributed rc Model
Distributed rc Model
VARUN KUMAR
 
Electrical Wire Model
Electrical Wire ModelElectrical Wire Model
Electrical Wire Model
VARUN KUMAR
 
Interconnect Parameter in Digital VLSI Design
Interconnect Parameter in Digital VLSI DesignInterconnect Parameter in Digital VLSI Design
Interconnect Parameter in Digital VLSI Design
VARUN KUMAR
 
Introduction to Digital VLSI Design
Introduction to Digital VLSI DesignIntroduction to Digital VLSI Design
Introduction to Digital VLSI Design
VARUN KUMAR
 
Challenges of Massive MIMO System
Challenges of Massive MIMO SystemChallenges of Massive MIMO System
Challenges of Massive MIMO System
VARUN KUMAR
 
E-democracy or Digital Democracy
E-democracy or Digital DemocracyE-democracy or Digital Democracy
E-democracy or Digital Democracy
VARUN KUMAR
 
Ethics of Parasitic Computing
Ethics of Parasitic ComputingEthics of Parasitic Computing
Ethics of Parasitic Computing
VARUN KUMAR
 
Action Lines of Geneva Plan of Action
Action Lines of Geneva Plan of ActionAction Lines of Geneva Plan of Action
Action Lines of Geneva Plan of Action
VARUN KUMAR
 
Geneva Plan of Action
Geneva Plan of ActionGeneva Plan of Action
Geneva Plan of Action
VARUN KUMAR
 
Fair Use in the Electronic Age
Fair Use in the Electronic AgeFair Use in the Electronic Age
Fair Use in the Electronic Age
VARUN KUMAR
 
Software as a Property
Software as a PropertySoftware as a Property
Software as a Property
VARUN KUMAR
 
Orthogonal Polynomial
Orthogonal PolynomialOrthogonal Polynomial
Orthogonal Polynomial
VARUN KUMAR
 
Patent Protection
Patent ProtectionPatent Protection
Patent Protection
VARUN KUMAR
 
Copyright Vs Patent and Trade Secrecy Law
Copyright Vs Patent and Trade Secrecy LawCopyright Vs Patent and Trade Secrecy Law
Copyright Vs Patent and Trade Secrecy Law
VARUN KUMAR
 
Property Right and Software
Property Right and SoftwareProperty Right and Software
Property Right and Software
VARUN KUMAR
 
Investigating Data Trials
Investigating Data TrialsInvestigating Data Trials
Investigating Data Trials
VARUN KUMAR
 
Gaussian Numerical Integration
Gaussian Numerical IntegrationGaussian Numerical Integration
Gaussian Numerical Integration
VARUN KUMAR
 
Censorship and Controversy
Censorship and ControversyCensorship and Controversy
Censorship and Controversy
VARUN KUMAR
 
Romberg's Integration
Romberg's IntegrationRomberg's Integration
Romberg's Integration
VARUN KUMAR
 
Introduction to Censorship
Introduction to Censorship Introduction to Censorship
Introduction to Censorship
VARUN KUMAR
 

More from VARUN KUMAR (20)

Distributed rc Model
Distributed rc ModelDistributed rc Model
Distributed rc Model
 
Electrical Wire Model
Electrical Wire ModelElectrical Wire Model
Electrical Wire Model
 
Interconnect Parameter in Digital VLSI Design
Interconnect Parameter in Digital VLSI DesignInterconnect Parameter in Digital VLSI Design
Interconnect Parameter in Digital VLSI Design
 
Introduction to Digital VLSI Design
Introduction to Digital VLSI DesignIntroduction to Digital VLSI Design
Introduction to Digital VLSI Design
 
Challenges of Massive MIMO System
Challenges of Massive MIMO SystemChallenges of Massive MIMO System
Challenges of Massive MIMO System
 
E-democracy or Digital Democracy
E-democracy or Digital DemocracyE-democracy or Digital Democracy
E-democracy or Digital Democracy
 
Ethics of Parasitic Computing
Ethics of Parasitic ComputingEthics of Parasitic Computing
Ethics of Parasitic Computing
 
Action Lines of Geneva Plan of Action
Action Lines of Geneva Plan of ActionAction Lines of Geneva Plan of Action
Action Lines of Geneva Plan of Action
 
Geneva Plan of Action
Geneva Plan of ActionGeneva Plan of Action
Geneva Plan of Action
 
Fair Use in the Electronic Age
Fair Use in the Electronic AgeFair Use in the Electronic Age
Fair Use in the Electronic Age
 
Software as a Property
Software as a PropertySoftware as a Property
Software as a Property
 
Orthogonal Polynomial
Orthogonal PolynomialOrthogonal Polynomial
Orthogonal Polynomial
 
Patent Protection
Patent ProtectionPatent Protection
Patent Protection
 
Copyright Vs Patent and Trade Secrecy Law
Copyright Vs Patent and Trade Secrecy LawCopyright Vs Patent and Trade Secrecy Law
Copyright Vs Patent and Trade Secrecy Law
 
Property Right and Software
Property Right and SoftwareProperty Right and Software
Property Right and Software
 
Investigating Data Trials
Investigating Data TrialsInvestigating Data Trials
Investigating Data Trials
 
Gaussian Numerical Integration
Gaussian Numerical IntegrationGaussian Numerical Integration
Gaussian Numerical Integration
 
Censorship and Controversy
Censorship and ControversyCensorship and Controversy
Censorship and Controversy
 
Romberg's Integration
Romberg's IntegrationRomberg's Integration
Romberg's Integration
 
Introduction to Censorship
Introduction to Censorship Introduction to Censorship
Introduction to Censorship
 

Recently uploaded

Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
DuvanRamosGarzon1
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 

Recently uploaded (20)

Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 

Gaussian process in machine learning

  • 1. Gaussian Process in Machine Learning Subject: Machine Learning Dr. Varun Kumar Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 1 / 16
  • 2. Outlines 1 Introduction to Gaussian Distributed Random Variable 2 Central Limit Theorem 3 MLE Vs MAP 4 Gaussian Process for Linear Regression 5 References Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 2 / 16
  • 3. Introduction to Gaussian Distributed Random Variable (rv) Gaussian distribution 1 The general expression for the PDF of a uni-variate Gaussian distributed random variable is fX (x) = 1 √ 2πσ e− (x−µ)2 2σ2 where, σ → Standard deviation, µ → Mean, σ2 → Variance 2 The general expression for the PDF of a multi-variate Gaussian distributed random variable is P(X, µx , Σ) = 1 (2π)d/2 det|Σ| e−1 2 (X−µx )T Σ−1(X−µx ) X → d-dimensional input random vector, i.e X = [x1, x2, ....., xd ]T µx → d-dimensional mean vector, i.e µx = [µx1 , µx2 , ....., µxd ]T Σ → Co-variance matrix of size d × d Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 3 / 16
  • 4. Properties of Gaussian distributed random variable 1. Addition of two Gaussian distributed rv is also a Gaussian. Let X1 ∼ N(µX1 , ΣX1X1 ) and X1 ∼ N(µX2 , ΣX2X2 ) are two Gaussian distributed rv. Z = X1 + X2 ∼ N(µX1 + µX2 , ΣX1X1 ΣX2X2 ) 2. Normalization is also a Gaussian. Z = Z y p(y, µ, Σ)dy = 1 → Gaussian distribution 3. Marginalization is also a Gaussian distribution. p(X1) = Z ∞ 0 p(X1, X2, µ, Σ)dX2 → Gaussian distribution 4. Conditioning: The conditional distribution of X1 on X2 p(X1/X2) = p(X1, X2, µ, Σ) R X1 p(X1, X2, µ, Σ)dX1 → Gaussian distribution Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 4 / 16
  • 5. Central limit theorem ⇒ Let {X1, . . . , Xn} be a random sample of size n. ⇒ All random sample are independent and identically distributed (i.i.d.). ⇒ The sample average X̄n = X1 + X2 + .... + Xn n , n → ∞ ⇒ Gaussian distribution ⇒ By the law of large numbers, the sample averages converge almost surely to the expected value µ and variance σ2. ⇒ Let Z be the expectation, where Z = √ nX̄n−µ σ lim n→∞ ⇒ Resultant PDF f = 1 √ 2πσ e− (X̄n−µ) 2σ2 = 1 √ 2π e−Z2 2 Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 5 / 16
  • 6. Continued– Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 6 / 16
  • 7. MLE vs MAP Maximum likelihood estimator (MLE) Let y = ax + n, where n ∼ N(0, σ2) x̂MLE (y) = arg max x fY (y/x) = 1 √ 2πσ e− (y−ax)2 2σ2 Measure y = ȳ = ax̂MLE Note: There is no requirement of the distribution of x. Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 7 / 16
  • 8. Maximum aposteriori probability (MAP) 1 Maximum apriori xapriori = arg max x fX (x) 2 Maximum aposteriori probability (MAP) x̂MAP = arg max x fX (x/y) = fY (y/x)fX (x) fY (y) = fY (y/x)fX (x) R X fY (y/x)fX (x)dx ⇒ If xapriori is uniformly distributed then x̂MLE = x̂MAP Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 8 / 16
  • 9. Linear regression Let we have a data, D = {(x1, y1), ....., (xn, yn)} ⇒ MLE: p(D/W ) = Qn i=1 p(yi /xi ; w) ∀ p(yi /xi ; w) ∼ N(W T X, σ2I) ⇒ MAP: p(W /D) ∝ p(D/W )p(W ) p(D) = p(D/W )p(W ) R W p(D/W )p(W )dw ⇒ p(y/x; D) = Z w p(y/x; w)p(w/D)dw Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 9 / 16
  • 10. Continued– In general, the posterior predictive distribution is P(Y |D, X) = Z w P(Y , w|D, X)dw = Z w P(Y |w, D, X)P(w|D)dw The above is often intractable in closed form. The mean and covariance of the given expression can be written as P(y|D, x) ∼ N(µy|D, Σy|D) where µy|D = KT ∗ (K + σ2 I)−1 y and Σy∗|D = KKT ∗ (K + σ2 I)−1 K∗ Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 10 / 16
  • 11. Gaussian process ⇒ Problem: f is an infinite dimensional function. But, the multivariate Gaussian distributions is for finite dimensional random vectors. ⇒ Definition: A GP is a collection of random variables (RV) such that the joint distribution of every finite subset of RVs is multivariate Gaussian: f ∼ GP(µ, k) where µ(x) and k(x, x0) are the mean and covariance function. ⇒ Need to model the predictive distribution P(f∗|x, D). ⇒ We can use a Bayesian approach by using a GP prior: P(f |x) ∼ N(µ, Σ) and condition it on the training data D to model the joint distribution of f = f (X) (vector of training observations) and f∗ = f (x∗) (prediction at test input). Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 11 / 16
  • 12. Gaussian Process Regression GPR We observe the training labels that are drawn from the zero-mean prior Gaussian : y = [y1, y2, ...., yn, yt]T ∼ N(0, Σ) ⇒ All training and test labels are drawn from an (n+m)-dimension Gaussian distribution. ⇒ n is the number of training points. ⇒ m is the number of testing points. We consider the following properties of Σ : 1 Σij = E((Yi − µi )(Yj − µj )) 2 Σ is always positive semi-definite. 3 Σii = Var(Yi ), thus Σii ≥ 0 4 If Yi and Yj are very independent, i.e. xi is very different from xj , then Σii = Σij = 0. If xi is similar to xj , then Σij = Σji > 0 Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 12 / 16
  • 13. Continued– We can observe that this is very similar from the kernel matrix in SVMs. Therefore, we can simply let Σij = K(xi , xj ). For example, (a) If we use RBF kernel Σij = τe− kxi −xj k2 2σ2 (b) If we use polynomial kernel, then Σij = τ(1 + xT i xj )d . We can decompose Σ as Σ = K, K∗ KT ∗ , K∗∗ where K is the training kernel matrix. K∗ is the training-testing kernel matrix. KT ∗ is the testing-training kernel matrix K∗∗ is the testing kernel matrix Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 13 / 16
  • 14. Continued– The conditional distribution of (noise-free) values of the latent function f can be written as: f∗|(Y1 = y1, ..., Yn = yn, x1, ..., xn, xt) ∼ N(KT ∗ K−1 y, K∗∗ − KT ∗ K−1 K∗) , Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 14 / 16
  • 15. Conclusion Gaussian Process Regression has the following properties: 1 GPs are an elegant and powerful ML method. 2 We get a measure of uncertainty for the predictions for free. 3 GPs work very well for regression problems with small training data set sizes. 4 Running time O(n3) ← matrix inversion (gets slow when n 0 ) ⇒ use sparse GPs for large n. 5 GPs are a little bit more involved for classification (non-Gaussian likelihood). 6 We can model non-Gaussian likelihoods in regression and do approximate inference for e.g., count data (Poisson distribution) Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 15 / 16
  • 16. References T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University, School of Computer Science, Machine Learning , 2006, vol. 9. E. Alpaydin, Introduction to machine learning. MIT press, 2020. K. Weinberger, https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote15.html, May 2018. Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Lecture 15 16 / 16