Gaussian process regression is a non-parametric Bayesian approach for supervised learning problems. It can be used to model an unknown function by specifying a prior directly over functions, such that the posterior incorporates the constraints from the training data. The kernel trick allows specifying an infinite-dimensional feature space without explicitly defining features. This allows selecting valid covariance functions that implicitly define features and incorporate prior knowledge about the solution.
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
Conceptual Introduction to Gaussian Processes
1. Gaussian Process Regression
An intuitive introduction
Juan Pablo Carbajal
Siedlungswasserwirtschaft
Eawag - aquatic research
Dübendorf, Switzerland
juanpablo.carbajal@eawag.ch
November 24, 2017
2. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The learning problem in a nutshell
0.2 0.4 0.6 0.8
-2
-1
0
1
2
t
y
Data given
(ti, yi) = (t, y), what model to use?
2
3. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The learning problem
to use a set of observations to uncover an underlying process.
For prediction (maybe for understanding).
3
Yaser Abu-Mostafa. Learning from data. https://work.caltech.edu/telecourse.html
4. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The learning problem
Input: x position in a map
Output: y height of the terrain
Target function: f : X → Y height map
Data: (x1, y1) , . . . , (xn, yn) field measurements
Hypothesis: g : X → Y formula to be used
3
Yaser Abu-Mostafa. Learning from data. https://work.caltech.edu/telecourse.html
5. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The learning problem
Learning Algorithm
Training Examples
Available text snippets
Unknown Target Function
How much legal-like is the text?
Hypothesis Set
Possible text clas-
sification functions
Final Hypothesis
Final classification function
3
Yaser Abu-Mostafa. Learning from data. https://work.caltech.edu/telecourse.html
6. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Data set
0.2 0.4 0.6 0.8
-2
-1
0
1
2
t
y
Data given
(ti, yi) = (t, y), what model to use?
4
7. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Naive regression
0.2 0.4 0.6 0.8
-2
-1
0
1
2
t
y
w0 + w1t + w2t2
+ w3t3
= y(t)
1 t t2
t3
w0
w1
w2
w3
= y(t)
φ(t)
w = y(t)
1 t1 t2
1 t3
1
1 t2 t2
2 t3
2
1 t3 t2
3 t3
3
w0
w1
w2
w3
=
y1
y2
y3
Φ(t)
w =
φ(t1)
φ(t2)
φ(t3)
w = y
5
8. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Pseudo-inverse
Φ
is an n×N (3×4) matrix with N ≥ n, then rank Φ
≤ n
With a feature vector φ complex enough we have that
rank Φ
= n, i.e. the n row vectors of the matrix are linearly
independent, ∃ Φ
Φ
−1
Φ
Φ is called Gramian matrix: the matrix of all scalar products.
Φ
w = y →
I
z }| {
Φ
Φ Φ
Φ
−1
Φ
w
| {z }
w∗
= y
Φ Φ
Φ
−1
Φ
w = Φ Φ
Φ
−1
| {z }
Moore-Penrose pseudoinverse
y = w∗
6
9. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
A change of perspective
Instead of looking at the rows, look at the columns. These are l.i.
functions ψi(t) = ti−1
. The model looks like
y(t) =
N−1
X
i=0
ψi(t)wi
and the regression problem now looks like
ψi(t) = ti−1
, Ψ(t)w =
ψ3(t) ψ2(t) ψ1(t) ψ0(t)
w = y
Note that Ψ = Φ
.
7
10. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
A change of perspective
Ψ is an n×N (3×4) matrix with N ≥ n, then rank Ψ ≤ n
If the N column vectors of the matrix linearly independent, then
∃ ΨΨ
−1
K = ΨΨ
is called Covariance matrix: Kij =
N−1
X
k=0
ψk(ti)ψk(tj).
Ψw = y →
I
z }| {
Ψ Ψ
ΨΨ
−1
Ψw
| {z }
w∗
= y
Ψ
ΨΨ
−1
Ψw = Ψ
ΨΨ
−1
| {z }
Moore-Penrose pseudoinverse
y = w∗
8
11. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Recapitulation: the problem
Given n examples {(ti, yi)}, propose a model using N ≥ n l.i.
functions (a.k.a. features),
f(t) =
N
X
i
ψi(t)wi
and find some good {wi}.
9
Hansen, Per Christian. Rank-deficient and discrete ill-posed problems: numerical aspects of
linear inversion. Vol. 4. Siam, 1998.
Wendland, Holger. Scattered data approximation. Vol. 17. Cambridge university press, 2004.
12. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Recapitulation: the solution
Data view
Think in terms of n feature vectors φi in a (high) dimensional space
RN
φ
j =
ψ1(tj) . . . ψN (tj)
, j = 1, . . . , n
The solution reads
f(t) = Φ(t)
w∗ =
scalar product
z }| {
Φ(t)
Φ Φ
Φ
| {z }
scalar product
-1
y
Function view
Think in terms of a N dimensional function space H spanned by the
ψi(t). The solution reads
f(t) = Ψ(t)w =
covariance
z }| {
Ψ(t)Ψ
ΨΨ
| {z }
covariance
-1
y = k(t, t) k(t, t)−1
y
10
13. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The Kernel trick
To calculate the solutions we only need scalar products or
covariances: we never use the actual {φi} or {ψi}.
cov
Ψ
(t, t0
) = k(t, t0
) = Φ(t) · Φ(t0
)
Infinite features
Now we can use N = ∞, i.e infinite dimensional features or infinite
basis functions!
By selecting valid covariance functions we implicitly select features
for our model.
How to choose the covariance function?
11
Rasmussen, C., Williams, C. (2006). Gaussian Processes for Machine Learning.
http://www.gaussianprocess.org/gpml/
14. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The Kernel trick
To calculate the solutions we only need scalar products or
covariances: we never use the actual {φi} or {ψi}.
cov
Ψ
(t, t0
) = k(t, t0
) = Φ(t) · Φ(t0
)
Infinite features
Now we can use N = ∞, i.e infinite dimensional features or infinite
basis functions!
By selecting valid covariance functions we implicitly select features
for our model.
How to choose the covariance function? Prior knowledge about the
solution.
11
Rasmussen, C., Williams, C. (2006). Gaussian Processes for Machine Learning.
http://www.gaussianprocess.org/gpml/
15. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Digression: back to the solution
Lets call the pseudoinverse Ψ+
. The proposed solution
Ψ+
y = w∗, Ψw∗ = ΨΨ+
y = y → y(t) = Ψ(t)w∗
LHS of the arrow is the interpolation, RHS is the intra- or
extra-polation.
But with any random vector ξ we have
w∗ +
null Ψ
z }| {
I − Ψ+
Ψ
ξ = ŵ∗
Ψŵ∗ = y + Ψ − ΨΨ+
| {z }
I
Ψ
!
ξ = y
ŵ∗ also solves the interpolation problem. There are many solutions!
(unless Ψ+
Ψ = I, i.e. null Ψ = 0, i.e. invertible matrix: not our
case).
12
16. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Digression: back to the solution
0.2 0.4 0.6 0.8
-2
-1
0
1
2
t
y
w∗ + (I − Ψ+
Ψ) ξ
13
17. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Gaussian Process
14
18. Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Thank you!
15