SlideShare a Scribd company logo
1 of 3
Download to read offline
Random Feature Selection for Online Gaussian
Process-Based Learning
Erick Lin, Dr. Byron Boots
School of Interactive Computing
Background
Gaussian processes (GP) are a powerful class of models that are increasingly being used
in applications as diverse as control systems engineering, geostatistics, and medical image
analysis. A Gaussian process is defined as a collection of random variables, any finite number
of which have a joint Gaussian (also called normal) distribution. Informally, GP can be
thought of as the infinite-dimensional version of a multivariate Gaussian distribution, where
every point x in the domain is associated with a Gaussian random variable f(x) ∼ N(µ, σ2
).
This means that a Gaussian process may be specified by a mean function µ(x) and a kernel
function k(x, x ) which gives the covariance between the random variables at any two points
x and x , and overall can be represented as
f(x) ∼ GP µ(x), k(x, x ) . (1)
Gaussian processes can be used to perform supervised learning, or the general task
of inferring unknown outputs for test inputs based on previously observed training data
(input-output pairs). We focus primarily on models for regression, which are supervised
learning models that produce real-valued outputs. From a statistical perspective, Gaussian
process regression (GPR) is closely related to Bayesian linear regression (BLR) in that the
distribution of hypotheses is conditioned on the training data using Bayesian inference.
Their connection is further illustrated through the derivation of GPR from BLR in the
following paragraph.
The standard linear regression model takes the form y = f(x) + ε with f(x) = x w,
where w is the parameter vector, f is the hypothesis, and y is the observed output. If we
make the assumption that w ∼ N(0, Σp) and ε ∼ N(0, σ2
n), then it can be shown that
p(w|y, X) ∼ N(
1
σ2
n
A−1
Xy, A−1
) (2)
where A = σ−2
n XX + Σ−1
p , and as a result, the predictive distribution f∗ at the test input
x∗ for BLR is given by
p(f∗|x∗, y, X) = p(f∗|x∗, w)p(w|y, X)dw (3)
= N(
1
σ2
n
x∗ A−1
Xy, x∗ A−1
x∗). (4)
If x is now replaced with a function φ(x) that maps x into a higher-dimensional feature
space, the model f(x) = φ(x) w becomes capable of fitting nonlinear functions, and the
predictive distribution becomes
p(f∗|x∗, y, X) = N
1
σ2
n
φ(x∗) A−1
Φ(X)y, φ(x∗) A−1
φ(x∗) (5)
where Φ(X) is the matrix of φ(x) for all elements x in the training set. If the image of φ is
infinite-dimensional, then (5) expresses Gaussian process regression in a specific form that
is useful for our purposes.
1
Gaussian processes can be considered nonparametric – that is, the number of parameters
scales up with the number of data points, allowing for flexibility in fitting to data on large
scales. However, in practice, traditional GPR has not been able to be applied in such scenar-
ios due to its heavy computational requirements, which include O(N3
) time complexity and
O(N2
) space complexity in the number of training points N, with the respective bottleneck
factors being the inversion of a covariance matrix K for all pairs of training points and the
storage of K [2].
One proposed solution is that instead of using an infinite or high-dimensional feature
mapping φ (in fact, k(x, x ) = φ(x), φ(x ) ), a randomized low-dimensional feature mapping
z(x) z(x ) can be used instead that approximates k(x, x ) to a high accuracy, while enabling
the use of the much faster and computationally less expensive BLR to approximate GPR in
the random feature space [1].
Objectives
Our goal is to combine the approaches outlined in the previous section with original ap-
proaches of our own to produce a novel variant of GPR able to predict and make decisions
based on data that is both potentially massive in scale and streaming in real-time.
For handling the latter characteristic in particular, we may take hint from Bayesian
linear regression, which has a recursive solution [3] that allows it to be performed online, or
updated with new training data without having to retrain on the previous data. A general
outline of online Bayesian linear regression is that we begin with a prior p(w) where the
prior distribution is given by w ∼ N(0, Σp), and recursively apply the Bayes filter
p(w|y1, · · · , yk, Xk) =
p(yk|w, Xk)
p(y1|X1) · · · p(yk|Xk)
p(w|y1, · · · , yk−1, Xk−1) (6)
once for each incoming training output yk, where Xk is the matrix consisting of the first
k training input vectors. It can be shown that the posterior itself has a Gaussian dis-
tribution, and allows us to also obtain a Gaussian distribution for the predicted output
p(f∗|x∗, y1, · · · , yk, X) of any test input x∗. We propose that the same procedure can be
repeated when xk is replaced with φ(xk) and Xk with Φ(Xk) to obtain the corresponding
distribution function for Gaussian process regression that has the desired characteristic of
being able to perform online learning.
My plan is to first develop the mathematical details of and ensure correctness of our
proposed learning model, which attempts to synthesize the methods of selecting random
features and solving BLR recursively to create a fast online approximate GPR, and making
modifications or adopting further ideas as necessary.
I will then implement the GPR model as a program and begin testing it on an inverse
dynamics problem in the form of motion planning for a 7-DOF robotic arm, comparing its
performance to those of established benchmarks.
If time permits, I will also work toward implementing automated selection of hyperpa-
rameters (free parameters) using techniques of maximizing likelihood and/or cross valida-
tion.
Materials and Methods
Octave/MATLAB will be used to implement our GPR algorithm. Personal desktop com-
puters running Ubuntu and/or Windows will be used to run the learning algorithm on the
training and test datasets, and we will make use of a Barrett TechnologyR
Inc. WAMTM
Arm
in our lab.
2
References
[1] Rahimi, A., and Recht, B. Random features for large-scale kernel machines. In
Neural Information Processing Systems (2007).
[2] Rasmussen, C. E., and Williams, C. K. Gaussian Processes for Machine Learning.
The MIT Press, Cambridge, Massachusetts, 2006.
[3] S¨arkk¨a, S. Lecture 2: From linear regression to kalman filter and beyond, January
2012.
3

More Related Content

What's hot

A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:Sean Golliher
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlonozomuhamada
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approachnozomuhamada
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsYoonho Lee
 
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Ismael Torres-Pizarro, PhD, PE, Esq.
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practicetuxette
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function InterpolationJesse Bettencourt
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientFabian Pedregosa
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC datatuxette
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsLuc Brun
 
Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)NYversity
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...AIST
 
Graph kernels
Graph kernelsGraph kernels
Graph kernelsLuc Brun
 

What's hot (20)

A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
 
Analysis_molf
Analysis_molfAnalysis_molf
Analysis_molf
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
 
2009 asilomar
2009 asilomar2009 asilomar
2009 asilomar
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function Interpolation
 
Astaño 4
Astaño 4Astaño 4
Astaño 4
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & Trends
 
Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
 
Graph kernels
Graph kernelsGraph kernels
Graph kernels
 

Similar to proposal_pura

Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...SSA KPI
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagationDong Guo
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...SSA KPI
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Multilayer Neural Networks
Multilayer Neural NetworksMultilayer Neural Networks
Multilayer Neural NetworksESCOM
 
Presentacion limac-unc
Presentacion limac-uncPresentacion limac-unc
Presentacion limac-uncPucheta Julian
 
Numerical differentation with c
Numerical differentation with cNumerical differentation with c
Numerical differentation with cYagya Dev Bhardwaj
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeMagdi Mohamed
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practiceguest3550292
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
SPECTRAL ESTIMATE FOR STABLE SIGNALS WITH P-ADIC TIME AND OPTIMAL SELECTION O...
SPECTRAL ESTIMATE FOR STABLE SIGNALS WITH P-ADIC TIME AND OPTIMAL SELECTION O...SPECTRAL ESTIMATE FOR STABLE SIGNALS WITH P-ADIC TIME AND OPTIMAL SELECTION O...
SPECTRAL ESTIMATE FOR STABLE SIGNALS WITH P-ADIC TIME AND OPTIMAL SELECTION O...sipij
 
Iaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detectionIaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detectionIaetsd Iaetsd
 

Similar to proposal_pura (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
reportVPLProject
reportVPLProjectreportVPLProject
reportVPLProject
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagation
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
 
Multilayer Neural Networks
Multilayer Neural NetworksMultilayer Neural Networks
Multilayer Neural Networks
 
Presentacion limac-unc
Presentacion limac-uncPresentacion limac-unc
Presentacion limac-unc
 
Numerical differentation with c
Numerical differentation with cNumerical differentation with c
Numerical differentation with c
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and Practice
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
 
17_monte_carlo.pdf
17_monte_carlo.pdf17_monte_carlo.pdf
17_monte_carlo.pdf
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
SPECTRAL ESTIMATE FOR STABLE SIGNALS WITH P-ADIC TIME AND OPTIMAL SELECTION O...
SPECTRAL ESTIMATE FOR STABLE SIGNALS WITH P-ADIC TIME AND OPTIMAL SELECTION O...SPECTRAL ESTIMATE FOR STABLE SIGNALS WITH P-ADIC TIME AND OPTIMAL SELECTION O...
SPECTRAL ESTIMATE FOR STABLE SIGNALS WITH P-ADIC TIME AND OPTIMAL SELECTION O...
 
Iaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detectionIaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detection
 

proposal_pura

  • 1. Random Feature Selection for Online Gaussian Process-Based Learning Erick Lin, Dr. Byron Boots School of Interactive Computing Background Gaussian processes (GP) are a powerful class of models that are increasingly being used in applications as diverse as control systems engineering, geostatistics, and medical image analysis. A Gaussian process is defined as a collection of random variables, any finite number of which have a joint Gaussian (also called normal) distribution. Informally, GP can be thought of as the infinite-dimensional version of a multivariate Gaussian distribution, where every point x in the domain is associated with a Gaussian random variable f(x) ∼ N(µ, σ2 ). This means that a Gaussian process may be specified by a mean function µ(x) and a kernel function k(x, x ) which gives the covariance between the random variables at any two points x and x , and overall can be represented as f(x) ∼ GP µ(x), k(x, x ) . (1) Gaussian processes can be used to perform supervised learning, or the general task of inferring unknown outputs for test inputs based on previously observed training data (input-output pairs). We focus primarily on models for regression, which are supervised learning models that produce real-valued outputs. From a statistical perspective, Gaussian process regression (GPR) is closely related to Bayesian linear regression (BLR) in that the distribution of hypotheses is conditioned on the training data using Bayesian inference. Their connection is further illustrated through the derivation of GPR from BLR in the following paragraph. The standard linear regression model takes the form y = f(x) + ε with f(x) = x w, where w is the parameter vector, f is the hypothesis, and y is the observed output. If we make the assumption that w ∼ N(0, Σp) and ε ∼ N(0, σ2 n), then it can be shown that p(w|y, X) ∼ N( 1 σ2 n A−1 Xy, A−1 ) (2) where A = σ−2 n XX + Σ−1 p , and as a result, the predictive distribution f∗ at the test input x∗ for BLR is given by p(f∗|x∗, y, X) = p(f∗|x∗, w)p(w|y, X)dw (3) = N( 1 σ2 n x∗ A−1 Xy, x∗ A−1 x∗). (4) If x is now replaced with a function φ(x) that maps x into a higher-dimensional feature space, the model f(x) = φ(x) w becomes capable of fitting nonlinear functions, and the predictive distribution becomes p(f∗|x∗, y, X) = N 1 σ2 n φ(x∗) A−1 Φ(X)y, φ(x∗) A−1 φ(x∗) (5) where Φ(X) is the matrix of φ(x) for all elements x in the training set. If the image of φ is infinite-dimensional, then (5) expresses Gaussian process regression in a specific form that is useful for our purposes. 1
  • 2. Gaussian processes can be considered nonparametric – that is, the number of parameters scales up with the number of data points, allowing for flexibility in fitting to data on large scales. However, in practice, traditional GPR has not been able to be applied in such scenar- ios due to its heavy computational requirements, which include O(N3 ) time complexity and O(N2 ) space complexity in the number of training points N, with the respective bottleneck factors being the inversion of a covariance matrix K for all pairs of training points and the storage of K [2]. One proposed solution is that instead of using an infinite or high-dimensional feature mapping φ (in fact, k(x, x ) = φ(x), φ(x ) ), a randomized low-dimensional feature mapping z(x) z(x ) can be used instead that approximates k(x, x ) to a high accuracy, while enabling the use of the much faster and computationally less expensive BLR to approximate GPR in the random feature space [1]. Objectives Our goal is to combine the approaches outlined in the previous section with original ap- proaches of our own to produce a novel variant of GPR able to predict and make decisions based on data that is both potentially massive in scale and streaming in real-time. For handling the latter characteristic in particular, we may take hint from Bayesian linear regression, which has a recursive solution [3] that allows it to be performed online, or updated with new training data without having to retrain on the previous data. A general outline of online Bayesian linear regression is that we begin with a prior p(w) where the prior distribution is given by w ∼ N(0, Σp), and recursively apply the Bayes filter p(w|y1, · · · , yk, Xk) = p(yk|w, Xk) p(y1|X1) · · · p(yk|Xk) p(w|y1, · · · , yk−1, Xk−1) (6) once for each incoming training output yk, where Xk is the matrix consisting of the first k training input vectors. It can be shown that the posterior itself has a Gaussian dis- tribution, and allows us to also obtain a Gaussian distribution for the predicted output p(f∗|x∗, y1, · · · , yk, X) of any test input x∗. We propose that the same procedure can be repeated when xk is replaced with φ(xk) and Xk with Φ(Xk) to obtain the corresponding distribution function for Gaussian process regression that has the desired characteristic of being able to perform online learning. My plan is to first develop the mathematical details of and ensure correctness of our proposed learning model, which attempts to synthesize the methods of selecting random features and solving BLR recursively to create a fast online approximate GPR, and making modifications or adopting further ideas as necessary. I will then implement the GPR model as a program and begin testing it on an inverse dynamics problem in the form of motion planning for a 7-DOF robotic arm, comparing its performance to those of established benchmarks. If time permits, I will also work toward implementing automated selection of hyperpa- rameters (free parameters) using techniques of maximizing likelihood and/or cross valida- tion. Materials and Methods Octave/MATLAB will be used to implement our GPR algorithm. Personal desktop com- puters running Ubuntu and/or Windows will be used to run the learning algorithm on the training and test datasets, and we will make use of a Barrett TechnologyR Inc. WAMTM Arm in our lab. 2
  • 3. References [1] Rahimi, A., and Recht, B. Random features for large-scale kernel machines. In Neural Information Processing Systems (2007). [2] Rasmussen, C. E., and Williams, C. K. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, Massachusetts, 2006. [3] S¨arkk¨a, S. Lecture 2: From linear regression to kalman filter and beyond, January 2012. 3