SA is a global optimization technique.
It distinguishes between different local optima.
It is a memory less algorithm & the algorithm does not use any information gathered during the search.
SA is motivated by an analogy to annealing in solids.
& it is an iterative improvement algorithm.
Part 1 of the Deep Learning Fundamentals Series, this session discusses the use cases and scenarios surrounding Deep Learning and AI; reviews the fundamentals of artificial neural networks (ANNs) and perceptrons; discuss the basics around optimization beginning with the cost function, gradient descent, and backpropagation; and activation functions (including Sigmoid, TanH, and ReLU). The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
SA is a global optimization technique.
It distinguishes between different local optima.
It is a memory less algorithm & the algorithm does not use any information gathered during the search.
SA is motivated by an analogy to annealing in solids.
& it is an iterative improvement algorithm.
Part 1 of the Deep Learning Fundamentals Series, this session discusses the use cases and scenarios surrounding Deep Learning and AI; reviews the fundamentals of artificial neural networks (ANNs) and perceptrons; discuss the basics around optimization beginning with the cost function, gradient descent, and backpropagation; and activation functions (including Sigmoid, TanH, and ReLU). The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
1 factor vs.2 factor gaussian model for zero coupon bond pricing finalFinancial Algorithms
Financial Algorithms describes the comparison between and relevance of Gaussian one and two factor models in today's interest rate environments across US, European and Asian markets. Negative short rates seem to be the new norm of interest rate markets, especially in Euro-zone & somewhat in US , where poor demand and very low inflation dragging down interest rates in a negative zone. One and Two Factor Gaussian Models under Hull-White Setup can accommodate such scenarios and address the cases of curve steeping of longer end of the zero curve wherein short rates hover in negative zones.
One Size Doesn't Fit All: The New Database Revolutionmark madsen
Slides from a webcast for the database revolution research report (report will be available at http://www.databaserevolution.com)
Choosing the right database has never been more challenging, or potentially rewarding. The options available now span a wide spectrum of architectures, each of which caters to a particular workload. The range of pricing is also vast, with a variety of free and low-cost solutions now challenging the long-standing titans of the industry. How can you determine the optimal solution for your particular workload and budget? Register for this Webcast to find out!
Robin Bloor, Ph.D. Chief Analyst of the Bloor Group, and Mark Madsen of Third Nature, Inc. will present the findings of their three-month research project focused on the evolution of database technology. They will offer practical advice for the best way to approach the evaluation, procurement and use of today’s database management systems. Bloor and Madsen will clarify market terminology and provide a buyer-focused, usage-oriented model of available technologies.
Webcast video and audio will be available on the report download site as well.
Streamlining Technology to Reduce Complexity and Improve ProductivityKevin Fream
The 3 biggest challenges for mid-sized companies are: clutter, discovery, and curiosity. Those that thrive will become experts in streamlining technology while remaining vigilant in following new trends.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
We approach the screening problem - i.e. detecting which inputs of a computer model significantly impact the output - from a formal Bayesian model selection point of view. That is, we place a Gaussian process prior on the computer model and consider the $2^p$ models that result from assuming that each of the subsets of the $p$ inputs affect the response. The goal is to obtain the posterior probabilities of each of these models. In this talk, we focus on the specification of objective priors on the model-specific parameters and on convenient ways to compute the associated marginal likelihoods. These two problems that normally are seen as unrelated, have challenging connections since the priors proposed in the literature are specifically designed to have posterior modes in the boundary of the parameter space, hence precluding the application of approximate integration techniques based on e.g. Laplace approximations. We explore several ways of circumventing this difficulty, comparing different methodologies with synthetic examples taken from the literature.
Authors: Gonzalo Garcia-Donato (Universidad de Castilla-La Mancha) and Rui Paulo (Universidade de Lisboa)
Manifold regularization is an approach which exploits the geometry of the marginal distribution.
The main goal of this paper is to analyze the convergence issues of such regularization
algorithms in learning theory. We propose a more general multi-penalty framework and establish
the optimal convergence rates under the general smoothness assumption. We study a
theoretical analysis of the performance of the multi-penalty regularization over the reproducing
kernel Hilbert space. We discuss the error estimates of the regularization schemes under
some prior assumptions for the joint probability measure on the sample space. We analyze the
convergence rates of learning algorithms measured in the norm in reproducing kernel Hilbert
space and in the norm in Hilbert space of square-integrable functions. The convergence issues
for the learning algorithms are discussed in probabilistic sense by exponential tail inequalities.
In order to optimize the regularization functional, one of the crucial issue is to select regularization
parameters to ensure good performance of the solution. We propose a new parameter
choice rule “the penalty balancing principle” based on augmented Tikhonov regularization for
the choice of regularization parameters. The superiority of multi-penalty regularization over
single-penalty regularization is shown using the academic example and moon data set.
Gaussian Processes: Applications in Machine Learning
1. Gaussian Processes: Applications in Machine
Learning
Abhishek Agarwal
(05329022)
Under the Guidance of Prof. Sunita Sarawagi
KReSIT, IIT Bombay
Seminar Presentation
March 29, 2006
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
2. Outline
Introduction to Gaussian Processes(GP)
Prior & Posterior Distributions
GP Models: Regression
GP Models: Binary Classification
Covariance Functions
Conclusion.
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
3. Introduction
Supervised Learning
Gaussian Processes
Defines distribution over functions.
Collection of random variables, any finite number of which
have joint Gaussian distributions.[1] [2]
f ∼ GP(m, k)
Hyperparameters and Covariance function.
Predictions
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
4. Prior Distribution
Represents our belief about the function distribution, which
we pass through parameters
Example: GP(m, k)
1
m(x) = x 2 , k(x, x ) = exp(− 1 (x − x )2 ).
2
4
To draw sample from the distribution:
Pick some data points.
Find distribution parameters at each point.
µi = m(xi ) & Σij = k(xi , xj ) i, j = 1, . . . , n
Pick the function values from each individual distribution.
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
5. Prior Distribution(contd.)
9
8
7
6
function values
5
4
3
2
1
−5 −4 −3 −2 −1 0 1 2 3 4 5
data points
Figure: Prior distribution over function using Gaussian Process
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
6. Posterior Distribution
Distribution changes in presence of Training data D(x, y ).
Functions which satisy D are given higher probability.
8
7
6
5
function values
4
3
2
1
0
−1
−5 −4 −3 −2 −1 0 1 2 3 4 5
data points
Figure: Posterior distribution over functions using Gaussian Processes
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
7. Posterior Distribution (contd.)
Prediction for unlabeled data x∗
GP outputs the function distribution at x∗
Let f be the distribution at data points in D and f∗ at x∗
f and f∗ will have a joint Gaussian distribution, represented as:
f µ Σ Σ∗
∼
f∗ µ∗ Σ∗ T Σ∗∗
Conditional distribution of f∗ given f can be expressed as:
f∗ |f ∼ N ( µ∗ + Σ∗ T Σ−1 (f − µ), Σ∗∗ − Σ∗ T Σ−1 Σ∗ ) (1)
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
8. Posterior Distribution (contd.)
Parameters of the posterior in Eq. 1 are:
f∗ |D ∼ GP(mD , kD ) ,
where mD (x) = m(x) + Σ(X , x)T Σ−1 (f − m)
kD (x, x ) = k(x, x ) − Σ(X , x)T Σ−1 Σ(X , x )
8
7
6
5
function values
4
3
2
1
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
data points
Figure: Prediction from GP Applications in Machine Learning
Abhishek Agarwal (05329022)
Gaussian Processes:
9. GP Models: Regression
GP can be directly applied to Bayesian Linear Regression
model like:
f (x) = φ(x)T w with prior w ∼ N (0, Σ)
Parameters for this distribution will be:
E[f (x)] = φ(x)T E[w ] = 0,
E[f (x)f (x )] = φ(x)T E[ww T ]φ(x ) = φ(x)T Σp φ(x )
So, f (x) and f (x ) are jointly Gaussian with zero mean and
covariance φ(x)T Σp φ(x ).
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
10. GP Models: Regression (contd.)
In Regression, posterior distribution over the weights, is given
as (9):
likelhood ∗ prior
posterior =
marginal likelihood
Both prior p(f|X ) and likelihood p(y |f, X ) are Gaussian:
prior: f|X ∼ N (0, K ) (5)
likelihood: y|f ∼ N (f, σ n 2 I)
Marginal Likelihood p(y |X ) is defined as (6):
p(y |X ) = p(y |f, X )p(f|X )df (2)
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
11. GP Models: Classification
Modeling Binary Classifier
Squash the output of a regression model using a response
function, like sigmoid.
Ex: Linear logistic regression model:
1
p(C1 |x) = λ(x T w ), λ(z) =
1 + exp(−z)
Likelihood is expressed as (7):
p(yi |xi , w ) = σ(yi fi ),
fi ∼ f (xi ) = x i T w
and therefore its non-Gaussain.
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
12. GP Models: Classification (contd.)
Distribution over latent function, after seeing the test data, is
given as:
p(f∗ |X , y , x∗ ) = p(f∗ |X , x∗ , f)p(f|X , y )df, (3)
where p(f|X , y ) = p(y |f)p(f|X )/p(y |X ) is the posterior over
the latent variable.
Computation of the above integral is analytically intractable
Both, likelihood and posterior are non-Gaussian.
Need to use some analytic Approximation of integrals.
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
13. GP Models: Laplace Approximations
Gaussian Approximation of p(f|X , y ):
Using second order Taylor expansion, we obtain:
q(f|X , y ) = N (f|ˆ A−1 )
f,
where where ˆ = argmaxf p(f|X , y ) and
f
A=− log p(f|X , y )|f=ˆ
f
To find ˆ we use Newton’s method, because of non-linearity of
f,
log p(f|X , y ) (9)
Prediction is given as:
π∗ = p(y∗ = +1|X , y , x∗ ) = σ(f∗ )p(f∗ |X , y , x∗ )df∗ , (4)
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
14. Covariance Function
Encodes our belief about the prior distribution over function
Some properties:
Staionary
Isotropic
Dot-Product Covariance
Ex: Squared Exponential(SE) covarince function:
1
cov (f (xp ), f (xq )) = exp(− |xp − xq |2 )
2
Learned with other hyper-parameters.
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
15. Summary and Future Work
Current Research:
Fast sparse approximation algorithm for matrix inversion.
Approximation algorithm for non-Gaussian likelihoods.
GP approach has outperformed traditional methods in many
applications.
Gaussin Process based Positioning System (GPPS) [6]
Multi user Detection (MUD) in CDMA [7]
GP models are more powerful and flexible than simple
linear parametric models and less complex in comparison
to other models like multi-layer perceptrons. [1]
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
16. Rasmussen and Williams. Gaussian Process for Machine
Learning, The MIT Press, 2006.
Matthias Seeger. Gaussian Process for Machine Learning,
2004. International Journal of Neural Systems, 14(2):69-106,
2004.
Christopher Williams, Bayesian Classification with Gaussian
Processes, In IEEE Trans. Pattern analysis and Machine
Intelligence, 1998
Rasmussen and Williams, Gaussian Process for Regression. In
Proceedings of NIPS’ 1996.
Rasmussen, Evaluation of Gaussian Processes and Other
Methods for Non-linear Regression. PhD thesis, Dept. of
Computer Science, University of Toronto, 1996. Available from
http://www.cs.utoronto.ca/ carl/
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
17. Anton Schwaighofer, et. al. GPPS: A Gaussian Process
Positioning System for Cellular Networks, In proceedings of
NIPS’ 2003.
Murillo-Fuentes, et. al. Gaussian Processes for Multiuser
Detection in CDMA receivers, Advances in Neural Information
Processing System’ 2005
David Mackay, Introduction to Gaussian Processes
C. Williams. Gaussian processes. In M. A. Arbib, editor,
Handbook of Brain Theory and Neural Networks, pages
466-470. The MIT Press, second edition, 2002.
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
18. Thank You !!
Questions ??
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
19. Extra
Prior:
1 1 n
log p(f|X ) = − f T K −1 f − log |K | − log 2π (5)
2 2 2
Mariginal likelihood
1 1 n
log p(y|X ) = − yT (K +σ n 2 I)−1 y− log |K +σ n 2 I|− log 2π
2 2 2
(6)
Likelihood
p(y = +1|x, w ) = σ(x T w ), (7)
For symmetric like hood σ(−z) = 1 − σ(z).
p(yi |xi , w ) = σ(x i T w ), (8)
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
20. Extra (contd.)
first derivative of posterior
ˆ = K(
f log p(f|X , y ))
Prediction
p(y|X, w) ∗ p(w)
p(w |y , X ) =
p(y |X )
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning