{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
Kernel analysis of deep networks
1. Kernel Analysis of
Deep Networks
By:
Gregoire Montavon
Mikio L. Braun
Klaus-Robert Muller
(Technical University of Berlin)
JMLR 2011
Presented by:
Behrang mehrparvar
(University of Houston)
April 8th, 2014
7. Problem Specification
Deep Learning is still a Black Box!
Theoretical aspect
− e.g. studying depth in sum-product networks
Analytical arguments
− e.g. analysis of depth
Experimental results
− e.g. performance in application domains
Visualization
− e.g. measuring invariance
8. Kernel Methods
Decouples learning algorithms from data representation
Kernel operator:
− Measures similarity between points
− All the prior knowledge of the learning problem
In this paper:
− Not a learning machine
− Abstraction tool to model the deep network
9. Kernel Methods (cont.)
Kernel Methods
− model the deep network
Used to quantify ...
− the goodness of representations
− the evolution of good representations
10. Hypothesis
1) Simpler and more accurate representation throughout the depth
2) Structure of the network (restrictions) define the speed of how
representations are formed
– Evolution from dist. of pixels to dist. of classes
11. Problem Specification
Problem: Role of depth in goodness of representation
Challenge: Definition and Measurement for goodness
Solution:
– Simplicity
Dimensionality: number of kernel PCs
Number of local variations
– Accuracy
Classification error
13. Method
1) Train the deep network
2) Infer the representation of each layer
3) Apply kernel PCA on each layer representations
4) Project data points on first d eigenvectors
5) Analyze the results
24. Architectures
Multilayer Perceptrons
– No preconditioning on learning problem
– Prior: NONE
Pretrained Multilayer perceptrons
– Better represents the underlying representation
– Contains a certain part of soluton
– Prior: generative model of input
Convolutional Neural Networks
– Prior: Spatial invariance
28. Observation
MNIST:
– MLP: Discriminating is solved greedily
– PMLP and CNN: postpone to last layers
CIFAR
– MLP: Doesn't discriminate till last layer
– PMLP and CNN: spread it to more layers
WHY?!
– Good observation, but no explanation!
– Hints: dataset, priors, etc. ?
30. Observation
Regularities in PMLP and CNN
– Facilitate the construction of a structured
solution
– Controls the rate of discrimination at every level
32. Comments
Strengths
– Important and interesting problem
– Simple and intuitive approach
– Well designed experiments
– Good analysis of results
Weaknesses
– Too many observations
• e.g. role of sigma in scale invariance
– explaining observations
34. References
1) Bengio, Yoshua, and Olivier Delalleau. "On the expressive
power of deep architectures." Algorithmic Learning Theory.
Springer Berlin Heidelberg, 2011.
2) Poon, Hoifung, and Pedro Domingos. "Sum-product networks: A
new deep architecture." Computer Vision Workshops (ICCV
Workshops), 2011 IEEE International Conference on. IEEE, 2011.
3) Braun, Mikio L., Joachim M. Buhmann, and Klaus-Robert Müller. "On
relevant dimensions in kernel feature spaces." The Journal of
Machine Learning Research 9 (2008): 1875-1908.
4) http://deeplearning.net/