MIRU2014 SLAC

Nakayama Lab.
Machine Perception Group
The University of Tokyo
Grad. School of Information Science and Technology
Hideki Nakayama

Nakayama Lab.
 Deep learning
◦ Successive local response filters and pooling layers
◦ State-of-the-art performance on many tasks & benchmarks
 Traditional BoW-based models are often referred to as
“shallow learning” (interpreted as a single-layer network)
2
[A. Krizhevsky et al., NIPS’12]

Nakayama Lab.
To achieve a certain level of representational power...
 Deep models are believed to require fewer free parameters
or neurons [Larochelle et al., 2007] [Bengio, 2009] [Delalleau and Bengio, 2011]
(not fully proved except for some specific cases, though.)
 However, optimization of deep models is challenging
◦ Non-convex, local minima, many heuristic hyperparameters...
◦ Optimizing shallow network is relatively easy (convex in many cases)
3
(If successfully
trained)  Better generalization
 Computational efficiency
 Scalability
Objection:
“Do Deep Nets Really
Need to be Deep?”
[Ba & Caruana, 2014]

Nakayama Lab.
 Suboptimal (layer-wise)
 Reasonable performance
◦ Even random weights
could work! [Jarrette, 2009]
 Easiness in tuning
 Stability in learning
 Flexibility in the choice of
layer modules
 Global optimality through
the entire network
 State-of-the-art performance
 Difficulty in optimization
 Computational cost
 Constraints on layer modules
4
Fine-tuning (back propagation)
through the entire network is
the key to the best performance!
Structure of the deep network
itself has the primary importance!
Global training of deep models Stacking single-layer
learning modules
◎
◎△
○

Nakayama Lab.
 Suboptimal (layer-wise)
 Reasonable performance
◦ Even random weights
could work! [Jarrette, 2009]
 Easiness in tuning
 Stability in learning
 Flexibility in the choice of
layer modules
 Global optimality through
the entire network
 State-of-the-art performance
 Difficulty in optimization
 Computational cost
 Constraints on layer modules
5
Fine-tuning (back propagation)
through the entire network is
the key to the best performance!
Structure of the deep network
itself has the primary importance!
Global training of deep models Stacking single-layer
learning modules
◎
◎△
○

Nakayama Lab.
Empirically studied on top of the bag-of-words framework
 Hyperfeatures [Agarwal et al., ECCV’06]
◦ Hierarchically stack bag-of-visual-words layers
 Deep Fisher Network [Simonyan et al., NIPS’13]
 Deep Sparse Coding [He et al., SDM’14]
6

Nakayama Lab.
 Higher-order Local Auto-Correlation (HLAC) features
◦ Non-linear filter (mask) response + average pooling
◦ Successfully deployed in many visual recognition applications
 Cons:
◦ Higher-order correlation & masks are required to achieve good
performance, making the feature representation high-dimensional
7

Nakayama Lab.
 Sum-product network [Poon and Domingos, UAI’11]
◦ A deep network where each node (neuron) outputs the sum or
product of input variables
 To represent the same functions, the number of nodes
has to grow: [Delalleau & Bengio, NIPS’11]
◦ Exponentially in a shallow network
◦ Linearly in a deep network
8

Nakayama Lab.
 So, why not use deep models?
9

Nakayama Lab.
 Hierarchically compute low-order local correlations
 Naturally includes a ConvNet-like structure
10
※LAC = Local auto correlation
Repeat multiple times

Nakayama Lab.
 Datasets
◦ MNIST [LeCun,1999]
 Digit recognition
 60k training/10k testing samples
 28x28 pixels
◦ CIFAR-10 [Krizhevsky, 2009]
 Object recognition
 50k training/10k testing samples
 32x32 pixels
◦ Caltech-101 [Fei-Fei, 2004]
 Object recognition
 30 training/15 testing samples
(per class)
 Classifier
◦ Logistic regression
11

Nakayama Lab.
 SLAC achieves better performance than standard
HLAC with reduced feature dimensions
12
84
86
88
90
92
94
96
98
100
HLAC
2nd-order
(35 dim)
HLAC
2nd-order
mask size 5
(219 dim)
HLAC
3rd-order
(153 dim)
HLAC
3rd-order
(2245 dim)
SLAC
2-layers
(1176 dim)
0
10
20
30
40
50
60
70
HLAC
1st-order
(45 dim)
HLAC
2nd-order
(739 dim)
HLAC
2nd-order
mask size 5
(5419 dim)
HLAC
3rd-order
(8023 dim)
SLAC
2-layers
(1176 dim)
Accuracy (%) Accuracy (%)MNIST (gray scale) CIFAR-10 (color)

Nakayama Lab.
 Replace raw patches with densely sampled
SIFT descriptors (SIFT-SLAC)
13
0
10
20
30
40
50
60
70
SLAC
3-layers
(2628 dim)
SIFT-SLAC
1-layer
(2628 dim)
SIFT-SLAC
3-layers
(2628 dim)
SIFT-BoVW
(4000 dim)
SIFT-Fisher
(8192 dim)
Accuracy (%) Caltech-101

Nakayama Lab.
 Combining SLAC layers with Fisher framework boosts
the performance
◦ Different statistical properties can be exploited
14
40
45
50
55
60
65
70
SIFT-Fisher
(a)
SIFT-
SLAC(1-layer)
-Fisher
(b)
SIFT-
SLAC (2-layers)
-Fisher
(c)
(a) + (b) (a) + (b) + (c)
Accuracy (%) Caltech-101

Nakayama Lab.
 Deep learning by stacking is a simple but powerful, flexible
framework to integrate various single-layer modules
 Stacked local autocorrelation (SLAC) features
◦ Iterate computation of local autocorrelation and PCA compression
◦ More efficient than standard HLAC that computes everything in a
single layer
◦ Using multiple layers makes sense
 Learning polynomials is a hot topic in ML
◦ R. Livni et al., Vanishing Component Analysis, In Proc. ICML, 2013.
◦ A. Andoni et al., Learning Polynomials with Neural Networks, In Proc. ICML, 2014.
15

MIRU2014 SLAC

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to MIRU2014 SLAC

Similar to MIRU2014 SLAC (20)

Recently uploaded

Recently uploaded (20)

MIRU2014 SLAC