Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the Pyramid Histograms of Oriented Gradients with Small Samples

Andrey V. Savchenko
National Research University Higher School of Economics
Email: avsavchenko@hse.ru
Co-authors:
Vladimir Milov (N. Novgorod State Technical University)
Natalya Belova (NRU HSE, Moscow)
National Research University Higher School of Economics
Nizhny Novgorod
THE 4TH INTERNATIONAL CONFERENCE ON ANALYSIS
OF IMAGES, SOCIAL NETWORKS, AND TEXTS
Sequential Hierarchical Image
Recognition based on the Pyramid
Histograms of Oriented Gradients
with Small Samples

Outline
1.Overview. Rough sets. Three-way decisions
2.Hierarchical image recognition. Pyramid HOG
3.Sequential three-way decisions in image recognition
4.Experimental results. Face recognition
5.Conclusion and future work

Overview. Rough sets
Pawlak, Zdzisław. Rough Sets: Theoretical Aspects of
Reasoning About Data, 1991
Conferences:
1. JRS (Joint Roush Set Symposium):
- RSEISP: Rough Sets and Emerging Intelligent
Systems Paradigms
- RSCTC: Conference on Rough Sets and Current
Trends in Computing
2. International Joint Conference on Rough Sets
(IJCRS)
3. Rough Set Theory Workshop (RST)
…
Key idea: set is represented with lower and upper
approximations
Three regions of a target set S from universal set U:
1. Positive region POS(S)
2. Negative regions NEG(S)
3. Boundary region: U-POS(S)-NEG(S)

Three-way decisions (TWD) and binary
classification
Yiyu Yao, Three-way decisions with probabilistic rough sets, Information
Sciences, 2010
“Rules constructed from the three regions are associated with different
actions and decisions, which immediately leads to the notion of three-
way decision rules. A positive rule makes a decision of acceptance,
a negative rule makes a decision of rejection, and a boundary rule
makes a decision of abstaining”
Pattern recognition: it is required to assign a query object X to one of C classes
specified by the database of reference (model) objects. It is assumed that the
class label of the rth model object is known
In case of binary classification (C=2):
1. Positive decision - accept the first class
2. Negative decision - reject the first class and accept the second class
3. Boundary decision: delay the final decision and do not accept either first or
second class
{ }Crc ,...,1)( ∈

Three-way decisions and multi-class
recognition
Obvious enhancement to multi-class recognition – (C+1)-way decisions
1. Accept class c=1
2. Accept class c=2
…
C. Accept class c=C
C+1. Boundary decision: delay the decision process if the classification result is
unreliable
Known way to reject unreliable decision - Chow’s rule.
Chow C. On optimum recognition error and reject tradeoff // IEEE Trans. Inf. Theory, 1970
( ) 0max
},...,1{
pXcP
Cc
≤
∈
ro
rop
Π−Π+Π
Π−Π
=
1001
10
0
1) П10 – losses of incorrect decision, which has not been rejected
2) П01 – losses of rejection of correct decision
3) Пro – cost of reject option (Пro≤ П10)

Sequential three-way decisions and granular
computing
Key question: how to make a
decision if the reject option
was chosen?
Yao Y. Granular Computing and
Sequential Three-Way Decisions
//Proc. of Rough Sets and Knowledge
Technology, LNCS, 2013:
"Objects with a non-commitment
decision may be further
investigated by using fine-grained
granules"
Issues to address:
1) How to define granularity levels in a general way for practically important
composite objects so that high granularity levels are processed faster than the
low one?
2) It is not necessary that the most reliable decision is obtained at the low
granularity level. How to define the way to make a final decision if reject option
is chosen at the last level?
3) Is it possible to apply sequential TWD to each granularity level?
Query object X
Check if decision is unreliable
... ...
Level L
Classifier
Resulted class
Level 1
Classifier
Level (L-1)
Classifier

Histograms of Oriented Gradients (HOG)
Proposed in (Dalal and Triggs, 2005)
Criterion: { }
( ) ( )( )∑
=
∑
=
∆+∆+
∆≤∆
∆≤∆∈
)1(
1
)2(
2
212211
,
)2()1(
,...,1 1 1
,,,minmin
2
1
K
k
K
k
kkHkkH
UV
KK
r
Rr
ρ
Image descriptors:
1. Local (SIFT, SURF, etc.):
a) Keypoint extraction
b) Descriptor extraction
2. Global (color histograms, HOG)
a) Object detection
b) Descriptor extraction
Gradient orientation histogram (from (Lowe, 2004))

Statistical classification
Instead of unknown class distributions let’s
use the Gaussian Parzen kernel.
Thus, for group-choice classification of the
segment and naïve assumption of features
independence inside each segment the
generalized PNN is used
Classification task is reduced to the testing of simple hypothesis. In case of equal
prior probabilities the Bayesian rule is as follows:
( )
( )
∏ ∑
= =





=
=
n
j
n
j rn
r
r
r
r
kk
r
j
xkkjxK
n
kkWkkXf
1 1
2121
2121
)
~
,
~
(
)(
),,(
1
)
~
,
~
(),(
Final classifier with assumption of segments independence:
Production layer is added to the
traditional PNN structure
Unfortunately, if the distribution estimates are used instead of unknown distributions,
such approach is not optimal. It is necessary to check complex hypothesis
{ }
( )( ))2
~
,1
~
()2,1(
,...,1
max kkrWkkXfrp
Rr
⋅
∈
( )












∑
=
∑
=
∏
=
∑
=






∆+∆+⋅
∈
∆≤∆
∆≤∆
)1(
1
)2(
2
221121
,1 1 1 1
),(
)(
),,(
1
min
},...,1{
max
2
1
K
k
K
k
n
j
rn
rj
kk
r
rj
xkkjxK
n
rn
rp
Rr

Segment homogeneity testing
Idea [Borovkov, 1984] - it is necessary to check complex hypothesis of features
samples homogeneity.
The following criterion is known to be asymptotically minimax:
Distribution of hypothesis is estimated by the united sample {X(k1,k2),
as it is done in Lehmann-Rosenblatt test.
)2
~
,1
~
,2,1(
)(
kkkk
r
W
Homogeneity-Testing PNN (HT-PNN) for piecewise-regular object recognition
(Savchenko//Proc. of ANNPR, LNAI, 2012)
{ }
{ } 





∈
)2
~
,1
~
,2,1(
)(
)2
~
,1
~
(),...,2
~
,1
~
(1),2,1(
)(
,...,
)1(
sup
,...,1
max kkkk
r
WkkRXkkXkkXf
R
ffRr
)}2
~
,1
~
( kkXr
( )
( )









∏
=


















∑
=








∆+∆+∆+∆+
∑
=






∆+∆+
+×
∏
=
×
















∑
=






∆+∆+
∑
=






∆+∆+
+×






×∑
=
∑
= ++
⋅
∈
∆≤∆
∆≤∆
rn
rj rn
rj
kk
r
rj
xkk
r
rj
xK
n
j
kk
j
xkk
r
rj
xK
n
j n
j
kk
j
xkkjxK
rn
rj
kk
r
rj
xkkjxK
K
k
K
k rnn
rnn
rn
rnnn
Rr
1
11;
),(
)(
1;
),,(
)(
11
),(
1
),,(
)(
1
1
11
),(
1
),,(
1
),(
)(
),,(
1
1 1
min
},...,1{
max
22112211
212211
221121
221121
)1(
1
)2(
2
,
2
1

Discrete features
Computing efficiency of the PNN and the HT-PNN 







⋅⋅⋅+∆ ∑
=
R
r
rr VUUVO
1
2
)12(
If there are only N feature values, segments are described with their histograms
(HOGs) Nikkirkkiw ,1)},
~
,
~
(;{)},,({ 2121 =θ
∑
=
=
N
j
kkKkk r
jij
r
iK
1
)
~
,
~
()
~
,
~
( 21
)(
21
)(
; θθ
∑
=
=
N
j
kkwKkkw jijiK
1
),(),( 2121;
( )nnkkwnkknkkkk riK
r
iKr
r
i +⋅+⋅=Σ /)),()
~
,
~
(()
~
,
~
,,(
~
21;21
)(
;2121
)(
; θθ
PNN
Equivalent to the Kullback-Leibler
divergence, if smoothing parameter σ→0
HT-PNN
Generalization of the Jensen-Shannon
divergence
Approximate HT-PNN (A-HT-PNN) Generalization of chi-square distance
See details in (Savchenko // Neural Networks, 2013)
{ }Rr
N
i
r
iK
iK
i
K
k
K
k kk
kkw
kkw
KnrXX
,...,1
)1(
1
)2(
2 1 2211
)(
;
21;
21
,
PNN min
1 1 ),(
),(
ln),(min
1
),(
2
1 ∈=
∆≤∆
∆≤∆
→∑
=
∑
= ∆+∆+
= ∑
θ
ρ
∑
= ΣΣ
−








⋅+⋅=
N
i
r
i
r
iKr
irr
i
iK
i
kkkk
kk
kkn
kkkk
kkw
kkwnrHH
1 2121
)(
;
21
)(
;
21
)(
2121
)(
;
21;
21PNNHT
)
~
,
~
;,(
~
)
~
,
~
(
ln)
~
,
~
(
)
~
,
~
;,(
~
),(
ln),(),(
θ
θ
θ
θ
ρ
∑
=
Σ
Σ
Σ
Σ
−−




















−+










−⋅=
N
i
r
iK
r
i
r
i
r
iKr
ir
iK
r
i
r
i
iK
i
kk
kkkk
kkkk
kk
n
kkw
kkkk
kkkk
kkw
kkwnrHH
1 21
)(
;
2121
)(
;
2121
)(
;
21
)(
;)(
21;
2121
)(
;
2121
)(
;
21;
21PNNHTA
)
~
,
~
(
)
~
,
~
,,(
~
)
~
,
~
,,(
~
)
~
,
~
(
),(
)
~
,
~
,,(
~
)
~
,
~
,,(
~
),(
),(),(
θ
θ
θ
θ
θ
θ
θ
ρ

Definition of granularity levels.
Hierarchical image recognition. Pyramid HOG
Proposed in (Bosch, Zisserman, Munoz // CIVR, 2007)
Objects are divided into L pyramid levels.
We focus on small sample size (SSS) problem. Criterion – the nearest neighbor rule
with weighted sum of distances
∑
=
⋅=
L
l
rXXllwrXXPHOG
1
),()()(),( ρρ
Key issue – insufficient performance. This criterion requires
)1()1(/
1
)()( KK
L
l
lKlK ⋅∑
=





 ⋅ -times more calculations in comparison with conventional HOG

Sequential three-way decisions and granular
computing in image recognition
3. Final decision in case of unreliable decision at the finest granularity level –
choose the least unreliable level
1. The nearest neighbor rule is
used at each granularity level l
2. Posterior probability is estimated
based on the properties of the HT-
PNN
),(
)(
},...,1{
minarg)( rXX
l
Rr
l ρν
∈
=
Query image X
... ...
Level L
Nearest neighbor
classifier
Resulted class c(ν)
Level 1
Nearest neighbor
classifier
Classifier fusion






=
∈
X
l
l
WPl
Ll
)(
)(
ˆmaxarg
},...,1{
*
ν
∑
=





 ⋅−





 ⋅−
=





R
r
rXXlUV
lXXlUV
X
l
l
WP
1
),()(exp
))(,()(exp
)(
)(
ˆ
ρ
νρ
ν

Sequential analysis at each granularity level
If the distance is included in the negative region, it is not the distance between objects
from different classes. Hence, it should be the distance between objects of the same
class and there is no need to continue matching with other models!
Warning! Performance of nearest neighbor rule is insufficient if the number of
classes is high
Solution: for each rth model object it is checked if it is possible to accept
hypothesis Wr without further verification of the remaining models
Probabilistic rough set of the distance between objects from different classes is
created:
1. Positive region ( )






∈>=




Ρ X2121 ,
)(
1,
)()(
)(
1
XX
l
XX
ll
POS l ρρρ
2. Negative region ( )






∈<=




Ρ X2121 ,
)(
0,
)()(
)(
0
XX
l
XX
ll
NEG l ρρρ
3. Boundary region ( )






∈≤≤=




Ρ X2,1
)(
12,1
)()(
0
)(
)(
1,
)(
0
XX
l
XX
lll
llBND ρρρ
ρρ
)(
0
)(
,
)()( ll
rX
l
X
l
ρρ <





It is a termination condition in approximate nearest neighbor methods

Real-time recognition with large database
k-NN rule requires the brute force of the whole database
Small training sample RC ≈
Small
database
(tens of
classes)
Medium-sized DB.
Problems with accuracy and
computational speed automatic
real-time recognition
Very-large DB.
Automated content-
based object retrieval
(approximate k-NN)
Solutions:
• Modern hardware
• Parallel computing
• Simplification of similarity measure or its parameters.
• Approximate nearest neighbor methods:
1. ANN library (Arya, Mount, etc. // Journal of the ACM, 1998): kd-trees. Only
Minkowski distances are supported.
2. Hashing Techniques, LSH (Locality-Sensitive Hashing) (Gionis, Indyk,
Motwani, R. // Proc. of VLDB, 1999). Applications in Google Correlate
(Vanderkam, Schonberger, Rowley, Kumar, Nearest Neighbor Search in
Google Correlate // Tech. report, 2013)
3. FLANN library (Muja, Lowe // Proc. of VISAPP, 2009)
4. NonMetricSpaceLib (Boytsov, Bilegsaikhan // Proc. of SISAP, LNCS, 2013)

Medium-sized databases. Maximum-Likelihood
Directed Enumeration Method (DEM)
1. In asymptotic (n, nr→∞) the number of distance calculations in the DEM is constant
(doe not depend on the DB size R)
2. The DEM is the optimal greedy algorithm for the HT-PNN.
Idea: on each step the next model is selected to
maximize likelihood of the previously calculated
distances
Initialization: r1 is chosen to maximize average probability to obtain correct decision of
k=2-th step
{ } { }
( )∑
=−∈
+ =
k
i
i
rrR
k rr
k 1,...,,...,1
1
1
minarg µ
µ
ϕ ( )
( )( )
i
ii
r
rrPNNHT
i
XX
r
,
2
,,
µ
µ
µ
ρ
ρρ
ϕ
−
≈
−
{ }
∑ ∏
= =∈
















−Φ+=
R R
r
r
R
nK
r
1 1
,,
,...,1
1
22
1
maxarg
ν
µνµ
µ
ρρ
{ } { }
( )( )∏
=
−
−∈
+ =
k
i
rPNNHT
rrR
k WXXfr i
k 1,...,,...,1
1 ,maxarg
1
ν
ν
ρ
Based on the asymptotic properties of the HT-PNN, this
rule is equivalent to
Details in (Savchenko // Pattern Recognition, 2012), (Savchenko //Proc. of ICVS, 2013)
and (Savchenko // Proc. of PReMI, LNCS, 2015 – accepted)

Experimental study. Face recognition
Parameters:
1. L=2 granularity levels (10x10 and 20x20)
2. Alignment of HOGs: Δ=1
3. Threshold p0=0.85
Testing: 20-time repeated random subsampling cross-validation
Essex dataset (323 persons, training set: 5187 photos, test set: 1224 photos)
Image
Face detection
(OpenCV LBP)
Convert to
grayscale
Gamma
correction
3x3 median
filter
Contrast
equalization
SegmentationGradient magnitude/
orientation estimation
PHOG
descriptor
Preprocessing 10x10 grid

Experimental results (1a).
Sequential three-way decisions. HT-PNN
Error rate, % Average recognition time, ms.
1. Error rate of hierarchical approach is 0.6-1% lower than the error rate at the ground
level (20x20 grid)
2. Average recognition time for the sequential TWD is 3.5-4.5 times lower in
comparison with the PHOG
3. Recognition time of sequential TWD with database enumeration is 25-45% lower
than the time of sequential TWD
0
2
4
6
8
10
12
323 500 700 1000
Number of models R
Errorrate,%
L=1, grid 10x10 L=1, grid 20x20
Pyramid HOG(PHOG) Sequential TWD with Chow's rule
Sequential TWD with database enumeration
0
10
20
30
40
50
60
70
80
323 500 700 1000
Number of models R
Averagerecognitiontime,ms.
L=1, grid 10x10 L=1, grid 20x20
PyramidHOG(PHOG) Sequential TWD with Chow's rule

Experimental results (1b). Sequential three-way
decisions. Euclidean metric
1. Error rate is 1-2.5% higher than for the HT-PNN
2. Sequential TWD with database enumeration it is 2-2.5-times faster than the
PHOG and 20-30% faster than the original sequential TWD
0
2
4
6
8
10
12
14
323 500 700 1000
Number of models R
Errorrate,%
L=1, grid 10x10 L=1, grid 20x20
Pyramid HOG, PHOG Sequential TWD with Chow's rule
0
2
4
6
8
10
12
14
16
18
323 500 700 1000
Number of models R
L=1, grid 10x10 L=1, grid 20x20
Pyramid HOG, PHOG Sequential TWD with Chow's rule

Experimental results (2).
OpenCV face recognition
1. SVM classifier (libSVM) of HOGs (10x10 grid)
2. Eigenfaces (Turk, Pentland // CVPR, 1991)
3. Fisherfaces (Belhumeur et al. //IEEE Trans. on PAMI, 1997)
4. Histograms of Local Binary Patterns (LBP) (Ahonen et. al //ECCV, 2004)
0
5
10
15
20
25
323 500 700 1000
Number of models R
Errorrate,%
SVM, HOG(10x10) Eigenfaces Fisherfaces LBP
0
2
4
6
8
10
12
14
16
18
20
323 500 700 1000
Number of models R
SVM, HOG (10x10) Eigenfaces Fisherfaces LBP
In case of one image per person, SVM's accuracy is 3% and 5.5% lower than the
accuracy of Euclidean distance and the HT-PNN with the same features (10x10 grid).
If the number of images per class is high (3-4, R=1000), SVM is expectedly better.

Experimental results (3). ML-DEM
Average recognition time (ms.)
R=5187
0
5
10
15
20
25
30
35
T=1 T=8
Averagerecognitiontime,ms
Brute force Randomized KD tree Ordering permutation ML-DEM Pivot ML-DEM
R=881
0
1
2
3
4
5
6
T=1 T=8
Averagerecognitiontime,ms
Brute force Randomized KD tree Ordering permutation ML-DEM Pivot ML-DEM
Original training set Reduced training set (880 medoids)
Error rate: 0.164% 0.573%
The ML-DEM is compared with the following approximate NN methods
1. Randomized kd-tree (Silpa-Anan C., Hartley R. Optimised KD-trees for fast image
descriptor matching // CVPR, 2008)
2. Ordering permutation (Gonzalez E.C., Figueroa K., Navarro G. Effective Proximity Retrieval
by Ordering Permutations // IEEE Trans. on PAMI, 2008)

Experimental results (4a).
Sequential three-way decisions. HT-PNN
0
2
4
6
8
10
12
14
16
18
20
65 75 150 225
Errorrate,%
Number of models R
10x10
20x20
PHOG (10x10+ 20x20)
Sequential PHOG (10x10+ 20x20)
PHOG (10x10+15x15+ 20x20)
Sequential PHOG (10x10+15x15+20x20)
0
5
10
15
20
25
30
35
65 75 150 225Averagerecognitiontime,ms.
Number of models R
10x10
20x20
PHOG (10x10+ 20x20)
PHOG (10x10+15x15+ 20x20)
Dataset: AT&T+Yale+JAFFE (C=65 classes, 778 images)

Experimental results (4b). Sequential three-way
decisions. Euclidean metric
1. L=3 granularity levels allow to increase the accuracy
2. Sequential TWD speeds up the recognition procedure in 2-4 times in comparison
with PHOG
0
5
10
15
20
25
65 75 150 225
Errorrate,%
Number of models R
10x10
20x20
PHOG (10x10+ 20x20)
PHOG (10x10+15x15+ 20x20)
0
2
4
6
8
10
12
14
65 75 150 225
Number of models R
10x10
20x20
PHOG (10x10+ 20x20)
PHOG (10x10+15x15+ 20x20)

Conclusion
1. Insufficient performance of hierarchical image recognition methods is
highlighted.
2. Possibility to apply rough set theory, three-way decisions theory and
granularity computing in image recognition is explored
3. Fast decision method of sequential analysis for Pyramid HOG features is
proposed.
4. We experimentally demonstrated that the proposed approach can be
efficiently applied even with conventional Euclidean metric.
5. If the number of classes C is large, brute force solution is not computing
efficient. Hence, approximate nearest neighbor algorithms can be applied.
Future work
1. Explore modern features extracted with deep neural networks for
unconstrained face recognition
2. Experimental study of more sophisticated segmentation methods
3. Application of proposed approach in other pattern recognition tasks, e.g.
speech recognition
4. It is necessary to apply our approach with other classifiers for which reject
option is available, e.g., one-against-all multi-class support vector
machine

Thank you for listening!
Questions?

Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the Pyramid Histograms of Oriented Gradients with Small Samples

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (12)

Similar to Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the Pyramid Histograms of Oriented Gradients with Small Samples

Similar to Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the Pyramid Histograms of Oriented Gradients with Small Samples (20)

More from AIST

More from AIST (20)

Recently uploaded

Recently uploaded (20)

Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the Pyramid Histograms of Oriented Gradients with Small Samples