2.
Outline
Discriminatively Trained, Multiscale,
Deformable Part Model
Object Detection with Discriminatively
Trained Part Based Models
Works to do
3.
A Discriminatively Trained,
Multiscale, Deformable Part Model,
CVPR’08
4.
Part-Based Model
Template
ﬁlters part ﬁlters deformation
resolution ﬁner resolution models
root ﬁlters part ﬁlters deformation
Each component has a root ﬁlter F0 models
coarse resolution ﬁner resolution
and n part models (Fi, vi, di)
Each component has a root ﬁlter F0
5.
Feature Pyramid
Object hypothesis
z = (p0,..., pn)
p0 : location of root
p1,..., pn : location of parts
Score is sum of ﬁlter
scores minus
deformation costs
Image pyramid HOG feature pyramid
Multiscale model captures features at two-resolutions
6.
φd (dx, dy) = (dx, dy, dx2 , dy 2 ) This tr
(4)
locatio
Score of a hypothesis
are deformation features. value D
Score Function
Note that if di = (0, 0, 1, 1) the deformation cost for part to
the i-th part is the squared distance between its actual of this
position and its anchor position relative to the root. In The
general the deformation term” is an arbitrary separable time fr
“data cost “spatial prior”
quadratic function of the displacements. distanc
n n
The bias term is introduced in the score to make 2 the The
score(p0 , .of , pn ) = modelsφ(H, pi ) − when· we combine by the
scores
. . multiple Fi · comparable di (dxi , dyi )
2
i=0 i=1 displacements shifted
them into a mixture model.
The score of a hypothesis z can be expressed in terms respon
of a dot product, β ﬁlters z), between a vector of model
· ψ(H, deformation parameters
parameters β and a vector ψ(H, z), scor
β = (F0 , . . .score(z). . . , βn·, Ψ(H, z)
, Fn , d1 , = d b). (5)
ψ(H, z) = (φ(H, p0 ), . . . φ(H, pn ),
(6) Recall
−φd (dx1 , dy1 ), ﬁlters and(dxconcatenation of HOG
concatenation . . . , −φd n , dyn ), 1).
deformation parameters features and part in the
This illustrates a connection between our models and compu
displacement features
linear classiﬁers. We use this relationship for learning Figu
7.
Matching
Find The Best Hypothesis Matching results
• Deﬁne an overall score for each root location
- Based on best placement of parts
score(p0 ) = max score(p0 , . . . , pn ).
p1 ,...,pn
• High scoring root locations deﬁne detections
- “sliding window approach”
• Efﬁcient computation: dynamic programming +
generalized distance transforms (max-convolution)
8.
Semi-convexity
fβ (x) = max β · Φ(x, z) if dthe=squared 1) the deformationits actual
Note that
the i-th part is
i (0, 0, 1,
distance between
cost for par
of
z∈Z(x) position and its anchor position relative to the root. In T
general the deformation cost is an arbitrary separable tim
Latent SVM (MI-SVM)
Latent SVM
! are model parameters quadratic function of the displacements.
The bias term is introduced in the score to make the
dis
T
z are latent values scores of multiple models comparable when we combine by
• Maximum of convex functions is convex them into a mixture model.
The score of a hypothesis z can be expressed in terms
shi
res
Classiﬁers that score an = ( x , y x. using , y ) y ∈ {−1, 1}
Training data D example , . . , x of a dot product, β · ψ(H, z), between a vector of model
1 parameters β and a vector ψ(H, z), i
1 n n
(x) = max Φ(x, z) is
• ffβ(x) = z∈Z(x)ββ··Φ(x, z)! suchconvex f (x ) > 0
in ! β = (F0 , . . . , Fn , d1 , . . . , dn , b). (5)
β We max like to ﬁnd
would that: y i β i
ψ(H, z) = (φ(H, p0 ), . . . φ(H, pn ),
z∈Z(x) (6) Re
• max(0, 1 − yi fβ (xi )) is convex for negative examples
Minimize
−φd (dx1 , dy1 ), . . . , −φd (dxn , dyn ), 1).
This illustrates a connection between our models and
in
com
! are model parameters
1
Semi-convexity linear classiﬁers. We use this relationship for learning
n model parameters with the latent SVM framework.
the
F
T
z are latent D (β) = ||β||2 + C
L values max(0, 1 − yi fβ (xi )) loc
eac
2 3.2 Matching
in
n i=1 detect objects in an image we compute an overall
1 To giv
Training data D = ( + 1 , y1 , . .placementnof thenparts,fβ (xii)) {−1, 1}
LD (β) = ||β|| x C max(0, 1 y yi
., x , − )
2 score for each root location according to the best possible sco
y ∈
• 2
Maximum of convex functions=is convex. , p ).
i=1 score(p ) max score(p , . .
0 (7) 0 n
we
add
We would like to ﬁnd ! such that: yi froot(xi ) > 0 detections while the
p1 ,...,pn fro
High-scoring β locations deﬁne I
Convex if latent values Φ(x, z) of the parts that yield a!are ﬁxed
fβ (x) = max β · forlocationsdeﬁne convex in high-scoring root
• positive examples can
location is a full object hypothesis. fun
z∈Z(x) By deﬁning an overall score for each root location we Pi,
nimize can detect multiple instances of an object (we assume
max(0, 1 − yi fβ (xi )) is convex for negative examp
there is at most one instance per root location). This
Aft
9.
Object Detection with
Discriminatively Trained Part Based
Models, PAMI’09
10.
Modiﬁcation
Optimization function
Lower dimension but more informative
features
Bounding box prediction
Contextual Information
11.
HOG with PCA
0.45617 0.04390 0.02462 0.01339 0.00629 0.00556 0.00456 0.00391 0.00367
0.00353 0.00310 0.00063 0.00030 0.00020 0.00018 0.00018 0.00017 0.00014
0.00013 0.00011 0.00010 0.00010 0.00009 0.00009 0.00008 0.00008 0.00007
0.00006 0.00005 0.00004 0.00004 0.00003 0.00003 0.00003 0.00002 0.00002
6. PCA of HOG features. Each eigenvector is displayed as a 4 by 9 matrix so that each row corresponds t
The ﬁrst 11 eigenvectors
alization factor and each column to one orientation bin. The eigenvalues are displayed on top of the eigenve
near subspace spanned by the top 11 eigenvectors captures essentially all of the information in a feature v
capture almost all information
how all of the top eigenvectors are either constant along each column or row of the matrix representation.
C be a cell-based feature map computed by aggre- 7 P OST P ROCESSING
g a pixel-level feature map with 9 contrast insensi-
12.
7.3 Contextual Information overla
box, o
We have implemented a simple procedure to rescore positiv
Post-Processing
detections using contextual information.
Let (D1 , . . . , Dk ) be a set of detections obtained using
a syst
with a
k different models (for different object categories) in an diction
image I. Each detection (B, s) ∈ Di is deﬁned by a false p
bounding box B = (x1 , y1 , x2 , y2 ) and a score s. We cision
deﬁne the context of I in terms of a k-dimensional vector We
c(I) = (σ(s1 ), .a. regression model to ﬁgurethe high-
Learning . , σ(sk )) where si is the score of out each d
est the bounding boxDi , and σ(x) = 1/(1 + exp(−2x))
scoring detection in coordinates on the
is a logistic function for renormalizing the scores. obtain
To rescore a detection (B, s) by an imagewith all
Re-scoring the window in models I we build correc
a 25-dimensional feature vector with the original score
scores of categories detection windows In s
of the detection, the top-left and bottom-right bounding to con
box coordinates, and the image context, cow o
g = (σ(s), x1 , y1 , x2 , y2 , c(I)). (30) detect
box cr
The coordinates x1 , y1 , x2 , y2 ∈ [0, 1] are normalized by catego
the width and height of the image. We use a category- truth b
speciﬁc classiﬁer to score this vector to obtain a new
13.
PASCAL VOC 2008
Precision/Recall results on Person 2008
14.
09 Base 09 BB 09 Cont 08
Average Precisison
n 0.407 0.423 0.431 0.42
18
person
15.
Work to Do
Cell model work modiﬁcations
Integrate other methods into cell model work
Another direction
Be the first to comment