Compound Structure Detection

AUTOMATIC DETECTION OF
COMPOUND STRUCTURES
FROM MULTIPLE HIERARCHICAL
SEGMENTATIONS
Hüseyin Gökhan Akçay
Department of Computer Engineering, Bilkent University, Bilkent, 06800, Ankara
akcay@cs.bilkent.edu.tr
21 Sept. 2016
H. G. Akçay Compound Structure Detection 21 Sept. 2016 1 / 78

Motivation
Large scale global content about the Earth.
Small local details (upto 30 cm resolution).
1.6 terabytes of data by the ESA’s multispectral
high-resolution imaging satellite.
The WorldView-2 satellite collects 975,000 square
kilometers of imagery per day.

Motivation
A challenging problem in remote sensing image mining is
the detection of heterogeneous compound structures such
as different types of residential, industrial, and agricultural
areas.
Compound structures are comprised of spatial
arrangements of simple primitive objects such as buildings,
trees and road segments.
Detection of compound structures is a challenging problem
because
They contain thousands of primitive objects.
They mostly do not have distinctive features.
Primitives can arrange in many different combinations in the
overhead view.

Motivation
Figure: 75 × 75m2 compound structures in WorldView-2 images.

Literature Review
Primitive object detection.
Residential-factory buildings, local roads, vehicles, airplanes
and boats.
Window-based approaches.
Bag-of-words representation.
Enforces artiﬁcial boundaries on the image.
Assumes the whole window corresponds to a compound
structure.
Segmentation-based approaches.
The grouping criteria do not involve spatial arrangements.
Graph-based approaches.
Speciﬁc arrangements such as alignment and parallelism.
Structural graph matching.

Problem Deﬁnition
We propose a generic method for the modeling and
detection of compound structures.
Target structures can involve arrangements of an unknown
number of different types of primitive objects.
The detection task is formulated as the selection of multiple
coherent subsets of candidate regions obtained from
multiple hierarchical segmentations.
To avoid over- or under-segmentation of candidate regions,
we search for the most meaningful regions at different
scales.
We propose a constrained region selection framework
which allows to specify global constraints on the selected
regions.

Overview
Learning
Inference
Example
structure
Feature
extraction
Spatial
arrangement
model
Maximum
likelihood
estimation
Probabilistic
region
process
Selected
regions
Region
selection
Candidate
regions
Hierarchical
region
extraction
Image
Gibbs
sampling
S-W sampling/
QP
V
H(V)
p(V|β)
I
1
2
5 6
9 10
3 4
7
11
8
12 13 14
G = (V, E)
1
2
5 6
9 10
3 4
7
11
8
12 13 14
V∗
Figure: Object/process diagram of the proposed approach.

Compound Structure Model
Primitive Representation
Compound structures are composed of spatial
arrangements of multiple, relatively homogeneous, and
compact primitive objects.
We assume that a compound structure V consists of R
layers of primitive object maps, V = r=1,...,R Vr
.
Figure: Primitive object layers.
Each primitive object vi is represented by an ellipse
vi = (li, si, θi).

Spatial Arrangement Model
For a given compound structure consisting of N primitive
objects, we construct a neighborhood graph G = (V, E).
V = {v1, . . . , vN} correspond to the individual primitive
objects,
E = r1,r2=1,...,R Er1r2 where Er1r2 denotes the edges
between the vertices at layers Vr1 and Vr2 .
Figure: Neighborhood graph construction for multiple primitive layers.

For each (vi, vj) ∈ E, we compute the following ﬁve features:
Distance between the
closest pixels, φ1(vi, vj),
Relative orientation,
φ2(vi, vj),
φ2
φ3
φ1 φ4
Angle between the line joining the centroids of the two
objects and the major axis of vi as the reference object,
φ3(vi, vj),
Distance between the closest antipodal pixels that lie on the
major axes, φ4(vi, vj),
Relative size, φ5(vi, vj).

We also compute the following four individual features for
each primitive object vi:
Area, φ6(vi),
Eccentricity, φ7(vi),
Solidity, φ8(vi),
Regularity, φ9(vi).

A one-dimensional marginal histogram Hr1r2
k (Er1r2 ) is
constructed for each pairwise feature φk , k = 1, . . . , 5,
computed over all edges for each pair of layers Vr1 and Vr2 .
Also, a one-dimensional marginal histogram Hr
k (Vr
) is
constructed for each individual feature φk , k = 6, . . . , 8,
computed over all vertices at each layer Vr
.
The concatenation H(V) of all marginal histograms is used
as a non-parametric approximation to the distribution of the
primitive objects in the compound structure.

Figure: Example histograms for the building layers of four different
types of compound structures.

Probabilistic Region Processes
Each primitive object vi (i.e., the ellipse parameters) is
considered a vector-valued random variable.
A compound structure is represented by a set of random
variables that leads to a region process.
The region process is governed by the Gibbs distribution
p(V|β) =
1
Zv
exp βT
H(V) (1)
where β is the parameter vector controlling each histogram
bin, and Zv is the partition function.
A region process is equivalent to a Markov random ﬁeld
(MRF).

Learning
Maximum Likelihood Estimation
Suppose that we observe a set of i.i.d. region processes
V = {V1, . . . , VM}.
We can estimate a compound structure model via
maximum likelihood estimation (MLE) of β by maximizing
(β|V) =
M
m=1
log p(Vm|β). (2)
The gradient of the log-likelihood is given by
d (β|V)
dβ
= Ep[H(V)] −
1
M
N
m=1
H(Vm). (3)
We use the stochastic gradient ascent algorithm where the
expectation Ep[H(V)] is approximated by a ﬁnite sum of
histograms of samples V(s), s = 1, . . . , S.

Learning
Maximum Likelihood Estimation
Figure: An example iteration for updating β corresponding to the
relative orientation histogram bins.

Learning
Sampling Region Processes
(a) (b) t = 0 (c) t = 50
(d) t = 200 (e) t = 600 (f) t = 1000
Figure: Illustration of the Gibbs sampler for two primitive layers.

Inference and Region Selection
Hierarchical Region Extraction
Given a compound structure model with learned parameter
vector β, we would like to automatically detect all of its
instances in an input image.
The detection problem is posed as the selection of multiple
subgroups of candidate regions coming from multiple
hierarchical segmentations.
Figure: Hierarchical segmentation trees for two primitive layers.
Each selected group of regions constitutes an instance of
the example compound structure in the large image.

The ﬁrst step is the identiﬁcation of candidate regions for
each layer Vr
by using a hierarchical segmentation
algorithm.
The next step is to connect the potentially related vertices
at all levels to represent the neighbor relationships.
Within-level edges (⊆ Er1r2 , r1 = r2): Voronoi tessellations.
Between-level edges (⊆ Er1r2 , r1 = r2): Ancestor-descendant
relations.
Between-layer edges (⊆ Er1r2 , r1 = r2): Proximity-based
neighbors.
Figure: Hierarchical segmentation trees for two primitive layers.

Figure: Graph construction for two primitive layers (i.e., building and
pool). The hierarchical candidate regions at three and two levels for
these layers are shown in red and light blue, respectively. The edges
that represent parent-child relationship for both layers are shown.

pool). The edges that represent the within- and between-level
neighbor relationship within the same layer are shown.

pool). The edges that represent the within- and between-level
neighbor relationship between the layers are shown. For better
visualization of edges, only 20 percent of all between-layer edges are
shown.

Inference and Region Selection Inference without Constraints
Bayesian Formulation
Given a graph G = (V, E), the problem can be formulated
as the selection of a subset V∗
among all regions V as
V∗
= arg max
V ⊆V
p(V |I) = arg max
V ⊆V
p(I|V )p(V ) (4)
where p(I|V ) is the observed spectral data likelihood for
the compound structure in the image, and p(V ) acts as the
spatial prior according to the learned appearance and
arrangement model.

CRF Formulation
We formulate the selection problem in (4) using a
conditional random ﬁeld (CRF).
Let X = {x1, . . . , xM} where xi ∈ {0, 1}, i = 1, . . . , M, be the
set of indicator variables associated with the vertices V of G
so that xi = 1 implies region vi being selected.
Our CRF formulation deﬁnes a posterior distribution as
p(X|I, V) ∝ p(I|X, V)p(X, V)
=
1
Zx
vi ∈V
exp ψc
i + ψs
i xi
(vi ,vj )∈E
exp ψa
ij xixj . (5)

CRF Formulation
The vertex bias terms ψc
and ψs
representing color and
shape, respectively, and edge weights ψa
representing
arrangement are deﬁned as
ψc
i =
−1
2
(yi − µr
)T
(Σr
)−1
(yi − µr ), ∀vi ∈ Vr
, r = 1, . . . , R
(6)
ψs
i =
9
k=6
βr
k,Ir
k
φk (vi )
, ∀vi ∈ Vr
, r = 1, . . . , R
(7)
ψa
ij =
5
k=1
βr1r2
k,I
r1r2
k
φk (vi ,vj )
, ∀(vi, vj) ∈ E, r1, r2 = 1, . . .
(8)

CRF Inference
Selecting V∗
in (4) is equivalent to estimating the joint MAP
labels given by
X∗
= arg max
X
p(X|I, V). (9)
Exact inference of the CRF formulation is intractable in
general graphs.
An approximate solution can be obtained by a Markov chain
Monte Carlo sampler.
We developed a sampling algorithm that samples the labels
of many variables at once.

CRF Inference
Figure: Illustration of the primitive sampling procedure.

Inference with Constraints
Our objective is to obtain the maximum probability
estimates of the indicator variables, xi, i = 1, . . . , N
satisfying convex inequality and equality constraints.
We reformulate the problem as quadratic programming
under convex constraints.
The problem in Equation (4) can be rewritten as
V∗
= arg max
V ⊆V
V ⊆Ω
p(V |I) = arg max
V ⊆V
V ⊆Ω
p(I|V )p(V ). (10)
where Ω ∈ RN
is a nonempty polyhedral convex set
determined by a set of constraints.

Quadratic Programming Formulation
For a V ⊆ V, log p(V |β∗
) can be written as follows
log p(V |β) =
5
k=1
R
r1=1
R
r2=1 (vi ,vj )∈Er1r2
βr1r2
k,I
r1r2
k
φk (vi ,vj )
xixj
+
9
k=6
R
r=1 vi ∈Vr
βr
k,Ir
k
φk (vi )
xi − log ZX .
(11)
where ZX is the partition function.

Quadratic Programming Formulation
Let W = 5
k=1
R
r1=1
R
r2=1 Wr1r2
k where each Wr1r2
k is an
N × N afﬁnity matrix.
Each element of this matrix is calculated as
Wr1r2
k (i, j) = −βr1r2
k,I
r1r2
k
φk (vi ,vj )
.
Also, let q = 9
k=6
R
r=1 qr
k where each qr
k is an N × 1
potential vector.
Each element of this vector is calculated as
qr
k (i) = βr
k,Ir
k
φk (vi )
.
The problem can be formulated as
minimize
x
− log p(V |β) =
1
2
XT
WX + qT
X + log ZX
subject to X ∈ Ω,
X ∈ {0, 1}.
(12)

DC Programming Inference
The problem can be formulated as
minimize
x
− log p(V |β) =
1
2
XT
WX + qT
X + log ZX
subject to X ∈ Ω,
X ∈ {0, 1}.
(13)
First, a linear programming relaxation is applied to the 0 − 1
integer program so that 0 ≤ x ≤ 1.
Since W is not assumed positive semideﬁnite, the resulting
linearly constrained quadratic problem is not convex.
The objective function can be reformulated as a difference
of two convex functions.

Difference of Convex Programming
Formulation
A Difference of Convex (DC) problem is deﬁned as
P = min f(X) = g(X) − h(X) : X ∈ RN
(14)
where g : RN
→ R and h : RN
→ R are convex functions.
Consider the dual program
D = min f∗
(Y) = h∗
(Y) − g∗
(Y) : Y ∈ RN
(15)
where g∗
is the conjugate function of g.
An iterative primal-dual algorithm constructs two alternating
sequences {X(t)
} and {Y(t)
} such that
g(X(t)) − h(X(t)) and g∗(Y(t)) − h∗(Y(t)) are decreasing,
converging to the optimal solutions, X∗ and Y∗, to the primal
and dual problems, respectively.

DC Programming Inference
Let W = QΛQT
be the eigenvalue decomposition of W, and
Λ+
= diag(Λ+
1 , . . . , Λ+
N ) (respectively, Λ−
= diag(Λ−
1 , . . . , Λ−
N ))
be the positive semideﬁnite diagonal matrix (respectively,
negative semideﬁnite diagonal matrix) of Λ.
We rewrite the nonconvex symmetric quadratic objective
function as
1
2
XT
WX + qT
X = g(X) − h(X)
g(X) =
1
2
XT
W+
X + qT
X + χΩ(X)
h(X) = −
1
2
XT
W−
X
(16)
where W+
= QΛ+
QT
, W−
= QΛ−
QT
, and χΩ(X) is the
space enclosed by the constraints X ∈ {0, 1} and Ω.

Experiments
We present detailed results of four different kinds of
experiments:
1 Using a single layer without imposing any constraint.
2 Using multiple layers without imposing any constraint.
3 Using a single layer by imposing constraints.
4 Using multiple layers by imposing constraints.

Experiments Unconstrained Single-layer Experiments
Results-Urban Structures
Figure: 2500 × 4000 pixels Ankara data set.

Table: Detection scenarios for the experiments. Example primitives
used for learning the compound structure model for each scenario are
shown in a different color. The number of polygons and buildings in
the validation data are also given.
Scenario 1 2 3 4 5 6
Example
primitives
# polygons 162 98 48 195 60 16
# buildings 1519 870 1117 1796 771 219

Figure: Candidate regions hierarchy.

Figure: Marginal probabilities for the ﬁrst scenario.

Figure: Marginal probabilities for the second scenario.

Figure: Marginal probabilities for the third scenario.

Figure: Marginal probabilities for the fourth scenario.

Figure: Marginal probabilities for the ﬁfth scenario.

*
Figure: Marginal probabilities for the sixth scenario.

Figure: Zoomed detection examples.

Figure: Zoomed detection examples. Each row corresponds to a
particular scenario.

Results-Orchards
Figure: 3000 × 8000 pixels Kusadasi data set.

Results-Orchards
Figure: (Up) Candidate regions. (Down) Selected regions.

Results-Orchards
Figure: Example results for the detection of orchards in the subimage
on the left column. The right column shows the corresponding
marginal probabilities of the selected regions (the copper colormap) as
well as the discarded input candidate regions (white).

Results-Orchards
Figure: Example results for the detection of orchards. The left column
shows the marginal probabilities at the end of selection. The right
column shows the thresholded detections overlayed as red.

Results-Refugee Camps
Figure: 1102 × 971 pixels Darfur data set.

Results-Refugee Camps
Figure: Example results for the detection of refugee camps as rural
structures.

Experiments Unconstrained Multi-layer Experiments
Results-Housing Estates
Figure: 3000 × 8000 pixels Kusadasi data set.

Figure: (Up) Examples of local details of red building rooftops. (Down)
An example hierarchy.

The selection algorithm that used only the building layer
could not detect several housing estates.
The idea was to add a pool layer that can provide additional
cues for ﬁnding the missed buildings.
The initial model was extended by learning the
arrangements of buildings with respect to pools as well.

Figure: (Up) Candidate regions. (Down) Selected regions from the
building and pool layers.H. G. Akc¸ay Compound Structure Detection 21 Sept. 2016 58 / 78

Table: The number of candidate and detected regions for single and
multi-layer selection scenarios.
Single-layer Multi-layer
Candidates Detected Candidates Detected
Building 67,983 11,173 67,983 11,871
Pool - - 16,276 436
Total 67,983 11,173 84,259 12,307

Figure: Samples obtained by the selection procedure ran on single
and multiple layers.

(a) (b)
Figure: The selected regions using (a) only the building layer. (b)
building and pool layers. Newly detected housing estates that was
missed with single layer selection is enclosed by a red convex hull.

Figure: Ground view of a missed housing estate with single layer
selection.

Experiments
Constrained Single-layer Experiments

Experiments
Problem Deﬁnition
Unconstrained selection involved overlapping regions at
different levels of the hierarchy.
To overcome this problem, we require at most one region
should be selected per path where a path corresponds to
the set of vertices from a leaf to the root.
Formally, we select an optimal subset V∗
⊆ V such that
∀a, b ∈ V∗
, a ∈ descendant(b) and b ∈ descendant(a).

Experiments
Constrained Single-layer Experiments
Let A be a |P| × |V| matrix.
P denotes all the paths from the leaves to the roots of the
input hierarchical forest.
A(i, j) = 1 implies vi ∈ pj ∈ P.
The problem can be reformulated as
minimize
x
1
2
XT
WX + qT
X + log ZX
subject to AX ≤ 1,
0 ≤ X ≤ 1.
(17)
The resulting problem is solved by the DC inference
algorithm.

Experiments
Table: The number of selected regions for unconstrained and
constrained selection scenarios.
# cand.s 70,644 70,644 70,644 70,644 70,644 22,195
Uncnstr. 3191 1828 3819 3201 2027 1612
Cnstr. 1485 856 2562 1740 811 263

Experiments
(a) (b)
Figure: Zoomed detection examples. (a) shows the RGB image for a
300 × 300 sub-scene. (b) shows the hierarchy of candidate regions
(two-level hierarchy from bottom to top).

Experiments
(a)
(b)
Figure: Zoomed detection examples. (a) shows the RGB image for a
300 × 300 sub-scene. (b) shows the hierarchy of candidate regions
(six-level hierarchy from bottom to top).

Experiments
Constrained Multi-layer Experiments
The last set of experiments uses two primitive layers and
enforces geometrical constraints between them.
We search for nearby alike buildings and green areas where
each building group must have a green area in the middle.
We strictly require that the distance between the centroid of
the centroids of a selected group of similar buildings and
the centroid of a selected large green area cannot exceed a
distance threshold δ .

Experiments
Constrained Multi-layer Experiments
Let V1
and V2
represent the building and green area layers.
The desired set of regions can be selected by
minimize
x
1
2
XT
WX + qT
X + log ZX
subject to AX ≤ 1,
1
k1
vi ∈V1
sh
i xi −
1
k2
vj ∈V2
sh
j xj ≤ δ
1
k1
vi ∈V1
sw
i xi −
1
k2
vj ∈V2
sw
j xj ≤ δ
vi ∈V1
xi = k1
vj ∈V2
xj = k2
0 ≤ X ≤ 1.
(18)
where (sh
i , sw
i ) is the centroid of region vi, k1 and k2 denote
the number of buildings and green areas to be selected.

Experiments Constrained Multi-layer Experiments
Results-Buildings & Green Areas
Figure: 2500 × 4000 pixels Ankara data set.

Figure: Selected regions for the green areas surrounded by buildings.

(a) RGB (b) Building candidates
(c) Green candi-
dates
(d) Selection (e) Overlay
Figure: A zoomed detection example for k1 = 4, k2 = 1.

(a) RGB (b) Building candidates
(c) Green candi-
dates
(d) Selection (e) Overlay
Figure: A zoomed detection example for k1 = 6, k2 = 1.

(a) k1 = 4, fVal = −2.95 (b) k1 = 8, fVal = −3.23
(c) k1 = 6, fVal = −3.16 (d) k1 = 4, fVal = −2.97
Figure: Zoomed detection examples for different values of k1 = 4, 6, 8.

Experiments
Summary & Conclusions
We described a generic method for the modeling and
detection of compound structures that consisted of
arrangements of mostly unknown number of primitives.
The modeling process built an MRF-based contextual
model for the compound structure of interest.
The detection task involved a combinatorial selection
problem where multiple subsets of candidate regions from
multiple hierarchical segmentations were selected.
We also handled hard constraints on the candidate regions.

Experiments
Summary & Conclusions
Experiments using urban, industrial, agricultural and rural
structures showed that the proposed method can provide
good localization of instances of compound structures.
The multi-layered experiments showed that selection of
some objects required the selection of objects in other
layers that had spatial relation with them.
One of the most important bottlenecks in terms of accuracy
was the errors in the input hierarchical segmentations.
Future work includes
Using the detection results for adjusting wrong segmentation
results.
Inferring the primitive objects inside a compound structure.

Compound Structure Detection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Compound Structure Detection

Similar to Compound Structure Detection (20)

Recently uploaded

Recently uploaded (20)

Compound Structure Detection