Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

Receding Horizon Stochastic Control Algorithms for Sensor Management
Darin Hitchings and David A. Castañón
Abstract— The increasing use of smart sensors that can
dynamically adapt their observations has created a need for
algorithms to control the information acquisition process. While
such problems can usually be formulated as stochastic control
problems, the resulting optimization problems are complex and
difficult to solve in real-time applications. In this paper, we
consider sensor management problems for sensors that are
trying to find and classify objects. We propose alternative
approaches for sensor management based on receding horizon
control using a stochastic control approximation to the sensor
management problem. This approximation can be solved using
combinations of linear programming and stochastic control
techniques for partially observed Markov decision problems
in a hierarchical manner. We explore the performance of our
proposed receding horizon algorithms in simulations using
heterogeneous sensors, and show that their performance is
close to that of a theoretical lower bound. Our results also
suggest that a modest horizon is sufficient to achieve near-
optimal performance.
I. INTRODUCTION
Recent advances in embedded computing have introduced
a new generation of sensors that have the capability of
adapting their sensing dynamically in response to collected
information. For instance, unmanned aerial vehicles (UAVs)
have multiple sensors —radar and electro-optical cameras
—which can dynamically change their fields of view and
measurement modes. These advances have created a need
for a commensurate theory of sensor management (SM)
and control to ensure that relevant information is collected
for the mission of the sensor system given the available
sensor resources. There are numerous applications involving
surveillance, diagnosis and fault identification that require
such control.
One of the earliest examples of SM arose in the context
of Search, with applications to anti-submarine warfare [1].
Sensors had the ability to move spatially and allocate their
search effort over time and space. Most of the early work
on search theory focused on open-loop search plans rather
than feedback control of search trajectories [2]. Extensions
of search theory to problems requiring adaptive feedback
strategies have been developed in some restricted contexts
[3].
Adaptive SM has its roots in the field of statistics, where
Bayesian experiment design was used to configure subse-
quent experiments based on observed information. Wald [4],
[5] considered sequential hypothesis testing with costly ob-
servations. Lindley [6] and Kiefer [7] expanded the concepts
to include variations in potential measurements. Chernoff [8]
This work was supported by a grant from AFOSR.
The authors are with the Dept of Electrical & Computer Eng., Boston
University, dhitchin@bu.edu, dac@bu.edu
and Fedorov [9] used Cramer-Rao bounds for selecting se-
quences of measurements for nonlinear regression problems.
Most of the strategies proposed for Bayesian experiment
design involve single-step optimization criteria, resulting
in greedy or myopic strategies that optimize bounds on
the expected performance after the next experiment. Other
approaches to adaptive SM using single-stage optimization
have been proposed using alternative information theoretic
measures [10], [11].
Feedback control approaches to SM that consider opti-
mization over time have also been explored. Athans [12]
considered a two-point boundary value approach to control-
ling the error covariance in linear estimators by choosing
the measurement matrices. Multi-armed bandit formulations
have been used to control individual sensors in applications
related to target tracking [13], [14]. Such approaches are
restricted to single-sensor control, selecting among individual
subproblems to measure, in order to obtain solutions using
Gittins indices [15], [16]. Approximate dynamic program-
ming (DP) techniques have also been proposed using ap-
proximations to the optimal cost-to-go based on information
theoretic measures evaluated using Monte Carlo techniques
[17], [18]. A good overview of these techniques is available
in [19].
The above approaches for dynamic feedback control are
limited in application to problems with a small number
of sensor-action choices and simple constraints because the
algorithms must enumerate and evaluate the various control
actions. In [20], combinatorial optimization techniques are
integrated into a DP formulation to obtain approximate
stochastic dynamic programming (SDP) algorithms that ex-
tend to large numbers of sensor actions. Subsequent work
in [21] derived an SDP formulation using partially ob-
served Markov decision processes (POMDPs) and obtained
a computable lower bound to the achievable performance of
feedback strategies for complex multi-sensor management
problems. The lower bound was obtained by a convex
relaxation of the original combinatorial POMDP using mixed
strategies and averaged constraints. However, the results in
[21] do not specify algorithms with performance close to the
lower bound.
In this paper, we develop and implement algorithms for the
efficient computation of adaptive SM strategies for complex
problems involving multiple sensors with different observa-
tion modes and large numbers of objects. The algorithms
are based on using the lower bound formulation from [21]
as an objective in a receding horizon (RH) optimization
problem and developing techniques for obtaining feasible
decisions from the mixed strategy solutions. The resulting

algorithms are scalable to large numbers of tasks, and
suitable for real-time SM. We also extend the model of [21]
to incorporate search actions in addition to classification.
We evaluate alternative approaches for obtaining feasible
decision strategies, and evaluate the resulting performance of
the RH algorithms using multi-sensor simulations. Our sim-
ulation results demonstrate that our RH algorithms achieve
performance comparable to the predicted lower bound of [21]
and shed insight into the relative value of different strategies
for partitioning sensor resources either geographically or by
sensor specialization.
The rest of this paper is organized as follows: Section II
describes the formulation of the stochastic SM problem.
Section III provides an example of the column generation
technique for generating mixed strategies for SM. Section IV
discusses how we create feasible, sequenced sensor schedules
from these mixed strategies. Section V documents our sim-
ulation results for various scenarios. Section VI summarizes
our results and discusses areas for future work.
II. PROBLEM FORMULATION AND BACKGROUND
The problem formulation is an extension of the POMDP
formulation presented in [21]. Assume that there are a finite
number of locations 1, . . . , N, each of which may have an
object with a given type, or which may be empty. Assume
that there is a set of S sensors, each of which has multiple
sensor modes, and that each sensor can observe one and only
one location at each time with a selected mode.
Let xi ∈ {0, 1, . . . , D} denote the state of location i,
where xi = 0 if location i is unoccupied, and otherwise
xi = k > 0 indicates location i has an object of type
k. Let πi(0) ∈ D+1
be a discrete probability distribution
over the possible states for the ith
location for i = 1, . . . , N
where D ≥ 2. Assume additionally that the random variables
xi, i = 1, . . . , N are mutually independent.
There are s = 1, . . . , S sensors, each of which has m =
1, . . . , Ms possible modes of observation. We assume there
is a series of T discrete decision stages where sensors can
select which location to measure, where T is large enough
so that all of the sensors can use their available resources.
At each stage, each sensor can choose to employ one and
only of its modes on a single location to collect a noisy
measurement concerning the state xi at that location. Each
sensor s has a limited set of locations that it can observe,
denoted by Os ⊆ {1, . . . , N}. A sensor action by sensor s
at stage t is a pair:
us(t) = (is(t), ms(t)) (1)
consisting of a location to observe, is ∈ Os, and a mode for
that observation, ms.
Sensor measurements are modeled as belonging to a finite
set y ∈ {1, . . . , Ls}. The likelihood of the measured value
is assumed to depend on the sensor s, sensor mode m,
location i and on the true state at the location xi but not
on the states of other locations. Denote this likelihood as
P(y|xi, i, s, m). We assume that this likelihood is time-
invariant, and that the random measurements yi,s,m(t) are
conditionally independent of other measurements yj,σ,n(τ)
given the location states xi, xj for all sensor modes m, n
provided i = j or τ = t.
Each sensor has a limited quantity of Ri resources avail-
able for measurements. Associated with the use of mode m
by sensor s on location i is a resource cost rs(us(t)) to use
this mode, representing power or some other type of resource
required to use this mode from this sensor.
T −1
t=0
rs(us(t)) ≤ Rs ∀ s ∈ [1, . . . , S] (2)
This is a hard constraint for each realization of observations
and decisions.
Let I(t) denote the sequence of past sensing actions and
measurement outcomes up to and including stage t − 1:
I(t) = {(us(k), ys(k)), s = 1, . . . , S; k = 0, . . . , t − 1}
Under the assumption of conditional independence of mea-
surements and independence of individual states at each lo-
cation, the conditional probability of (x1, . . . , xN ) given I(t)
can be factored as a product of belief states at each location.
Denote the belief state at location i as πi(t) = p(xi|I(t)).
When a sensor measurement is taken, the belief state is up-
dated according to Bayes’ Rule. A measurement of location i
with the sensor-mode combination us(t) = (i, m) at stage t
that generates observable y(t) updates the belief vector as:
πi(t + 1) =
diag{P(y(t)|xi = j, i, s, m)}πi(t)
1T
diag{P(y(t)|xi = j, i, s, m)}πi(t)
(3)
where 1 is the D + 1 dimensional vector of all ones. Eq. (3)
captures the relevant information dynamics that SM controls.
In addition to information dynamics, there are resource
dynamics that characterize the available resources at stage t.
The dynamics for sensor s are given as:
Rs(t + 1) = Rs(t) − rs(us(t)); Rs(0) = Rs (4)
These dynamics constrain the admissible decisions by a
sensor, in that it can only use modes that do not use more
resources than are available.
Given the final information I(T), the quality of the
information collected is measured by making an estimate of
the state of each location i given the available information.
Denote these estimates as vi i = 1, . . . , N. The Bayes’ cost
of selecting estimate vi when the true state is xi is denoted
as c(xi, vi) ∈ with c(xi, vi) ≥ 0. The objective of the SM,
stochastic control formulation is to minimize:
J =
N
i=1
E[c(xi, vi)] (5)
by selecting adaptive sensor control policies and final esti-
mates subject to the dynamics of Eq. (3) and the constraints
of Eq. (4) and Eq. (2).
The results in [21] provide an SDP algorithm to solve
the above problem, with cost-to-go at stage t depending
on the joint belief state π(t) = [π1(t), . . . , πN (t)] and the
residual resource state R(t) = [R1(t), . . . , RS(t)]. Because

of this dependency, the cost-to-go does not decouple over
locations. This leads to a very large POMDP problem with
combinatorially many actions and an underlying belief state
of dimension (D + 1)N
that is computationally intractable
unless there are very few locations.
In [21], the above stochastic control problem was replaced
with a simpler problem that provided a lower bound on the
optimal cost, by expanding the set of admissible strategies,
replacing the constraints of Eq. (2) by the “soft” constraints:
E[
T −1
t=0
rs(us(t))] ≤ Rs ∀ s ∈ [1 . . . S] (6)
To solve the simpler problem, [21] proposed incorporation of
the soft constraints in Eq. (6) into the objective function using
Lagrange multipliers λs for each sensor s. The augmented
objective function is:
¯Jλ = J +
T −1
t=0
S
s=1
λs E[rs(us(t))] −
S
s=1
λsRs (7)
A key result in [21] was that when the optimization of Eq. (7)
was done over mixed strategies for given values of Lagrange
multipliers λs, the stochastic control problem decoupled into
independent POMDPs for each location, and the optimiza-
tion could be performed using feedback strategies for each
location i that depended only on the information collected
for that location, Ii(t). These POMDPs have an underlying
information state-space of dimension D + 1, corresponding
to the number of possible states at a single location, and
can be solved efficiently. Because the measurements and
possible sensor actions are finite-valued, the set of possible
SM strategies Γ is also finite. Let Q(Γ) denote the set of
mixed strategies that assign probability q(γ) to the choice of
strategy γ ∈ Γ. The problem of finding the optimal mixed
strategies can be written as:
min
q∈Q(Γ)
γ∈Γ
q(γ) E
γ
J(γ) (8)
γ∈Γ
q(γ) E
γ
[
N
i=1
T −1
t=0
rs(us(t))] ≤ Rs s ∈ [1, . . . , S] (9)
γ∈Γ
q(γ) = 1 (10)
where we have one constraint for each of the S sensor re-
source pools and an additional simplex constraint in Eq. (10)
which ensures that q ∈ Q(Γ) forms a valid probability
distribution. This is a large linear program (LP), where the
number of possible variables are the strategies in Γ. However,
the total number of constraints is S+1, which establishes that
optimal solutions of this LP are mixtures of no more than
S + 1 strategies. Thus, one can use a column generation
approach [22], [23], [24] to quickly identify an optimal
mixed strategy. In this approach, one solves Eq. (8) and
Eq. (9) restricting the mixed strategies to be mixtures of
a small subset Γ ⊂ Γ. The solution of the restricted LP
has optimal dual prices λs, s = 1, . . . , S. Using these prices,
one can determine a corresponding optimal pure strategy by
minimizing:
Jλ =
N
i=1
E[c(xi, vi)] +
T −1
t=0
S
s=1
λs E[rs(us(t))] −
S
s=1
λsRs
(11)
which the results in [21] show can be decoupled into
N independent optimization problems, one for each location.
Each of these problems is solved as a POMDP using standard
algorithms such as point-based value iteration (PBVI) [25]
to determine the best pure strategy γ1 for these prices. If the
best pure strategy γ1 is already in the set Γ , then the solution
of Eq. (8) and Eq. (9) restricted to Q(Γ ) is an optimal mixed
strategy over all of Q(Γ). Otherwise, the strategy γ1 is added
to the admissible set Γ , and the iteration is repeated. The
result is a set of mixed strategies that achieve a performance
level that is a lower bound on the original SM optimization
problem with hard constraints.
III. COLUMN GENERATION AND POMDP SUBPROBLEM
EXAMPLE
We present an example to illustrate the column generation
algorithm and POMDP algorithms discussed previously. In
this simple example we consider 100 objects (N=100), 2 pos-
sible object types (D=2) with X = {non-military vehicle,
military vehicle}, and 2 sensors that each have one mode
(S = 2 and Ms = 1 ∀ s ∈ {1, 2}). Sensor s actions have
resource costs: rs, where r1 = 1, r2 = 2. Sensors return
2 possible observation values, corresponding to binary object
classifications, with likelihoods:
P(yi,1,1(t)|xi, u1(t)) P(yi,2,1(t)|xi, u2(t))
0.90 0.10
0.10 0.90
0.92 0.08
0.08 0.92
where the (j, k) matrix entry denotes the likelihood that y =
j if xi = k. The second sensor has 2% better performance
than the first sensor but requires twice as many resources to
use. Each sensor has Rs = 100 units of resources, and can
view each location. Each of the 100 locations has a uniform
prior of πi = [0.5 0.5]T
∀ i. For the performance objective,
we use c(xi, vi) = 1 if xi = vi, and 0 otherwise, where the
cost is 1 unit for a classification error.
Table I demonstrates the column generation solution pro-
cess. The first three columns are initialized by guessing val-
ues of resource prices and obtaining the POMDP solutions,
yielding expected costs and expected resource use for each
sensor at those resource prices. A small LP is solved to obtain
the optimal mixture of the first three strategies γ1, . . . , γ3,
and a corresponding set of dual prices. These dual prices are
used in the POMDP solver to generate the fourth column
γ4, which yields a strategy that is different from that of the
first 3 columns. The LP is re-solved for mixtures of the first
4 strategies, yielding new resource prices that are used to
generate the next column. This process continues until the
solution using the prices after 7 columns yields a strategy that
was already represented in a previous column, terminating

γ1 γ2 γ3 γ4 γ5 γ6 γ7
min 50.0 2.80 2.44 1.818 8 10 6.22
R1 0 218 200 0 0 100 150 ≤ 100
R2 0 0 36 800 200 0 18 ≤ 100
Simplex 1 1 1 1 1 1 1 = 1
Optimal
cost - - 26.22 21.28 7.35 5.95 5.95
Mixture
weights 0 0.424 0 0 0.500 0.076 0
λc
1 1.0e15 0.024 0.010 0.238 0.227 0.217 0.061
λc
2 1.0e15 0.025 0.015 0 0.060 0.210 0.041
TABLE I: Column generation example with 100 objects. The
tableau is displayed in its final form after convergence. λc
s describe
the lambda trajectories up until convergence. R1 and R2 are
resource constraints. γ1 is a ‘do-nothing’ strategy. Bold numbers
represent useful solution data.
Fig. 1: The 3 policy graphs that correspond to columns 2, 5 and
6 of Table I. The frequency of choosing each of these 3 strategies
is controlled by the relative proportion of the mixture weight qc ∈
(0..1) with c ∈ {2, 5, 6}.
the algorithm. The optimal mixture combines the strategies
of the second, fifth and sixth columns. When the master prob-
lem converges, the optimal cost, J∗
, for the mixed strategy
is 5.95 units. The resulting policy graphs are illustrated in
Fig. 1, where branches up indicate measurements y = 1
(‘non-military’) and down y = 2 (‘military’). The red and
green nodes denote the final decision, vi, for a location.
Note that the strategy of column 5 uses only the second
sensor, whereas the strategies of columns 2 and 6 use only
the first sensor. The mixed strategy allows the soft resource
constraints to be satisfied with equality. Table I also shows
the resource costs and expected classification performance
of each column.
The example illustrates some of the issues associated with
the use of soft constraints in the optimization: the resulting
solution does not lead to SM strategies that will always
satisfy the hard constraints Eq. (2). We address this issue
in the subsequent section.
IV. RECEDING HORIZON CONTROL
The column generation algorithm described previously
solves the approximate SM problem with “soft” constraints
in terms of mixed strategies that, on average, satisfy the
resource constraints. However, for control purposes, one
must select actual SM actions that satisfy the hard constraints
Eq. (2). Another issue is that the solutions of the decoupled
POMDPs provide individual sensor schedules for each lo-
cation that must be interleaved into a single coherent sensor
schedule. Furthermore, exact solution of the small decoupled
POMDPs for each set of prices can be time consuming,
making the resulting algorithm unsuited for real-time SM.
To address this, we will explore a set of RH algorithms
that will convert the mixed strategy solutions discussed in the
previous section to actions that satisfy the hard constraints,
and limit the computational complexity of the resulting
algorithm. The RH algorithms have adjustable parameters
whose effects we will explore in simulation.
The RH algorithms start at stage t with an information
state/resource state pair, consisting of available information
about each location i = 1, . . . , N represented by the condi-
tional probability vector πi(t) and available sensor resources
Rs(t), s = 1, . . . , S. The first step in the algorithms is
to solve the SM problem of Eq. (5) starting at stage t to
final stage T subject to soft constraints Eq. (6), using the
hierarchical column generation / POMDP algorithms to get
a set of mixed strategies. We introduce a parameter corre-
sponding to the maximum number of sensing actions per
location to control the resulting computational complexity
of the POMDP algorithms.
The second step is to select sensing actions to implement
at the current stage t from the mixed strategies. These
strategies are mixtures of at most S + 1 pure strategies,
with associated probabilistic weights. We explore three ap-
proaches for selecting sensing actions:
• str1: Select the pure strategy with maximum probability.
• str2: Randomly select a pure strategy per location
according to the optimal mixture probabilities.
• str3: Select the pure strategy with positive probability
that minimizes the expected sensor resource use (and
thus leaves resources for use in future stages.)
Once pure strategies for each location have been selected,
the third step is to select a sensing action to be implemented
for each location. Our approach is to select the first sensing
action of the pure strategy for each location. Note that
there may not be enough sensor resources to execute the
selected actions, particularly in the case where the pure
strategy with maximum probability is selected. To address
this, we rank sensing actions by their expected entropy
gain [26], which is the expected reduction in entropy of
the conditional probability distribution, πi(t), based on the
anticipated measurement value. We schedule sensor actions
in order of decreasing expected entropy gain, and perform
those actions at stage t that have enough sensor resources to
be feasible.
The measurements collected from the scheduled actions
are used to update the information states πi(t + 1) using
Eq. (3). The resources used by the actions are eliminated
from the available resources to compute Rs(t + 1) using
Eq. (4). The RH algorithm is then executed from the new
information state/resource state condition.

Search Low-res Hi-res
y1 y2 y3 y1 y2 y3 y1 y2 y3
empty 0.92 0.04 0.04 0.95 0.03 0.02 0.95 0.03 0.02
car 0.08 0.46 0.46 0.05 0.85 0.10 0.02 0.95 0.03
truck 0.08 0.46 0.46 0.05 0.10 0.85 0.02 0.90 0.08
military 0.08 0.46 0.46 0.05 0.10 0.85 0.02 0.03 0.95
TABLE II: Observation likelihoods for different sensor
modes with the observation symbols y1, y2 and y3.
Fig. 2: Illustration of scenario with two partially-overlapping
sensors.
V. SIMULATION RESULTS
In order to evaluate the relative performance of the differ-
ent RH algorithms, we performed a set of experiments with
simulations. In these experiments, there were 100 locations,
each of which could be empty, or have objects of three types,
so the possible states of location i were xi ∈ {0, 1, 2, 3}
where type 1 represents cars, type 2 trucks, and type 3
military vehicles. Sensors can have several modes: a search
mode, a low resolution mode and a high resolution mode.
The search mode primarily detects the presence of objects;
the low resolution mode can identify cars, but confuses
the other two types, whereas the high resolution mode can
separate the three types. Observations are modeled as having
three possible values. The search mode consumes 0.25 units
of resources, whereas the low-resolution mode consumes
1 unit and the high resolution mode 5 units, uniformly
for each sensor and location. Table II shows the likelihood
functions that were used in the simulations.
Initially, each location has a state with one of two prior
probability distributions: πi(0) = [0.10 0.60 0.20 0.10]T
,
i ∈ [1, . . . , 10] or πi(0) = [0.80 0.12 0.06 0.02]T
, i ∈
[11, . . . , 100]. Thus, the first 10 locations are likely to contain
objects, whereas the other 90 locations are likely to be empty.
When multiple sensors are present, they may share some
locations in common, and have locations that can only be
seen by a specific sensor, as illustrated in Fig. 2.
The cost function used in the experiments, c(xi, vi) is
shown in Table III. The parameter MD represents the cost
of a missed detection, and will be varied in the experiments.
Table IV shows simulation results for a search and clas-
sify scenario involving 2 identical sensors (with the same
xi; vi empty car truck military
empty 0 1 1 1
car 1 0 0 1
truck 1 0 0 1
military MD MD MD 0
TABLE III: Decision costs
MD = 1 MD = 5 MD = 10
Hor. 3 str1 str2 str3 str1 str2 str3 str1 str2 str3
Res 30 3.64 3.85 3.85 11.82 12.88 12.23 15.28 14.57 14.50
Res 50 2.40 2.80 2.43 6.97 6.93 7.84 10.98 9.99 10.45
Res 70 2.45 2.32 1.88 3.44 3.99 4.04 6.14 6.48 5.10
Hor. 4
Res 30 3.58 3.46 3.52 12.28 12.62 11.90 14.48 15.91 15.59
Res 50 2.37 2.21 2.33 7.44 7.44 7.20 9.94 9.28 10.65
Res 70 1.68 1.33 1.60 3.59 3.57 3.62 6.30 5.18 5.86
Hor. 6
Res 30 3.51 3.44 3.73 11.17 11.85 12.09 15.17 14.99 13.6
Res 50 2.28 2.11 2.31 7.29 8.02 7.70 10.67 10.47 11.25
Res 70 1.43 1.38 1.44 3.60 3.73 3.84 4.91 5.09 5.94
Bounds
Res 30 3.35 11.50 13.85
Res 50 2.21 6.27 9.40
Res 70 1.32 2.95 4.96
TABLE IV: Simulation results for 2 homogeneous, multi-modal
sensors. str1: select the most likely pure strategy for all locations;
str2: randomize the choice of strategy per location according to
mixture probabilities; str3: select the strategy that yields the least
expected use of resources for all locations.
visibility), evaluating different versions of the RH control
algorithms and with different resource levels. The variable
“Horizon” is the total number of sensor actions allowed per
location plus one additional action for estimating the location
content. The table shows results for different resource levels
per sensor, from 30 to 70 units, and displays the lower bound
performance computed. The missed detection cost MD is
varied from 1 to 10. The results shown in Table IV represent
the average of 100 Monte Carlo simulation runs of the RH
algorithms.
The results show that using a longer horizon in the
planning improves performance minimally, so that using
a RH replanning approach with a short horizon can be
used to reduce computation time with limited performance
degradation. The results also show that the different RH
algorithms have performance close to the optimal lower
bound in most cases, with the exception being the case of
MD = 5 with 70 units of sensing resources per sensor. For
a horizon 6 plan, the longest horizon studied, the simulation
performance is close to that of the associated bound. In
terms of which strategy is preferable for converting the
mixed strategies, the results of Table IV are unclear. For
short planning horizons in the RH algorithms, the preferred
strategy appears to be to use the least resources (str3), thus
allowing for improvement from replanning. For the longer
horizons, there was no significant difference in performance
among the three strategies. To illustrate the computational
requirements of this scenario (4 states, 3 observations, 2 sen-
sors (6 actions), full sensor-overlap), the number of columns
generated by the column generation algorithm to compute a
set of mixed strategies was on the order of 10-20 columns
for the horizon 6 algorithms, which takes about 60 sec on
a 2.2 Ghz, single-core, Intel P4 machine under linux using
C code in ‘Debug’ mode (with 1000 belief-points for PBVI).
Memory usage without optimizations is around 3 MB.
There are typically 4-5 planning sessions in a simulation.
Profiling indicates that roughly 80% of the computing time

Homogeneous Heterogeneous
MD MD
Horizon 3 1 5 10 1 5 10
Res[150, 150] 5.69 16.93 30.38 6.34 18.15 31.23
Res[250, 250] 4.61 16.11 25.92 5.53 16.77 29.32
Res[350, 350] 4.23 15.30 21.45 5.12 16.41 27.41
Horizon 4
Res[150, 150] 5.02 16.06 20.61 5.64 16.85 20.61
Res[250, 250] 3.94 9.46 12.66 4.58 12.05 14.87
Res[350, 350] 3.35 8.58 12.47 4.28 9.41 12.65
Horizon 6
Res[150, 150] 4.62 15.66 19.56 5.27 16.20 19.56
Res[250, 250] 2.92 8.24 10.91 3.32 8.83 11.35
Res[350, 350] 2.18 4.86 7.15 2.66 6.63 9.17
TABLE V: Comparison of lower-bounds for 2 homogeneous, bi-
modal sensors vs. 2 heterogeneous sensors.
goes towards value backups in the PBVI routine and 15%
goes towards (recursively) tracing decision-trees in order to
back out (deduce) the measurement costs from hyperplane
costs. (Every node in a decision-tree / policy-graph (for
each pure strategy) has a corresponding hyperplane with
a vector of cost coefficients that represent classification +
measurement costs).
In the next set of experiments, we compare the use of
heterogeneous sensors that have different modes available. In
these experiments, the 100 locations are guaranteed to have
an object, so xi = 0 is not feasible. The prior probability
of object type for each location is πi(0) = [0 0.7 0.2 0.1]T
.
Table V shows the results of experiments with sensors that
have all sensing modes, versus an experiment where one
sensor has only a low-resolution mode and the other sensor
has both high and low-resolution modes. The table shows the
lower bounds predicted by the column generation algorithm,
to illustrate the change in performance expected from the
different architectural choices of sensors. The results indicate
that specialization of one sensor can lead to significant
degradation in performance due to inefficient use of its
resources.
The next set of results explore the effect of spatial distribu-
tion of sensors. We consider experiments where there are two
homogeneous sensors which have only partially-overlapping
coverage zones. (We define a ‘visibility group’ as a set
of sensors that have a common coverage zone). Table VI
gives bounds for different percentages of overlap. Note
that, even when there is only 20% overlap, the achievable
performance is similar to that of the 100% overlap case in
Table V, indicating that proper choice of strategies can lead
to efficient sharing of resources from different sensors and
equalizing their workload. The last set of results show the
performance of the RH algorithms for three homogeneous
sensors with partial overlap and different resource levels.
The visibility groups are graphically portrayed in Fig. 3.
Table VII presents the simulated cost values averaged over
100 simulations of the different RH algorithms and the lower
bounds. The results support our previous conclusions: when
a short horizon is used in the RH algorithm, and there are
sufficient resources, the strategy that uses the least resources
is preferred as it allows for replanning when new information
Overlap 60% Overlap 20%
MD MD
Horizon 3 1 5 10 1 5 10
Res[150, 150] 5.69 16.93 30.38 5.69 16.93 30.38
Res[150, 150] 4.61 16.11 25.98 4.61 16.11 25.92
Res[150, 150] 4.23 15.30 21.45 4.23 15.30 21.45
Horizon 4
Res[150, 150] 5.02 16.06 20.61 5.02 15.93 20.61
Res[150, 150] 3.94 9.46 12.66 3.94 9.46 12.66
Res[150, 150] 3.35 8.58 12.47 3.35 8.58 12.47
Horizon 6
Res[150, 150] 4.62 15.66 19.56 4.62 15.66 19.56
Res[150, 150] 2.92 8.25 10.91 2.94 8.24 10.91
Res[150, 150] 2.18 4.86 7.19 2.18 4.86 7.16
TABLE VI: Comparison of performance bounds with 2 homo-
geneous sensors with partial overlap in coverage. Only the bold
numbers are different.
Fig. 3: The 7 visibility groups for the 3 sensor experiment
indicating the number of locations in each group.
is available. If the RH algorithm uses a longer horizon, then
its performance approaches the theoretical lower bound, and
the difference in performance between the three approaches
for sampling the mixed strategy to obtain a pure strategy
is statistically insignificant. Our results suggest that RH
control with modest horizons of 2 or 3 sensor actions per
location can yield performance close to the best achievable
performance using mixed strategies. If shorter horizons are
used to reduce computation, then an approach that samples
mixed strategies by using the smallest amount of resources
is preferred. The results also show that, with proper SM,
geographically distributed sensors with limited visibility can
be coordinated to achieve equivalent performance to centrally
MD = 1 MD = 5 MD = 10
Horizon 3 str1 str2 str3 str1 str2 str3 str1 str2 str3
Res 100 5.26 6.08 5.57 17.23 17.44 16.79 22.02 21.93 22.16
Res 166 5.91 4.81 3.13 10.23 11.91 9.21 14.19 16.66 12.85
Res 233 3.30 3.75 3.43 10.15 9.32 5.88 14.49 12.55 8.21
Horizon 4
Res 100 5.32 5.58 5.93 17.26 16.88 16.17 21.92 20.94 21.35
Res 166 3.42 4.07 3.24 8.63 8.00 9.04 12.05 11.71 14.08
Res 233 3.65 3.07 3.29 5.27 7.14 5.38 8.25 10.08 7.90
Horizon 6
Res 100 5.79 5.51 5.98 17.13 17.90 17.44 22.03 20.56 22.17
Res 166 2.96 2.68 2.72 10.22 8.33 9.08 9.82 11.47 11.57
Res 233 1.52 2.00 1.70 4.81 4.13 4.24 5.64 7.20 5.11
Bounds
Res 100 4.62 15.66 19.56
Res 166 2.92 8.22 10.89
Res 233 2.18 4.87 7.18
TABLE VII: Simulation results for 3 homogeneous sensors with
partial overlap as shown in Fig. 3.

pooled resources.
In terms of the computational complexity of our RH al-
gorithms, the main bottleneck is the solution of the POMDP
problems. The LPs solved in the column generation approach
are small and are solved in minimal time. Solving the
POMDPs required to generate each column (one POMDP
for each visibility group in cases with partial sensor overlap)
is tractable by virtue of the hierarchical breakdown of the
SM problem into independent subproblems. It is also very
possible to accelerate these computations using multi-core
CPU or (NVIDIA) GPU processors, as the POMDPs are
highly parallelizable.
VI. CONCLUSIONS
In this paper, we introduce RH algorithms for near-
optimal, closed-loop SM with multi-modal, resource-
constrained, heterogeneous sensors. These RH algorithms
exploit a lower bound formulation developed in earlier work
that decomposes the SM optimization into a master problem,
which is addressed with linear-programming techniques, and
single location stochastic control problems that are solved
using POMDP algorithms. The resulting algorithm generates
mixed strategies for sensor plans, and the RH algorithms
convert these mixed strategies into sensor actions that satisfy
sensor resource constraints.
Our simulation results show that the RH algorithms
achieve performance close to that of the theoretical lower
bounds in [21]. The results also highlight the different
benefits of choosing a longer horizon for RH strategies and
alternative approaches at sampling the mixed strategy solu-
tions. Our simulations also show the effects of geographically
distributing sensors so that there is limited overlap in field
of view and the effects of specializing sensors by using a
restricted number of modes.
There are many interesting directions for extensions to
this work. First, one could consider the presence of object
dynamics, where objects can arrive at or depart from specific
locations. Second, one can also consider options for sensor
motion, where individual sensors can change locations and
thus observe new areas. Third, one could consider a set of
objects that have deterministic, but time-varying, visibility
profiles. Finally, one could consider approaches that reduce
the computational complexity of the resulting algorithms, ei-
ther through exploitation of parallel computing architectures
or through the use of off-line learning or other approximation
techniques.
REFERENCES
[1] B. Koopman, Search and Screening: General Principles with Histor-
ical Applications. Pergamon, New York NY, 1980.
[2] S. J. Benkoski, M. G. Monticino, and J. R. Weisinger, “A survey of
the search theory literature,” Naval Research Logistics, vol. 38, no. 4,
pp. 469–494, 1991.
[3] D. A. Castañón, “Optimal search strategies in dynamic hypothesis test-
ing,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 25,
no. 7, pp. 1130–1138, Jul 1995.
[4] A. Wald, “On the efficient design of statistical investigations,” The
Annals of Mathematical Statistics, vol. 14, pp. 134–140, 1943.
[5] ——, “Sequential tests of statistical hypotheses,” The Annals of
Mathematical Statistics, vol. 16, no. 2, pp. 117–186, 1945. [Online].
Available: http://www.jstor.org/stable/2235829
[6] D. V. Lindley, “On a measure of the information provided by an
experiment,” Annals of Mathematical Statistics, vol. 27, pp. 986–1005,
1956.
[7] J. C. Kiefer, “Optimum experimental designs,” Journal of the Royal
Statistical Society Series B, vol. 21, pp. 272–319, 1959.
[8] H. Chernovv, Sequential Analysis and Optimal Design. SIAM,
Philadelphia, PA, 1972.
[9] V. V. Fedorov, Theory of Optimal Experiments. Academic Press, New
York, 1972.
[10] K. Kastella, “Discrimination gain to optimize detection and classifica-
tion,” Systems, Man and Cybernetics, Part A, IEEE Transactions on,
vol. 27, no. 1, pp. 112–116, Jan. 1997.
[11] C. Kreucher, K. Kastella, and I. Alfred O. Hero, “Sensor management
using an active sensing approach,” Signal Processing, vol. 85, no. 3,
pp. 607–624, 2005.
[12] M. Athans, “On the determination of optimal costly
measurement strategies for linear stochastic systems,” Automatica,
vol. 8, no. 4, pp. 397–412, 1972. [Online]. Avail-
able: http://www.sciencedirect.com/science/article/B6V21-47SV18C-
5/2/dc50e03b2ec82f34c592d4056c0da466
[13] V. Krishnamurthy and R. Evans, “Hidden markov model multiarm
bandits: a methodology for beam scheduling in multitarget tracking,”
Signal Processing, IEEE Transactions on, vol. 49, no. 12, pp. 2893–
2908, Dec 2001.
[14] R. Washburn, M. Schneider, and J. Fox, “Stochastic dynamic program-
ming based approaches to sensor resource management,” Information
Fusion, 2002. Proceedings of the Fifth International Conference on,
pp. 608–615 vol.1, 2002.
[15] J. C. Gittins, “Bandit processes and dynamic allocation indices,”
Journal of the Royal Statistical Society. Series B (Methodological),
vol. 41, no. 2, pp. 148–177, 1979. [Online]. Available:
http://www.jstor.org/stable/2985029
[16] W. Macready and I. Wolpert, D.H., “Bandit problems and the explo-
ration/exploitation tradeoff,” Evolutionary Computation, IEEE Trans-
actions on, vol. 2, no. 1, pp. 2–22, Apr. 1998.
[17] C. Kreucher and A. O. H. III, “Monte carlo methods for sensor
management in target tracking,” in IEEE Nonlinear Statistical Signal
Processing Workshop, 2006.
[18] E. Chong, C. Kreucher, and A. Hero, “Monte-carlo-based partially
observable markov decision process approximations for adaptive sens-
ing,” Discrete Event Systems, 2008. WODES 2008. 9th International
Workshop on, pp. 173–180, May 2008.
[19] D. A. Castañón, A. Hero, D. Cochran, and K. Kastella, Foundations
and Applications of Sensor Management, 1st ed. Springer, 2008,
ch. 1.
[20] D. A. Castañón, “Approximate dynamic programming for sensor
management,” in Proc 36th Conference on Decision and Control.
IEEE, 1997, pp. 1202–1207.
[21] ——, “Stochastic control bounds on sensor network performance,”
Decision and Control, 2005 and 2005 European Control Conference.
CDC-ECC ’05. 44th IEEE Conference on, pp. 4939–4944, Dec. 2005.
[22] P. C. Gilmore and R. E. Gomory, “A linear programming
approach to the cutting-stock problem,” Operations Research,
vol. 9, no. 6, pp. 849–859, 1961. [Online]. Available:
http://www.jstor.org/stable/167051
[23] G. B. Dantzig and P. Wolfe, “The decomposition algorithm for
linear programs,” Econometrica, vol. 29, no. 4, pp. 767–778, 1961.
[Online]. Available: http://www.jstor.org/stable/1911818
[24] K. A. Yost and A. R. Washburn, “The lp/pomdp marriage: Optimiza-
tion with imperfect information,” Naval Research Logistics, vol. 47,
no. 8, pp. 607–619, 2000.
[25] J. Pineau, G. Gordon, and S. Thrun, “Point-based value iteration: An
anytime algorithm for pomdps,” in International Joint Conference on
Artificial Intelligence (IJCAI), Aug. 2003, pp. 1025–1032.
[26] K. Kastella, “Discrimination gain for sensor management in multitar-
get detection and tracking,” in IEEE-SMC and IMACS Multiconference
CESA 1996, vol. 1, Jul. 1996, pp. 167–172.

Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

Similar to Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010 (20)

Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010